| Age | Commit message (Collapse) | Author |
|
* ZJIT: Profile `invokesuper` instructions
* ZJIT: Introduce the `InvokeSuperDirect` HIR instruction
The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ.
* ZJIT: Expand definition of unspecializable to more complex cases
* ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified
* ZJIT: Simplify `invokesuper` specialization to most common case
Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`.
* ZJIT: Track `super` method entries directly to avoid GC issues
Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children.
* ZJIT: Optimize `super` calls with simple argument forms
* ZJIT: Report the reason why we can't optimize an `invokesuper` instance
* ZJIT: Revise send fallback reasons for `super` calls
* ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
|
|
|
|
|
|
Don't support shape transitions for now.
|
|
This lets us constant-fold common monomorphic cases.
|
|
As can be seen in vm_block_handler_verify(), VM_BLOCK_HANDLER_NONE is
not a valid argument for vm_block_handler(). Store nil in the profiler
when seen instead of crashing.
|
|
|
|
Storing the tagged block handler in profiles is not GC-safe (nice catch,
Kokubun). Store the untagged block handler instead.
Fix bug in https://github.com/ruby/ruby/pull/15051
|
|
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks.
For lobsters:
```
Top-6 invokeblock handler (100.0% of total 1,064,155):
megamorphic: 494,931 (46.5%)
monomorphic_iseq: 337,171 (31.7%)
polymorphic: 113,381 (10.7%)
monomorphic_ifunc: 52,260 ( 4.9%)
monomorphic_other: 38,970 ( 3.7%)
no_profiles: 27,442 ( 2.6%)
```
For railsbench:
```
Top-6 invokeblock handler (100.0% of total 2,529,104):
monomorphic_iseq: 834,452 (33.0%)
megamorphic: 818,347 (32.4%)
polymorphic: 632,273 (25.0%)
monomorphic_ifunc: 224,243 ( 8.9%)
monomorphic_other: 19,595 ( 0.8%)
no_profiles: 194 ( 0.0%)
```
For shipit:
```
Top-6 invokeblock handler (100.0% of total 2,104,148):
megamorphic: 1,269,889 (60.4%)
polymorphic: 411,475 (19.6%)
no_profiles: 173,367 ( 8.2%)
monomorphic_other: 118,619 ( 5.6%)
monomorphic_iseq: 84,891 ( 4.0%)
monomorphic_ifunc: 45,907 ( 2.2%)
```
Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
|
|
|
|
Since `Send` has a block iseq, I updated `CCallWithFrame` to take an optional `blockiseq` as well, and then generate `CCallWithFrame` for `Send` when the condition is right.
## Stats
`liquid-render` Benchmark
| Metric | Before | After | Change |
|----------------------|--------------------|--------------------|--------------------- |
| send_no_profiles | 3,209,418 (34.1%) | 4,119 (0.1%) | -3,205,299 (-99.9%) |
| dynamic_send_count | 9,410,758 (23.1%) | 6,459,678 (15.9%) | -2,951,080 (-31.4%) |
| optimized_send_count | 31,269,388 (76.9%) | 34,220,474 (84.1%) | +2,951,086 (+9.4%) |
`lobsters` Benchmark
| Metric | Before | After | Change |
|----------------------|------------|------------|---------------------|
| send_no_profiles | 10,769,052 | 2,902,865 | -7,866,187 (-73.0%) |
| dynamic_send_count | 45,673,185 | 42,880,160 | -2,793,025 (-6.1%) |
| optimized_send_count | 75,142,407 | 78,378,514 | +3,236,107 (+4.3%) |
### `liquid-render` Before
<details>
```
Average of last 22, non-warmup iters: 262ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.9% of total 10,370,809):
Kernel#respond_to?: 5,069,204 (48.9%)
Hash#key?: 2,394,488 (23.1%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Kernel#is_a?: 164,272 ( 1.6%)
Kernel#format: 124,262 ( 1.2%)
Integer#/: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,394,421):
Kernel#respond_to?: 5,069,204 (48.8%)
Hash#key?: 2,394,488 (23.0%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
Kernel#is_a?: 208,664 ( 2.0%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Integer#/: 124,262 ( 1.2%)
Kernel#format: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 2,382):
cfunc: 1,196 (50.2%)
iseq: 1,186 (49.8%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,561,006):
iseq: 2,442,091 (95.4%)
optimized: 118,882 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 685,128):
invokeblock: 227,376 (33.2%)
opt_neq: 166,471 (24.3%)
opt_and: 166,471 (24.3%)
opt_eq: 66,721 ( 9.7%)
invokesuper: 39,363 ( 5.7%)
opt_le: 16,278 ( 2.4%)
opt_minus: 1,574 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 9,410,758):
send_no_profiles: 3,209,418 (34.1%)
send_without_block_polymorphic: 2,858,558 (30.4%)
send_without_block_not_optimized_method_type: 2,561,006 (27.2%)
not_optimized_instruction: 685,128 ( 7.3%)
send_without_block_no_profiles: 91,913 ( 1.0%)
send_not_optimized_method_type: 2,382 ( 0.0%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 83,682):
getclassvariable: 83,431 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,431,910):
register_spill_on_alloc: 4,665,393 (85.9%)
exception_handler: 766,347 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-11 side exit reasons (100.0% of total 14,635,508):
compile_error: 5,431,910 (37.1%)
guard_shape_failure: 3,436,341 (23.5%)
guard_type_failure: 2,545,791 (17.4%)
unhandled_splat: 2,162,907 (14.8%)
unhandled_kwarg: 952,568 ( 6.5%)
unhandled_yarv_insn: 83,682 ( 0.6%)
unhandled_hir_insn: 19,112 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
send_count: 40,680,153
dynamic_send_count: 9,410,758 (23.1%)
optimized_send_count: 31,269,395 (76.9%)
iseq_optimized_send_count: 13,886,902 (34.1%)
inline_cfunc_optimized_send_count: 7,011,684 (17.2%)
non_variadic_cfunc_optimized_send_count: 4,670,333 (11.5%)
variadic_cfunc_optimized_send_count: 5,700,476 (14.0%)
dynamic_getivar_count: 1,144,613
dynamic_setivar_count: 950,830
compiled_iseq_count: 402
failed_iseq_count: 48
compile_time: 976ms
profile_time: 3,223ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 37,744,491
vm_write_sp_count: 37,511,865
vm_write_locals_count: 37,511,865
vm_write_stack_count: 37,511,865
vm_write_to_parent_iseq_local_count: 558,177
vm_read_from_parent_iseq_local_count: 14,317,032
code_region_bytes: 2,211,840
side_exit_count: 14,635,508
total_insn_count: 476,097,972
vm_insn_count: 253,795,154
zjit_insn_count: 222,302,818
ratio_in_zjit: 46.7%
```
</details>
### `liquid-render` After
<details>
```
Average of last 21, non-warmup iters: 272ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.8% of total 10,093,966):
Kernel#respond_to?: 4,932,224 (48.9%)
Hash#key?: 2,329,928 (23.1%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#is_a?: 159,892 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,116,938):
Kernel#respond_to?: 4,932,224 (48.8%)
Hash#key?: 2,329,928 (23.0%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
Kernel#is_a?: 203,084 ( 2.0%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 182,938):
iseq: 178,414 (97.5%)
cfunc: 4,524 ( 2.5%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,492,246):
iseq: 2,376,511 (95.4%)
optimized: 115,702 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 667,727):
invokeblock: 221,375 (33.2%)
opt_neq: 161,971 (24.3%)
opt_and: 161,971 (24.3%)
opt_eq: 64,921 ( 9.7%)
invokesuper: 39,243 ( 5.9%)
opt_le: 15,838 ( 2.4%)
opt_minus: 1,534 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 6,287,956):
send_without_block_polymorphic: 2,782,058 (44.2%)
send_without_block_not_optimized_method_type: 2,492,246 (39.6%)
not_optimized_instruction: 667,727 (10.6%)
send_not_optimized_method_type: 182,938 ( 2.9%)
send_without_block_no_profiles: 89,613 ( 1.4%)
send_polymorphic: 66,962 ( 1.1%)
send_no_profiles: 4,059 ( 0.1%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 81,482):
getclassvariable: 81,231 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,286,310):
register_spill_on_alloc: 4,540,413 (85.9%)
exception_handler: 745,727 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-12 side exit reasons (100.0% of total 14,244,881):
compile_error: 5,286,310 (37.1%)
guard_shape_failure: 3,346,873 (23.5%)
guard_type_failure: 2,477,071 (17.4%)
unhandled_splat: 2,104,447 (14.8%)
unhandled_kwarg: 926,828 ( 6.5%)
unhandled_yarv_insn: 81,482 ( 0.6%)
unhandled_hir_insn: 18,672 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
interrupt: 1 ( 0.0%)
send_count: 39,591,410
dynamic_send_count: 6,287,956 (15.9%)
optimized_send_count: 33,303,454 (84.1%)
iseq_optimized_send_count: 13,514,283 (34.1%)
inline_cfunc_optimized_send_count: 6,823,745 (17.2%)
non_variadic_cfunc_optimized_send_count: 7,417,432 (18.7%)
variadic_cfunc_optimized_send_count: 5,547,994 (14.0%)
dynamic_getivar_count: 1,110,647
dynamic_setivar_count: 927,309
compiled_iseq_count: 403
failed_iseq_count: 48
compile_time: 968ms
profile_time: 3,547ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 36,735,108
vm_write_sp_count: 36,508,262
vm_write_locals_count: 36,508,262
vm_write_stack_count: 36,508,262
vm_write_to_parent_iseq_local_count: 543,097
vm_read_from_parent_iseq_local_count: 13,930,672
code_region_bytes: 2,228,224
side_exit_count: 14,244,881
total_insn_count: 463,357,969
vm_insn_count: 247,003,727
zjit_insn_count: 216,354,242
ratio_in_zjit: 46.7%
```
</details>
### `lobsters` Before
<details>
```
Average of last 10, non-warmup iters: 898ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,906):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,120 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,549 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,172):
Kernel#is_a?: 1,993,676 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,520 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-2 not optimized method types for send (100.0% of total 115,007):
cfunc: 76,172 (66.2%)
iseq: 38,835 (33.8%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,826):
invokesuper: 4,335,446 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,614 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 45,673,212):
send_without_block_polymorphic: 17,390,335 (38.1%)
send_no_profiles: 10,769,053 (23.6%)
send_without_block_not_optimized_method_type: 8,003,641 (17.5%)
not_optimized_instruction: 7,590,826 (16.6%)
send_without_block_no_profiles: 1,757,109 ( 3.8%)
send_not_optimized_method_type: 115,007 ( 0.3%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,228):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,480 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,693):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,280 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,142,827):
compile_error: 6,769,693 (33.6%)
guard_type_failure: 5,169,050 (25.7%)
guard_shape_failure: 3,726,362 (18.5%)
unhandled_yarv_insn: 1,242,228 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 128,856 ( 0.6%)
patchpoint_method_redefined: 32,718 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 16 ( 0.0%)
send_count: 120,815,640
dynamic_send_count: 45,673,212 (37.8%)
optimized_send_count: 75,142,428 (62.2%)
iseq_optimized_send_count: 32,188,039 (26.6%)
inline_cfunc_optimized_send_count: 23,458,483 (19.4%)
non_variadic_cfunc_optimized_send_count: 14,809,797 (12.3%)
variadic_cfunc_optimized_send_count: 4,686,109 ( 3.9%)
dynamic_getivar_count: 13,023,437
dynamic_setivar_count: 12,311,158
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 8,943ms
profile_time: 99ms
gc_time: 45ms
invalidation_time: 239ms
vm_write_pc_count: 113,652,291
vm_write_sp_count: 111,209,623
vm_write_locals_count: 111,209,623
vm_write_stack_count: 111,209,623
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 22,609,920
side_exit_count: 20,142,827
total_insn_count: 926,088,942
vm_insn_count: 297,636,255
zjit_insn_count: 628,452,687
ratio_in_zjit: 67.9%
```
</details>
### `lobsters` After
<details>
```
Average of last 10, non-warmup iters: 919ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,868):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,110 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,545 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,134):
Kernel#is_a?: 1,993,666 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,516 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-4 not optimized method types for send (100.0% of total 4,749,678):
iseq: 2,563,391 (54.0%)
cfunc: 2,064,888 (43.5%)
alias: 118,577 ( 2.5%)
null: 2,822 ( 0.1%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,818):
invokesuper: 4,335,442 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,610 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-10 send fallback reasons (100.0% of total 43,152,037):
send_without_block_polymorphic: 17,390,322 (40.3%)
send_without_block_not_optimized_method_type: 8,003,641 (18.5%)
not_optimized_instruction: 7,590,818 (17.6%)
send_not_optimized_method_type: 4,749,678 (11.0%)
send_no_profiles: 2,893,666 ( 6.7%)
send_without_block_no_profiles: 1,757,109 ( 4.1%)
send_polymorphic: 719,562 ( 1.7%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,215):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,467 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,688):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,275 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,144,372):
compile_error: 6,769,688 (33.6%)
guard_type_failure: 5,169,204 (25.7%)
guard_shape_failure: 3,726,374 (18.5%)
unhandled_yarv_insn: 1,242,215 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 130,252 ( 0.6%)
patchpoint_method_redefined: 32,716 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 19 ( 0.0%)
send_count: 120,812,030
dynamic_send_count: 43,152,037 (35.7%)
optimized_send_count: 77,659,993 (64.3%)
iseq_optimized_send_count: 32,187,900 (26.6%)
inline_cfunc_optimized_send_count: 23,458,491 (19.4%)
non_variadic_cfunc_optimized_send_count: 17,327,499 (14.3%)
variadic_cfunc_optimized_send_count: 4,686,103 ( 3.9%)
dynamic_getivar_count: 13,023,424
dynamic_setivar_count: 12,310,991
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 9,012ms
profile_time: 104ms
gc_time: 44ms
invalidation_time: 239ms
vm_write_pc_count: 113,648,665
vm_write_sp_count: 111,205,997
vm_write_locals_count: 111,205,997
vm_write_stack_count: 111,205,997
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 23,052,288
side_exit_count: 20,144,372
total_insn_count: 926,090,214
vm_insn_count: 297,647,811
zjit_insn_count: 628,442,403
ratio_in_zjit: 67.9%
```
</details>
|
|
This is only really called a lot in the benchmark harness, as far as I
can tell.
|
|
These bring `send_without_block_no_profiles` numbers down more.
On lobsters:
Before: send_without_block_no_profiles: 1,293,375
After: send_without_block_no_profiles: 998,724
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,335):
Hash#[]: 4,519,774 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,929 ( 5.5%)
Hash#[]=: 742,941 ( 4.8%)
Regexp#match?: 399,889 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.4%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,340 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,478):
Hash#[]: 4,519,784 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,929 ( 5.4%)
Hash#[]=: 743,120 ( 4.7%)
Regexp#match?: 399,889 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,347 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,648):
iseq: 2,271,904 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,702 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,096):
invokesuper: 2,373,391 (55.3%)
invokeblock: 811,872 (18.9%)
sendforward: 505,448 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,225 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,463):
send_without_block_polymorphic: 9,721,727 (37.6%)
send_no_profiles: 5,894,760 (22.8%)
send_without_block_not_optimized_method_type: 4,523,648 (17.5%)
not_optimized_instruction: 4,293,096 (16.6%)
send_without_block_no_profiles: 1,293,386 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,502):
register_spill_on_alloc: 3,457,791 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,787):
compile_error: 3,752,502 (34.6%)
guard_type_failure: 2,638,903 (24.3%)
guard_shape_failure: 1,917,195 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,787 ( 4.9%)
unhandled_kwarg: 421,943 ( 3.9%)
patchpoint: 370,449 ( 3.4%)
unknown_newarray_send: 314,785 ( 2.9%)
unhandled_splat: 122,060 ( 1.1%)
unhandled_hir_insn: 76,396 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 504 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,801
dynamic_send_count: 25,824,463 (38.6%)
optimized_send_count: 41,121,338 (61.4%)
iseq_optimized_send_count: 18,587,368 (27.8%)
inline_cfunc_optimized_send_count: 6,958,635 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,155 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,180 ( 4.0%)
dynamic_getivar_count: 7,365,975
dynamic_setivar_count: 7,245,897
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 760ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 55ms
vm_write_pc_count: 64,284,053
vm_write_sp_count: 62,940,297
vm_write_locals_count: 62,940,297
vm_write_stack_count: 62,940,297
vm_write_to_parent_iseq_local_count: 292,446
vm_read_from_parent_iseq_local_count: 6,470,923
code_region_bytes: 23,019,520
side_exit_count: 10,860,787
total_insn_count: 517,576,320
vm_insn_count: 163,188,910
zjit_insn_count: 354,387,410
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 15,740,856):
Hash#[]: 4,519,792 (28.7%)
Kernel#is_a?: 1,030,776 ( 6.5%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 742,914 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 353,775 ( 2.2%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,788 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.1%)
BasicObject#!=: 175,996 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (70.9% of total 15,902,999):
Hash#[]: 4,519,802 (28.4%)
Kernel#is_a?: 1,212,667 ( 7.6%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 743,093 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,657 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.1%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,180 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.0%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,637):
iseq: 2,271,900 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,695 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,128):
invokesuper: 2,373,401 (55.3%)
invokeblock: 811,890 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,530,605):
send_without_block_polymorphic: 9,722,499 (38.1%)
send_no_profiles: 5,894,763 (23.1%)
send_without_block_not_optimized_method_type: 4,523,637 (17.7%)
not_optimized_instruction: 4,293,128 (16.8%)
send_without_block_no_profiles: 998,732 ( 3.9%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,500):
register_spill_on_alloc: 3,457,792 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,360 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,797):
compile_error: 3,752,500 (34.6%)
guard_type_failure: 2,638,909 (24.3%)
guard_shape_failure: 1,917,203 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,474 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,067 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 469 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,326
dynamic_send_count: 25,530,605 (38.1%)
optimized_send_count: 41,414,721 (61.9%)
iseq_optimized_send_count: 18,587,439 (27.8%)
inline_cfunc_optimized_send_count: 7,086,426 (10.6%)
non_variadic_cfunc_optimized_send_count: 13,076,682 (19.5%)
variadic_cfunc_optimized_send_count: 2,664,174 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,954
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 748ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 58ms
vm_write_pc_count: 64,155,801
vm_write_sp_count: 62,812,041
vm_write_locals_count: 62,812,041
vm_write_stack_count: 62,812,041
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,470,939
code_region_bytes: 23,052,288
side_exit_count: 10,860,797
total_insn_count: 517,576,915
vm_insn_count: 163,192,099
zjit_insn_count: 354,384,816
ratio_in_zjit: 68.5%
```
|
|
These bring `send_without_block_no_profiles` numbers down dramatically.
On lobsters:
Before: send_without_block_no_profiles: 3,466,375
After: send_without_block_no_profiles: 1,293,375
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 14,174,061):
Hash#[]: 4,519,771 (31.9%)
Kernel#is_a?: 1,030,757 ( 7.3%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 353,775 ( 2.5%)
Hash#key?: 349,125 ( 2.5%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.4%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.7%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 181,793 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,341 ( 1.3%)
BasicObject#!=: 175,996 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,600 ( 1.2%)
String#==: 157,734 ( 1.1%)
Top-20 not annotated C methods (71.1% of total 14,336,035):
Hash#[]: 4,519,781 (31.5%)
Kernel#is_a?: 1,212,647 ( 8.5%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 361,013 ( 2.5%)
Hash#key?: 349,125 ( 2.4%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.3%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.6%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 191,662 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,348 ( 1.3%)
BasicObject#!=: 176,180 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,634 ( 1.2%)
String#==: 163,666 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,536,895):
iseq: 2,281,897 (50.3%)
bmethod: 985,679 (21.7%)
optimized: 952,914 (21.0%)
alias: 310,745 ( 6.8%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,123):
invokesuper: 2,373,396 (55.3%)
invokeblock: 811,891 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,227 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 27,795,022):
send_without_block_polymorphic: 9,505,835 (34.2%)
send_no_profiles: 5,894,763 (21.2%)
send_without_block_not_optimized_method_type: 4,536,895 (16.3%)
not_optimized_instruction: 4,293,123 (15.4%)
send_without_block_no_profiles: 3,466,407 (12.5%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,918 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,391):
register_spill_on_alloc: 3,457,680 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,852,021):
compile_error: 3,752,391 (34.6%)
guard_type_failure: 2,630,877 (24.2%)
guard_shape_failure: 1,917,208 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,989 ( 3.9%)
patchpoint: 369,799 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,062 ( 1.1%)
unhandled_hir_insn: 76,394 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 468 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,989,407
dynamic_send_count: 27,795,022 (41.5%)
optimized_send_count: 39,194,385 (58.5%)
iseq_optimized_send_count: 18,060,194 (27.0%)
inline_cfunc_optimized_send_count: 6,960,130 (10.4%)
non_variadic_cfunc_optimized_send_count: 11,523,682 (17.2%)
variadic_cfunc_optimized_send_count: 2,650,379 ( 4.0%)
dynamic_getivar_count: 7,365,982
dynamic_setivar_count: 7,245,929
compiled_iseq_count: 4,795
failed_iseq_count: 449
compile_time: 846ms
profile_time: 12ms
gc_time: 9ms
invalidation_time: 61ms
vm_write_pc_count: 64,326,442
vm_write_sp_count: 62,982,524
vm_write_locals_count: 62,982,524
vm_write_stack_count: 62,982,524
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,471,353
code_region_bytes: 22,708,224
side_exit_count: 10,852,021
total_insn_count: 517,550,288
vm_insn_count: 162,946,459
zjit_insn_count: 354,603,829
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,343):
Hash#[]: 4,519,778 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,931 ( 5.5%)
Hash#[]=: 742,938 ( 4.8%)
Regexp#match?: 399,886 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.4%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,486):
Hash#[]: 4,519,788 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,931 ( 5.4%)
Hash#[]=: 743,117 ( 4.7%)
Regexp#match?: 399,886 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.3%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,650):
iseq: 2,271,911 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,696 (21.0%)
alias: 310,747 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,126):
invokesuper: 2,373,395 (55.3%)
invokeblock: 811,894 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,512):
send_without_block_polymorphic: 9,721,725 (37.6%)
send_no_profiles: 5,894,761 (22.8%)
send_without_block_not_optimized_method_type: 4,523,650 (17.5%)
not_optimized_instruction: 4,293,126 (16.6%)
send_without_block_no_profiles: 1,293,404 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,504):
register_spill_on_alloc: 3,457,793 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,754):
compile_error: 3,752,504 (34.6%)
guard_type_failure: 2,638,901 (24.3%)
guard_shape_failure: 1,917,198 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,785 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,447 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,065 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 463 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,926
dynamic_send_count: 25,824,512 (38.6%)
optimized_send_count: 41,121,414 (61.4%)
iseq_optimized_send_count: 18,587,430 (27.8%)
inline_cfunc_optimized_send_count: 6,958,641 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,166 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,177 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,942
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 852ms
profile_time: 13ms
gc_time: 11ms
invalidation_time: 63ms
vm_write_pc_count: 64,284,194
vm_write_sp_count: 62,940,427
vm_write_locals_count: 62,940,427
vm_write_stack_count: 62,940,427
vm_write_to_parent_iseq_local_count: 292,447
vm_read_from_parent_iseq_local_count: 6,470,931
code_region_bytes: 23,019,520
side_exit_count: 10,860,754
total_insn_count: 517,576,267
vm_insn_count: 163,188,187
zjit_insn_count: 354,388,080
ratio_in_zjit: 68.5%
```
|
|
* ZJIT: Profile opt_aref
* ZJIT: Add test for opt_aref
* ZJIT: Move test and add hash opt test
* ZJIT: Update zjit bindgen
* ZJIT: Add inspect calls to opt_aref tests
|
|
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
When we investigate guard failure issues, we sometimes need to use
profile num around 100k (e.g. `lobsters` in ruby-bench).
This change is to allow that.
|
|
|
|
* failing test for ObjToString optimization with GuardType
* profile ObjToString receiver and rewrite with guard
* adjust integration tests for objtostring type guard optimization
* Implement new GuardTypeNot HIR; objtostring sends to_s directly on profiled nonstrings
* codegen for GuardTypeNot
* typo fixes
* better name for tests; fix side exit reason for GuardTypeNot
* revert accidental change
* make bindgen
* Fix is_string to identify subclasses of String; fix codegen for identifying if val is String
|
|
|
|
|
|
Specialize monomorphic `GetIvar` into:
* `GuardType(HeapObject)`
* `GuardShape`
* `LoadIvarEmbedded` or `LoadIvarExtended`
This requires profiling self for `getinstancevariable` (it's not on the operand
stack).
This also optimizes `GetIvar`s that happen as a result of inlining
`attr_reader` and `attr_accessor`.
Also move some (newly) shared JIT helpers into jit.c.
|
|
We will (for now) only cache ivar reads from T_OBJECTs.
|
|
This lets us know where to look for an ivar: in the object or indirect
elsewhere in the heap.
|
|
* ZJIT: Profile and specialize Array#empty?
* ZJIT: Specialize BasicObject#==
* ZJIT: Specialize Hash#empty?
* ZJIT: Specialize BasicObject#!
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
|
|
It only makes sense for heap objects.
|
|
Co-authored-by: Stan Lo <stan001212@gmail.com>
|
|
* ZJIT: Implement SingleRactorMode invalidation
* ZJIT: Add macro for compiling jumps
* ZJIT: Fix typo in comment
* YJIT: Fix typo in comment
* ZJIT: Avoid using unexported types in zjit.h
`enum ruby_vminsn_type` is declared in `insns.inc` and is not exported.
Using it in `zjit.h` would cause build errors when the file including it
doesn't include `insns.inc`.
|
|
ZJIT uses the interpreter to take type profiles of what objects pass through
the code. It stores a compressed record of the history per opcode for the
opcodes we select.
Before this change, we re-used the HIR Type data-structure, a shallow type
lattice, to store historical type information. This was quick for bringup but
is quite lossy as profiles go: we get one bit per built-in type seen, and if we
see a non-built-in type in addition, we end up with BasicObject. Not very
helpful. Additionally, it does not give us any notion of cardinality: how many
of each type did we see?
This change brings with it a much more interesting slice of type history: a
histogram. A Distribution holds a record of the top-N (where N is fixed at Ruby
compile-time) `(Class, ShapeId)` pairs and their counts. It also holds an
*other* count in case we see more than N pairs.
Using this distribution, we can make more informed decisions about when we
should use type information. We can determine if we are strictly monomorphic,
very nearly monomorphic, or something else. Maybe the call-site is polymorphic,
so we should have a polymorphic inline cache. Exciting stuff.
I also plumb this new distribution into the HIR part of the compilation
pipeline.
|
|
This issues writebarriers for objects added via gc_offsets or by
profiling. This may be slower than writebarrier_remember, but we would
like it to be more debuggable.
Co-authored-by: Max Bernstein <ruby@bernsteinbear.com>
Co-authored-by: Stan Lo <stan001212@gmail.com>
|
|
Fixes `TestZJIT::test_require_rubygems`. It was crashing locally due to
false collection of a live object. See
<https://alanwu.space/post/write-barrier/>.
Co-authored-by: Max Bernstein <max@bernsteinbear.com>
Co-authored-by: Takashi Kokubun <takashi.kokubun@shopify.com>
Co-authored-by: Stan Lo <stan.lo@shopify.com>
|
|
|
|
* ZJIT: Profile each instruction at most num_profiles times
* Use saturating_add for num_profiles
|
|
|
|
This is notably faster: no need to hash indices.
Before:
```
plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T14:40:49Z master 51252ef8d7) +ZJIT dev +PRISM [arm64-darwin24]
itr: time
#1: 5311ms
#2: 49ms
#3: 49ms
#4: 48ms
```
After:
```
plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T15:09:06Z mb-benchmark-compile 42ffd3c1ee) +ZJIT dev +PRISM [arm64-darwin24]
itr: time
#1: 1332ms
#2: 49ms
#3: 48ms
#4: 48ms
```
|
|
|
|
|
|
This allows ZJIT to profile `nil?` calls and create type guards for
its receiver.
- Add `zjit_profile` to `opt_nil_p` insn
- Start profiling `opt_nil_p` calls
- Use `runtime_exact_ruby_class` instead of `exact_ruby_class` to determine
the profiled receiver class
|
|
* Implement JIT-to-JIT calls
* Use a closer dummy address for Arm64
* Revert an obsoleted change
* Revert a few more obsoleted changes
* Fix outdated comments
* Explain PosMarkers for CCall
* s/JIT code/machine code/
* Get rid of ParallelMov
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
Split out from the CCall changes since we discussed during pairing that
this is useful to unblock some other changes. No tests since no one
consumes this profiling data yet.
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
(https://github.com/Shopify/zjit/pull/39)
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
Top/Bottom can be unintuitive or ambiguous.
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
(https://github.com/Shopify/zjit/pull/24)
* Profile instructions for fixnum arithmetic
* Drop PartialEq from Type
* Do not push PatchPoint onto the stack
* Avoid pushing the output of the guards
* Pop operands after guards
* Test HIR from profiled runs
* Implement Display for new instructions
* Drop unused FIXNUM_BITS
* Use a Rust function to split lines
* Use Display for GuardType operands
Co-authored-by: Max Bernstein <max@bernsteinbear.com>
* Fix tests with Display-ed values
---------
Co-authored-by: Max Bernstein <max@bernsteinbear.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
(https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter
* Rename FixnumPlus to FixnumAdd
* Update a comment about Invalidate
* Rename Guard to GuardType
* Rename Invalidate to PatchPoint
* Drop unneeded debug!()
* Plan on profiling the types
* Use the output of GuardType as type refined outputs
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|