| Age | Commit message (Collapse) | Author |
|
Don't support shape transitions for now.
|
|
This lets us constant-fold common monomorphic cases.
|
|
|
|
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks.
For lobsters:
```
Top-6 invokeblock handler (100.0% of total 1,064,155):
megamorphic: 494,931 (46.5%)
monomorphic_iseq: 337,171 (31.7%)
polymorphic: 113,381 (10.7%)
monomorphic_ifunc: 52,260 ( 4.9%)
monomorphic_other: 38,970 ( 3.7%)
no_profiles: 27,442 ( 2.6%)
```
For railsbench:
```
Top-6 invokeblock handler (100.0% of total 2,529,104):
monomorphic_iseq: 834,452 (33.0%)
megamorphic: 818,347 (32.4%)
polymorphic: 632,273 (25.0%)
monomorphic_ifunc: 224,243 ( 8.9%)
monomorphic_other: 19,595 ( 0.8%)
no_profiles: 194 ( 0.0%)
```
For shipit:
```
Top-6 invokeblock handler (100.0% of total 2,104,148):
megamorphic: 1,269,889 (60.4%)
polymorphic: 411,475 (19.6%)
no_profiles: 173,367 ( 8.2%)
monomorphic_other: 118,619 ( 5.6%)
monomorphic_iseq: 84,891 ( 4.0%)
monomorphic_ifunc: 45,907 ( 2.2%)
```
Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
|
|
Since `Send` has a block iseq, I updated `CCallWithFrame` to take an optional `blockiseq` as well, and then generate `CCallWithFrame` for `Send` when the condition is right.
## Stats
`liquid-render` Benchmark
| Metric | Before | After | Change |
|----------------------|--------------------|--------------------|--------------------- |
| send_no_profiles | 3,209,418 (34.1%) | 4,119 (0.1%) | -3,205,299 (-99.9%) |
| dynamic_send_count | 9,410,758 (23.1%) | 6,459,678 (15.9%) | -2,951,080 (-31.4%) |
| optimized_send_count | 31,269,388 (76.9%) | 34,220,474 (84.1%) | +2,951,086 (+9.4%) |
`lobsters` Benchmark
| Metric | Before | After | Change |
|----------------------|------------|------------|---------------------|
| send_no_profiles | 10,769,052 | 2,902,865 | -7,866,187 (-73.0%) |
| dynamic_send_count | 45,673,185 | 42,880,160 | -2,793,025 (-6.1%) |
| optimized_send_count | 75,142,407 | 78,378,514 | +3,236,107 (+4.3%) |
### `liquid-render` Before
<details>
```
Average of last 22, non-warmup iters: 262ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.9% of total 10,370,809):
Kernel#respond_to?: 5,069,204 (48.9%)
Hash#key?: 2,394,488 (23.1%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Kernel#is_a?: 164,272 ( 1.6%)
Kernel#format: 124,262 ( 1.2%)
Integer#/: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,394,421):
Kernel#respond_to?: 5,069,204 (48.8%)
Hash#key?: 2,394,488 (23.0%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
Kernel#is_a?: 208,664 ( 2.0%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Integer#/: 124,262 ( 1.2%)
Kernel#format: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 2,382):
cfunc: 1,196 (50.2%)
iseq: 1,186 (49.8%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,561,006):
iseq: 2,442,091 (95.4%)
optimized: 118,882 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 685,128):
invokeblock: 227,376 (33.2%)
opt_neq: 166,471 (24.3%)
opt_and: 166,471 (24.3%)
opt_eq: 66,721 ( 9.7%)
invokesuper: 39,363 ( 5.7%)
opt_le: 16,278 ( 2.4%)
opt_minus: 1,574 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 9,410,758):
send_no_profiles: 3,209,418 (34.1%)
send_without_block_polymorphic: 2,858,558 (30.4%)
send_without_block_not_optimized_method_type: 2,561,006 (27.2%)
not_optimized_instruction: 685,128 ( 7.3%)
send_without_block_no_profiles: 91,913 ( 1.0%)
send_not_optimized_method_type: 2,382 ( 0.0%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 83,682):
getclassvariable: 83,431 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,431,910):
register_spill_on_alloc: 4,665,393 (85.9%)
exception_handler: 766,347 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-11 side exit reasons (100.0% of total 14,635,508):
compile_error: 5,431,910 (37.1%)
guard_shape_failure: 3,436,341 (23.5%)
guard_type_failure: 2,545,791 (17.4%)
unhandled_splat: 2,162,907 (14.8%)
unhandled_kwarg: 952,568 ( 6.5%)
unhandled_yarv_insn: 83,682 ( 0.6%)
unhandled_hir_insn: 19,112 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
send_count: 40,680,153
dynamic_send_count: 9,410,758 (23.1%)
optimized_send_count: 31,269,395 (76.9%)
iseq_optimized_send_count: 13,886,902 (34.1%)
inline_cfunc_optimized_send_count: 7,011,684 (17.2%)
non_variadic_cfunc_optimized_send_count: 4,670,333 (11.5%)
variadic_cfunc_optimized_send_count: 5,700,476 (14.0%)
dynamic_getivar_count: 1,144,613
dynamic_setivar_count: 950,830
compiled_iseq_count: 402
failed_iseq_count: 48
compile_time: 976ms
profile_time: 3,223ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 37,744,491
vm_write_sp_count: 37,511,865
vm_write_locals_count: 37,511,865
vm_write_stack_count: 37,511,865
vm_write_to_parent_iseq_local_count: 558,177
vm_read_from_parent_iseq_local_count: 14,317,032
code_region_bytes: 2,211,840
side_exit_count: 14,635,508
total_insn_count: 476,097,972
vm_insn_count: 253,795,154
zjit_insn_count: 222,302,818
ratio_in_zjit: 46.7%
```
</details>
### `liquid-render` After
<details>
```
Average of last 21, non-warmup iters: 272ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.8% of total 10,093,966):
Kernel#respond_to?: 4,932,224 (48.9%)
Hash#key?: 2,329,928 (23.1%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#is_a?: 159,892 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,116,938):
Kernel#respond_to?: 4,932,224 (48.8%)
Hash#key?: 2,329,928 (23.0%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
Kernel#is_a?: 203,084 ( 2.0%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 182,938):
iseq: 178,414 (97.5%)
cfunc: 4,524 ( 2.5%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,492,246):
iseq: 2,376,511 (95.4%)
optimized: 115,702 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 667,727):
invokeblock: 221,375 (33.2%)
opt_neq: 161,971 (24.3%)
opt_and: 161,971 (24.3%)
opt_eq: 64,921 ( 9.7%)
invokesuper: 39,243 ( 5.9%)
opt_le: 15,838 ( 2.4%)
opt_minus: 1,534 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 6,287,956):
send_without_block_polymorphic: 2,782,058 (44.2%)
send_without_block_not_optimized_method_type: 2,492,246 (39.6%)
not_optimized_instruction: 667,727 (10.6%)
send_not_optimized_method_type: 182,938 ( 2.9%)
send_without_block_no_profiles: 89,613 ( 1.4%)
send_polymorphic: 66,962 ( 1.1%)
send_no_profiles: 4,059 ( 0.1%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 81,482):
getclassvariable: 81,231 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,286,310):
register_spill_on_alloc: 4,540,413 (85.9%)
exception_handler: 745,727 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-12 side exit reasons (100.0% of total 14,244,881):
compile_error: 5,286,310 (37.1%)
guard_shape_failure: 3,346,873 (23.5%)
guard_type_failure: 2,477,071 (17.4%)
unhandled_splat: 2,104,447 (14.8%)
unhandled_kwarg: 926,828 ( 6.5%)
unhandled_yarv_insn: 81,482 ( 0.6%)
unhandled_hir_insn: 18,672 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
interrupt: 1 ( 0.0%)
send_count: 39,591,410
dynamic_send_count: 6,287,956 (15.9%)
optimized_send_count: 33,303,454 (84.1%)
iseq_optimized_send_count: 13,514,283 (34.1%)
inline_cfunc_optimized_send_count: 6,823,745 (17.2%)
non_variadic_cfunc_optimized_send_count: 7,417,432 (18.7%)
variadic_cfunc_optimized_send_count: 5,547,994 (14.0%)
dynamic_getivar_count: 1,110,647
dynamic_setivar_count: 927,309
compiled_iseq_count: 403
failed_iseq_count: 48
compile_time: 968ms
profile_time: 3,547ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 36,735,108
vm_write_sp_count: 36,508,262
vm_write_locals_count: 36,508,262
vm_write_stack_count: 36,508,262
vm_write_to_parent_iseq_local_count: 543,097
vm_read_from_parent_iseq_local_count: 13,930,672
code_region_bytes: 2,228,224
side_exit_count: 14,244,881
total_insn_count: 463,357,969
vm_insn_count: 247,003,727
zjit_insn_count: 216,354,242
ratio_in_zjit: 46.7%
```
</details>
### `lobsters` Before
<details>
```
Average of last 10, non-warmup iters: 898ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,906):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,120 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,549 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,172):
Kernel#is_a?: 1,993,676 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,520 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-2 not optimized method types for send (100.0% of total 115,007):
cfunc: 76,172 (66.2%)
iseq: 38,835 (33.8%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,826):
invokesuper: 4,335,446 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,614 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 45,673,212):
send_without_block_polymorphic: 17,390,335 (38.1%)
send_no_profiles: 10,769,053 (23.6%)
send_without_block_not_optimized_method_type: 8,003,641 (17.5%)
not_optimized_instruction: 7,590,826 (16.6%)
send_without_block_no_profiles: 1,757,109 ( 3.8%)
send_not_optimized_method_type: 115,007 ( 0.3%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,228):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,480 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,693):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,280 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,142,827):
compile_error: 6,769,693 (33.6%)
guard_type_failure: 5,169,050 (25.7%)
guard_shape_failure: 3,726,362 (18.5%)
unhandled_yarv_insn: 1,242,228 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 128,856 ( 0.6%)
patchpoint_method_redefined: 32,718 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 16 ( 0.0%)
send_count: 120,815,640
dynamic_send_count: 45,673,212 (37.8%)
optimized_send_count: 75,142,428 (62.2%)
iseq_optimized_send_count: 32,188,039 (26.6%)
inline_cfunc_optimized_send_count: 23,458,483 (19.4%)
non_variadic_cfunc_optimized_send_count: 14,809,797 (12.3%)
variadic_cfunc_optimized_send_count: 4,686,109 ( 3.9%)
dynamic_getivar_count: 13,023,437
dynamic_setivar_count: 12,311,158
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 8,943ms
profile_time: 99ms
gc_time: 45ms
invalidation_time: 239ms
vm_write_pc_count: 113,652,291
vm_write_sp_count: 111,209,623
vm_write_locals_count: 111,209,623
vm_write_stack_count: 111,209,623
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 22,609,920
side_exit_count: 20,142,827
total_insn_count: 926,088,942
vm_insn_count: 297,636,255
zjit_insn_count: 628,452,687
ratio_in_zjit: 67.9%
```
</details>
### `lobsters` After
<details>
```
Average of last 10, non-warmup iters: 919ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,868):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,110 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,545 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,134):
Kernel#is_a?: 1,993,666 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,516 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-4 not optimized method types for send (100.0% of total 4,749,678):
iseq: 2,563,391 (54.0%)
cfunc: 2,064,888 (43.5%)
alias: 118,577 ( 2.5%)
null: 2,822 ( 0.1%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,818):
invokesuper: 4,335,442 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,610 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-10 send fallback reasons (100.0% of total 43,152,037):
send_without_block_polymorphic: 17,390,322 (40.3%)
send_without_block_not_optimized_method_type: 8,003,641 (18.5%)
not_optimized_instruction: 7,590,818 (17.6%)
send_not_optimized_method_type: 4,749,678 (11.0%)
send_no_profiles: 2,893,666 ( 6.7%)
send_without_block_no_profiles: 1,757,109 ( 4.1%)
send_polymorphic: 719,562 ( 1.7%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,215):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,467 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,688):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,275 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,144,372):
compile_error: 6,769,688 (33.6%)
guard_type_failure: 5,169,204 (25.7%)
guard_shape_failure: 3,726,374 (18.5%)
unhandled_yarv_insn: 1,242,215 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 130,252 ( 0.6%)
patchpoint_method_redefined: 32,716 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 19 ( 0.0%)
send_count: 120,812,030
dynamic_send_count: 43,152,037 (35.7%)
optimized_send_count: 77,659,993 (64.3%)
iseq_optimized_send_count: 32,187,900 (26.6%)
inline_cfunc_optimized_send_count: 23,458,491 (19.4%)
non_variadic_cfunc_optimized_send_count: 17,327,499 (14.3%)
variadic_cfunc_optimized_send_count: 4,686,103 ( 3.9%)
dynamic_getivar_count: 13,023,424
dynamic_setivar_count: 12,310,991
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 9,012ms
profile_time: 104ms
gc_time: 44ms
invalidation_time: 239ms
vm_write_pc_count: 113,648,665
vm_write_sp_count: 111,205,997
vm_write_locals_count: 111,205,997
vm_write_stack_count: 111,205,997
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 23,052,288
side_exit_count: 20,144,372
total_insn_count: 926,090,214
vm_insn_count: 297,647,811
zjit_insn_count: 628,442,403
ratio_in_zjit: 67.9%
```
</details>
|
|
This is only really called a lot in the benchmark harness, as far as I
can tell.
|
|
These bring `send_without_block_no_profiles` numbers down more.
On lobsters:
Before: send_without_block_no_profiles: 1,293,375
After: send_without_block_no_profiles: 998,724
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,335):
Hash#[]: 4,519,774 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,929 ( 5.5%)
Hash#[]=: 742,941 ( 4.8%)
Regexp#match?: 399,889 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.4%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,340 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,478):
Hash#[]: 4,519,784 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,929 ( 5.4%)
Hash#[]=: 743,120 ( 4.7%)
Regexp#match?: 399,889 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,347 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,648):
iseq: 2,271,904 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,702 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,096):
invokesuper: 2,373,391 (55.3%)
invokeblock: 811,872 (18.9%)
sendforward: 505,448 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,225 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,463):
send_without_block_polymorphic: 9,721,727 (37.6%)
send_no_profiles: 5,894,760 (22.8%)
send_without_block_not_optimized_method_type: 4,523,648 (17.5%)
not_optimized_instruction: 4,293,096 (16.6%)
send_without_block_no_profiles: 1,293,386 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,502):
register_spill_on_alloc: 3,457,791 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,787):
compile_error: 3,752,502 (34.6%)
guard_type_failure: 2,638,903 (24.3%)
guard_shape_failure: 1,917,195 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,787 ( 4.9%)
unhandled_kwarg: 421,943 ( 3.9%)
patchpoint: 370,449 ( 3.4%)
unknown_newarray_send: 314,785 ( 2.9%)
unhandled_splat: 122,060 ( 1.1%)
unhandled_hir_insn: 76,396 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 504 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,801
dynamic_send_count: 25,824,463 (38.6%)
optimized_send_count: 41,121,338 (61.4%)
iseq_optimized_send_count: 18,587,368 (27.8%)
inline_cfunc_optimized_send_count: 6,958,635 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,155 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,180 ( 4.0%)
dynamic_getivar_count: 7,365,975
dynamic_setivar_count: 7,245,897
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 760ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 55ms
vm_write_pc_count: 64,284,053
vm_write_sp_count: 62,940,297
vm_write_locals_count: 62,940,297
vm_write_stack_count: 62,940,297
vm_write_to_parent_iseq_local_count: 292,446
vm_read_from_parent_iseq_local_count: 6,470,923
code_region_bytes: 23,019,520
side_exit_count: 10,860,787
total_insn_count: 517,576,320
vm_insn_count: 163,188,910
zjit_insn_count: 354,387,410
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 15,740,856):
Hash#[]: 4,519,792 (28.7%)
Kernel#is_a?: 1,030,776 ( 6.5%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 742,914 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 353,775 ( 2.2%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,788 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.1%)
BasicObject#!=: 175,996 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (70.9% of total 15,902,999):
Hash#[]: 4,519,802 (28.4%)
Kernel#is_a?: 1,212,667 ( 7.6%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 743,093 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,657 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.1%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,180 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.0%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,637):
iseq: 2,271,900 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,695 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,128):
invokesuper: 2,373,401 (55.3%)
invokeblock: 811,890 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,530,605):
send_without_block_polymorphic: 9,722,499 (38.1%)
send_no_profiles: 5,894,763 (23.1%)
send_without_block_not_optimized_method_type: 4,523,637 (17.7%)
not_optimized_instruction: 4,293,128 (16.8%)
send_without_block_no_profiles: 998,732 ( 3.9%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,500):
register_spill_on_alloc: 3,457,792 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,360 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,797):
compile_error: 3,752,500 (34.6%)
guard_type_failure: 2,638,909 (24.3%)
guard_shape_failure: 1,917,203 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,474 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,067 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 469 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,326
dynamic_send_count: 25,530,605 (38.1%)
optimized_send_count: 41,414,721 (61.9%)
iseq_optimized_send_count: 18,587,439 (27.8%)
inline_cfunc_optimized_send_count: 7,086,426 (10.6%)
non_variadic_cfunc_optimized_send_count: 13,076,682 (19.5%)
variadic_cfunc_optimized_send_count: 2,664,174 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,954
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 748ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 58ms
vm_write_pc_count: 64,155,801
vm_write_sp_count: 62,812,041
vm_write_locals_count: 62,812,041
vm_write_stack_count: 62,812,041
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,470,939
code_region_bytes: 23,052,288
side_exit_count: 10,860,797
total_insn_count: 517,576,915
vm_insn_count: 163,192,099
zjit_insn_count: 354,384,816
ratio_in_zjit: 68.5%
```
|
|
These bring `send_without_block_no_profiles` numbers down dramatically.
On lobsters:
Before: send_without_block_no_profiles: 3,466,375
After: send_without_block_no_profiles: 1,293,375
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 14,174,061):
Hash#[]: 4,519,771 (31.9%)
Kernel#is_a?: 1,030,757 ( 7.3%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 353,775 ( 2.5%)
Hash#key?: 349,125 ( 2.5%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.4%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.7%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 181,793 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,341 ( 1.3%)
BasicObject#!=: 175,996 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,600 ( 1.2%)
String#==: 157,734 ( 1.1%)
Top-20 not annotated C methods (71.1% of total 14,336,035):
Hash#[]: 4,519,781 (31.5%)
Kernel#is_a?: 1,212,647 ( 8.5%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 361,013 ( 2.5%)
Hash#key?: 349,125 ( 2.4%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.3%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.6%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 191,662 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,348 ( 1.3%)
BasicObject#!=: 176,180 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,634 ( 1.2%)
String#==: 163,666 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,536,895):
iseq: 2,281,897 (50.3%)
bmethod: 985,679 (21.7%)
optimized: 952,914 (21.0%)
alias: 310,745 ( 6.8%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,123):
invokesuper: 2,373,396 (55.3%)
invokeblock: 811,891 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,227 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 27,795,022):
send_without_block_polymorphic: 9,505,835 (34.2%)
send_no_profiles: 5,894,763 (21.2%)
send_without_block_not_optimized_method_type: 4,536,895 (16.3%)
not_optimized_instruction: 4,293,123 (15.4%)
send_without_block_no_profiles: 3,466,407 (12.5%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,918 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,391):
register_spill_on_alloc: 3,457,680 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,852,021):
compile_error: 3,752,391 (34.6%)
guard_type_failure: 2,630,877 (24.2%)
guard_shape_failure: 1,917,208 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,989 ( 3.9%)
patchpoint: 369,799 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,062 ( 1.1%)
unhandled_hir_insn: 76,394 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 468 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,989,407
dynamic_send_count: 27,795,022 (41.5%)
optimized_send_count: 39,194,385 (58.5%)
iseq_optimized_send_count: 18,060,194 (27.0%)
inline_cfunc_optimized_send_count: 6,960,130 (10.4%)
non_variadic_cfunc_optimized_send_count: 11,523,682 (17.2%)
variadic_cfunc_optimized_send_count: 2,650,379 ( 4.0%)
dynamic_getivar_count: 7,365,982
dynamic_setivar_count: 7,245,929
compiled_iseq_count: 4,795
failed_iseq_count: 449
compile_time: 846ms
profile_time: 12ms
gc_time: 9ms
invalidation_time: 61ms
vm_write_pc_count: 64,326,442
vm_write_sp_count: 62,982,524
vm_write_locals_count: 62,982,524
vm_write_stack_count: 62,982,524
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,471,353
code_region_bytes: 22,708,224
side_exit_count: 10,852,021
total_insn_count: 517,550,288
vm_insn_count: 162,946,459
zjit_insn_count: 354,603,829
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,343):
Hash#[]: 4,519,778 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,931 ( 5.5%)
Hash#[]=: 742,938 ( 4.8%)
Regexp#match?: 399,886 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.4%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,486):
Hash#[]: 4,519,788 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,931 ( 5.4%)
Hash#[]=: 743,117 ( 4.7%)
Regexp#match?: 399,886 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.3%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,650):
iseq: 2,271,911 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,696 (21.0%)
alias: 310,747 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,126):
invokesuper: 2,373,395 (55.3%)
invokeblock: 811,894 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,512):
send_without_block_polymorphic: 9,721,725 (37.6%)
send_no_profiles: 5,894,761 (22.8%)
send_without_block_not_optimized_method_type: 4,523,650 (17.5%)
not_optimized_instruction: 4,293,126 (16.6%)
send_without_block_no_profiles: 1,293,404 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,504):
register_spill_on_alloc: 3,457,793 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,754):
compile_error: 3,752,504 (34.6%)
guard_type_failure: 2,638,901 (24.3%)
guard_shape_failure: 1,917,198 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,785 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,447 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,065 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 463 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,926
dynamic_send_count: 25,824,512 (38.6%)
optimized_send_count: 41,121,414 (61.4%)
iseq_optimized_send_count: 18,587,430 (27.8%)
inline_cfunc_optimized_send_count: 6,958,641 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,166 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,177 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,942
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 852ms
profile_time: 13ms
gc_time: 11ms
invalidation_time: 63ms
vm_write_pc_count: 64,284,194
vm_write_sp_count: 62,940,427
vm_write_locals_count: 62,940,427
vm_write_stack_count: 62,940,427
vm_write_to_parent_iseq_local_count: 292,447
vm_read_from_parent_iseq_local_count: 6,470,931
code_region_bytes: 23,019,520
side_exit_count: 10,860,754
total_insn_count: 517,576,267
vm_insn_count: 163,188,187
zjit_insn_count: 354,388,080
ratio_in_zjit: 68.5%
```
|
|
* ZJIT: Profile opt_aref
* ZJIT: Add test for opt_aref
* ZJIT: Move test and add hash opt test
* ZJIT: Update zjit bindgen
* ZJIT: Add inspect calls to opt_aref tests
|
|
to fix inconsistent and wrong current namespace detections.
This includes:
* Moving load_path and related things from rb_vm_t to rb_namespace_t to simplify
accessing those values via namespace (instead of accessing either vm or ns)
* Initializing root_namespace earlier and consolidate builtin_namespace into root_namespace
* Adding VM_FRAME_FLAG_NS_REQUIRE for checkpoints to detect a namespace to load/require files
* Removing implicit refinements in the root namespace which was used to determine
the namespace to be loaded (replaced by VM_FRAME_FLAG_NS_REQUIRE)
* Removing namespaces from rb_proc_t because its namespace can be identified by lexical context
* Starting to use ep[VM_ENV_DATA_INDEX_SPECVAL] to store the current namespace when
the frame type is MAGIC_TOP or MAGIC_CLASS (block handlers don't exist in this case)
|
|
* failing test for ObjToString optimization with GuardType
* profile ObjToString receiver and rewrite with guard
* adjust integration tests for objtostring type guard optimization
* Implement new GuardTypeNot HIR; objtostring sends to_s directly on profiled nonstrings
* codegen for GuardTypeNot
* typo fixes
* better name for tests; fix side exit reason for GuardTypeNot
* revert accidental change
* make bindgen
* Fix is_string to identify subclasses of String; fix codegen for identifying if val is String
|
|
It was used to let MJIT override the leafness of the instruction when it
decides to remove check_ints for it. Now that MJIT is gone, nobody needs
to "override" the leafness using this.
|
|
Specialize monomorphic `GetIvar` into:
* `GuardType(HeapObject)`
* `GuardShape`
* `LoadIvarEmbedded` or `LoadIvarExtended`
This requires profiling self for `getinstancevariable` (it's not on the operand
stack).
This also optimizes `GetIvar`s that happen as a result of inlining
`attr_reader` and `attr_accessor`.
Also move some (newly) shared JIT helpers into jit.c.
|
|
* ZJIT: Profile and specialize Array#empty?
* ZJIT: Specialize BasicObject#==
* ZJIT: Specialize Hash#empty?
* ZJIT: Specialize BasicObject#!
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
|
|
When these instructions were introduced it was common to read from a
hash with mutable string literals. However, these days, I think these
instructions are fairly rare.
I tested this with the lobsters benchmark, and saw no difference in
speed. In order to be sure, I tracked down every use of this
instruction in the lobsters benchmark, and there were only 4 places
where it was used.
Additionally, this patch fixes a case where "chilled strings" should
emit a warning but they don't.
```ruby
class Foo
def self.[](x)= x.gsub!(/hello/, "hi")
end
Foo["hello world"]
```
Removing these instructions shows this warning:
```
> ./miniruby -vw test.rb
ruby 3.5.0dev (2025-08-25T21:36:50Z rm-opt_aref_with dca08e286c) +PRISM [arm64-darwin24]
test.rb:2: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```
[Feature #21553]
|
|
|
|
This allows ZJIT to profile `nil?` calls and create type guards for
its receiver.
- Add `zjit_profile` to `opt_nil_p` insn
- Start profiling `opt_nil_p` calls
- Use `runtime_exact_ruby_class` instead of `exact_ruby_class` to determine
the profiled receiver class
|
|
Always use opt_new behavior regardless of tracepoint state.
Notes:
Merged: https://github.com/ruby/ruby/pull/13232
|
|
We don't calculate the correct argc so the bookkeeping slot is something
else (unexpected) instead of Qnil (expected).
Notes:
Merged: https://github.com/ruby/ruby/pull/13198
|
|
|
|
This commit inlines instructions for Class#new. To make this work, we
added a new YARV instructions, `opt_new`. `opt_new` checks whether or
not the `new` method is the default allocator method. If it is, it
allocates the object, and pushes the instance on the stack. If not, the
instruction jumps to the "slow path" method call instructions.
Old instructions:
```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
New instructions:
```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).
Here is an example that uses `Hash.new(capacity: 0)`:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.004 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.076 s … 1.088 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.9 ms ± 3.5 ms [User: 622.7 ms, System: 4.8 ms]
Range (min … max): 622.7 ms … 633.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
This commit changes the backtrace for `initialize`:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```
It also increases memory usage for calls to `new` by 122 bytes:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```
Thanks to @ko1 for coming up with this idea!
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
|
Split out from the CCall changes since we discussed during pairing that
this is useful to unblock some other changes. No tests since no one
consumes this profiling data yet.
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
(https://github.com/Shopify/zjit/pull/24)
* Profile instructions for fixnum arithmetic
* Drop PartialEq from Type
* Do not push PatchPoint onto the stack
* Avoid pushing the output of the guards
* Pop operands after guards
* Test HIR from profiled runs
* Implement Display for new instructions
* Drop unused FIXNUM_BITS
* Use a Rust function to split lines
* Use Display for GuardType operands
Co-authored-by: Max Bernstein <max@bernsteinbear.com>
* Fix tests with Display-ed values
---------
Co-authored-by: Max Bernstein <max@bernsteinbear.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
(https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter
* Rename FixnumPlus to FixnumAdd
* Update a comment about Invalidate
* Rename Guard to GuardType
* Rename Invalidate to PatchPoint
* Drop unneeded debug!()
* Plan on profiling the types
* Use the output of GuardType as type refined outputs
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
|
|
The forwarding instructions should use the `ec` parameter passed to
vm_exec_core instead of trying to look up the EC via `GET_EC()`. It's
cheaper to get the local than to try looking up a global
Notes:
Merged: https://github.com/ruby/ruby/pull/12931
|
|
* Add opt_duparray_send insn to skip the allocation on `#include?`
If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.
This adds a BOP for include? and tracks redefinition for it on Array.
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* YJIT: Implement opt_duparray_send include_p
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* Update opt_newarray_send to support simple forms of include?(arg)
Similar to opt_duparray_send but for non-static arrays.
* YJIT: Implement opt_newarray_send include_p
---------
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
If a Hash which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_hash_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/11406
|
|
If an Array which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_ary_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/11406
|
|
The pushtoarraykwsplat instruction was designed to replace newarraykwsplat,
and we now meet the condition for deletion mentioned in
77c1233f79a0f96a081b70da533fbbde4f3037fa.
Notes:
Merged: https://github.com/ruby/ruby/pull/11371
Merged-By: XrXr
|
|
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.
$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'
before:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 newarray 1
0009 putchilledstring "E*"
0011 getlocal_WC_0 b@0
0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG>
0015 leave
```
after:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 putchilledstring "E*"
0009 getlocal b@0, 0
0012 opt_newarray_send 3, 5
0015 leave
```
Notes:
Merged: https://github.com/ruby/ruby/pull/11249
|
|
This should make the diff more clean
|
|
This commit adds `sendforward` and `invokesuperforward` for forwarding
parameters to calls
Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
|
|
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
|
|
[Feature #20205]
Now that chilled strings no longer appear as frozen, there is no
need to offer an API to check for chilled strings.
We however need to change `rb_check_frozen_internal` to no
longer be a macro, as it needs to check for chilled strings.
|
|
Instructions for this code:
```ruby
# frozen_string_literal: true
[a].pack("C")
```
Before this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray 1
0005 putobject "C"
0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```
After this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject "C"
0005 opt_newarray_send 2, :pack
0008 leave
```
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
[Feature #20205]
As a path toward enabling frozen string literals by default in the future,
this commit introduce "chilled strings". From a user perspective chilled
strings pretend to be frozen, but on the first attempt to mutate them,
they lose their frozen status and emit a warning rather than to raise a
`FrozenError`.
Implementation wise, `rb_compile_option_struct.frozen_string_literal` is
no longer a boolean but a tri-state of `enabled/disabled/unset`.
When code is compiled with frozen string literals neither explictly enabled
or disabled, string literals are compiled with a new `putchilledstring`
instruction. This instruction is identical to `putstring` except it marks
the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags.
Chilled strings have the `FL_FREEZE` flag as to minimize the need to check
for chilled strings across the codebase, and to improve compatibility with
C extensions.
Notes:
- `String#freeze`: clears the chilled flag.
- `String#-@`: acts as if the string was mutable.
- `String#+@`: acts as if the string was mutable.
- `String#clone`: copies the chilled flag.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
|
|
This is designed to replace the newarraykwsplat instruction, which is
no longer used in the parse.y compiler after this commit. This avoids
an unnecessary array allocation in the case where ARGSCAT is followed
by LIST with keyword:
```ruby
a = []
kw = {}
[*a, 1, **kw]
```
Previous Instructions:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 newhash 0 ( 2)[Li]
0006 setlocal_WC_0 kw@1
0008 getlocal_WC_0 a@0 ( 3)[Li]
0010 splatarray true
0012 putobject_INT2FIX_1_
0013 putspecialobject 1
0015 newhash 0
0017 getlocal_WC_0 kw@1
0019 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0021 newarraykwsplat 2
0023 concattoarray
0024 leave
```
New Instructions:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 newhash 0 ( 2)[Li]
0006 setlocal_WC_0 kw@1
0008 getlocal_WC_0 a@0 ( 3)[Li]
0010 splatarray true
0012 putobject_INT2FIX_1_
0013 pushtoarray 1
0015 putspecialobject 1
0017 newhash 0
0019 getlocal_WC_0 kw@1
0021 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0023 pushtoarraykwsplat
0024 leave
```
pushtoarraykwsplat is designed to be simpler than newarraykwsplat.
It does not take a variable number of arguments from the stack, it
pops the top of the stack, and appends it to the second from the top,
unless the top of the stack is an empty hash.
During this work, I found the ARGSPUSH followed by HASH with keyword
did not compile correctly, as it pushed the generated hash to the
array even if the hash was empty. This fixes the behavior, to use
pushtoarraykwsplat instead of pushtoarray in that case:
```ruby
a = []
kw = {}
[*a, **kw]
[{}] # Before
[] # After
```
This does not remove the newarraykwsplat instruction, as it is still
referenced in the prism compiler (which should be updated similar
to this), YJIT (only in the bindings, it does not appear to be
implemented), and RJIT (in a couple comments). After those are
updated, the newarraykwsplat instruction should be removed.
|
|
Previously, `**nil` by itself worked, but if you add a block argument,
it raised a conversion error. The presence of the block argument
shouldn't change how keyword splat works.
See: <https://bugs.ruby-lang.org/issues/20064>
|
|
|
|
This instruction is similar to concattoarray, but it takes the
number of arguments to push to the array, removes that number
of arguments from the stack, and adds them to the array now at
the top of the stack.
This allows `f(*a, 1)` to allocate only a single array on the
caller side (which can be reused on the callee side in the case of
`def f(*a)`). Prior to this commit, `f(*a, 1)` would generate
3 arrays:
* a dupped by splatarray true
* 1 wrapped in array by newarray
* a dupped again by concatarray
Instructions Before for `a = []; f(*a, 1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 putobject_INT2FIX_1_
0010 newarray 1
0012 concatarray
0013 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|FCALL>
0015 leave
```
Instructions After for `a = []; f(*a, 1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 putobject_INT2FIX_1_
0010 pushtoarray 1
0012 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0014 leave
```
With these changes, method calls to Ruby methods should
implicitly allocate at most one array.
Ignore typeprof bundled gem failure due to unrecognized instruction.
|
|
This instruction is similar to concatarray, but assumes the first
object is already an array, and appends to it directly. This is
different than concatarray, which will create a new array instead
of appending to an existing array.
Additionally, for both concatarray and concattoarray, if the second
argument cannot be converted to an array, then just push it onto
the array, instead of creating a new array to wrap it, and then
using concat array. This saves an array allocation in that case.
This allows `f(*a, *a, *1)` to allocate only a single array on the
caller side (which can be reused on the callee side in the case of
`def f(*a)`). Prior to this commit, `f(*a, *a, *1)` would generate
4 arrays:
* a dupped by splatarray true
* a dupped again by first concatarray
* 1 wrapped in array by third splatarray
* result of [*a, *a] dupped by second concatarray
Instructions Before for `a = []; f(*a, *a, *1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 getlocal_WC_0 a@0
0011 splatarray false
0013 concatarray
0014 putobject_INT2FIX_1_
0015 splatarray false
0017 concatarray
0018 opt_send_without_block <calldata!mid:g, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0020 leave
```
Instructions After for `a = []; f(*a, *a, *1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 getlocal_WC_0 a@0
0011 concattoarray
0012 putobject_INT2FIX_1_
0013 concattoarray
0014 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0016 leave
```
|
|
Previously, block.to_proc was called first, by vm_caller_setup_arg_block.
kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex.
This adds a splatkw instruction that is inserted before sends with
ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the
KW_SPLAT_MUT case, because then you know the value is a hash, and you don't
need to call to_hash on it.
The splatkw instruction checks whether the second to top block is a hash,
and if not, replaces it with the value of calling to_hash on it (using
rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and
KW_SPLAT, second to top is the keyword splat, and top is the passed block.
|
|
The expandarray instruction can allocate an array, which can trigger
a GC compaction. However, since it does not increment the sp until the
end of the instruction, the objects it places on the stack are not
marked or reference updated by the GC, which can cause the objects to
move which leaves broken or incorrect objects on the stack.
This commit changes the instruction to be handles_sp so the sp is
incremented inside of the instruction right after the object is written
on the stack.
|
|
* YJIT: Fallback opt_getconstant_path for const_missing
* Fix a comment [ci skip]
* Remove a wrapper function
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
This commit introduces a new instruction `opt_newarray_send` which is
used when there is an array literal followed by either the `hash`,
`min`, or `max` method.
```
[a, b, c].hash
```
Will emit an `opt_newarray_send` instruction. This instruction falls
back to a method call if the "interested" method has been monkey
patched.
Here are some examples of the instructions generated:
```
$ ./miniruby --dump=insns -e '[@a, @b].max'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :max
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].min'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :min
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].hash'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :hash
0009 leave
```
[Feature #18897] [ruby-core:109147]
Co-authored-by: John Hawthorn <jhawthorn@github.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/6090
|
|
I closed https://github.com/ruby/ruby/pull/7543, but part of the diff
seems useful regardless, so I extracted it.
|
|
* Break up jit_exec from vm_sendish
* YJIT: Implement throw instruction
* YJIT: Explain what rb_vm_throw does [ci skip]
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
because non-opt instructions should contain `_` char.
Notes:
Merged: https://github.com/ruby/ruby/pull/7485
|