summaryrefslogtreecommitdiff
path: root/insns.def
AgeCommit message (Collapse)Author
2025-11-25ZJIT: Specialize setinstancevariable when ivar is already in shape (#15290)Max Bernstein
Don't support shape transitions for now.
2025-11-21ZJIT: Specialize monomorphic DefinedIvar (#15281)Max Bernstein
This lets us constant-fold common monomorphic cases.
2025-11-07renaming internal data structures and functions from namespace to boxSatoshi Tagomori
2025-11-05ZJIT: Profile specific objects for invokeblock (#15051)Max Bernstein
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks. For lobsters: ``` Top-6 invokeblock handler (100.0% of total 1,064,155): megamorphic: 494,931 (46.5%) monomorphic_iseq: 337,171 (31.7%) polymorphic: 113,381 (10.7%) monomorphic_ifunc: 52,260 ( 4.9%) monomorphic_other: 38,970 ( 3.7%) no_profiles: 27,442 ( 2.6%) ``` For railsbench: ``` Top-6 invokeblock handler (100.0% of total 2,529,104): monomorphic_iseq: 834,452 (33.0%) megamorphic: 818,347 (32.4%) polymorphic: 632,273 (25.0%) monomorphic_ifunc: 224,243 ( 8.9%) monomorphic_other: 19,595 ( 0.8%) no_profiles: 194 ( 0.0%) ``` For shipit: ``` Top-6 invokeblock handler (100.0% of total 2,104,148): megamorphic: 1,269,889 (60.4%) polymorphic: 411,475 (19.6%) no_profiles: 173,367 ( 8.2%) monomorphic_other: 118,619 ( 5.6%) monomorphic_iseq: 84,891 ( 4.0%) monomorphic_ifunc: 45,907 ( 2.2%) ``` Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
2025-10-20ZJIT: Optimize send with block into CCallWithFrame (#14863)Stan Lo
Since `Send` has a block iseq, I updated `CCallWithFrame` to take an optional `blockiseq` as well, and then generate `CCallWithFrame` for `Send` when the condition is right. ## Stats `liquid-render` Benchmark | Metric | Before | After | Change | |----------------------|--------------------|--------------------|--------------------- | | send_no_profiles | 3,209,418 (34.1%) | 4,119 (0.1%) | -3,205,299 (-99.9%) | | dynamic_send_count | 9,410,758 (23.1%) | 6,459,678 (15.9%) | -2,951,080 (-31.4%) | | optimized_send_count | 31,269,388 (76.9%) | 34,220,474 (84.1%) | +2,951,086 (+9.4%) | `lobsters` Benchmark | Metric | Before | After | Change | |----------------------|------------|------------|---------------------| | send_no_profiles | 10,769,052 | 2,902,865 | -7,866,187 (-73.0%) | | dynamic_send_count | 45,673,185 | 42,880,160 | -2,793,025 (-6.1%) | | optimized_send_count | 75,142,407 | 78,378,514 | +3,236,107 (+4.3%) | ### `liquid-render` Before <details> ``` Average of last 22, non-warmup iters: 262ms ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (96.9% of total 10,370,809): Kernel#respond_to?: 5,069,204 (48.9%) Hash#key?: 2,394,488 (23.1%) Set#include?: 778,429 ( 7.5%) String#===: 326,134 ( 3.1%) String#<<: 203,231 ( 2.0%) Integer#<<: 166,768 ( 1.6%) Kernel#is_a?: 164,272 ( 1.6%) Kernel#format: 124,262 ( 1.2%) Integer#/: 124,262 ( 1.2%) Array#<<: 115,325 ( 1.1%) Regexp.last_match: 94,862 ( 0.9%) Hash#[]=: 88,485 ( 0.9%) String#start_with?: 55,933 ( 0.5%) CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%) Array#shift: 55,298 ( 0.5%) Regexp#===: 48,928 ( 0.5%) String#=~: 48,477 ( 0.5%) Array#unshift: 47,331 ( 0.5%) String#empty?: 42,870 ( 0.4%) Array#push: 41,215 ( 0.4%) Top-20 not annotated C methods (97.1% of total 10,394,421): Kernel#respond_to?: 5,069,204 (48.8%) Hash#key?: 2,394,488 (23.0%) Set#include?: 778,429 ( 7.5%) String#===: 326,134 ( 3.1%) Kernel#is_a?: 208,664 ( 2.0%) String#<<: 203,231 ( 2.0%) Integer#<<: 166,768 ( 1.6%) Integer#/: 124,262 ( 1.2%) Kernel#format: 124,262 ( 1.2%) Array#<<: 115,325 ( 1.1%) Regexp.last_match: 94,862 ( 0.9%) Hash#[]=: 88,485 ( 0.9%) String#start_with?: 55,933 ( 0.5%) CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%) Array#shift: 55,298 ( 0.5%) Regexp#===: 48,928 ( 0.5%) String#=~: 48,477 ( 0.5%) Array#unshift: 47,331 ( 0.5%) String#empty?: 42,870 ( 0.4%) Array#push: 41,215 ( 0.4%) Top-2 not optimized method types for send (100.0% of total 2,382): cfunc: 1,196 (50.2%) iseq: 1,186 (49.8%) Top-4 not optimized method types for send_without_block (100.0% of total 2,561,006): iseq: 2,442,091 (95.4%) optimized: 118,882 ( 4.6%) alias: 20 ( 0.0%) null: 13 ( 0.0%) Top-9 not optimized instructions (100.0% of total 685,128): invokeblock: 227,376 (33.2%) opt_neq: 166,471 (24.3%) opt_and: 166,471 (24.3%) opt_eq: 66,721 ( 9.7%) invokesuper: 39,363 ( 5.7%) opt_le: 16,278 ( 2.4%) opt_minus: 1,574 ( 0.2%) opt_send_without_block: 772 ( 0.1%) opt_or: 102 ( 0.0%) Top-8 send fallback reasons (100.0% of total 9,410,758): send_no_profiles: 3,209,418 (34.1%) send_without_block_polymorphic: 2,858,558 (30.4%) send_without_block_not_optimized_method_type: 2,561,006 (27.2%) not_optimized_instruction: 685,128 ( 7.3%) send_without_block_no_profiles: 91,913 ( 1.0%) send_not_optimized_method_type: 2,382 ( 0.0%) obj_to_string_not_string: 2,352 ( 0.0%) send_without_block_cfunc_array_variadic: 1 ( 0.0%) Top-3 unhandled YARV insns (100.0% of total 83,682): getclassvariable: 83,431 (99.7%) once: 137 ( 0.2%) getconstant: 114 ( 0.1%) Top-3 compile error reasons (100.0% of total 5,431,910): register_spill_on_alloc: 4,665,393 (85.9%) exception_handler: 766,347 (14.1%) register_spill_on_ccall: 170 ( 0.0%) Top-11 side exit reasons (100.0% of total 14,635,508): compile_error: 5,431,910 (37.1%) guard_shape_failure: 3,436,341 (23.5%) guard_type_failure: 2,545,791 (17.4%) unhandled_splat: 2,162,907 (14.8%) unhandled_kwarg: 952,568 ( 6.5%) unhandled_yarv_insn: 83,682 ( 0.6%) unhandled_hir_insn: 19,112 ( 0.1%) patchpoint_stable_constant_names: 1,608 ( 0.0%) obj_to_string_fallback: 902 ( 0.0%) patchpoint_method_redefined: 599 ( 0.0%) block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%) send_count: 40,680,153 dynamic_send_count: 9,410,758 (23.1%) optimized_send_count: 31,269,395 (76.9%) iseq_optimized_send_count: 13,886,902 (34.1%) inline_cfunc_optimized_send_count: 7,011,684 (17.2%) non_variadic_cfunc_optimized_send_count: 4,670,333 (11.5%) variadic_cfunc_optimized_send_count: 5,700,476 (14.0%) dynamic_getivar_count: 1,144,613 dynamic_setivar_count: 950,830 compiled_iseq_count: 402 failed_iseq_count: 48 compile_time: 976ms profile_time: 3,223ms gc_time: 22ms invalidation_time: 0ms vm_write_pc_count: 37,744,491 vm_write_sp_count: 37,511,865 vm_write_locals_count: 37,511,865 vm_write_stack_count: 37,511,865 vm_write_to_parent_iseq_local_count: 558,177 vm_read_from_parent_iseq_local_count: 14,317,032 code_region_bytes: 2,211,840 side_exit_count: 14,635,508 total_insn_count: 476,097,972 vm_insn_count: 253,795,154 zjit_insn_count: 222,302,818 ratio_in_zjit: 46.7% ``` </details> ### `liquid-render` After <details> ``` Average of last 21, non-warmup iters: 272ms ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (96.8% of total 10,093,966): Kernel#respond_to?: 4,932,224 (48.9%) Hash#key?: 2,329,928 (23.1%) Set#include?: 757,389 ( 7.5%) String#===: 317,494 ( 3.1%) String#<<: 197,831 ( 2.0%) Integer#<<: 162,268 ( 1.6%) Kernel#is_a?: 159,892 ( 1.6%) Kernel#format: 120,902 ( 1.2%) Integer#/: 120,902 ( 1.2%) Array#<<: 112,225 ( 1.1%) Regexp.last_match: 92,382 ( 0.9%) Hash#[]=: 86,145 ( 0.9%) String#start_with?: 54,953 ( 0.5%) Array#shift: 54,038 ( 0.5%) CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%) Regexp#===: 47,848 ( 0.5%) String#=~: 47,237 ( 0.5%) Array#unshift: 46,051 ( 0.5%) String#empty?: 41,750 ( 0.4%) Array#push: 40,115 ( 0.4%) Top-20 not annotated C methods (97.1% of total 10,116,938): Kernel#respond_to?: 4,932,224 (48.8%) Hash#key?: 2,329,928 (23.0%) Set#include?: 757,389 ( 7.5%) String#===: 317,494 ( 3.1%) Kernel#is_a?: 203,084 ( 2.0%) String#<<: 197,831 ( 2.0%) Integer#<<: 162,268 ( 1.6%) Kernel#format: 120,902 ( 1.2%) Integer#/: 120,902 ( 1.2%) Array#<<: 112,225 ( 1.1%) Regexp.last_match: 92,382 ( 0.9%) Hash#[]=: 86,145 ( 0.9%) String#start_with?: 54,953 ( 0.5%) Array#shift: 54,038 ( 0.5%) CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%) Regexp#===: 47,848 ( 0.5%) String#=~: 47,237 ( 0.5%) Array#unshift: 46,051 ( 0.5%) String#empty?: 41,750 ( 0.4%) Array#push: 40,115 ( 0.4%) Top-2 not optimized method types for send (100.0% of total 182,938): iseq: 178,414 (97.5%) cfunc: 4,524 ( 2.5%) Top-4 not optimized method types for send_without_block (100.0% of total 2,492,246): iseq: 2,376,511 (95.4%) optimized: 115,702 ( 4.6%) alias: 20 ( 0.0%) null: 13 ( 0.0%) Top-9 not optimized instructions (100.0% of total 667,727): invokeblock: 221,375 (33.2%) opt_neq: 161,971 (24.3%) opt_and: 161,971 (24.3%) opt_eq: 64,921 ( 9.7%) invokesuper: 39,243 ( 5.9%) opt_le: 15,838 ( 2.4%) opt_minus: 1,534 ( 0.2%) opt_send_without_block: 772 ( 0.1%) opt_or: 102 ( 0.0%) Top-9 send fallback reasons (100.0% of total 6,287,956): send_without_block_polymorphic: 2,782,058 (44.2%) send_without_block_not_optimized_method_type: 2,492,246 (39.6%) not_optimized_instruction: 667,727 (10.6%) send_not_optimized_method_type: 182,938 ( 2.9%) send_without_block_no_profiles: 89,613 ( 1.4%) send_polymorphic: 66,962 ( 1.1%) send_no_profiles: 4,059 ( 0.1%) obj_to_string_not_string: 2,352 ( 0.0%) send_without_block_cfunc_array_variadic: 1 ( 0.0%) Top-3 unhandled YARV insns (100.0% of total 81,482): getclassvariable: 81,231 (99.7%) once: 137 ( 0.2%) getconstant: 114 ( 0.1%) Top-3 compile error reasons (100.0% of total 5,286,310): register_spill_on_alloc: 4,540,413 (85.9%) exception_handler: 745,727 (14.1%) register_spill_on_ccall: 170 ( 0.0%) Top-12 side exit reasons (100.0% of total 14,244,881): compile_error: 5,286,310 (37.1%) guard_shape_failure: 3,346,873 (23.5%) guard_type_failure: 2,477,071 (17.4%) unhandled_splat: 2,104,447 (14.8%) unhandled_kwarg: 926,828 ( 6.5%) unhandled_yarv_insn: 81,482 ( 0.6%) unhandled_hir_insn: 18,672 ( 0.1%) patchpoint_stable_constant_names: 1,608 ( 0.0%) obj_to_string_fallback: 902 ( 0.0%) patchpoint_method_redefined: 599 ( 0.0%) block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%) interrupt: 1 ( 0.0%) send_count: 39,591,410 dynamic_send_count: 6,287,956 (15.9%) optimized_send_count: 33,303,454 (84.1%) iseq_optimized_send_count: 13,514,283 (34.1%) inline_cfunc_optimized_send_count: 6,823,745 (17.2%) non_variadic_cfunc_optimized_send_count: 7,417,432 (18.7%) variadic_cfunc_optimized_send_count: 5,547,994 (14.0%) dynamic_getivar_count: 1,110,647 dynamic_setivar_count: 927,309 compiled_iseq_count: 403 failed_iseq_count: 48 compile_time: 968ms profile_time: 3,547ms gc_time: 22ms invalidation_time: 0ms vm_write_pc_count: 36,735,108 vm_write_sp_count: 36,508,262 vm_write_locals_count: 36,508,262 vm_write_stack_count: 36,508,262 vm_write_to_parent_iseq_local_count: 543,097 vm_read_from_parent_iseq_local_count: 13,930,672 code_region_bytes: 2,228,224 side_exit_count: 14,244,881 total_insn_count: 463,357,969 vm_insn_count: 247,003,727 zjit_insn_count: 216,354,242 ratio_in_zjit: 46.7% ``` </details> ### `lobsters` Before <details> ``` Average of last 10, non-warmup iters: 898ms ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (61.3% of total 19,495,906): String#<<: 1,764,437 ( 9.1%) Kernel#is_a?: 1,615,120 ( 8.3%) Hash#[]=: 1,159,455 ( 5.9%) Regexp#match?: 777,496 ( 4.0%) String#empty?: 722,953 ( 3.7%) Hash#key?: 685,258 ( 3.5%) Kernel#respond_to?: 602,017 ( 3.1%) TrueClass#===: 447,671 ( 2.3%) FalseClass#===: 439,276 ( 2.3%) Array#include?: 426,758 ( 2.2%) Kernel#block_given?: 405,271 ( 2.1%) Hash#fetch: 382,302 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%) String#start_with?: 353,793 ( 1.8%) Kernel#kind_of?: 340,341 ( 1.7%) Kernel#dup: 328,162 ( 1.7%) String.new: 306,667 ( 1.6%) String#==: 287,549 ( 1.5%) BasicObject#!=: 284,642 ( 1.5%) String#length: 256,070 ( 1.3%) Top-20 not annotated C methods (62.4% of total 19,796,172): Kernel#is_a?: 1,993,676 (10.1%) String#<<: 1,764,437 ( 8.9%) Hash#[]=: 1,159,634 ( 5.9%) Regexp#match?: 777,496 ( 3.9%) String#empty?: 738,030 ( 3.7%) Hash#key?: 685,258 ( 3.5%) Kernel#respond_to?: 602,017 ( 3.0%) TrueClass#===: 447,671 ( 2.3%) FalseClass#===: 439,276 ( 2.2%) Array#include?: 426,758 ( 2.2%) Kernel#block_given?: 425,813 ( 2.2%) Hash#fetch: 382,302 ( 1.9%) ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%) String#start_with?: 353,793 ( 1.8%) Kernel#kind_of?: 340,375 ( 1.7%) Kernel#dup: 328,169 ( 1.7%) String.new: 306,667 ( 1.5%) String#==: 293,520 ( 1.5%) BasicObject#!=: 284,825 ( 1.4%) String#length: 256,070 ( 1.3%) Top-2 not optimized method types for send (100.0% of total 115,007): cfunc: 76,172 (66.2%) iseq: 38,835 (33.8%) Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641): iseq: 3,999,211 (50.0%) bmethod: 1,750,271 (21.9%) optimized: 1,653,426 (20.7%) alias: 591,342 ( 7.4%) null: 8,174 ( 0.1%) cfunc: 1,217 ( 0.0%) Top-13 not optimized instructions (100.0% of total 7,590,826): invokesuper: 4,335,446 (57.1%) invokeblock: 1,329,215 (17.5%) sendforward: 841,463 (11.1%) opt_eq: 810,614 (10.7%) opt_plus: 141,773 ( 1.9%) opt_minus: 52,270 ( 0.7%) opt_send_without_block: 43,248 ( 0.6%) opt_neq: 15,047 ( 0.2%) opt_mult: 13,824 ( 0.2%) opt_or: 7,451 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-9 send fallback reasons (100.0% of total 45,673,212): send_without_block_polymorphic: 17,390,335 (38.1%) send_no_profiles: 10,769,053 (23.6%) send_without_block_not_optimized_method_type: 8,003,641 (17.5%) not_optimized_instruction: 7,590,826 (16.6%) send_without_block_no_profiles: 1,757,109 ( 3.8%) send_not_optimized_method_type: 115,007 ( 0.3%) send_without_block_cfunc_array_variadic: 31,149 ( 0.1%) obj_to_string_not_string: 15,518 ( 0.0%) send_without_block_direct_too_many_args: 574 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 1,242,228): expandarray: 622,203 (50.1%) checkkeyword: 316,111 (25.4%) getclassvariable: 120,540 ( 9.7%) getblockparam: 88,480 ( 7.1%) invokesuperforward: 78,842 ( 6.3%) opt_duparray_send: 14,149 ( 1.1%) getconstant: 1,588 ( 0.1%) checkmatch: 288 ( 0.0%) once: 27 ( 0.0%) Top-3 compile error reasons (100.0% of total 6,769,693): register_spill_on_alloc: 6,188,305 (91.4%) register_spill_on_ccall: 347,108 ( 5.1%) exception_handler: 234,280 ( 3.5%) Top-17 side exit reasons (100.0% of total 20,142,827): compile_error: 6,769,693 (33.6%) guard_type_failure: 5,169,050 (25.7%) guard_shape_failure: 3,726,362 (18.5%) unhandled_yarv_insn: 1,242,228 ( 6.2%) block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%) unhandled_kwarg: 800,154 ( 4.0%) unknown_newarray_send: 539,317 ( 2.7%) patchpoint_stable_constant_names: 340,283 ( 1.7%) unhandled_splat: 229,440 ( 1.1%) unhandled_hir_insn: 147,351 ( 0.7%) patchpoint_no_singleton_class: 128,856 ( 0.6%) patchpoint_method_redefined: 32,718 ( 0.2%) block_param_proxy_modified: 25,274 ( 0.1%) patchpoint_no_ep_escape: 7,559 ( 0.0%) obj_to_string_fallback: 24 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) interrupt: 16 ( 0.0%) send_count: 120,815,640 dynamic_send_count: 45,673,212 (37.8%) optimized_send_count: 75,142,428 (62.2%) iseq_optimized_send_count: 32,188,039 (26.6%) inline_cfunc_optimized_send_count: 23,458,483 (19.4%) non_variadic_cfunc_optimized_send_count: 14,809,797 (12.3%) variadic_cfunc_optimized_send_count: 4,686,109 ( 3.9%) dynamic_getivar_count: 13,023,437 dynamic_setivar_count: 12,311,158 compiled_iseq_count: 4,806 failed_iseq_count: 466 compile_time: 8,943ms profile_time: 99ms gc_time: 45ms invalidation_time: 239ms vm_write_pc_count: 113,652,291 vm_write_sp_count: 111,209,623 vm_write_locals_count: 111,209,623 vm_write_stack_count: 111,209,623 vm_write_to_parent_iseq_local_count: 516,800 vm_read_from_parent_iseq_local_count: 11,225,587 code_region_bytes: 22,609,920 side_exit_count: 20,142,827 total_insn_count: 926,088,942 vm_insn_count: 297,636,255 zjit_insn_count: 628,452,687 ratio_in_zjit: 67.9% ``` </details> ### `lobsters` After <details> ``` Average of last 10, non-warmup iters: 919ms ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (61.3% of total 19,495,868): String#<<: 1,764,437 ( 9.1%) Kernel#is_a?: 1,615,110 ( 8.3%) Hash#[]=: 1,159,455 ( 5.9%) Regexp#match?: 777,496 ( 4.0%) String#empty?: 722,953 ( 3.7%) Hash#key?: 685,258 ( 3.5%) Kernel#respond_to?: 602,016 ( 3.1%) TrueClass#===: 447,671 ( 2.3%) FalseClass#===: 439,276 ( 2.3%) Array#include?: 426,758 ( 2.2%) Kernel#block_given?: 405,271 ( 2.1%) Hash#fetch: 382,302 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%) String#start_with?: 353,793 ( 1.8%) Kernel#kind_of?: 340,341 ( 1.7%) Kernel#dup: 328,162 ( 1.7%) String.new: 306,667 ( 1.6%) String#==: 287,545 ( 1.5%) BasicObject#!=: 284,642 ( 1.5%) String#length: 256,070 ( 1.3%) Top-20 not annotated C methods (62.4% of total 19,796,134): Kernel#is_a?: 1,993,666 (10.1%) String#<<: 1,764,437 ( 8.9%) Hash#[]=: 1,159,634 ( 5.9%) Regexp#match?: 777,496 ( 3.9%) String#empty?: 738,030 ( 3.7%) Hash#key?: 685,258 ( 3.5%) Kernel#respond_to?: 602,016 ( 3.0%) TrueClass#===: 447,671 ( 2.3%) FalseClass#===: 439,276 ( 2.2%) Array#include?: 426,758 ( 2.2%) Kernel#block_given?: 425,813 ( 2.2%) Hash#fetch: 382,302 ( 1.9%) ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%) String#start_with?: 353,793 ( 1.8%) Kernel#kind_of?: 340,375 ( 1.7%) Kernel#dup: 328,169 ( 1.7%) String.new: 306,667 ( 1.5%) String#==: 293,516 ( 1.5%) BasicObject#!=: 284,825 ( 1.4%) String#length: 256,070 ( 1.3%) Top-4 not optimized method types for send (100.0% of total 4,749,678): iseq: 2,563,391 (54.0%) cfunc: 2,064,888 (43.5%) alias: 118,577 ( 2.5%) null: 2,822 ( 0.1%) Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641): iseq: 3,999,211 (50.0%) bmethod: 1,750,271 (21.9%) optimized: 1,653,426 (20.7%) alias: 591,342 ( 7.4%) null: 8,174 ( 0.1%) cfunc: 1,217 ( 0.0%) Top-13 not optimized instructions (100.0% of total 7,590,818): invokesuper: 4,335,442 (57.1%) invokeblock: 1,329,215 (17.5%) sendforward: 841,463 (11.1%) opt_eq: 810,610 (10.7%) opt_plus: 141,773 ( 1.9%) opt_minus: 52,270 ( 0.7%) opt_send_without_block: 43,248 ( 0.6%) opt_neq: 15,047 ( 0.2%) opt_mult: 13,824 ( 0.2%) opt_or: 7,451 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-10 send fallback reasons (100.0% of total 43,152,037): send_without_block_polymorphic: 17,390,322 (40.3%) send_without_block_not_optimized_method_type: 8,003,641 (18.5%) not_optimized_instruction: 7,590,818 (17.6%) send_not_optimized_method_type: 4,749,678 (11.0%) send_no_profiles: 2,893,666 ( 6.7%) send_without_block_no_profiles: 1,757,109 ( 4.1%) send_polymorphic: 719,562 ( 1.7%) send_without_block_cfunc_array_variadic: 31,149 ( 0.1%) obj_to_string_not_string: 15,518 ( 0.0%) send_without_block_direct_too_many_args: 574 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 1,242,215): expandarray: 622,203 (50.1%) checkkeyword: 316,111 (25.4%) getclassvariable: 120,540 ( 9.7%) getblockparam: 88,467 ( 7.1%) invokesuperforward: 78,842 ( 6.3%) opt_duparray_send: 14,149 ( 1.1%) getconstant: 1,588 ( 0.1%) checkmatch: 288 ( 0.0%) once: 27 ( 0.0%) Top-3 compile error reasons (100.0% of total 6,769,688): register_spill_on_alloc: 6,188,305 (91.4%) register_spill_on_ccall: 347,108 ( 5.1%) exception_handler: 234,275 ( 3.5%) Top-17 side exit reasons (100.0% of total 20,144,372): compile_error: 6,769,688 (33.6%) guard_type_failure: 5,169,204 (25.7%) guard_shape_failure: 3,726,374 (18.5%) unhandled_yarv_insn: 1,242,215 ( 6.2%) block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%) unhandled_kwarg: 800,154 ( 4.0%) unknown_newarray_send: 539,317 ( 2.7%) patchpoint_stable_constant_names: 340,283 ( 1.7%) unhandled_splat: 229,440 ( 1.1%) unhandled_hir_insn: 147,351 ( 0.7%) patchpoint_no_singleton_class: 130,252 ( 0.6%) patchpoint_method_redefined: 32,716 ( 0.2%) block_param_proxy_modified: 25,274 ( 0.1%) patchpoint_no_ep_escape: 7,559 ( 0.0%) obj_to_string_fallback: 24 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) interrupt: 19 ( 0.0%) send_count: 120,812,030 dynamic_send_count: 43,152,037 (35.7%) optimized_send_count: 77,659,993 (64.3%) iseq_optimized_send_count: 32,187,900 (26.6%) inline_cfunc_optimized_send_count: 23,458,491 (19.4%) non_variadic_cfunc_optimized_send_count: 17,327,499 (14.3%) variadic_cfunc_optimized_send_count: 4,686,103 ( 3.9%) dynamic_getivar_count: 13,023,424 dynamic_setivar_count: 12,310,991 compiled_iseq_count: 4,806 failed_iseq_count: 466 compile_time: 9,012ms profile_time: 104ms gc_time: 44ms invalidation_time: 239ms vm_write_pc_count: 113,648,665 vm_write_sp_count: 111,205,997 vm_write_locals_count: 111,205,997 vm_write_stack_count: 111,205,997 vm_write_to_parent_iseq_local_count: 516,800 vm_read_from_parent_iseq_local_count: 11,225,587 code_region_bytes: 23,052,288 side_exit_count: 20,144,372 total_insn_count: 926,090,214 vm_insn_count: 297,647,811 zjit_insn_count: 628,442,403 ratio_in_zjit: 67.9% ``` </details>
2025-10-15ZJIT: Profile opt_succ and inline Integer#succ for Fixnum (#14846)Max Bernstein
This is only really called a lot in the benchmark harness, as far as I can tell.
2025-10-14ZJIT: Profile opt_size, opt_length, opt_regexpmatch2 (#14837)Max Bernstein
These bring `send_without_block_no_profiles` numbers down more. On lobsters: Before: send_without_block_no_profiles: 1,293,375 After: send_without_block_no_profiles: 998,724 all stats before: ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (71.1% of total 15,575,335): Hash#[]: 4,519,774 (29.0%) Kernel#is_a?: 1,030,758 ( 6.6%) String#<<: 851,929 ( 5.5%) Hash#[]=: 742,941 ( 4.8%) Regexp#match?: 399,889 ( 2.6%) String#empty?: 353,775 ( 2.3%) Hash#key?: 349,129 ( 2.2%) String#start_with?: 334,961 ( 2.2%) Kernel#respond_to?: 316,527 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,381 ( 1.4%) Hash#fetch: 204,702 ( 1.3%) Kernel#block_given?: 181,792 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%) Kernel#dup: 179,340 ( 1.2%) BasicObject#!=: 175,997 ( 1.1%) Class#new: 168,078 ( 1.1%) Kernel#kind_of?: 165,600 ( 1.1%) Top-20 not annotated C methods (71.6% of total 15,737,478): Hash#[]: 4,519,784 (28.7%) Kernel#is_a?: 1,212,649 ( 7.7%) String#<<: 851,929 ( 5.4%) Hash#[]=: 743,120 ( 4.7%) Regexp#match?: 399,889 ( 2.5%) String#empty?: 361,013 ( 2.3%) Hash#key?: 349,129 ( 2.2%) String#start_with?: 334,961 ( 2.1%) Kernel#respond_to?: 316,527 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,381 ( 1.3%) Hash#fetch: 204,702 ( 1.3%) Kernel#block_given?: 191,661 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%) Kernel#dup: 179,347 ( 1.1%) BasicObject#!=: 176,181 ( 1.1%) Class#new: 168,078 ( 1.1%) Kernel#kind_of?: 165,634 ( 1.1%) Top-2 not optimized method types for send (100.0% of total 72,318): cfunc: 48,055 (66.4%) iseq: 24,263 (33.6%) Top-6 not optimized method types for send_without_block (100.0% of total 4,523,648): iseq: 2,271,904 (50.2%) bmethod: 985,636 (21.8%) optimized: 949,702 (21.0%) alias: 310,746 ( 6.9%) null: 5,106 ( 0.1%) cfunc: 554 ( 0.0%) Top-13 not optimized instructions (100.0% of total 4,293,096): invokesuper: 2,373,391 (55.3%) invokeblock: 811,872 (18.9%) sendforward: 505,448 (11.8%) opt_eq: 451,754 (10.5%) opt_plus: 74,403 ( 1.7%) opt_minus: 36,225 ( 0.8%) opt_send_without_block: 21,792 ( 0.5%) opt_neq: 7,231 ( 0.2%) opt_mult: 6,752 ( 0.2%) opt_or: 3,753 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-9 send fallback reasons (100.0% of total 25,824,463): send_without_block_polymorphic: 9,721,727 (37.6%) send_no_profiles: 5,894,760 (22.8%) send_without_block_not_optimized_method_type: 4,523,648 (17.5%) not_optimized_instruction: 4,293,096 (16.6%) send_without_block_no_profiles: 1,293,386 ( 5.0%) send_not_optimized_method_type: 72,318 ( 0.3%) send_without_block_cfunc_array_variadic: 15,134 ( 0.1%) obj_to_string_not_string: 9,765 ( 0.0%) send_without_block_direct_too_many_args: 629 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 690,482): expandarray: 328,490 (47.6%) checkkeyword: 190,694 (27.6%) getclassvariable: 59,901 ( 8.7%) invokesuperforward: 49,503 ( 7.2%) getblockparam: 48,651 ( 7.0%) opt_duparray_send: 11,978 ( 1.7%) getconstant: 952 ( 0.1%) checkmatch: 290 ( 0.0%) once: 23 ( 0.0%) Top-3 compile error reasons (100.0% of total 3,752,502): register_spill_on_alloc: 3,457,791 (92.1%) register_spill_on_ccall: 176,348 ( 4.7%) exception_handler: 118,363 ( 3.2%) Top-14 side exit reasons (100.0% of total 10,860,787): compile_error: 3,752,502 (34.6%) guard_type_failure: 2,638,903 (24.3%) guard_shape_failure: 1,917,195 (17.7%) unhandled_yarv_insn: 690,482 ( 6.4%) block_param_proxy_not_iseq_or_ifunc: 535,787 ( 4.9%) unhandled_kwarg: 421,943 ( 3.9%) patchpoint: 370,449 ( 3.4%) unknown_newarray_send: 314,785 ( 2.9%) unhandled_splat: 122,060 ( 1.1%) unhandled_hir_insn: 76,396 ( 0.7%) block_param_proxy_modified: 19,193 ( 0.2%) obj_to_string_fallback: 566 ( 0.0%) interrupt: 504 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) send_count: 66,945,801 dynamic_send_count: 25,824,463 (38.6%) optimized_send_count: 41,121,338 (61.4%) iseq_optimized_send_count: 18,587,368 (27.8%) inline_cfunc_optimized_send_count: 6,958,635 (10.4%) non_variadic_cfunc_optimized_send_count: 12,911,155 (19.3%) variadic_cfunc_optimized_send_count: 2,664,180 ( 4.0%) dynamic_getivar_count: 7,365,975 dynamic_setivar_count: 7,245,897 compiled_iseq_count: 4,794 failed_iseq_count: 450 compile_time: 760ms profile_time: 9ms gc_time: 8ms invalidation_time: 55ms vm_write_pc_count: 64,284,053 vm_write_sp_count: 62,940,297 vm_write_locals_count: 62,940,297 vm_write_stack_count: 62,940,297 vm_write_to_parent_iseq_local_count: 292,446 vm_read_from_parent_iseq_local_count: 6,470,923 code_region_bytes: 23,019,520 side_exit_count: 10,860,787 total_insn_count: 517,576,320 vm_insn_count: 163,188,910 zjit_insn_count: 354,387,410 ratio_in_zjit: 68.5% ``` all stats after: ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (70.4% of total 15,740,856): Hash#[]: 4,519,792 (28.7%) Kernel#is_a?: 1,030,776 ( 6.5%) String#<<: 851,940 ( 5.4%) Hash#[]=: 742,914 ( 4.7%) Regexp#match?: 399,887 ( 2.5%) String#empty?: 353,775 ( 2.2%) Hash#key?: 349,139 ( 2.2%) String#start_with?: 334,961 ( 2.1%) Kernel#respond_to?: 316,529 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,381 ( 1.3%) Hash#fetch: 204,702 ( 1.3%) Kernel#block_given?: 181,788 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%) Kernel#dup: 179,341 ( 1.1%) BasicObject#!=: 175,996 ( 1.1%) Class#new: 168,079 ( 1.1%) Kernel#kind_of?: 165,600 ( 1.1%) Top-20 not annotated C methods (70.9% of total 15,902,999): Hash#[]: 4,519,802 (28.4%) Kernel#is_a?: 1,212,667 ( 7.6%) String#<<: 851,940 ( 5.4%) Hash#[]=: 743,093 ( 4.7%) Regexp#match?: 399,887 ( 2.5%) String#empty?: 361,013 ( 2.3%) Hash#key?: 349,139 ( 2.2%) String#start_with?: 334,961 ( 2.1%) Kernel#respond_to?: 316,529 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,381 ( 1.3%) Hash#fetch: 204,702 ( 1.3%) Kernel#block_given?: 191,657 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.1%) Kernel#dup: 179,348 ( 1.1%) BasicObject#!=: 176,180 ( 1.1%) Class#new: 168,079 ( 1.1%) Kernel#kind_of?: 165,634 ( 1.0%) Top-2 not optimized method types for send (100.0% of total 72,318): cfunc: 48,055 (66.4%) iseq: 24,263 (33.6%) Top-6 not optimized method types for send_without_block (100.0% of total 4,523,637): iseq: 2,271,900 (50.2%) bmethod: 985,636 (21.8%) optimized: 949,695 (21.0%) alias: 310,746 ( 6.9%) null: 5,106 ( 0.1%) cfunc: 554 ( 0.0%) Top-13 not optimized instructions (100.0% of total 4,293,128): invokesuper: 2,373,401 (55.3%) invokeblock: 811,890 (18.9%) sendforward: 505,449 (11.8%) opt_eq: 451,754 (10.5%) opt_plus: 74,403 ( 1.7%) opt_minus: 36,228 ( 0.8%) opt_send_without_block: 21,792 ( 0.5%) opt_neq: 7,231 ( 0.2%) opt_mult: 6,752 ( 0.2%) opt_or: 3,753 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-9 send fallback reasons (100.0% of total 25,530,605): send_without_block_polymorphic: 9,722,499 (38.1%) send_no_profiles: 5,894,763 (23.1%) send_without_block_not_optimized_method_type: 4,523,637 (17.7%) not_optimized_instruction: 4,293,128 (16.8%) send_without_block_no_profiles: 998,732 ( 3.9%) send_not_optimized_method_type: 72,318 ( 0.3%) send_without_block_cfunc_array_variadic: 15,134 ( 0.1%) obj_to_string_not_string: 9,765 ( 0.0%) send_without_block_direct_too_many_args: 629 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 690,482): expandarray: 328,490 (47.6%) checkkeyword: 190,694 (27.6%) getclassvariable: 59,901 ( 8.7%) invokesuperforward: 49,503 ( 7.2%) getblockparam: 48,651 ( 7.0%) opt_duparray_send: 11,978 ( 1.7%) getconstant: 952 ( 0.1%) checkmatch: 290 ( 0.0%) once: 23 ( 0.0%) Top-3 compile error reasons (100.0% of total 3,752,500): register_spill_on_alloc: 3,457,792 (92.1%) register_spill_on_ccall: 176,348 ( 4.7%) exception_handler: 118,360 ( 3.2%) Top-14 side exit reasons (100.0% of total 10,860,797): compile_error: 3,752,500 (34.6%) guard_type_failure: 2,638,909 (24.3%) guard_shape_failure: 1,917,203 (17.7%) unhandled_yarv_insn: 690,482 ( 6.4%) block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%) unhandled_kwarg: 421,947 ( 3.9%) patchpoint: 370,474 ( 3.4%) unknown_newarray_send: 314,786 ( 2.9%) unhandled_splat: 122,067 ( 1.1%) unhandled_hir_insn: 76,395 ( 0.7%) block_param_proxy_modified: 19,193 ( 0.2%) obj_to_string_fallback: 566 ( 0.0%) interrupt: 469 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) send_count: 66,945,326 dynamic_send_count: 25,530,605 (38.1%) optimized_send_count: 41,414,721 (61.9%) iseq_optimized_send_count: 18,587,439 (27.8%) inline_cfunc_optimized_send_count: 7,086,426 (10.6%) non_variadic_cfunc_optimized_send_count: 13,076,682 (19.5%) variadic_cfunc_optimized_send_count: 2,664,174 ( 4.0%) dynamic_getivar_count: 7,365,985 dynamic_setivar_count: 7,245,954 compiled_iseq_count: 4,794 failed_iseq_count: 450 compile_time: 748ms profile_time: 9ms gc_time: 8ms invalidation_time: 58ms vm_write_pc_count: 64,155,801 vm_write_sp_count: 62,812,041 vm_write_locals_count: 62,812,041 vm_write_stack_count: 62,812,041 vm_write_to_parent_iseq_local_count: 292,448 vm_read_from_parent_iseq_local_count: 6,470,939 code_region_bytes: 23,052,288 side_exit_count: 10,860,797 total_insn_count: 517,576,915 vm_insn_count: 163,192,099 zjit_insn_count: 354,384,816 ratio_in_zjit: 68.5% ```
2025-10-14ZJIT: Profile opt_ltlt and opt_aset (#14834)Max Bernstein
These bring `send_without_block_no_profiles` numbers down dramatically. On lobsters: Before: send_without_block_no_profiles: 3,466,375 After: send_without_block_no_profiles: 1,293,375 all stats before: ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (70.4% of total 14,174,061): Hash#[]: 4,519,771 (31.9%) Kernel#is_a?: 1,030,757 ( 7.3%) Regexp#match?: 399,885 ( 2.8%) String#empty?: 353,775 ( 2.5%) Hash#key?: 349,125 ( 2.5%) Hash#[]=: 344,348 ( 2.4%) String#start_with?: 334,961 ( 2.4%) Kernel#respond_to?: 316,527 ( 2.2%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%) TrueClass#===: 235,770 ( 1.7%) FalseClass#===: 231,143 ( 1.6%) Array#include?: 211,383 ( 1.5%) Hash#fetch: 204,702 ( 1.4%) Kernel#block_given?: 181,793 ( 1.3%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%) Kernel#dup: 179,341 ( 1.3%) BasicObject#!=: 175,996 ( 1.2%) Class#new: 168,079 ( 1.2%) Kernel#kind_of?: 165,600 ( 1.2%) String#==: 157,734 ( 1.1%) Top-20 not annotated C methods (71.1% of total 14,336,035): Hash#[]: 4,519,781 (31.5%) Kernel#is_a?: 1,212,647 ( 8.5%) Regexp#match?: 399,885 ( 2.8%) String#empty?: 361,013 ( 2.5%) Hash#key?: 349,125 ( 2.4%) Hash#[]=: 344,348 ( 2.4%) String#start_with?: 334,961 ( 2.3%) Kernel#respond_to?: 316,527 ( 2.2%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%) TrueClass#===: 235,770 ( 1.6%) FalseClass#===: 231,143 ( 1.6%) Array#include?: 211,383 ( 1.5%) Hash#fetch: 204,702 ( 1.4%) Kernel#block_given?: 191,662 ( 1.3%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%) Kernel#dup: 179,348 ( 1.3%) BasicObject#!=: 176,180 ( 1.2%) Class#new: 168,079 ( 1.2%) Kernel#kind_of?: 165,634 ( 1.2%) String#==: 163,666 ( 1.1%) Top-2 not optimized method types for send (100.0% of total 72,318): cfunc: 48,055 (66.4%) iseq: 24,263 (33.6%) Top-6 not optimized method types for send_without_block (100.0% of total 4,536,895): iseq: 2,281,897 (50.3%) bmethod: 985,679 (21.7%) optimized: 952,914 (21.0%) alias: 310,745 ( 6.8%) null: 5,106 ( 0.1%) cfunc: 554 ( 0.0%) Top-13 not optimized instructions (100.0% of total 4,293,123): invokesuper: 2,373,396 (55.3%) invokeblock: 811,891 (18.9%) sendforward: 505,449 (11.8%) opt_eq: 451,754 (10.5%) opt_plus: 74,403 ( 1.7%) opt_minus: 36,227 ( 0.8%) opt_send_without_block: 21,792 ( 0.5%) opt_neq: 7,231 ( 0.2%) opt_mult: 6,752 ( 0.2%) opt_or: 3,753 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-9 send fallback reasons (100.0% of total 27,795,022): send_without_block_polymorphic: 9,505,835 (34.2%) send_no_profiles: 5,894,763 (21.2%) send_without_block_not_optimized_method_type: 4,536,895 (16.3%) not_optimized_instruction: 4,293,123 (15.4%) send_without_block_no_profiles: 3,466,407 (12.5%) send_not_optimized_method_type: 72,318 ( 0.3%) send_without_block_cfunc_array_variadic: 15,134 ( 0.1%) obj_to_string_not_string: 9,918 ( 0.0%) send_without_block_direct_too_many_args: 629 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 690,482): expandarray: 328,490 (47.6%) checkkeyword: 190,694 (27.6%) getclassvariable: 59,901 ( 8.7%) invokesuperforward: 49,503 ( 7.2%) getblockparam: 48,651 ( 7.0%) opt_duparray_send: 11,978 ( 1.7%) getconstant: 952 ( 0.1%) checkmatch: 290 ( 0.0%) once: 23 ( 0.0%) Top-3 compile error reasons (100.0% of total 3,752,391): register_spill_on_alloc: 3,457,680 (92.1%) register_spill_on_ccall: 176,348 ( 4.7%) exception_handler: 118,363 ( 3.2%) Top-14 side exit reasons (100.0% of total 10,852,021): compile_error: 3,752,391 (34.6%) guard_type_failure: 2,630,877 (24.2%) guard_shape_failure: 1,917,208 (17.7%) unhandled_yarv_insn: 690,482 ( 6.4%) block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%) unhandled_kwarg: 421,989 ( 3.9%) patchpoint: 369,799 ( 3.4%) unknown_newarray_send: 314,786 ( 2.9%) unhandled_splat: 122,062 ( 1.1%) unhandled_hir_insn: 76,394 ( 0.7%) block_param_proxy_modified: 19,193 ( 0.2%) obj_to_string_fallback: 566 ( 0.0%) interrupt: 468 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) send_count: 66,989,407 dynamic_send_count: 27,795,022 (41.5%) optimized_send_count: 39,194,385 (58.5%) iseq_optimized_send_count: 18,060,194 (27.0%) inline_cfunc_optimized_send_count: 6,960,130 (10.4%) non_variadic_cfunc_optimized_send_count: 11,523,682 (17.2%) variadic_cfunc_optimized_send_count: 2,650,379 ( 4.0%) dynamic_getivar_count: 7,365,982 dynamic_setivar_count: 7,245,929 compiled_iseq_count: 4,795 failed_iseq_count: 449 compile_time: 846ms profile_time: 12ms gc_time: 9ms invalidation_time: 61ms vm_write_pc_count: 64,326,442 vm_write_sp_count: 62,982,524 vm_write_locals_count: 62,982,524 vm_write_stack_count: 62,982,524 vm_write_to_parent_iseq_local_count: 292,448 vm_read_from_parent_iseq_local_count: 6,471,353 code_region_bytes: 22,708,224 side_exit_count: 10,852,021 total_insn_count: 517,550,288 vm_insn_count: 162,946,459 zjit_insn_count: 354,603,829 ratio_in_zjit: 68.5% ``` all stats after: ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (71.1% of total 15,575,343): Hash#[]: 4,519,778 (29.0%) Kernel#is_a?: 1,030,758 ( 6.6%) String#<<: 851,931 ( 5.5%) Hash#[]=: 742,938 ( 4.8%) Regexp#match?: 399,886 ( 2.6%) String#empty?: 353,775 ( 2.3%) Hash#key?: 349,127 ( 2.2%) String#start_with?: 334,961 ( 2.2%) Kernel#respond_to?: 316,529 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,380 ( 1.4%) Hash#fetch: 204,701 ( 1.3%) Kernel#block_given?: 181,792 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%) Kernel#dup: 179,341 ( 1.2%) BasicObject#!=: 175,997 ( 1.1%) Class#new: 168,079 ( 1.1%) Kernel#kind_of?: 165,600 ( 1.1%) Top-20 not annotated C methods (71.6% of total 15,737,486): Hash#[]: 4,519,788 (28.7%) Kernel#is_a?: 1,212,649 ( 7.7%) String#<<: 851,931 ( 5.4%) Hash#[]=: 743,117 ( 4.7%) Regexp#match?: 399,886 ( 2.5%) String#empty?: 361,013 ( 2.3%) Hash#key?: 349,127 ( 2.2%) String#start_with?: 334,961 ( 2.1%) Kernel#respond_to?: 316,529 ( 2.0%) ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%) TrueClass#===: 235,771 ( 1.5%) FalseClass#===: 231,144 ( 1.5%) Array#include?: 211,380 ( 1.3%) Hash#fetch: 204,701 ( 1.3%) Kernel#block_given?: 191,661 ( 1.2%) ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%) Kernel#dup: 179,348 ( 1.1%) BasicObject#!=: 176,181 ( 1.1%) Class#new: 168,079 ( 1.1%) Kernel#kind_of?: 165,634 ( 1.1%) Top-2 not optimized method types for send (100.0% of total 72,318): cfunc: 48,055 (66.4%) iseq: 24,263 (33.6%) Top-6 not optimized method types for send_without_block (100.0% of total 4,523,650): iseq: 2,271,911 (50.2%) bmethod: 985,636 (21.8%) optimized: 949,696 (21.0%) alias: 310,747 ( 6.9%) null: 5,106 ( 0.1%) cfunc: 554 ( 0.0%) Top-13 not optimized instructions (100.0% of total 4,293,126): invokesuper: 2,373,395 (55.3%) invokeblock: 811,894 (18.9%) sendforward: 505,449 (11.8%) opt_eq: 451,754 (10.5%) opt_plus: 74,403 ( 1.7%) opt_minus: 36,228 ( 0.8%) opt_send_without_block: 21,792 ( 0.5%) opt_neq: 7,231 ( 0.2%) opt_mult: 6,752 ( 0.2%) opt_or: 3,753 ( 0.1%) opt_lt: 348 ( 0.0%) opt_ge: 91 ( 0.0%) opt_gt: 36 ( 0.0%) Top-9 send fallback reasons (100.0% of total 25,824,512): send_without_block_polymorphic: 9,721,725 (37.6%) send_no_profiles: 5,894,761 (22.8%) send_without_block_not_optimized_method_type: 4,523,650 (17.5%) not_optimized_instruction: 4,293,126 (16.6%) send_without_block_no_profiles: 1,293,404 ( 5.0%) send_not_optimized_method_type: 72,318 ( 0.3%) send_without_block_cfunc_array_variadic: 15,134 ( 0.1%) obj_to_string_not_string: 9,765 ( 0.0%) send_without_block_direct_too_many_args: 629 ( 0.0%) Top-9 unhandled YARV insns (100.0% of total 690,482): expandarray: 328,490 (47.6%) checkkeyword: 190,694 (27.6%) getclassvariable: 59,901 ( 8.7%) invokesuperforward: 49,503 ( 7.2%) getblockparam: 48,651 ( 7.0%) opt_duparray_send: 11,978 ( 1.7%) getconstant: 952 ( 0.1%) checkmatch: 290 ( 0.0%) once: 23 ( 0.0%) Top-3 compile error reasons (100.0% of total 3,752,504): register_spill_on_alloc: 3,457,793 (92.1%) register_spill_on_ccall: 176,348 ( 4.7%) exception_handler: 118,363 ( 3.2%) Top-14 side exit reasons (100.0% of total 10,860,754): compile_error: 3,752,504 (34.6%) guard_type_failure: 2,638,901 (24.3%) guard_shape_failure: 1,917,198 (17.7%) unhandled_yarv_insn: 690,482 ( 6.4%) block_param_proxy_not_iseq_or_ifunc: 535,785 ( 4.9%) unhandled_kwarg: 421,947 ( 3.9%) patchpoint: 370,447 ( 3.4%) unknown_newarray_send: 314,786 ( 2.9%) unhandled_splat: 122,065 ( 1.1%) unhandled_hir_insn: 76,395 ( 0.7%) block_param_proxy_modified: 19,193 ( 0.2%) obj_to_string_fallback: 566 ( 0.0%) interrupt: 463 ( 0.0%) guard_type_not_failure: 22 ( 0.0%) send_count: 66,945,926 dynamic_send_count: 25,824,512 (38.6%) optimized_send_count: 41,121,414 (61.4%) iseq_optimized_send_count: 18,587,430 (27.8%) inline_cfunc_optimized_send_count: 6,958,641 (10.4%) non_variadic_cfunc_optimized_send_count: 12,911,166 (19.3%) variadic_cfunc_optimized_send_count: 2,664,177 ( 4.0%) dynamic_getivar_count: 7,365,985 dynamic_setivar_count: 7,245,942 compiled_iseq_count: 4,794 failed_iseq_count: 450 compile_time: 852ms profile_time: 13ms gc_time: 11ms invalidation_time: 63ms vm_write_pc_count: 64,284,194 vm_write_sp_count: 62,940,427 vm_write_locals_count: 62,940,427 vm_write_stack_count: 62,940,427 vm_write_to_parent_iseq_local_count: 292,447 vm_read_from_parent_iseq_local_count: 6,470,931 code_region_bytes: 23,019,520 side_exit_count: 10,860,754 total_insn_count: 517,576,267 vm_insn_count: 163,188,187 zjit_insn_count: 354,388,080 ratio_in_zjit: 68.5% ```
2025-10-09ZJIT: Profile opt_aref (#14778)Aiden Fox Ivey
* ZJIT: Profile opt_aref * ZJIT: Add test for opt_aref * ZJIT: Move test and add hash opt test * ZJIT: Update zjit bindgen * ZJIT: Add inspect calls to opt_aref tests
2025-09-29Update current namespace management by using control frames and lexical contextsSatoshi Tagomori
to fix inconsistent and wrong current namespace detections. This includes: * Moving load_path and related things from rb_vm_t to rb_namespace_t to simplify accessing those values via namespace (instead of accessing either vm or ns) * Initializing root_namespace earlier and consolidate builtin_namespace into root_namespace * Adding VM_FRAME_FLAG_NS_REQUIRE for checkpoints to detect a namespace to load/require files * Removing implicit refinements in the root namespace which was used to determine the namespace to be loaded (replaced by VM_FRAME_FLAG_NS_REQUIRE) * Removing namespaces from rb_proc_t because its namespace can be identified by lexical context * Starting to use ep[VM_ENV_DATA_INDEX_SPECVAL] to store the current namespace when the frame type is MAGIC_TOP or MAGIC_CLASS (block handlers don't exist in this case)
2025-09-09ZJIT: Optimize `ObjToString` with type guards (#14469)André Luiz Tiago Soares
* failing test for ObjToString optimization with GuardType * profile ObjToString receiver and rewrite with guard * adjust integration tests for objtostring type guard optimization * Implement new GuardTypeNot HIR; objtostring sends to_s directly on profiled nonstrings * codegen for GuardTypeNot * typo fixes * better name for tests; fix side exit reason for GuardTypeNot * revert accidental change * make bindgen * Fix is_string to identify subclasses of String; fix codegen for identifying if val is String
2025-09-05insns.def: Drop unused leafness_of_check_intsTakashi Kokubun
It was used to let MJIT override the leafness of the instruction when it decides to remove check_ints for it. Now that MJIT is gone, nobody needs to "override" the leafness using this.
2025-08-29ZJIT: Specialize monomorphic GetIvar (#14388)Max Bernstein
Specialize monomorphic `GetIvar` into: * `GuardType(HeapObject)` * `GuardShape` * `LoadIvarEmbedded` or `LoadIvarExtended` This requires profiling self for `getinstancevariable` (it's not on the operand stack). This also optimizes `GetIvar`s that happen as a result of inlining `attr_reader` and `attr_accessor`. Also move some (newly) shared JIT helpers into jit.c.
2025-08-27ZJIT: Specialize some Sends (#14363)Max Bernstein
* ZJIT: Profile and specialize Array#empty? * ZJIT: Specialize BasicObject#== * ZJIT: Specialize Hash#empty? * ZJIT: Specialize BasicObject#! Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2025-08-26Remove `opt_aref_with` and `opt_aset_with`Aaron Patterson
When these instructions were introduced it was common to read from a hash with mutable string literals. However, these days, I think these instructions are fairly rare. I tested this with the lobsters benchmark, and saw no difference in speed. In order to be sure, I tracked down every use of this instruction in the lobsters benchmark, and there were only 4 places where it was used. Additionally, this patch fixes a case where "chilled strings" should emit a warning but they don't. ```ruby class Foo def self.[](x)= x.gsub!(/hello/, "hi") end Foo["hello world"] ``` Removing these instructions shows this warning: ``` > ./miniruby -vw test.rb ruby 3.5.0dev (2025-08-25T21:36:50Z rm-opt_aref_with dca08e286c) +PRISM [arm64-darwin24] test.rb:2: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) ``` [Feature #21553]
2025-07-09ZJIT: Profile `opt_and` and `opt_or` instructionsStan Lo
2025-07-08ZJIT: Profile `nil?` callsStan Lo
This allows ZJIT to profile `nil?` calls and create type guards for its receiver. - Add `zjit_profile` to `opt_nil_p` insn - Start profiling `opt_nil_p` calls - Use `runtime_exact_ruby_class` instead of `exact_ruby_class` to determine the profiled receiver class
2025-05-15Maintain same behavior regardless of tracepoint stateAaron Patterson
Always use opt_new behavior regardless of tracepoint state. Notes: Merged: https://github.com/ruby/ruby/pull/13232
2025-04-29Don't support blockarg in opt_newMax Bernstein
We don't calculate the correct argc so the bookkeeping slot is something else (unexpected) instead of Qnil (expected). Notes: Merged: https://github.com/ruby/ruby/pull/13198
2025-04-25Deopt if iseq trace events are enabledAaron Patterson
2025-04-25Inline Class#new.Aaron Patterson
This commit inlines instructions for Class#new. To make this work, we added a new YARV instructions, `opt_new`. `opt_new` checks whether or not the `new` method is the default allocator method. If it is, it allocates the object, and pushes the instance on the stack. If not, the instruction jumps to the "slow path" method call instructions. Old instructions: ``` > ruby --dump=insns -e'Object.new' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path <ic:0 Object> ( 1)[Li] 0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE> 0004 leave ``` New instructions: ``` > ./miniruby --dump=insns -e'Object.new' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path <ic:0 Object> ( 1)[Li] 0002 putnil 0003 swap 0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11 0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE> 0009 jump 14 0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE> 0013 swap 0014 pop 0015 leave ``` This commit speeds up basic object allocation (`Foo.new`) by 60%, but classes that take keyword parameters see an even bigger benefit because no hash is allocated when instantiating the object (3x to 6x faster). Here is an example that uses `Hash.new(capacity: 0)`: ``` > hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' Time (mean ± σ): 1.082 s ± 0.004 s [User: 1.074 s, System: 0.008 s] Range (min … max): 1.076 s … 1.088 s 10 runs Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' Time (mean ± σ): 627.9 ms ± 3.5 ms [User: 622.7 ms, System: 4.8 ms] Range (min … max): 622.7 ms … 633.2 ms 10 runs Summary ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran 1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ``` This commit changes the backtrace for `initialize`: ``` aaron@tc ~/g/ruby (inline-new)> cat test.rb class Foo def initialize puts caller end end def hello Foo.new end hello aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] test.rb:8:in 'Class#new' test.rb:8:in 'Object#hello' test.rb:11:in '<main>' aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24] test.rb:8:in 'Object#hello' test.rb:11:in '<main>' ``` It also increases memory usage for calls to `new` by 122 bytes: ``` aaron@tc ~/g/ruby (inline-new)> cat test.rb require "objspace" class Foo def initialize puts caller end end def hello Foo.new end puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello))) aaron@tc ~/g/ruby (inline-new)> make runruby RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb 656 aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] 544 ``` Thanks to @ko1 for coming up with this idea! Co-Authored-By: John Hawthorn <john@hawthorn.email>
2025-04-18Add profiling for opt_send_without_blockAlan Wu
Split out from the CCall changes since we discussed during pairing that this is useful to unblock some other changes. No tests since no one consumes this profiling data yet. Notes: Merged: https://github.com/ruby/ruby/pull/13131
2025-04-18Profile instructions for fixnum arithmetic ↵Takashi Kokubun
(https://github.com/Shopify/zjit/pull/24) * Profile instructions for fixnum arithmetic * Drop PartialEq from Type * Do not push PatchPoint onto the stack * Avoid pushing the output of the guards * Pop operands after guards * Test HIR from profiled runs * Implement Display for new instructions * Drop unused FIXNUM_BITS * Use a Rust function to split lines * Use Display for GuardType operands Co-authored-by: Max Bernstein <max@bernsteinbear.com> * Fix tests with Display-ed values --------- Co-authored-by: Max Bernstein <max@bernsteinbear.com> Notes: Merged: https://github.com/ruby/ruby/pull/13131
2025-04-18Add zjit_* instructions to profile the interpreter ↵Takashi Kokubun
(https://github.com/Shopify/zjit/pull/16) * Add zjit_* instructions to profile the interpreter * Rename FixnumPlus to FixnumAdd * Update a comment about Invalidate * Rename Guard to GuardType * Rename Invalidate to PatchPoint * Drop unneeded debug!() * Plan on profiling the types * Use the output of GuardType as type refined outputs Notes: Merged: https://github.com/ruby/ruby/pull/13131
2025-03-18Adjust style [ci skip]Nobuyoshi Nakada
2025-03-13Use the EC parameter in instructions.Aaron Patterson
The forwarding instructions should use the `ec` parameter passed to vm_exec_core instead of trying to look up the EC via `GET_EC()`. It's cheaper to get the local than to try looking up a global Notes: Merged: https://github.com/ruby/ruby/pull/12931
2024-11-26Optimize instructions when creating an array just to call `include?` (#12123)Randy Stauner
* Add opt_duparray_send insn to skip the allocation on `#include?` If the method isn't going to modify the array we don't need to copy it. This avoids the allocation / array copy for things like `[:a, :b].include?(x)`. This adds a BOP for include? and tracks redefinition for it on Array. Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * YJIT: Implement opt_duparray_send include_p Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * Update opt_newarray_send to support simple forms of include?(arg) Similar to opt_duparray_send but for non-static arrays. * YJIT: Implement opt_newarray_send include_p --------- Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-09-05Optimized instruction for Hash#freezeÉtienne Barrié
If a Hash which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_hash_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-09-05Optimized instruction for Array#freezeÉtienne Barrié
If an Array which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_ary_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-08-13Delete newarraykwsplatAlan Wu
The pushtoarraykwsplat instruction was designed to replace newarraykwsplat, and we now meet the condition for deletion mentioned in 77c1233f79a0f96a081b70da533fbbde4f3037fa. Notes: Merged: https://github.com/ruby/ruby/pull/11371 Merged-By: XrXr
2024-07-29Expand opt_newarray_send to support Array#pack with buffer keyword argRandy Stauner
Use an enum for the method arg instead of needing to add an id that doesn't map to an actual method name. $ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)' before: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 newarray 1 0009 putchilledstring "E*" 0011 getlocal_WC_0 b@0 0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG> 0015 leave ``` after: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 putchilledstring "E*" 0009 getlocal b@0, 0 0012 opt_newarray_send 3, 5 0015 leave ``` Notes: Merged: https://github.com/ruby/ruby/pull/11249
2024-06-18Refactor so we don't have _cdAaron Patterson
This should make the diff more clean
2024-06-18Add two new instructions for forwarding callsAaron Patterson
This commit adds `sendforward` and `invokesuperforward` for forwarding parameters to calls Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
2024-06-18Optimized forwarding callers and calleesAaron Patterson
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(*a) = a def foo(...) list = [1, 2] bar(*list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | 5| delegatee(...) # CI2 (FORWARDING) | 6| end | 7| | 8| def caller | -> 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 -> 4| # | CI1 (argc: 2) 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | CI1 (argc: 2) -> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | self 9| delegator(1, 2) # CI1 (argc: 2) | 1 10| end | 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-02Stop exposing `rb_str_chilled_p`Jean Boussier
[Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings.
2024-05-23Introduce a specialize instruction for Array#packNobuyoshi Nakada
Instructions for this code: ```ruby # frozen_string_literal: true [a].pack("C") ``` Before this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 newarray 1 0005 putobject "C" 0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE> 0009 leave ``` After this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 putobject "C" 0005 opt_newarray_send 2, :pack 0008 leave ``` Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-03-19Implement chilled stringsÉtienne Barrié
[Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-02-20Add pushtoarraykwsplat instruction to avoid unnecessary array allocationJeremy Evans
This is designed to replace the newarraykwsplat instruction, which is no longer used in the parse.y compiler after this commit. This avoids an unnecessary array allocation in the case where ARGSCAT is followed by LIST with keyword: ```ruby a = [] kw = {} [*a, 1, **kw] ``` Previous Instructions: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 newhash 0 ( 2)[Li] 0006 setlocal_WC_0 kw@1 0008 getlocal_WC_0 a@0 ( 3)[Li] 0010 splatarray true 0012 putobject_INT2FIX_1_ 0013 putspecialobject 1 0015 newhash 0 0017 getlocal_WC_0 kw@1 0019 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE> 0021 newarraykwsplat 2 0023 concattoarray 0024 leave ``` New Instructions: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 newhash 0 ( 2)[Li] 0006 setlocal_WC_0 kw@1 0008 getlocal_WC_0 a@0 ( 3)[Li] 0010 splatarray true 0012 putobject_INT2FIX_1_ 0013 pushtoarray 1 0015 putspecialobject 1 0017 newhash 0 0019 getlocal_WC_0 kw@1 0021 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE> 0023 pushtoarraykwsplat 0024 leave ``` pushtoarraykwsplat is designed to be simpler than newarraykwsplat. It does not take a variable number of arguments from the stack, it pops the top of the stack, and appends it to the second from the top, unless the top of the stack is an empty hash. During this work, I found the ARGSPUSH followed by HASH with keyword did not compile correctly, as it pushed the generated hash to the array even if the hash was empty. This fixes the behavior, to use pushtoarraykwsplat instead of pushtoarray in that case: ```ruby a = [] kw = {} [*a, **kw] [{}] # Before [] # After ``` This does not remove the newarraykwsplat instruction, as it is still referenced in the prism compiler (which should be updated similar to this), YJIT (only in the bindings, it does not appear to be implemented), and RJIT (in a couple comments). After those are updated, the newarraykwsplat instruction should be removed.
2024-02-12Allow `foo(**nil, &block_arg)`Alan Wu
Previously, `**nil` by itself worked, but if you add a block argument, it raised a conversion error. The presence of the block argument shouldn't change how keyword splat works. See: <https://bugs.ruby-lang.org/issues/20064>
2024-01-30Use `UNDEF_P`Nobuyoshi Nakada
2024-01-24Add pushtoarray VM instructionJeremy Evans
This instruction is similar to concattoarray, but it takes the number of arguments to push to the array, removes that number of arguments from the stack, and adds them to the array now at the top of the stack. This allows `f(*a, 1)` to allocate only a single array on the caller side (which can be reused on the callee side in the case of `def f(*a)`). Prior to this commit, `f(*a, 1)` would generate 3 arrays: * a dupped by splatarray true * 1 wrapped in array by newarray * a dupped again by concatarray Instructions Before for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 newarray 1 0012 concatarray 0013 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|FCALL> 0015 leave ``` Instructions After for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 pushtoarray 1 0012 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0014 leave ``` With these changes, method calls to Ruby methods should implicitly allocate at most one array. Ignore typeprof bundled gem failure due to unrecognized instruction.
2024-01-24Add concattoarray VM instructionJeremy Evans
This instruction is similar to concatarray, but assumes the first object is already an array, and appends to it directly. This is different than concatarray, which will create a new array instead of appending to an existing array. Additionally, for both concatarray and concattoarray, if the second argument cannot be converted to an array, then just push it onto the array, instead of creating a new array to wrap it, and then using concat array. This saves an array allocation in that case. This allows `f(*a, *a, *1)` to allocate only a single array on the caller side (which can be reused on the callee side in the case of `def f(*a)`). Prior to this commit, `f(*a, *a, *1)` would generate 4 arrays: * a dupped by splatarray true * a dupped again by first concatarray * 1 wrapped in array by third splatarray * result of [*a, *a] dupped by second concatarray Instructions Before for `a = []; f(*a, *a, *1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 getlocal_WC_0 a@0 0011 splatarray false 0013 concatarray 0014 putobject_INT2FIX_1_ 0015 splatarray false 0017 concatarray 0018 opt_send_without_block <calldata!mid:g, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0020 leave ``` Instructions After for `a = []; f(*a, *a, *1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 getlocal_WC_0 a@0 0011 concattoarray 0012 putobject_INT2FIX_1_ 0013 concattoarray 0014 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0016 leave ```
2023-12-09Ensure f(**kw, &block) calls kw.to_hash before block.to_procJeremy Evans
Previously, block.to_proc was called first, by vm_caller_setup_arg_block. kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex. This adds a splatkw instruction that is inserted before sends with ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the KW_SPLAT_MUT case, because then you know the value is a hash, and you don't need to call to_hash on it. The splatkw instruction checks whether the second to top block is a hash, and if not, replaces it with the value of calling to_hash on it (using rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and KW_SPLAT, second to top is the keyword splat, and top is the passed block.
2023-12-01Make expandarray compaction safePeter Zhu
The expandarray instruction can allocate an array, which can trigger a GC compaction. However, since it does not increment the sp until the end of the instruction, the objects it places on the stack are not marked or reference updated by the GC, which can cause the objects to move which leaves broken or incorrect objects on the stack. This commit changes the instruction to be handles_sp so the sp is incremented inside of the instruction right after the object is written on the stack.
2023-10-13YJIT: Fallback opt_getconstant_path for const_missing (#8623)Takashi Kokubun
* YJIT: Fallback opt_getconstant_path for const_missing * Fix a comment [ci skip] * Remove a wrapper function
2023-04-21Remove unused opt_call_c_function insn (#7750)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2023-04-18Emit special instruction for array literal + .(hash|min|max)Aaron Patterson
This commit introduces a new instruction `opt_newarray_send` which is used when there is an array literal followed by either the `hash`, `min`, or `max` method. ``` [a, b, c].hash ``` Will emit an `opt_newarray_send` instruction. This instruction falls back to a method call if the "interested" method has been monkey patched. Here are some examples of the instructions generated: ``` $ ./miniruby --dump=insns -e '[@a, @b].max' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :max 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].min' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :min 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].hash' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :hash 0009 leave ``` [Feature #18897] [ruby-core:109147] Co-authored-by: John Hawthorn <jhawthorn@github.com> Notes: Merged: https://github.com/ruby/ruby/pull/6090
2023-03-16Refactor jit_func_t and jit_execTakashi Kokubun
I closed https://github.com/ruby/ruby/pull/7543, but part of the diff seems useful regardless, so I extracted it.
2023-03-14YJIT: Implement throw instruction (#7491)Takashi Kokubun
* Break up jit_exec from vm_sendish * YJIT: Implement throw instruction * YJIT: Explain what rb_vm_throw does [ci skip] Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2023-03-10rename `defined_ivar` to `definedivar`Koichi Sasada
because non-opt instructions should contain `_` char. Notes: Merged: https://github.com/ruby/ruby/pull/7485