summaryrefslogtreecommitdiff
path: root/zjit.rb
AgeCommit message (Collapse)Author
6 daysZJIT: Handle `nil` case for `getblockparamproxy` (#15986)Jeff Zhang
Resolves https://github.com/Shopify/ruby/issues/772 Adds profiling for the `getblockparamproxy` YARV instruction and handles the `nil` block case by pushing `nil` instead of the block proxy object, improves `ratio_in_zjit` a tiny bit (0.1%) Profiling data for `getblockparamproxy` on Lobsters ``` Top-6 getblockparamproxy handler (100.0% of total 3,353,291): polymorphic: 2,337,372 (69.7%) nil: 552,629 (16.5%) iseq: 259,636 ( 7.7%) no_profiles: 156,734 ( 4.7%) proc: 40,223 ( 1.2%) megamorphic: 6,697 ( 0.2%) ``` Lobsters benchmark stats: <details> <summary>Stats before (master):</summary> <p> ``` ❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters ***ZJIT: Printing ZJIT statistics on exit*** ... Top-20 side exit reasons (100.0% of total 15,338,024): guard_type_failure: 6,889,050 (44.9%) guard_shape_failure: 6,848,898 (44.7%) block_param_proxy_not_iseq_or_ifunc: 1,008,525 ( 6.6%) unhandled_hir_insn: 236,977 ( 1.5%) compile_error: 191,763 ( 1.3%) fixnum_mult_overflow: 50,739 ( 0.3%) block_param_proxy_modified: 28,119 ( 0.2%) patchpoint_stable_constant_names: 18,229 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) unhandled_block_arg: 13,782 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) unhandled_yarv_insn: 7,540 ( 0.0%) expandarray_failure: 4,533 ( 0.0%) guard_super_method_entry: 4,475 ( 0.0%) patchpoint_method_redefined: 1,207 ( 0.0%) patchpoint_no_singleton_class: 1,130 ( 0.0%) obj_to_string_fallback: 412 ( 0.0%) guard_less_failure: 163 ( 0.0%) interrupt: 82 ( 0.0%) ... ratio_in_zjit: 82.1% ``` </p> </details> <details> <summary>Stats after:</summary> <p> ``` ❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters ***ZJIT: Printing ZJIT statistics on exit*** ... Top-20 side exit reasons (100.0% of total 15,061,422): guard_type_failure: 6,892,934 (45.8%) guard_shape_failure: 6,850,512 (45.5%) block_param_proxy_not_iseq_or_ifunc: 549,823 ( 3.7%) unhandled_hir_insn: 236,979 ( 1.6%) compile_error: 191,782 ( 1.3%) unhandled_yarv_insn: 128,695 ( 0.9%) block_param_proxy_not_nil: 68,623 ( 0.5%) fixnum_mult_overflow: 50,739 ( 0.3%) patchpoint_stable_constant_names: 18,568 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) block_param_proxy_modified: 13,819 ( 0.1%) unhandled_block_arg: 13,798 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) expandarray_failure: 4,533 ( 0.0%) guard_super_method_entry: 4,475 ( 0.0%) patchpoint_method_redefined: 1,207 ( 0.0%) obj_to_string_fallback: 1,140 ( 0.0%) patchpoint_no_singleton_class: 1,130 ( 0.0%) guard_less_failure: 163 ( 0.0%) ... ratio_in_zjit: 82.2% ``` </p> </details>
2026-01-14ZJIT: Optimize common `invokesuper` cases (#15816)Kevin Menard
* ZJIT: Profile `invokesuper` instructions * ZJIT: Introduce the `InvokeSuperDirect` HIR instruction The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ. * ZJIT: Expand definition of unspecializable to more complex cases * ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified * ZJIT: Simplify `invokesuper` specialization to most common case Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`. * ZJIT: Track `super` method entries directly to avoid GC issues Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children. * ZJIT: Optimize `super` calls with simple argument forms * ZJIT: Report the reason why we can't optimize an `invokesuper` instance * ZJIT: Revise send fallback reasons for `super` calls * ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
2025-12-09ZJIT: Add dump to file for --zjit-stats (#15414)Aiden Fox Ivey
* ZJIT: Add dump to file for --zjit-stats * ZJIT: Rename --zjit-stats=quiet to --zjit-stats-quiet
2025-11-26ZJIT: Count fallback reasons for set/get/definedivar (#15324)Max Bernstein
lobsters: ``` Top-4 setivar fallback reasons (100.0% of total 7,789,008): shape_transition: 6,074,085 (78.0%) not_monomorphic: 1,484,013 (19.1%) not_t_object: 172,629 ( 2.2%) too_complex: 58,281 ( 0.7%) Top-3 getivar fallback reasons (100.0% of total 9,348,832): not_t_object: 4,658,833 (49.8%) not_monomorphic: 4,542,316 (48.6%) too_complex: 147,683 ( 1.6%) Top-3 definedivar fallback reasons (100.0% of total 366,383): not_monomorphic: 361,389 (98.6%) too_complex: 3,062 ( 0.8%) not_t_object: 1,932 ( 0.5%) ``` railsbench: ``` Top-3 setivar fallback reasons (100.0% of total 15,119,057): shape_transition: 13,760,763 (91.0%) not_monomorphic: 982,368 ( 6.5%) not_t_object: 375,926 ( 2.5%) Top-2 getivar fallback reasons (100.0% of total 14,438,747): not_t_object: 7,643,870 (52.9%) not_monomorphic: 6,794,877 (47.1%) Top-2 definedivar fallback reasons (100.0% of total 209,613): not_monomorphic: 209,526 (100.0%) not_t_object: 87 ( 0.0%) ``` shipit: ``` Top-3 setivar fallback reasons (100.0% of total 14,516,254): shape_transition: 8,613,512 (59.3%) not_monomorphic: 5,761,398 (39.7%) not_t_object: 141,344 ( 1.0%) Top-2 getivar fallback reasons (100.0% of total 21,016,444): not_monomorphic: 11,313,482 (53.8%) not_t_object: 9,702,962 (46.2%) Top-2 definedivar fallback reasons (100.0% of total 290,382): not_monomorphic: 287,755 (99.1%) not_t_object: 2,627 ( 0.9%) ```
2025-11-19ZJIT: Count all calls to C functions from generated code (#15240)Max Bernstein
lobsters: ``` Top-20 calls to C functions from JIT code (79.9% of total 97,004,883): rb_vm_opt_send_without_block: 19,874,212 (20.5%) rb_vm_setinstancevariable: 9,774,841 (10.1%) rb_ivar_get: 9,358,866 ( 9.6%) rb_hash_aref: 6,828,948 ( 7.0%) rb_vm_send: 6,441,551 ( 6.6%) rb_vm_env_write: 5,375,989 ( 5.5%) rb_vm_invokesuper: 3,037,836 ( 3.1%) Module#===: 2,562,446 ( 2.6%) rb_ary_entry: 2,354,546 ( 2.4%) Kernel#is_a?: 1,424,092 ( 1.5%) rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%) Thread.current: 1,300,822 ( 1.3%) rb_zjit_defined_ivar: 1,222,613 ( 1.3%) rb_vm_invokeblock: 1,184,555 ( 1.2%) Hash#[]=: 1,061,969 ( 1.1%) rb_ary_push: 1,024,987 ( 1.1%) rb_ary_new_capa: 904,003 ( 0.9%) rb_str_buf_append: 833,782 ( 0.9%) rb_class_allocate_instance: 822,626 ( 0.8%) Hash#fetch: 755,913 ( 0.8%) ``` railsbench: ``` Top-20 calls to C functions from JIT code (74.8% of total 189,170,268): rb_vm_opt_send_without_block: 29,870,307 (15.8%) rb_vm_setinstancevariable: 17,631,199 ( 9.3%) rb_hash_aref: 16,928,890 ( 8.9%) rb_ivar_get: 14,441,240 ( 7.6%) rb_vm_env_write: 11,571,001 ( 6.1%) rb_vm_send: 11,153,457 ( 5.9%) rb_vm_invokesuper: 7,568,267 ( 4.0%) Module#===: 6,065,923 ( 3.2%) Hash#[]=: 2,842,990 ( 1.5%) rb_ary_entry: 2,766,125 ( 1.5%) rb_ary_push: 2,722,079 ( 1.4%) rb_vm_invokeblock: 2,594,398 ( 1.4%) Thread.current: 2,560,129 ( 1.4%) rb_str_getbyte: 1,965,627 ( 1.0%) Kernel#is_a?: 1,961,815 ( 1.0%) rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%) rb_hash_new_with_size: 1,796,456 ( 0.9%) rb_class_allocate_instance: 1,785,043 ( 0.9%) String#empty?: 1,713,414 ( 0.9%) rb_ary_new_capa: 1,678,834 ( 0.9%) ``` shipit: ``` Top-20 calls to C functions from JIT code (83.4% of total 182,402,821): rb_vm_opt_send_without_block: 45,753,484 (25.1%) rb_ivar_get: 21,020,650 (11.5%) rb_vm_setinstancevariable: 17,528,603 ( 9.6%) rb_hash_aref: 11,892,856 ( 6.5%) rb_vm_send: 11,723,471 ( 6.4%) rb_vm_env_write: 10,434,452 ( 5.7%) Module#===: 4,225,048 ( 2.3%) rb_vm_invokesuper: 3,705,906 ( 2.0%) Thread.current: 3,337,603 ( 1.8%) rb_ary_entry: 3,114,378 ( 1.7%) Hash#[]=: 2,509,912 ( 1.4%) Array#empty?: 2,282,994 ( 1.3%) rb_vm_invokeblock: 2,210,511 ( 1.2%) Hash#fetch: 2,017,960 ( 1.1%) _bi20: 1,975,147 ( 1.1%) rb_zjit_defined_ivar: 1,897,127 ( 1.0%) rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%) rb_ary_new_capa: 1,615,406 ( 0.9%) Kernel#is_a?: 1,567,854 ( 0.9%) rb_class_allocate_instance: 1,560,035 ( 0.9%) ``` Thanks to @eregon for the idea. Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2025-11-18ZJIT: Skip empty counter sections in statsShannon Skipper
2025-11-18ZJIT: Avoid `NaN%` ratio appearing in statsShannon Skipper
2025-11-18ZJIT: add support for lazy `RubyVM::ZJIT.enable`Godfrey Chan
This implements Shopify#854: - Splits boot-time and enable-time initialization, tracks progress with `InitializationState` enum - Introduces `RubyVM::ZJIT.enable` Ruby method for enabling the JIT lazily, if not already enabled - Introduces `--zjit-disable` flag, which can be used alongside the other `--zjit-*` flags but prevents enabling the JIT at boot time - Adds ZJIT infra to support JIT hooks, but this is not currently exercised (Shopify/ruby#667) Left for future enhancements: - Support kwargs for overriding the CLI flags in `RubyVM::ZJIT.enable` Closes Shopify#854
2025-11-12ZJIT: Revert patch_point_count counter (#15160)Takashi Kokubun
2025-11-10ZJIT: Split unhandled_hir_insn and unknown_newarray_send stats (#15127)Takashi Kokubun
2025-11-10ZJIT: Rename not_optimized_instruction to uncategorized_instruction (#15130)Randy Stauner
Make it more obvious that this hasn't been handled and could be broken down more.
2025-11-10ZJIT: Add patch_point_count stat (#15100)Takashi Kokubun
2025-11-05ZJIT: Profile specific objects for invokeblock (#15051)Max Bernstein
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks. For lobsters: ``` Top-6 invokeblock handler (100.0% of total 1,064,155): megamorphic: 494,931 (46.5%) monomorphic_iseq: 337,171 (31.7%) polymorphic: 113,381 (10.7%) monomorphic_ifunc: 52,260 ( 4.9%) monomorphic_other: 38,970 ( 3.7%) no_profiles: 27,442 ( 2.6%) ``` For railsbench: ``` Top-6 invokeblock handler (100.0% of total 2,529,104): monomorphic_iseq: 834,452 (33.0%) megamorphic: 818,347 (32.4%) polymorphic: 632,273 (25.0%) monomorphic_ifunc: 224,243 ( 8.9%) monomorphic_other: 19,595 ( 0.8%) no_profiles: 194 ( 0.0%) ``` For shipit: ``` Top-6 invokeblock handler (100.0% of total 2,104,148): megamorphic: 1,269,889 (60.4%) polymorphic: 411,475 (19.6%) no_profiles: 173,367 ( 8.2%) monomorphic_other: 118,619 ( 5.6%) monomorphic_iseq: 84,891 ( 4.0%) monomorphic_ifunc: 45,907 ( 2.2%) ``` Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
2025-11-05ZJIT: Add zjit_alloc_bytes and total_mem_bytes stats (#15059)Takashi Kokubun
2025-11-05ZJIT: Track guard shape exit ratio (#15052)Randy Stauner
new ZJIT stats excerpt from liquid-runtime: ``` vm_read_from_parent_iseq_local_count: 10,909,753 guard_type_count: 45,109,441 guard_type_exit_ratio: 4.3% guard_shape_count: 15,272,133 guard_shape_exit_ratio: 20.1% code_region_bytes: 3,899,392 ``` lobsters ``` guard_type_count: 71,765,580 guard_type_exit_ratio: 4.3% guard_shape_count: 21,872,560 guard_shape_exit_ratio: 8.0% ``` railsbench ``` guard_type_count: 117,661,124 guard_type_exit_ratio: 0.7% guard_shape_count: 28,032,665 guard_shape_exit_ratio: 5.1% ``` shipit ``` guard_type_count: 106,195,615 guard_type_exit_ratio: 3.5% guard_shape_count: 33,672,673 guard_shape_exit_ratio: 10.1% ```
2025-11-04ZJIT: Fallback counter rename: s/fancy/complex/Alan Wu
Kokubun bought up that "complex" is a more fitting name for what these counters count. Thanks! Also: - make the SendFallbackReason enum name consistent with the counter name - rewrite the printout prompt in zjit.rb
2025-10-30ZJIT: Unsupported call feature accounting, and new ↵Alan Wu
`send_fallback_fancy_call_feature` In cases we fall back when the callee has an unsupported signature, it was a little inaccurate to use `send_fallback_send_not_optimized_method_type`. We do support the method type in other situations. Add a new `send_fallback_fancy_call_feature` for these situations. Also, `send_fallback_bmethod_non_iseq_proc` so we can stop using `not_optimized_method_type` completely for bmethods. Add accompanying `fancy_arg_pass_*` counters. These don't sum to the number of unoptimized calls that run, but establishes the level of support the optimizer provides for a given workload.
2025-10-28ZJIT: Print percentage of GuardType failureMax Bernstein
2025-10-28ZJIT: Count GuardType instructionsMax Bernstein
We can measure how many we can remove by adding type information to C functions, etc.
2025-10-27ZJIT: Print out full path to --zjit-trace-exits output (#14966)Max Bernstein
* ZJIT: Print out full path to --zjit-trace-exits output This helps with any `chdir`-related issues. * Don't include dot Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2025-10-22ZJIT: Disable not-annotated cfuncs in --zjit-stats (#14915)Max Bernstein
It's mostly a duplicate of not-inlined-cfuncs right now.
2025-10-22ZJIT: Revert removal of empty samples from zjit trace exits (#14905)Aiden Fox Ivey
ZJIT: Revert 9a75c05
2025-10-22ZJIT: Inline simple SendWithoutBlockDirect (#14888)Max Bernstein
Copy the YJIT simple inliner except for the kwargs bit. It works great!
2025-10-11ZJIT: Count unoptimized `Send` (#14801)Stan Lo
* ZJIT: Count unoptimized `Send` This includes `Send` in `send fallback reasons` to guide future optimizations. * ZJIT: Create dedicated def_type counter for Send
2025-10-09ZJIT: Get stats for which C functions are not annotatedMax Bernstein
2025-10-07ZJIT: Ignore results with no samplesAiden Fox Ivey
2025-10-07ZJIT: Refactor comments and rewrite frames handlingAiden Fox Ivey
2025-10-07ZJIT: Change name format of zjit_exit_locations dump fileAiden Fox Ivey
2025-10-07ZJIT: Remove unnecessary .dup calls in exit_locationsAiden Fox Ivey
* Using https://www.rubyexplorer.xyz/?c=frames+%3D+results%5B%3Aframes%5D.dup shows dup is called regardless
2025-10-03ZJIT: Count CCallWithFrame as optimized_send_count (#14722)Takashi Kokubun
2025-10-03ZJIT: Add HIR for calling Cfunc with frame (#14661)Stan Lo
* ZJIT: Add HIR for CCallWithFrame * ZJIT: Update stats to count not inlined cfunc calls * ZJIT: Stops optimizing SendWithoutBlock when TracePoint is activated * ZJIT: Fallback to SendWithoutBlock when CCallWithFrame has too many args * ZJIT: Rename cfun -> cfunc
2025-10-02ZJIT: Enable sample rate for side exit tracing (#14696)Aiden Fox Ivey
2025-10-01ZJIT: Use Marshal.dump to handle large writesAiden Fox Ivey
`File.binwrite` with a big string can exceed the `INT_MAX` limit of write(2) and fail with an exception.
2025-09-30ZJIT: Add more *_send_count stats (#14689)Takashi Kokubun
2025-09-30ZJIT: Decouple stats and side exit tracing (#14688)Aiden Fox Ivey
2025-09-30ZJIT: Use optimized exit_locations implementationAiden Fox Ivey
2025-09-30ZJIT: Use binwrite in zjit.rbAiden Fox Ivey
2025-09-30ZJIT: Fix "malformed format string" on stats (#14681)Takashi Kokubun
2025-09-30ZJIT: Add --zjit-trace-exits (#14640)Aiden Fox Ivey
Add side exit tracing functionality for ZJIT
2025-09-30ZJIT: Unify fallback counters for send-ish insns (#14676)Takashi Kokubun
2025-09-29ZJIT: Count dynamic instance variable lookups (#14615)Max Bernstein
2025-09-29ZJIT: Add stats for cfuncs that are not optimized (#14638)Stan Lo
* ZJIT: Add stats for cfuncs that are not optimized * ZJIT: Add IncrCounterPtr HIR instead From `lobsters` ``` Top-20 Unoptimized C functions (73.0% of total 15,276,688): Kernel#is_a?: 2,052,363 (13.4%) Class#current: 1,892,623 (12.4%) String#to_s: 975,973 ( 6.4%) Hash#key?: 677,623 ( 4.4%) String#empty?: 636,468 ( 4.2%) TrueClass#===: 457,232 ( 3.0%) Hash#[]=: 455,908 ( 3.0%) FalseClass#===: 448,798 ( 2.9%) ActiveSupport::OrderedOptions#_get: 377,468 ( 2.5%) Kernel#kind_of?: 339,551 ( 2.2%) Kernel#dup: 329,371 ( 2.2%) String#==: 324,286 ( 2.1%) String#include?: 297,528 ( 1.9%) Hash#[]: 294,561 ( 1.9%) Array#include?: 287,145 ( 1.9%) Kernel#block_given?: 283,633 ( 1.9%) BasicObject#!=: 278,874 ( 1.8%) Hash#delete: 250,951 ( 1.6%) Set#include?: 246,447 ( 1.6%) NilClass#===: 242,776 ( 1.6%) ``` From `liquid-render` ``` Top-20 Unoptimized C functions (99.8% of total 5,195,549): Hash#key?: 2,459,048 (47.3%) String#to_s: 1,119,758 (21.6%) Set#include?: 799,469 (15.4%) Kernel#is_a?: 214,223 ( 4.1%) Integer#<<: 171,073 ( 3.3%) Integer#/: 127,622 ( 2.5%) CGI::EscapeExt#escapeHTML: 56,971 ( 1.1%) Regexp#===: 50,008 ( 1.0%) String#empty?: 43,990 ( 0.8%) String#===: 36,838 ( 0.7%) String#==: 21,309 ( 0.4%) Time#strftime: 21,251 ( 0.4%) String#strip: 15,271 ( 0.3%) String#scan: 13,753 ( 0.3%) String#+@: 12,603 ( 0.2%) Array#include?: 8,059 ( 0.2%) String#+: 5,295 ( 0.1%) String#dup: 4,606 ( 0.1%) String#-@: 3,213 ( 0.1%) Class#generate: 3,011 ( 0.1%) ```
2025-09-22ZJIT: Add polymorphism counters (#14608)Aiden Fox Ivey
* ZJIT: Add polymorphism counters * . * .
2025-09-19ZJIT: Measure reading/writing locals with level > 0 (#14601)Max Bernstein
ZJIT: Measure writing to locals with level > 0
2025-09-18ZJIT: Put exit reasons later in stats_string (#14599)Takashi Kokubun
2025-09-18ZJIT: Count writes to the VM frame (#14597)Max Bernstein
This is a) a lot of memory traffic and b) is another good proxy for our ability to strength reduce method calls.
2025-09-17ZJIT: Add stat for `def_type` of send fallbacks (#14533)Stan Lo
I thought about creating a new HIR like `SendWithoutBlockFailedToOptimize` that can carry very specific reasons later. But it'll mean adding it to every branch matching `SendWithoutBlock` and may make code unnecessarily complicated. So I take the easier path for now: ``` Top-4 send fallback def_types (100.0% of total 21,375,357): cfunc: 20,164,487 (94.3%) optimized: 1,197,897 ( 5.6%) attrset: 12,953 ( 0.1%) alias: 20 ( 0.0%) ```
2025-09-15ZJIT: Revert VM_CALL_ARGS_SPLAT and VM_CALL_KWARG support (#14565)Takashi Kokubun
2025-09-12ZJIT: Add specific dynamic send type counters (#14528)Stan Lo
2025-09-09ZJIT: Avoid mutating string in zjit stats (#14485)Daniel Colson
[ZJIT] Avoid mutating string in zjit stats GitHub runs with a Symbol patch that causes a frozen string error when running `--zjit-stats` ```rb Class Symbol alias_method :to_s, :name end ``` I remember hearing that Shopify runs a similar patch, and that we might try to make this the default behavior in Ruby some day. Any chance we can avoid mutating the string here in case it's frozen? That does mean we'll end up making some extra strings when it's not frozen, but I think that's OK for printing stats.