| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 6 hours | YJIT: Fix local register mapping overflow | Ibrahim Awwal | |
| Previously, local indices greater than 255 were truncated when converted to RegOpnd::Local. This could make a high-index local alias a tracked low-index local in the register mapping and produce incorrect values after compilation. Cap overflowing indices to MAX_CTX_LOCALS. RegMapping treats that index as untrackable because it is just outside the tracked local range. Fixes [Bug #22074]. Co-authored-by: Codex <199175422+chatgpt-connector[bot]@users.noreply.github.com> | |||
| 33 hours | Use atomics for kwargs reference count | John Hawthorn | |
| Fixes [Bug #22075] | |||
| 4 days | Stop boxing cdhash offsets | Jean Boussier | |
| Now that they're no longer a RHash instance, we don't have to box the offsets, we can directly srore the raw values, and stop marking them. | |||
| 4 days | Use IMEMO to store `cdhash` | Jean Boussier | |
| RHash isn't a good fit for storing `cdhash` as this force to allow arbitrary hash types into RHash, which doesn't work with AR tables. It also cause the cdhash to be larger than needed. | |||
| 8 days | Replace subclasses linked list with weakref array | John Hawthorn | |
| 2026-05-06 | shapes: Rename `TOO_COMPLEX` in just `COMPLEX` | Jean Boussier | |
| The `too_` prefix wasn't consistently used and just make the thing longer for no benefit. | |||
| 2026-05-06 | shape.c: transition to complex when `max_capacity` is reached | Jean Boussier | |
| Now that we have 1024B slots, we can store up to 126 fields inline. Objects larger than this are rare if not non-existent, hence we can get rid of the `malloc` path for imemo/fields and simply transition to `TOO_COMPLEX`. This additionally allows to shrink `attr_index_t` from 16 to 8B. Note: the ZJIT "ivar on extended" tests are renamed as "complex" because "extended" AKA malloc allocated imemo/fields no longer exists. They're now complex fields, AKA st tables. rb_class_allocate_instance: start as complex when over max_fields If `RCLASS_MAX_IV_COUNT` is over `max_fields`, allocating a large slot to end up transitioning to `TOO_COMPLEX` is wasteful. We might as well start as complex directly. | |||
| 2026-05-04 | vm_insnhelper.c: refactor and optimize setivar cache revalidation | Jean Boussier | |
| By only storing shape offsets in the cache, we're able to still match two equal objects that happen to be in different heaps, as well as to validate the object is neither frozen nor complex. | |||
| 2026-05-03 | Refactor shape transition functions | Jean Boussier | |
| Expose both `rb_obj_shape_` functions that take a `VALUE` and `rb_shape_` functions that take a `shape_id`. Make common transition functions such as `complex` and `frozen` inlineable. Also get rid of RB_SET_SHAPE_ID and rb_set_boxed_class_shape_id. | |||
| 2026-05-02 | jits: don't assume `attr_index_t` is u16 | Jean Boussier | |
| Extracted from: https://github.com/ruby/ruby/pull/16817 It's likely that it will be u8 soon. | |||
| 2026-05-02 | shape.c: reorganize rb_shape_tree_t | Jean Boussier | |
| Embed and shrink the capacities array so that it can be queried more efficiently. Move `next_shape_id` and `cache_size` outside of the struct on their own cache line, as they're expected to be incremented atomically from concurrent threads, so we want to avoid false sharing. The also don't need to be exposed to the rest of the VM. Get rid of `root_shape` as it's always equal to `shape_list`. | |||
| 2026-04-30 | YJIT: Replace std::mem::transmute with pointer casting | Alan Wu | |
| As the documentation puts it, transmute is "incredibly unsafe". We can express what we need to with pointer casts in these closure FFI situations, so let's use pointer casts. | |||
| 2026-04-30 | Rename `putstring` instruction as `dupstring` | Jean Boussier | |
| As well as `putchilledstring` as `dupchilledstring`. This is more consistent with similar `duparray` and `duphash` instructions and better reflect it's behavior. | |||
| 2026-04-29 | Add YJIT test for outdated comment | Jean Boussier | |
| As of Ruby 4.0, the YJIT comment isn't quite correct because we now store the reference to the `imemo/fields` object inline. Which means that a Struct of precisely 78 members (max embeddable on 64bit archs) can cohexist in embedded and heap shapes as demonstrated by the added tests. If the comment was correct I would expect we could easily crash YJIT, but I'm unable to, so I suspect there is a guard I'm not seeing that already handle that. | |||
| 2026-04-21 | class.c: Make cvc_tbl a managed object | Jean Boussier | |
| [Bug #21952] Solves the double-free or use after-free concern with boxes. Now entries can safely be used for copy-on-write. Also is likely necessary to make it save to read cvar from secondary ractors, as allowed since: ab32c0e690b805cdaaf264ad4c3421696c588204 | |||
| 2026-04-06 | ZJIT: Guard `T_*` in addition to shape in polymorphic getivar | Alan Wu | |
| This is a8f3c34556 ("ZJIT: Add missing guard on ivar access on T_{DATA,CLASS,MODULE}") but for the polymorphic implementation in HIR build. | |||
| 2026-03-16 | YJIT: Fix not reading locals from `cfp->ep` after `YJIT.enable` and ↵ | Alan Wu | |
| exceptional entry Fix for [Bug #21941]. In case of `--yjit-disable`, YJIT only starts to record environment escapes after `RubyVM::YJIT.enable`. Previously we falsely assumed that we always have a full history all the way back to VM boot. This had YJIT install and run code that assume EP=BP when EP≠BP for some exceptional entry into the middle of a running frame, if the environment escaped before `YJIT.enable`. The fix is to reject exceptional entry with an escaped environment. Rename things and explain in more detail how the predicate for deciding to assume EP=BP works. It's quite subtle since it reasons about all parties in the system that push a control frame and then run JIT code. Note that while can_assume_on_stack_env() checks the currently running environment if it so happens to be the one YJIT is compiling against, it can return true for any ISEQ. The check isn't necessary for fixing the bug, and the load bearing part of this patch is the change to exceptional entries. This fix is flat on speed and space on ruby-bench headline benchmarks. Many thanks for the community effort to create a small test case for this bug. | |||
| 2026-03-03 | Shrink struct rb_callinfo to 32 bytes | John Hawthorn | |
| This shouldn't do anything right now under the default GC, but in the future (or now on MMTK?) this would allow them to be allocated from a smaller size pool. | |||
| 2026-03-03 | Expand the Shape heap index mask | Matt Valentine-House | |
| I think we have spare bits here, so let's expand them out and see what happens. This will allow us to have more than 7 size pools in the future | |||
| 2026-03-02 | ZJIT: Use LoadField for TypedData ivars (#16259) | Max Bernstein | |
| Drops C calls to `rb_ivar_get_at_no_ractor_check` out of the stats completely. Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> | |||
| 2026-02-27 | ZJIT: Use LoadField for Class/Module ivars (#16252) | Max Bernstein | |
| Assume only one box (root box) and invalidate otherwise. Drops C calls to `rb_ivar_get_at_no_ractor_check`. Before: ``` Top-20 calls to C functions from JIT code (77.3% of total 64,311,573): rb_vm_opt_send_without_block: 11,939,854 (18.6%) rb_hash_aref: 5,400,091 ( 8.4%) rb_vm_invokeblock: 4,453,357 ( 6.9%) rb_zjit_writebarrier_check_immediate: 4,279,890 ( 6.7%) rb_vm_getinstancevariable: 3,504,908 ( 5.4%) rb_vm_send: 3,103,424 ( 4.8%) rb_ivar_get_at_no_ractor_check: 2,864,766 ( 4.5%) rb_obj_is_kind_of: 2,313,479 ( 3.6%) rb_hash_aset: 1,903,359 ( 3.0%) Hash#fetch: 1,639,937 ( 2.5%) rb_vm_setinstancevariable: 1,596,791 ( 2.5%) rb_vm_opt_getconstant_path: 1,328,761 ( 2.1%) rb_jit_ary_push: 960,563 ( 1.5%) rb_ec_ary_new_from_values: 722,913 ( 1.1%) rb_class_allocate_instance: 721,483 ( 1.1%) fetch: 713,134 ( 1.1%) rb_str_buf_append: 667,545 ( 1.0%) rb_ivar_get: 585,817 ( 0.9%) rb_hash_new_with_size: 520,347 ( 0.8%) rb_vm_sendforward: 479,029 ( 0.7%) ``` After: ``` Top-20 calls to C functions from JIT code (76.5% of total 62,282,359): rb_vm_opt_send_without_block: 11,939,850 (19.2%) rb_hash_aref: 5,400,092 ( 8.7%) rb_vm_invokeblock: 4,453,357 ( 7.2%) rb_zjit_writebarrier_check_immediate: 4,279,893 ( 6.9%) rb_vm_getinstancevariable: 3,504,920 ( 5.6%) rb_vm_send: 3,103,441 ( 5.0%) rb_obj_is_kind_of: 2,313,510 ( 3.7%) rb_hash_aset: 1,903,359 ( 3.1%) Hash#fetch: 1,639,937 ( 2.6%) rb_vm_setinstancevariable: 1,596,797 ( 2.6%) rb_vm_opt_getconstant_path: 1,328,761 ( 2.1%) rb_jit_ary_push: 960,563 ( 1.5%) rb_ivar_get_at_no_ractor_check: 835,498 ( 1.3%) rb_ec_ary_new_from_values: 722,921 ( 1.2%) rb_class_allocate_instance: 721,492 ( 1.2%) fetch: 713,135 ( 1.1%) rb_str_buf_append: 667,545 ( 1.1%) rb_ivar_get: 585,815 ( 0.9%) rb_hash_new_with_size: 520,348 ( 0.8%) rb_vm_sendforward: 479,029 ( 0.8%) ``` The remaining `rb_ivar_get_at_no_ractor_check` are due to TypedData access, mostly on `Thread`. | |||
| 2026-02-27 | ZJIT: Handle splatkw YARV instruction (#16267) | Max Bernstein | |
| The most common cases are nil and hash, so just cover those. If we need to convert to a hash, we can handle that later. | |||
| 2026-02-20 | YJIT: Fix version_map use-after-free from mutable aliasing UB | Randy Stauner | |
| Multiple YJIT functions created overlapping `&'static mut IseqPayload` references by calling `get_iseq_payload()` multiple times for the same iseq. Overlapping &mut is UB in rust's aliasing model, and as consequence, we trigered use-after-free on the `version_map` Vec header due to false claims of LLVM `noalias`. This manifested as crashes in various YJIT operations (block lookup, GC marking, block removal) that dereference the stale pointer. Fix by moving `delayed_deallocation` and `get_or_create_version_list` from free functions (which each call `get_iseq_payload()` internally) to methods on `IseqPayload` that operate through `&mut self`. This lets callers obtain a single payload reference and use it for all operations without creating overlapping mutable borrows. The three fixed call sites: 1. `rb_yjit_tracing_invalidate_all` (invariants.rs): The loop called `delayed_deallocation()` which internally called `get_iseq_payload()`, creating a second `&mut` overlapping with the outer `payload` reference. Fix: call `payload.delayed_deallocation()` method instead. 2. `add_block_version` (core.rs): Called `get_or_create_version_list()` then later `get_iseq_payload()` for pages, creating two references. Fix: use a single `get_or_create_iseq_payload()` call then call the `get_or_create_version_list()` method on it for both version_map and pages access. Also adds regression tests exercising tracing invalidation with on-stack methods and suspended fibers. [alan: edited commit message] Reviewed-by: Alan Wu <alanwu@ruby-lang.org> | |||
| 2026-02-19 | YJIT: Register builtin CMEs before prelude to avoid prepend crash | Randy Stauner | |
| Split rb_yjit_init into rb_yjit_init_builtin_cmes (called before ruby_init_prelude) and rb_yjit_init (called after). The prelude may load bundler via BUNDLER_SETUP which can call Kernel.prepend, moving core methods to an origin iclass. Registering method codegen before the prelude ensures we capture CMEs while core classes are pristine. | |||
| 2026-02-18 | YJIT: Fix always-failing guard for `super()` in BMETHODs | Alan Wu | |
| Previously, when dealing with a `super()` nested in a block that runs as a method (through e.g. `define_method`), YJIT generated a guard that never passes leading to a misidentification of the callsite as megamorphic and an unconditional interpreter fallback. The issue was in the subroutine to find the currently running method entry. In the interpreter, this is rb_vm_frame_method_entry(). YJIT used `gen_get_lep()` to find the EP with `VM_ENV_FLAG_LOCAL`, but in case of BMETHODs, the corresponding CME is never at an EP level with `VM_ENV_FLAG_LOCAL` set. Because each block nesting level can dynamically run as either a BMETHOD or not, starting at a block and finding the first EP that has a method entry ultimately requires a search loop such as the one in rb_vm_frame_method_entry(). This patch introduces such a loop. Because `invokesuper` in a block can now work end-to-end, add check for the previously masked "implicit argument passing of super from method defined by define_method() is not supported..." condition. | |||
| 2026-02-09 | Add without_interrupts primitive attribute to skip interrupt checks | Max Bernstein | |
| 2026-02-09 | Update disassembly snapshots for capstone 0.14.0 | Max Bernstein | |
| Capstone 0.14.0 uses canonical ARM64 aliases in its disassembly output: - `orr x, xzr, #imm` is now shown as `mov x, #imm` - `.byte` sequences matching `udf` are now decoded as `udf #imm` These are cosmetic changes to the disassembly text only; the underlying machine code (verified by hex snapshots) is unchanged. Also update the root Cargo.lock which was missed by Dependabot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> | |||
| 2026-02-09 | Bump the jit group across 2 directories with 1 update | dependabot[bot] | |
| Bumps the jit group with 1 update in the /yjit directory: [capstone](https://github.com/capstone-rust/capstone-rs). Bumps the jit group with 1 update in the /zjit directory: [capstone](https://github.com/capstone-rust/capstone-rs). Updates `capstone` from 0.13.0 to 0.14.0 - [Release notes](https://github.com/capstone-rust/capstone-rs/releases) - [Changelog](https://github.com/capstone-rust/capstone-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/capstone-rust/capstone-rs/compare/capstone-v0.13.0...capstone-v0.14.0) Updates `capstone` from 0.13.0 to 0.14.0 - [Release notes](https://github.com/capstone-rust/capstone-rs/releases) - [Changelog](https://github.com/capstone-rust/capstone-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/capstone-rust/capstone-rs/compare/capstone-v0.13.0...capstone-v0.14.0) --- updated-dependencies: - dependency-name: capstone dependency-version: 0.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: jit - dependency-name: capstone dependency-version: 0.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: jit ... Signed-off-by: dependabot[bot] <support@github.com> | |||
| 2026-02-07 | Clean rust target when rustc bumped up | Nobuyoshi Nakada | |
| Fix: ``` error[E0514]: found crate `jit` compiled by an incompatible version of rustc ``` | |||
| 2026-01-31 | Fix wrong declaration of `rb_optimized_call` | Nobuyoshi Nakada | |
| `recv` is used as the input argument to `GetProcPtr`, which is a `VALUE`. Fix up ruby/ruby#6691. | |||
| 2026-01-29 | ZJIT: Handle `nil` case for `getblockparamproxy` (#15986) | Jeff Zhang | |
| Resolves https://github.com/Shopify/ruby/issues/772 Adds profiling for the `getblockparamproxy` YARV instruction and handles the `nil` block case by pushing `nil` instead of the block proxy object, improves `ratio_in_zjit` a tiny bit (0.1%) Profiling data for `getblockparamproxy` on Lobsters ``` Top-6 getblockparamproxy handler (100.0% of total 3,353,291): polymorphic: 2,337,372 (69.7%) nil: 552,629 (16.5%) iseq: 259,636 ( 7.7%) no_profiles: 156,734 ( 4.7%) proc: 40,223 ( 1.2%) megamorphic: 6,697 ( 0.2%) ``` Lobsters benchmark stats: <details> <summary>Stats before (master):</summary> <p> ``` ❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters ***ZJIT: Printing ZJIT statistics on exit*** ... Top-20 side exit reasons (100.0% of total 15,338,024): guard_type_failure: 6,889,050 (44.9%) guard_shape_failure: 6,848,898 (44.7%) block_param_proxy_not_iseq_or_ifunc: 1,008,525 ( 6.6%) unhandled_hir_insn: 236,977 ( 1.5%) compile_error: 191,763 ( 1.3%) fixnum_mult_overflow: 50,739 ( 0.3%) block_param_proxy_modified: 28,119 ( 0.2%) patchpoint_stable_constant_names: 18,229 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) unhandled_block_arg: 13,782 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) unhandled_yarv_insn: 7,540 ( 0.0%) expandarray_failure: 4,533 ( 0.0%) guard_super_method_entry: 4,475 ( 0.0%) patchpoint_method_redefined: 1,207 ( 0.0%) patchpoint_no_singleton_class: 1,130 ( 0.0%) obj_to_string_fallback: 412 ( 0.0%) guard_less_failure: 163 ( 0.0%) interrupt: 82 ( 0.0%) ... ratio_in_zjit: 82.1% ``` </p> </details> <details> <summary>Stats after:</summary> <p> ``` ❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters ***ZJIT: Printing ZJIT statistics on exit*** ... Top-20 side exit reasons (100.0% of total 15,061,422): guard_type_failure: 6,892,934 (45.8%) guard_shape_failure: 6,850,512 (45.5%) block_param_proxy_not_iseq_or_ifunc: 549,823 ( 3.7%) unhandled_hir_insn: 236,979 ( 1.6%) compile_error: 191,782 ( 1.3%) unhandled_yarv_insn: 128,695 ( 0.9%) block_param_proxy_not_nil: 68,623 ( 0.5%) fixnum_mult_overflow: 50,739 ( 0.3%) patchpoint_stable_constant_names: 18,568 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) block_param_proxy_modified: 13,819 ( 0.1%) unhandled_block_arg: 13,798 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) expandarray_failure: 4,533 ( 0.0%) guard_super_method_entry: 4,475 ( 0.0%) patchpoint_method_redefined: 1,207 ( 0.0%) obj_to_string_fallback: 1,140 ( 0.0%) patchpoint_no_singleton_class: 1,130 ( 0.0%) guard_less_failure: 163 ( 0.0%) ... ratio_in_zjit: 82.2% ``` </p> </details> | |||
| 2026-01-19 | JITs: Fix comment about ARM64 stack growth direction [ci skip] | Alan Wu | |
| 2026-01-16 | ZJIT: Specialize OPTIMIZED_METHOD_TYPE_CALL (#15859) | Nozomi Hijikata | |
| Closes: https://github.com/Shopify/ruby/issues/865 ## Benchmark ### lobsters - wall clock time - before patch: Average of last 10, non-warmup iters: 809ms - after patch: Average of last 10, non-warmup iters: 754ms - zjit stats below <details> <summary>before patch</summary> ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (54.9% of total 18,003,698): Hash#fetch: 3,184,106 (17.7%) Regexp#match?: 707,148 ( 3.9%) Hash#key?: 689,879 ( 3.8%) String#sub!: 489,841 ( 2.7%) Array#include?: 470,648 ( 2.6%) Set#include?: 397,520 ( 2.2%) String#<<: 396,279 ( 2.2%) String#start_with?: 373,666 ( 2.1%) Kernel#dup: 352,617 ( 2.0%) Array#any?: 350,454 ( 1.9%) Hash#delete: 331,784 ( 1.8%) String.new: 307,248 ( 1.7%) Integer#===: 262,336 ( 1.5%) Symbol#end_with?: 255,538 ( 1.4%) Kernel#is_a?: 247,292 ( 1.4%) Process.clock_gettime: 221,588 ( 1.2%) Integer#>: 219,718 ( 1.2%) String#match?: 216,903 ( 1.2%) String#downcase: 213,108 ( 1.2%) Integer#<=: 202,617 ( 1.1%) Top-20 calls to C functions from JIT code (80.3% of total 130,255,689): rb_vm_opt_send_without_block: 28,329,698 (21.7%) rb_hash_aref: 8,992,191 ( 6.9%) rb_vm_env_write: 8,526,087 ( 6.5%) rb_vm_send: 8,337,448 ( 6.4%) rb_zjit_writebarrier_check_immediate: 7,809,310 ( 6.0%) rb_obj_is_kind_of: 6,098,929 ( 4.7%) rb_vm_getinstancevariable: 5,783,055 ( 4.4%) rb_vm_invokesuper: 5,038,443 ( 3.9%) rb_ivar_get_at_no_ractor_check: 4,762,093 ( 3.7%) rb_ary_entry: 4,283,966 ( 3.3%) rb_hash_aset: 2,429,862 ( 1.9%) rb_vm_setinstancevariable: 2,343,571 ( 1.8%) rb_vm_opt_getconstant_path: 2,284,810 ( 1.8%) Hash#fetch: 1,778,515 ( 1.4%) fetch: 1,405,591 ( 1.1%) rb_vm_invokeblock: 1,381,332 ( 1.1%) rb_str_buf_append: 1,362,272 ( 1.0%) rb_ec_ary_new_from_values: 1,324,997 ( 1.0%) rb_class_allocate_instance: 1,288,936 ( 1.0%) rb_hash_new_with_size: 998,628 ( 0.8%) Top-2 not optimized method types for send (100.0% of total 4,896,274): iseq: 4,893,452 (99.9%) null: 2,822 ( 0.1%) Top-4 not optimized method types for send_without_block (100.0% of total 782,296): optimized_send: 479,562 (61.3%) optimized_call: 256,609 (32.8%) null: 41,967 ( 5.4%) optimized_block_call: 4,158 ( 0.5%) Top-4 instructions with uncategorized fallback reason (100.0% of total 7,250,555): invokesuper: 5,038,443 (69.5%) invokeblock: 1,381,332 (19.1%) sendforward: 798,924 (11.0%) opt_send_without_block: 31,856 ( 0.4%) Top-18 send fallback reasons (100.0% of total 43,885,845): send_without_block_polymorphic: 18,533,639 (42.2%) uncategorized: 7,250,555 (16.5%) send_not_optimized_method_type: 4,896,274 (11.2%) send_without_block_no_profiles: 4,741,871 (10.8%) send_no_profiles: 2,865,577 ( 6.5%) one_or_more_complex_arg_pass: 2,825,240 ( 6.4%) send_without_block_not_optimized_method_type_optimized: 740,329 ( 1.7%) send_without_block_megamorphic: 709,818 ( 1.6%) send_polymorphic: 541,186 ( 1.2%) send_without_block_not_optimized_need_permission: 382,622 ( 0.9%) too_many_args_for_lir: 173,244 ( 0.4%) argc_param_mismatch: 50,382 ( 0.1%) send_without_block_not_optimized_method_type: 41,967 ( 0.1%) send_without_block_cfunc_array_variadic: 36,302 ( 0.1%) obj_to_string_not_string: 34,169 ( 0.1%) send_without_block_direct_keyword_mismatch: 32,436 ( 0.1%) send_megamorphic: 28,613 ( 0.1%) ccall_with_frame_too_many_args: 1,621 ( 0.0%) Top-4 setivar fallback reasons (100.0% of total 2,343,571): not_monomorphic: 2,120,856 (90.5%) not_t_object: 125,163 ( 5.3%) too_complex: 97,531 ( 4.2%) new_shape_needs_extension: 21 ( 0.0%) Top-2 getivar fallback reasons (100.0% of total 5,908,168): not_monomorphic: 5,658,909 (95.8%) too_complex: 249,259 ( 4.2%) Top-3 definedivar fallback reasons (100.0% of total 405,079): not_monomorphic: 397,150 (98.0%) too_complex: 5,122 ( 1.3%) not_t_object: 2,807 ( 0.7%) Top-6 invokeblock handler (100.0% of total 1,381,332): monomorphic_iseq: 685,359 (49.6%) polymorphic: 521,992 (37.8%) monomorphic_other: 104,640 ( 7.6%) monomorphic_ifunc: 55,505 ( 4.0%) no_profiles: 9,164 ( 0.7%) megamorphic: 4,672 ( 0.3%) Top-9 popular complex argument-parameter features not optimized (100.0% of total 3,097,538): param_kw_opt: 1,333,367 (43.0%) param_block: 632,885 (20.4%) param_forwardable: 600,601 (19.4%) param_rest: 329,020 (10.6%) param_kwrest: 119,971 ( 3.9%) caller_kw_splat: 39,001 ( 1.3%) caller_splat: 36,785 ( 1.2%) caller_blockarg: 5,798 ( 0.2%) caller_kwarg: 110 ( 0.0%) Top-1 compile error reasons (100.0% of total 186,900): exception_handler: 186,900 (100.0%) Top-7 unhandled YARV insns (100.0% of total 186,598): getblockparam: 99,414 (53.3%) invokesuperforward: 81,667 (43.8%) setblockparam: 2,837 ( 1.5%) getconstant: 1,537 ( 0.8%) checkmatch: 616 ( 0.3%) expandarray: 360 ( 0.2%) once: 167 ( 0.1%) Top-3 unhandled HIR insns (100.0% of total 236,962): throw: 198,474 (83.8%) invokebuiltin: 35,767 (15.1%) array_max: 2,721 ( 1.1%) Top-19 side exit reasons (100.0% of total 15,427,184): guard_type_failure: 6,865,696 (44.5%) guard_shape_failure: 6,779,586 (43.9%) block_param_proxy_not_iseq_or_ifunc: 1,030,319 ( 6.7%) unhandled_hir_insn: 236,962 ( 1.5%) compile_error: 186,900 ( 1.2%) unhandled_yarv_insn: 186,598 ( 1.2%) fixnum_mult_overflow: 50,739 ( 0.3%) block_param_proxy_modified: 28,119 ( 0.2%) patchpoint_no_singleton_class: 14,903 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_stable_constant_names: 9,198 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) expandarray_failure: 4,533 ( 0.0%) patchpoint_method_redefined: 662 ( 0.0%) obj_to_string_fallback: 277 ( 0.0%) guard_less_failure: 163 ( 0.0%) interrupt: 128 ( 0.0%) guard_greater_eq_failure: 20 ( 0.0%) send_count: 151,233,937 dynamic_send_count: 43,885,845 (29.0%) optimized_send_count: 107,348,092 (71.0%) dynamic_setivar_count: 2,343,571 ( 1.5%) dynamic_getivar_count: 5,908,168 ( 3.9%) dynamic_definedivar_count: 405,079 ( 0.3%) iseq_optimized_send_count: 37,324,023 (24.7%) inline_cfunc_optimized_send_count: 46,056,028 (30.5%) inline_iseq_optimized_send_count: 3,756,875 ( 2.5%) non_variadic_cfunc_optimized_send_count: 11,618,909 ( 7.7%) variadic_cfunc_optimized_send_count: 8,592,257 ( 5.7%) compiled_iseq_count: 5,289 failed_iseq_count: 0 compile_time: 1,664ms profile_time: 13ms gc_time: 20ms invalidation_time: 479ms vm_write_pc_count: 127,571,422 vm_write_sp_count: 127,571,422 vm_write_locals_count: 122,781,971 vm_write_stack_count: 122,781,971 vm_write_to_parent_iseq_local_count: 689,945 vm_read_from_parent_iseq_local_count: 14,721,820 guard_type_count: 167,633,896 guard_type_exit_ratio: 4.1% guard_shape_count: 0 code_region_bytes: 38,912,000 zjit_alloc_bytes: 40,542,102 total_mem_bytes: 79,454,102 side_exit_count: 15,427,184 total_insn_count: 927,373,567 vm_insn_count: 156,976,359 zjit_insn_count: 770,397,208 ratio_in_zjit: 83.1% ``` </details> <details> <summary>after patch</summary> ``` ***ZJIT: Printing ZJIT statistics on exit*** Top-20 not inlined C methods (55.0% of total 18,012,630): Hash#fetch: 3,184,101 (17.7%) Regexp#match?: 707,150 ( 3.9%) Hash#key?: 689,871 ( 3.8%) String#sub!: 489,841 ( 2.7%) Array#include?: 470,648 ( 2.6%) Set#include?: 397,520 ( 2.2%) String#<<: 396,279 ( 2.2%) String#start_with?: 382,538 ( 2.1%) Kernel#dup: 352,617 ( 2.0%) Array#any?: 350,454 ( 1.9%) Hash#delete: 331,802 ( 1.8%) String.new: 307,248 ( 1.7%) Integer#===: 262,336 ( 1.5%) Symbol#end_with?: 255,540 ( 1.4%) Kernel#is_a?: 247,292 ( 1.4%) Process.clock_gettime: 221,588 ( 1.2%) Integer#>: 219,718 ( 1.2%) String#match?: 216,905 ( 1.2%) String#downcase: 213,107 ( 1.2%) Integer#<=: 202,617 ( 1.1%) Top-20 calls to C functions from JIT code (80.1% of total 130,218,934): rb_vm_opt_send_without_block: 28,073,153 (21.6%) rb_hash_aref: 8,992,167 ( 6.9%) rb_vm_env_write: 8,526,089 ( 6.5%) rb_vm_send: 8,337,453 ( 6.4%) rb_zjit_writebarrier_check_immediate: 7,786,426 ( 6.0%) rb_obj_is_kind_of: 6,098,927 ( 4.7%) rb_vm_getinstancevariable: 5,783,053 ( 4.4%) rb_vm_invokesuper: 5,038,444 ( 3.9%) rb_ivar_get_at_no_ractor_check: 4,762,093 ( 3.7%) rb_ary_entry: 4,283,965 ( 3.3%) rb_hash_aset: 2,429,864 ( 1.9%) rb_vm_setinstancevariable: 2,343,573 ( 1.8%) rb_vm_opt_getconstant_path: 2,284,809 ( 1.8%) Hash#fetch: 1,778,510 ( 1.4%) fetch: 1,405,591 ( 1.1%) rb_vm_invokeblock: 1,381,329 ( 1.1%) rb_str_buf_append: 1,362,272 ( 1.0%) rb_ec_ary_new_from_values: 1,325,005 ( 1.0%) rb_class_allocate_instance: 1,288,944 ( 1.0%) rb_hash_new_with_size: 998,629 ( 0.8%) Top-2 not optimized method types for send (100.0% of total 4,896,276): iseq: 4,893,454 (99.9%) null: 2,822 ( 0.1%) Top-3 not optimized method types for send_without_block (100.0% of total 525,687): optimized_send: 479,562 (91.2%) null: 41,967 ( 8.0%) optimized_block_call: 4,158 ( 0.8%) Top-4 instructions with uncategorized fallback reason (100.0% of total 7,250,556): invokesuper: 5,038,444 (69.5%) invokeblock: 1,381,329 (19.1%) sendforward: 798,924 (11.0%) opt_send_without_block: 31,859 ( 0.4%) Top-18 send fallback reasons (100.0% of total 43,629,303): send_without_block_polymorphic: 18,533,669 (42.5%) uncategorized: 7,250,556 (16.6%) send_not_optimized_method_type: 4,896,276 (11.2%) send_without_block_no_profiles: 4,741,899 (10.9%) send_no_profiles: 2,865,579 ( 6.6%) one_or_more_complex_arg_pass: 2,825,242 ( 6.5%) send_without_block_megamorphic: 709,818 ( 1.6%) send_polymorphic: 541,187 ( 1.2%) send_without_block_not_optimized_method_type_optimized: 483,720 ( 1.1%) send_without_block_not_optimized_need_permission: 382,623 ( 0.9%) too_many_args_for_lir: 173,244 ( 0.4%) argc_param_mismatch: 50,382 ( 0.1%) send_without_block_not_optimized_method_type: 41,967 ( 0.1%) send_without_block_cfunc_array_variadic: 36,302 ( 0.1%) obj_to_string_not_string: 34,169 ( 0.1%) send_without_block_direct_keyword_mismatch: 32,436 ( 0.1%) send_megamorphic: 28,613 ( 0.1%) ccall_with_frame_too_many_args: 1,621 ( 0.0%) Top-4 setivar fallback reasons (100.0% of total 2,343,573): not_monomorphic: 2,120,858 (90.5%) not_t_object: 125,163 ( 5.3%) too_complex: 97,531 ( 4.2%) new_shape_needs_extension: 21 ( 0.0%) Top-2 getivar fallback reasons (100.0% of total 5,908,165): not_monomorphic: 5,658,912 (95.8%) too_complex: 249,253 ( 4.2%) Top-3 definedivar fallback reasons (100.0% of total 405,079): not_monomorphic: 397,150 (98.0%) too_complex: 5,122 ( 1.3%) not_t_object: 2,807 ( 0.7%) Top-6 invokeblock handler (100.0% of total 1,381,329): monomorphic_iseq: 685,363 (49.6%) polymorphic: 521,984 (37.8%) monomorphic_other: 104,640 ( 7.6%) monomorphic_ifunc: 55,505 ( 4.0%) no_profiles: 9,164 ( 0.7%) megamorphic: 4,673 ( 0.3%) Top-9 popular complex argument-parameter features not optimized (100.0% of total 3,094,719): param_kw_opt: 1,333,367 (43.1%) param_block: 632,886 (20.5%) param_forwardable: 600,605 (19.4%) param_rest: 329,019 (10.6%) param_kwrest: 119,971 ( 3.9%) caller_kw_splat: 39,001 ( 1.3%) caller_splat: 33,962 ( 1.1%) caller_blockarg: 5,798 ( 0.2%) caller_kwarg: 110 ( 0.0%) Top-1 compile error reasons (100.0% of total 186,917): exception_handler: 186,917 (100.0%) Top-7 unhandled YARV insns (100.0% of total 186,598): getblockparam: 99,414 (53.3%) invokesuperforward: 81,667 (43.8%) setblockparam: 2,837 ( 1.5%) getconstant: 1,537 ( 0.8%) checkmatch: 616 ( 0.3%) expandarray: 360 ( 0.2%) once: 167 ( 0.1%) Top-3 unhandled HIR insns (100.0% of total 236,969): throw: 198,475 (83.8%) invokebuiltin: 35,773 (15.1%) array_max: 2,721 ( 1.1%) Top-19 side exit reasons (100.0% of total 15,450,102): guard_type_failure: 6,888,596 (44.6%) guard_shape_failure: 6,779,586 (43.9%) block_param_proxy_not_iseq_or_ifunc: 1,030,319 ( 6.7%) unhandled_hir_insn: 236,969 ( 1.5%) compile_error: 186,917 ( 1.2%) unhandled_yarv_insn: 186,598 ( 1.2%) fixnum_mult_overflow: 50,739 ( 0.3%) block_param_proxy_modified: 28,119 ( 0.2%) patchpoint_no_singleton_class: 14,903 ( 0.1%) unhandled_newarray_send_pack: 14,481 ( 0.1%) fixnum_lshift_overflow: 10,085 ( 0.1%) patchpoint_stable_constant_names: 9,198 ( 0.1%) patchpoint_no_ep_escape: 7,815 ( 0.1%) expandarray_failure: 4,533 ( 0.0%) patchpoint_method_redefined: 662 ( 0.0%) obj_to_string_fallback: 277 ( 0.0%) guard_less_failure: 163 ( 0.0%) interrupt: 122 ( 0.0%) guard_greater_eq_failure: 20 ( 0.0%) send_count: 150,986,368 dynamic_send_count: 43,629,303 (28.9%) optimized_send_count: 107,357,065 (71.1%) dynamic_setivar_count: 2,343,573 ( 1.6%) dynamic_getivar_count: 5,908,165 ( 3.9%) dynamic_definedivar_count: 405,079 ( 0.3%) iseq_optimized_send_count: 37,324,039 (24.7%) inline_cfunc_optimized_send_count: 46,056,046 (30.5%) inline_iseq_optimized_send_count: 3,756,881 ( 2.5%) non_variadic_cfunc_optimized_send_count: 11,618,958 ( 7.7%) variadic_cfunc_optimized_send_count: 8,601,141 ( 5.7%) compiled_iseq_count: 5,289 failed_iseq_count: 0 compile_time: 1,700ms profile_time: 13ms gc_time: 21ms invalidation_time: 519ms vm_write_pc_count: 127,557,549 vm_write_sp_count: 127,557,549 vm_write_locals_count: 122,768,084 vm_write_stack_count: 122,768,084 vm_write_to_parent_iseq_local_count: 689,953 vm_read_from_parent_iseq_local_count: 14,730,705 guard_type_count: 167,853,730 guard_type_exit_ratio: 4.1% guard_shape_count: 0 code_region_bytes: 38,928,384 zjit_alloc_bytes: 41,103,415 total_mem_bytes: 80,031,799 side_exit_count: 15,450,102 total_insn_count: 927,432,364 vm_insn_count: 157,182,251 zjit_insn_count: 770,250,113 ratio_in_zjit: 83.1% ``` </details> | |||
| 2026-01-14 | ZJIT: Optimize common `invokesuper` cases (#15816) | Kevin Menard | |
| * ZJIT: Profile `invokesuper` instructions * ZJIT: Introduce the `InvokeSuperDirect` HIR instruction The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ. * ZJIT: Expand definition of unspecializable to more complex cases * ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified * ZJIT: Simplify `invokesuper` specialization to most common case Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`. * ZJIT: Track `super` method entries directly to avoid GC issues Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children. * ZJIT: Optimize `super` calls with simple argument forms * ZJIT: Report the reason why we can't optimize an `invokesuper` instance * ZJIT: Revise send fallback reasons for `super` calls * ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks | |||
| 2026-01-14 | YJIT: A64: In CPopAll, pop into the register before using MSR | Alan Wu | |
| Or else we put garbage into the flags. | |||
| 2026-01-14 | YJIT: Properly preserve register mapping in cpush_all() and cpop_all() | Alan Wu | |
| Previously, cpop_all() did not in fact restore the register mapping state since it was effectively doing a no-op `self.ctx.set_reg_mapping(self.ctx.get_reg_mapping())`. This desync in bookkeeping led to issues with the --yjit-dump-insns option because print_str() used to use cpush_all() and cpop_all(). | |||
| 2026-01-14 | YJIT: Fix --yjit-dump-insns by removing {cpush,cpop}_all() in printers | Alan Wu | |
| cpush_all() and cpop_all() in theory enabled these `print_*` utilities to work in more spots, but with automatically spilling in asm.ccall(), the benefits are now limited. They also have a bug at the moment. Stop using them to dodge the bug. | |||
| 2026-01-09 | YJIT: Add frozen guard for struct aset (#15835) | Max Bernstein | |
| We used to just skip this check (oops), but we should not allow modifying frozen objects. | |||
| 2026-01-09 | YJIT: gen_struct_aset check for frozen status | Jean Boussier | |
| 2025-12-26 | Remove taintedness/trustedness enums/macros deprecated for 4 years | Nobuyoshi Nakada | |
| 2025-12-25 | Implement declaring weak references | Peter Zhu | |
| [Feature #21084] # Summary The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`. This presents challenges because Ruby's GC is incremental, meaning that if the `ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents any cleanup to be run (e.g. freeing memory or deleting entries from hash tables). This ticket proposes `rb_gc_declare_weak_references` which declares that an object has weak references and calls a cleanup function after marking, allowing the object to clean up any memory for dead objects. # Introduction In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an API allowing objects to mark weak references, the function signature looks like this: ```c void rb_gc_mark_weak(VALUE *ptr); ``` `rb_gc_mark_weak` is called during the marking phase of the GC to specify that the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced. `rb_gc_mark_weak` appends this pointer to a list that is processed after the marking phase of the GC. If the object at `*ptr` is no longer alive, then it overwrites the object reference with a special value (`*ptr = Qundef`). However, this API resulted in two challenges: 1. Ruby's default GC is incremental, which means that the GC is not ran in one phase, but rather split into chunks of work that interleaves with Ruby execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc heap, and that memory could be realloc'd or even free'd. We had to use workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted runtime performance, and increased memory usage. 2. When an object dies, `rb_gc_mark_weak` only overwites the reference with `Qundef`. This means that if we want to do any cleanup (e.g. free a piece of memory or delete a hash table entry), we could not do that and had to defer this process elsewhere (e.g. during marking or runtime). In this ticket, I'm proposing a new API for weak references. Instead of an object marking its weak references during the marking phase, the object declares that it has weak references using the `rb_gc_declare_weak_references` function. This declaration occurs during runtime (e.g. after the object has been created) rather than during GC. After an object declares that it has weak references, it will have its callback function called after marking as long as that object is alive. This callback function can then call a special function `rb_gc_handle_weak_references_alive_p` to determine whether its references are alive. This will allow the callback function to do whatever it wants on the object, allowing it to perform any cleanup work it needs. This significantly simplifies the code for `ObjectSpace::WeakMap` and `ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for the limitations of `rb_gc_mark_weak`. # Performance The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now about 60% faster because the implementation has been simplified and the number of allocations has been reduced. We can see that there is not a significant impact on the performance of `ObjectSpace::WeakMap#[]`. Base: ``` ObjectSpace::WeakMap#[]= 4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s ObjectSpace::WeakMap#[] 30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s ``` Branch: ``` ObjectSpace::WeakMap#[]= 7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s ObjectSpace::WeakMap#[] 30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s ``` Code: ``` require "bundler/inline" gemfile do source "https://rubygems.org" gem "benchmark-ips" end wmap = ObjectSpace::WeakMap.new key = Object.new val = Object.new wmap[key] = val Benchmark.ips do |x| x.report("ObjectSpace::WeakMap#[]=") do |times| i = 0 while i < times wmap[Object.new] = Object.new i += 1 end end x.report("ObjectSpace::WeakMap#[]") do |times| i = 0 while i < times wmap[key] wmap[val] # does not exist i += 1 end end end ``` # Alternative designs Currently, `rb_gc_declare_weak_references` is designed to be an internal-only API. This allows us to assume the object types that call `rb_gc_declare_weak_references`. In the future, if we want to open up this API to third parties, we may want to change this function to something like: ```c void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj)); ``` This will allow the third party to implement a custom `callback` that gets called after the marking phase of GC to clean up any dead references. I chose not to implement this design because it is less efficient as we would need to store a mapping from `obj` to `callback`, which requires extra memory. | |||
| 2025-12-18 | JIT: Move EC offsets to jit_bindgen_constants | John Hawthorn | |
| Co-authored-by: Alan Wu <alanwu@ruby-lang.org> | |||
| 2025-12-18 | Co-authored-by: Luke Gruber <luke.gru@gmail.com> | John Hawthorn | |
| Co-authored-by: Alan Wu <alanwu@ruby-lang.org> YJIT: Support calling bmethods in Ractors Co-authored-by: Luke Gruber <luke.gru@gmail.com> Suggestion from Alan | |||
| 2025-12-18 | YJIT: Support calling bmethods in Ractors | John Hawthorn | |
| Co-authored-by: Luke Gruber <luke.gru@gmail.com> | |||
| 2025-12-17 | JITs: Pass down GNU make jobserver resources when appropriate | Alan Wu | |
| To fix warnings from rustc on e.g. Make 4.3, which is in Ubuntu 24.04: > warning: failed to connect to jobserver from environment variable | |||
| 2025-12-16 | YJIT: Print `Rc` strong and weak count on assert failure | Alan Wu | |
| For <https://bugs.ruby-lang.org/issues/21716>, the panic is looking like some sort of third party memory corruption, with YJIT taking the fall. At the point of this assert, the assembler has dropped, so there's nothing in YJIT's code other than JITState that could be holding on to these transient `PendingBranchRef`. The strong count being more than a handful or the weak count is non-zero shows that someone in the process (likely some native extension) corrupted the Rc's counts. | |||
| 2025-12-15 | YJIT: Bail out if proc would be stored above stack top | Randy Stauner | |
| Fixes [Bug #21266]. | |||
| 2025-12-12 | YJIT: Fix panic from overly loose filtering in identity method inlining | Alan Wu | |
| Credits to @rwstauner for noticing this issue in GH-15533. | |||
| 2025-12-12 | YJIT: Add missing local variable type update for fallback setlocal blocks | Alan Wu | |
| Previously, the chain_depth>0 version of setlocal blocks did not update the type of the local variable in the context. This can leave the context with stale type information and trigger panics like in [Bug #21772] or lead to miscompilation. To trigger the issue, YJIT needs to see the same ISEQ before and after environment escape and have tracked type info before the escape. To trigger in ISEQs that do not send with a block, it probably requires Kernel#binding or the use of include/ruby/debug.h APIs. | |||
| 2025-12-10 | JITs: Drop cargo and use just rustc for release combo build | Alan Wu | |
| So we don't expose builders to network flakiness which cannot be worked around using cargo's --offline flag. | |||
