summaryrefslogtreecommitdiff
path: root/zjit/src/codegen.rs
AgeCommit message (Collapse)Author
6 hoursZJIT: Optimize common `invokesuper` cases (#15816)Kevin Menard
* ZJIT: Profile `invokesuper` instructions * ZJIT: Introduce the `InvokeSuperDirect` HIR instruction The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ. * ZJIT: Expand definition of unspecializable to more complex cases * ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified * ZJIT: Simplify `invokesuper` specialization to most common case Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`. * ZJIT: Track `super` method entries directly to avoid GC issues Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children. * ZJIT: Optimize `super` calls with simple argument forms * ZJIT: Report the reason why we can't optimize an `invokesuper` instance * ZJIT: Revise send fallback reasons for `super` calls * ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
2 daysZJIT: Check arg limit before pushing SendWithoutBLockDirect insn (#15854)Randy Stauner
This reduces some processing and makes the HIR more accurate.
2 daysZJIT: Optimize Integer#[]Max Bernstein
This is used a lot in optcarrot.
6 daysZJIT: Replace GuardShape with LoadField+GuardBitEquals (#15821)Max Bernstein
GuardShape is just load+guard, so use the existing HIR instructions for load+guard. Probably makes future analysis slightly easier.
7 daysZJIT: Add ArrayAset instruction to HIR (#15747)Nozomi Hijikata
Inline `Array#[]=` into `ArrayAset`.
2025-12-18JIT: Move EC offsets to jit_bindgen_constantsJohn Hawthorn
Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
2025-12-16ZJIT: Use rb_zjit_writebarrier_check_immediate() instead of ↵Benoit Daloze
rb_gc_writebarrier() in gen_write_barrier() * To avoid calling rb_gc_writebarrier() with an immediate value in gen_write_barrier(), and avoid the LIR jump issue.
2025-12-16Revert "ZJIT: Do not call rb_gc_writebarrier() with an immediate value in ↵Benoit Daloze
gen_write_barrier()" * This reverts commit 623559faa3dd0927b4034a752226a30ae8821604. * There is an issue with the jump in LIR, see https://github.com/ruby/ruby/pull/15542.
2025-12-16ZJIT: Do not call rb_gc_writebarrier() with an immediate value in ↵Benoit Daloze
gen_write_barrier()
2025-12-16Revert "ZJIT: Allow ccalls above 7 arguments"Alan Wu
This reverts commit 2f151e76b5dc578026706b31f054d5caf5374b05. The SP decrement (push) before the call do not match up with the pops after the call, so registers were restored incorrectly. Code from: ./miniruby --zjit-call-threshold=1 --zjit-dump-disasm -e 'p Time.new(1992, 9, 23, 23, 0, 0, :std)' str x11, [sp, #-0x10]! str x12, [sp, #-0x10]! stur x7, [sp] # last argument mov x0, x20 mov x7, x6 mov x6, x5 mov x5, x4 mov x4, x3 mov x3, x2 mov x2, x1 ldur x1, [x29, #-0x20] mov x16, #0xccfc movk x16, #0x2e7, lsl #16 movk x16, #1, lsl #32 blr x16 ldr x12, [sp], #0x10 # supposed to match str x12, [sp, #-0x10]!, but got last argument ldr x11, [sp], #0x10
2025-12-13ZJIT: Nil-fill locals in direct send (#15536)Randy Stauner
Avoid garbage reads from locals in eval. Before the fix the test fails with <"[\"x\", \"x\", \"x\", \"x\"]"> expected but was <"[\"x\", \"x\", \"x\", \"x286326928\"]">.
2025-12-12ZJIT: Allow ccalls above 7 arguments (#15312)Aiden Fox Ivey
ZJIT: Add stack support for CCalls
2025-12-12ZJIT: Inline `Hash#[]=`Stan Lo
2025-12-12ZJIT: Add Shape type to HIR (#15528)Max Bernstein
It's just a nicety (they fit fine as CUInt32) but this makes printing look nicer in real execution and also in tests (helps with #15489). Co-authored-by: Randy Stauner <randy@r4s6.net>
2025-12-10ZJIT: Re-compile ISEQs invalidated by PatchPoint (#15459)Takashi Kokubun
2025-12-10ZJIT: Use inline format args (#15482)Alex Rocha
2025-12-09ZJIT: Put keyword bits in callee frame rather than c_argsRandy Stauner
2025-12-09ZJIT: Handle caller_kwarg in direct send when all keyword params are requiredRandy Stauner
2025-12-09ZJIT: Support opt_newarray_send with PACK_BUFFERMax Bernstein
2025-12-09ZJIT: Add codegen for FixnumDiv (#15452)Abrar Habib
Fixes https://github.com/Shopify/ruby/issues/902 This pull request adds code generation for dividing fixnums. Testing confirms the normal case, flooring, and side-exiting on division by zero.
2025-12-08ZJIT: Avoid redundant SP save in codegen (#15448)Stan Lo
2025-12-03ZJIT: Optimize setivar with shape transition (#15375)Max Bernstein
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes. lobsters stats before: ``` Top-20 calls to C functions from JIT code (79.4% of total 90,051,140): rb_vm_opt_send_without_block: 19,762,433 (21.9%) rb_vm_setinstancevariable: 7,698,314 ( 8.5%) rb_hash_aref: 6,767,461 ( 7.5%) rb_vm_env_write: 5,373,080 ( 6.0%) rb_vm_send: 5,049,229 ( 5.6%) rb_vm_getinstancevariable: 4,535,259 ( 5.0%) rb_obj_is_kind_of: 3,746,306 ( 4.2%) rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%) rb_vm_invokesuper: 3,037,467 ( 3.4%) rb_ary_entry: 2,351,983 ( 2.6%) rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%) rb_vm_invokeblock: 1,184,474 ( 1.3%) Hash#[]=: 1,064,288 ( 1.2%) rb_gc_writebarrier: 1,006,972 ( 1.1%) rb_ec_ary_new_from_values: 902,687 ( 1.0%) fetch: 898,667 ( 1.0%) rb_str_buf_append: 833,787 ( 0.9%) rb_class_allocate_instance: 822,024 ( 0.9%) Hash#fetch: 699,580 ( 0.8%) _bi20: 682,068 ( 0.8%) Top-4 setivar fallback reasons (100.0% of total 7,732,326): shape_transition: 6,032,109 (78.0%) not_monomorphic: 1,469,300 (19.0%) not_t_object: 172,636 ( 2.2%) too_complex: 58,281 ( 0.8%) ``` lobsters stats after: ``` Top-20 calls to C functions from JIT code (79.0% of total 88,322,656): rb_vm_opt_send_without_block: 19,777,880 (22.4%) rb_hash_aref: 6,771,589 ( 7.7%) rb_vm_env_write: 5,372,789 ( 6.1%) rb_gc_writebarrier: 5,195,527 ( 5.9%) rb_vm_send: 5,049,145 ( 5.7%) rb_vm_getinstancevariable: 4,538,485 ( 5.1%) rb_obj_is_kind_of: 3,746,241 ( 4.2%) rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%) rb_vm_invokesuper: 3,037,157 ( 3.4%) rb_ary_entry: 2,351,968 ( 2.7%) rb_vm_setinstancevariable: 1,703,337 ( 1.9%) rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%) rb_vm_invokeblock: 1,184,290 ( 1.3%) Hash#[]=: 1,061,868 ( 1.2%) rb_ec_ary_new_from_values: 902,666 ( 1.0%) fetch: 898,666 ( 1.0%) rb_str_buf_append: 833,784 ( 0.9%) rb_class_allocate_instance: 821,778 ( 0.9%) Hash#fetch: 755,913 ( 0.9%) Top-4 setivar fallback reasons (100.0% of total 1,703,337): not_monomorphic: 1,472,405 (86.4%) not_t_object: 172,629 (10.1%) too_complex: 58,281 ( 3.4%) new_shape_needs_extension: 22 ( 0.0%) ``` I also noticed that primitive printing in HIR was broken so I fixed that. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2025-12-03ZJIT: Optimize NewArray to use rb_ec_ary_new_from_values (#15391)Goshanraj Govindaraj
2025-12-02ZJIT: Optimize GetIvar for non-T_OBJECTBenoit Daloze
* All Invariant::SingleRactorMode PatchPoint are replaced by assume_single_ractor_mode() to fix https://github.com/Shopify/ruby/issues/875 for SingleRactorMode patchpoints.
2025-12-01ZJIT: Open-code String#getbyteMax Bernstein
Don't call a C function.
2025-12-01ZJIT: Specialize Integer#>>Max Bernstein
Same as Integer#>>. Also add more strict type checks for both Integer#>> and Integer#<<.
2025-12-01ZJIT: Specialize String#<< with FixnumMax Bernstein
Append a codepoint.
2025-12-01ZJIT: Standardize method dispatch insns' `recv` field (#15334)Stan Lo
ZJIT: Standardize C call related insn fields - Add `recv` field to `CCall` and `CCallWithFrame` so now all method dispatch related instructions have `recv` field, separate from `args` field. This ensures consistent pointer arithmetic when generating code for these instructions. - Standardize `recv` field's display position in send related instructions.
2025-12-01ZJIT: Optimize variadic cfunc `Send` calls into `CCallVariadic` (#14898)Stan Lo
ZJIT: Optimize variadic cfunc Send calls into CCallVariadic
2025-11-26ZJIT: Count fallback reasons for set/get/definedivar (#15324)Max Bernstein
lobsters: ``` Top-4 setivar fallback reasons (100.0% of total 7,789,008): shape_transition: 6,074,085 (78.0%) not_monomorphic: 1,484,013 (19.1%) not_t_object: 172,629 ( 2.2%) too_complex: 58,281 ( 0.7%) Top-3 getivar fallback reasons (100.0% of total 9,348,832): not_t_object: 4,658,833 (49.8%) not_monomorphic: 4,542,316 (48.6%) too_complex: 147,683 ( 1.6%) Top-3 definedivar fallback reasons (100.0% of total 366,383): not_monomorphic: 361,389 (98.6%) too_complex: 3,062 ( 0.8%) not_t_object: 1,932 ( 0.5%) ``` railsbench: ``` Top-3 setivar fallback reasons (100.0% of total 15,119,057): shape_transition: 13,760,763 (91.0%) not_monomorphic: 982,368 ( 6.5%) not_t_object: 375,926 ( 2.5%) Top-2 getivar fallback reasons (100.0% of total 14,438,747): not_t_object: 7,643,870 (52.9%) not_monomorphic: 6,794,877 (47.1%) Top-2 definedivar fallback reasons (100.0% of total 209,613): not_monomorphic: 209,526 (100.0%) not_t_object: 87 ( 0.0%) ``` shipit: ``` Top-3 setivar fallback reasons (100.0% of total 14,516,254): shape_transition: 8,613,512 (59.3%) not_monomorphic: 5,761,398 (39.7%) not_t_object: 141,344 ( 1.0%) Top-2 getivar fallback reasons (100.0% of total 21,016,444): not_monomorphic: 11,313,482 (53.8%) not_t_object: 9,702,962 (46.2%) Top-2 definedivar fallback reasons (100.0% of total 290,382): not_monomorphic: 287,755 (99.1%) not_t_object: 2,627 ( 0.9%) ```
2025-11-21ZJIT: Don't make GuardNotFrozen consider immediatesMax Bernstein
2025-11-21ZJIT: Inline GuardNotFrozen into LIRMax Bernstein
No sense calling a C function.
2025-11-21ZJIT: Specialize Module#=== and Kernel#is_a? into IsAMax Bernstein
2025-11-21ZJIT: Inline Integer#<< for constant rhs (#15258)Max Bernstein
This is good for protoboeuf and other binary parsing
2025-11-21ZJIT: Inline Thread.current (#15272)Max Bernstein
Add `LoadEC` then it's just two `LoadField`.
2025-11-21ZJIT: Inline ArrayLength into LIRMax Bernstein
2025-11-20ZJIT: Read `iseq->body->param` directly instead of through FFIAlan Wu
Going through a call to a C function just to read a bitfield was a little extreme. We did it to be super conservative since bitfields have historically been the trigger of many bugs and surprises. Let's try directly accessing them with code from rust-bindgen. If this ends up causing issues, we can use the FFI approach behind nicer wrappers. In any case, directly access regular struct fields such as `lead_num` and `opt_num` to remove boilerplate.
2025-11-20ZJIT: Compile the VM_OPT_NEWARRAY_SEND_HASH variant of opt_newarray_sendKevin Menard
2025-11-20ZJIT: Rename array length reference to make the code easier to followKevin Menard
2025-11-20ZJIT: Put optional interpreter cache on both GetIvar and SetIvarMax Bernstein
2025-11-20ZJIT: Fix pointer types for SetInstanceVariableMax Bernstein
2025-11-19ZJIT: Count all calls to C functions from generated code (#15240)Max Bernstein
lobsters: ``` Top-20 calls to C functions from JIT code (79.9% of total 97,004,883): rb_vm_opt_send_without_block: 19,874,212 (20.5%) rb_vm_setinstancevariable: 9,774,841 (10.1%) rb_ivar_get: 9,358,866 ( 9.6%) rb_hash_aref: 6,828,948 ( 7.0%) rb_vm_send: 6,441,551 ( 6.6%) rb_vm_env_write: 5,375,989 ( 5.5%) rb_vm_invokesuper: 3,037,836 ( 3.1%) Module#===: 2,562,446 ( 2.6%) rb_ary_entry: 2,354,546 ( 2.4%) Kernel#is_a?: 1,424,092 ( 1.5%) rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%) Thread.current: 1,300,822 ( 1.3%) rb_zjit_defined_ivar: 1,222,613 ( 1.3%) rb_vm_invokeblock: 1,184,555 ( 1.2%) Hash#[]=: 1,061,969 ( 1.1%) rb_ary_push: 1,024,987 ( 1.1%) rb_ary_new_capa: 904,003 ( 0.9%) rb_str_buf_append: 833,782 ( 0.9%) rb_class_allocate_instance: 822,626 ( 0.8%) Hash#fetch: 755,913 ( 0.8%) ``` railsbench: ``` Top-20 calls to C functions from JIT code (74.8% of total 189,170,268): rb_vm_opt_send_without_block: 29,870,307 (15.8%) rb_vm_setinstancevariable: 17,631,199 ( 9.3%) rb_hash_aref: 16,928,890 ( 8.9%) rb_ivar_get: 14,441,240 ( 7.6%) rb_vm_env_write: 11,571,001 ( 6.1%) rb_vm_send: 11,153,457 ( 5.9%) rb_vm_invokesuper: 7,568,267 ( 4.0%) Module#===: 6,065,923 ( 3.2%) Hash#[]=: 2,842,990 ( 1.5%) rb_ary_entry: 2,766,125 ( 1.5%) rb_ary_push: 2,722,079 ( 1.4%) rb_vm_invokeblock: 2,594,398 ( 1.4%) Thread.current: 2,560,129 ( 1.4%) rb_str_getbyte: 1,965,627 ( 1.0%) Kernel#is_a?: 1,961,815 ( 1.0%) rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%) rb_hash_new_with_size: 1,796,456 ( 0.9%) rb_class_allocate_instance: 1,785,043 ( 0.9%) String#empty?: 1,713,414 ( 0.9%) rb_ary_new_capa: 1,678,834 ( 0.9%) ``` shipit: ``` Top-20 calls to C functions from JIT code (83.4% of total 182,402,821): rb_vm_opt_send_without_block: 45,753,484 (25.1%) rb_ivar_get: 21,020,650 (11.5%) rb_vm_setinstancevariable: 17,528,603 ( 9.6%) rb_hash_aref: 11,892,856 ( 6.5%) rb_vm_send: 11,723,471 ( 6.4%) rb_vm_env_write: 10,434,452 ( 5.7%) Module#===: 4,225,048 ( 2.3%) rb_vm_invokesuper: 3,705,906 ( 2.0%) Thread.current: 3,337,603 ( 1.8%) rb_ary_entry: 3,114,378 ( 1.7%) Hash#[]=: 2,509,912 ( 1.4%) Array#empty?: 2,282,994 ( 1.3%) rb_vm_invokeblock: 2,210,511 ( 1.2%) Hash#fetch: 2,017,960 ( 1.1%) _bi20: 1,975,147 ( 1.1%) rb_zjit_defined_ivar: 1,897,127 ( 1.0%) rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%) rb_ary_new_capa: 1,615,406 ( 0.9%) Kernel#is_a?: 1,567,854 ( 0.9%) rb_class_allocate_instance: 1,560,035 ( 0.9%) ``` Thanks to @eregon for the idea. Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2025-11-18ZJIT: Rename the operand of Insn::GuardNotFrozen from val to recvBenoit Daloze
* When writing to an object, the receiver should be checked if it's frozen, not the value, so this avoids an error-prone autocomplete.
2025-11-18ZJIT: Inline setting Struct fieldsBenoit Daloze
* Add Insn::StoreField and Insn::WriteBarrier
2025-11-14ZJIT: Support JIT-to-JIT calls to callees with optional parametersAlan Wu
* Correct JIT entry points for optionals so each optional start with nil before their initialization routine runs. Establish `jit_entry_points[filled_opts_num]` gives the appropriate entry point * Correct number of HIR block parameters for each JIT entry point * Entry points that share the same ISEQ PC get separate entries since they start with different state. No more deduplication. * Reject post parameters. Was hidden behind check for optionals. * Make sure to visit every BB in iseq_to_hir(). Some wasn't visited when the initialization routine for an optional terminates the block in a `SideExit`. Remove the now impossible `FailedOptionalArguments`.
2025-11-14ZJIT: Check argument count matches callee's parametersAlan Wu
2025-11-10ZJIT: Rename things so that they aren't named "not_optimized_optimized" (#15135)Randy Stauner
These refer to "OptimizedMethodType" which is a subcategory of "MethodType::Optimized" so name them after the latter to avoid "not_optimized_optimized".
2025-11-10ZJIT: Set cfp->sp on leaf calls with GC (#15137)Takashi Kokubun
Co-authored-by: Randy Stauner <randy@r4s6.net>
2025-11-10ZJIT: Split unhandled_hir_insn and unknown_newarray_send stats (#15127)Takashi Kokubun
2025-11-10ZJIT: Rename not_optimized_instruction to uncategorized_instruction (#15130)Randy Stauner
Make it more obvious that this hasn't been handled and could be broken down more.