summaryrefslogtreecommitdiff
path: root/yjit/src/codegen.rs
AgeCommit message (Collapse)Author
2025-01-30YJIT: Explicitly specify C ABI to fix a nightly Rust warningAlan Wu
2025-01-28YJIT: Initialize locals in ISeqs defined with `...` (#12660)Alan Wu
* YJIT: Fix indentation [ci skip] Fixes: cdf33ed5f37f9649c482c3ba1d245f0d80ac01ce * YJIT: Initialize locals in ISeqs defined with `...` Previously, callers of forwardable ISeqs moved the stack pointer up without writing to the stack. If there happens to be a stale value in the area skipped over, it could crash due to "try to mark T_NONE". Also, the uninitialized local variables were observable through `binding`. Initialize the locals to nil. [Bug #21021] Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2025-01-10YJIT: Rename send_iseq_forwarding->send_forwardingAlan Wu
It's in gen_send_general(), so nothing specifically to do with iseqs. Notes: Merged: https://github.com/ruby/ruby/pull/12550
2025-01-08YJIT: Filter `&` calls from specialized C method codegenAlan Wu
Evident with the crash reported in [Bug #20997], the C replacement codegen functions aren't authored to handle block arguments (nor should they because the extra code from the complexity defeats optimization). Filter sites with VM_CALL_ARGS_BLOCKARG. Notes: Merged: https://github.com/ruby/ruby/pull/12536
2025-01-04YJIT: Fix crash when yielding keyword argumentsAlan Wu
Previously, the code for dropping surplus arguments when yielding into blocks erroneously attempted to drop keyword arguments when there is in fact no surplus arguments. Fix the condition and test that supplying the exact number of keyword arguments as require compiles without fallback. Notes: Merged: https://github.com/ruby/ruby/pull/12499
2024-12-17YJIT: Load registers on JIT entry to reuse blocks (#12355)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-12-13YJIT: Speculate block arg for `c_func_method(&nil)` calls (#12326)Alan Wu
A good amount of call sites always pass nil as block argument, but the nil doesn't show up in the context. Put a runtime guard for those cases to handle it. Particular relevant for the `ruby-lsp` benchmark in `yjit-bench`. Up to a 2% speedup across headline benchmarks. Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: Kevin Menard <kevin@nirvdrum.com> Co-authored-by: Randy Stauner <randy.stauner@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-12-09YJIT: Add a comment about a lazy frame callTakashi Kokubun
jit_prepare_lazy_frame_call is a complicated trick and comes with memory overhead. Every use of the function should come with justification.
2024-12-09YJIT: Spill/load argument registers to reuse blocks (#12287)Takashi Kokubun
* YJIT: Spill/load argument registers to reuse blocks * Mention the immediate function name * Explain the context behind spill/load operations Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-12-04YJIT: Generate specialized code for Symbol for objtostring (#12247)Maximillian Polhill
* YJIT: Generate specialized code for Symbol for objtostring Co-authored-by: John Hawthorn <john@hawthorn.email> * Update yjit/src/codegen.rs --------- Co-authored-by: John Hawthorn <john@hawthorn.email> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-29YJIT: Avoid std::ffi::CString with rb_intern2() during bootAlan Wu
Fewer allocations on boot, too. Suggested-by: https://github.com/ruby/ruby/pull/12217 Notes: Merged: https://github.com/ruby/ruby/pull/12220
2024-11-28YJIT: Add missing prepare before calling str_dupJohn Hawthorn
Notes: Merged: https://github.com/ruby/ruby/pull/12202
2024-11-26YJIT: Implement opt_reverse insn (#12175)Randy Stauner
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-26Optimize instructions when creating an array just to call `include?` (#12123)Randy Stauner
* Add opt_duparray_send insn to skip the allocation on `#include?` If the method isn't going to modify the array we don't need to copy it. This avoids the allocation / array copy for things like `[:a, :b].include?(x)`. This adds a BOP for include? and tracks redefinition for it on Array. Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * YJIT: Implement opt_duparray_send include_p Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * Update opt_newarray_send to support simple forms of include?(arg) Similar to opt_duparray_send but for non-static arrays. * YJIT: Implement opt_newarray_send include_p --------- Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-20YJIT: Refactor to forward jump_to_next_insn() return valueAlan Wu
It's more concise this way and since `return Some(EndBlock)` is the only correct answer, no point repeating it everywhere. Notes: Merged: https://github.com/ruby/ruby/pull/12124
2024-11-20YJIT: Abandon block when gen_outlined_exit() failsAlan Wu
When CodeBlock::set_page fails (part of next_page(), see their docs for exact conditions), it can cause gen_outlined_exit() to fail while there is still plenty of memory available. Previously, this can have YJIT running incomplete code due to taking the early return in end_block_with_jump() that manifested as crashes with SIGILL. Add and use a wrapper with error handling. Notes: Merged: https://github.com/ruby/ruby/pull/12124
2024-11-14YJIT: Specialize String#dup (#12090)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-14YJIT: Specialize Integer#pred (#12082)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-13YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069)Randy Stauner
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments String#[] is in the top few C calls of several YJIT benchmarks: liquid-compile rubocop mail sudoku This speeds up these benchmarks by 1-2%. * YJIT: Try harder to get type info for `String#[]` In the large generated code of the mail gem the context doesn't have the type info. In that case if we peek at the stack and add a guard we can still apply the specialization and it speeds up the mail benchmark by 5%. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-11Fix false-positive memory leak using Valgrind in YJIT (#12057)Peter Zhu
When we run with RUBY_FREE_AT_EXIT, there's a false-positive memory leak reported in YJIT because the METHOD_CODEGEN_TABLE is never freed. This commit adds rb_yjit_free_at_exit that is called at shutdown when RUBY_FREE_AT_EXIT is set. Reported memory leak: ==699816== 1,104 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==699816== at 0x484680F: malloc (vg_replace_malloc.c:446) ==699816== by 0x155B3E: UnknownInlinedFun (unix.rs:14) ==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:36) ==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:27) ==699816== by 0x155B3E: alloc (alloc.rs:98) ==699816== by 0x155B3E: alloc_impl (alloc.rs:181) ==699816== by 0x155B3E: allocate (alloc.rs:241) ==699816== by 0x155B3E: do_alloc<alloc::alloc::Global> (alloc.rs:15) ==699816== by 0x155B3E: new_uninitialized<alloc::alloc::Global> (mod.rs:1750) ==699816== by 0x155B3E: fallible_with_capacity<alloc::alloc::Global> (mod.rs:1788) ==699816== by 0x155B3E: prepare_resize<alloc::alloc::Global> (mod.rs:2864) ==699816== by 0x155B3E: resize_inner<alloc::alloc::Global> (mod.rs:3060) ==699816== by 0x155B3E: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2950) ==699816== by 0x155B3E: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1231) ==699816== by 0x5BC39F: UnknownInlinedFun (mod.rs:1179) ==699816== by 0x5BC39F: find_or_find_insert_slot<(usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool), alloc::alloc::Global, hashbrown::map::equivalent_key::{closure_env#0}<usize, usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool>, hashbrown::map::make_hasher::{closure_env#0}<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std::hash::random::RandomState>> (mod.rs:1413) ==699816== by 0x5BC39F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754) ==699816== by 0x57C5C6: insert<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std::hash::random::RandomState> (map.rs:1104) ==699816== by 0x57C5C6: yjit::codegen::reg_method_codegen (codegen.rs:10521) ==699816== by 0x57C295: yjit::codegen::yjit_reg_method_codegen_fns (codegen.rs:10464) ==699816== by 0x5C6B07: rb_yjit_init (yjit.rs:40) ==699816== by 0x393723: ruby_opt_init (ruby.c:1820) ==699816== by 0x393723: ruby_opt_init (ruby.c:1767) ==699816== by 0x3957D4: prism_script (ruby.c:2215) ==699816== by 0x3957D4: process_options (ruby.c:2538) ==699816== by 0x396065: ruby_process_options (ruby.c:3166) ==699816== by 0x236E56: ruby_options (eval.c:117) ==699816== by 0x15BAED: rb_main (main.c:43) ==699816== by 0x15BAED: main (main.c:62) After this patch, there are no more memory leaks reported when running RUBY_FREE_AT_EXIT with Valgrind on an empty Ruby script: $ RUBY_FREE_AT_EXIT=1 valgrind --leak-check=full ruby -e "" ... ==700357== HEAP SUMMARY: ==700357== in use at exit: 0 bytes in 0 blocks ==700357== total heap usage: 36,559 allocs, 36,559 frees, 6,064,783 bytes allocated ==700357== ==700357== All heap blocks were freed -- no leaks are possible Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-08YJIT: Always abandon the block when gen_branch() or defer_compilation() failsAlan Wu
In [1], we started checking for gen_branch failures, but I made two crucial mistakes. One, defer_compilation() had the same issue as gen_branch() but wasn't checked. Two, returning None from a codegen function does not throw away the block. Checking how gen_single_block() handles codegen functions, you can see that None terminates the block with an exit, but does not overall return an Err. This handling is fine for unimplemented instructions, for example, but incorrect in case gen_branch() fails. The missing branch essentially corrupts the block; adding more code after a missing branch doesn't correct the code. Always abandon the block when defer_compilation() or gen_branch() fails. [1]: cb661d7d82984cdb54485ea3f4af01ac21960882 Fixup: [1] Notes: Merged: https://github.com/ruby/ruby/pull/12035 Merged-By: XrXr
2024-10-23YJIT: Check when gen_branch() failsAlan Wu
We got some core dumps in the wild where a PendingBranch had everything as None, leading to a panic unwrapping in PendingBranch::into_branch(). This happened while compiling a `branchif`. It seems that the only way this can happen is when core::gen_branch() fails, but not due to OOM. We wouldn't have reach into_branch() when OOM, and the only way to not leave markers that would've set the branch's start_addr to some value in gen_branch() is for set_target() to fail, causing an early return. Unfortunately, it's hard to tell the exact sequence of events that led to this situation, but regardless, the dumps show us that we should check for errors in gen_branch(). Because gen_branch() is used deep in the stack during compilation (e.g. guard_known_class() -> jit_chain_guard() -> gen_branch()), it'd be bad for compile speed to propagate the error everywhere, not to mention the massive patch required. Opt for a flag checked near the end of compilation. Notes: Merged: https://github.com/ruby/ruby/pull/11938 Merged-By: XrXr
2024-10-22Rewrite Numeric#dup and Numeric#+@ in Ruby (#11933)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-10-22YJIT: Implement specialization for no-op `{Kernel,Numeric}#dup`Alan Wu
Type information in the context for no additional work! This is the `if (special_object_p(obj)) return obj;` path in rb_obj_dup() and for Numeric#dup, it's always the identity function. Notes: Merged: https://github.com/ruby/ruby/pull/11926
2024-10-21YJIT: Rename method substitution functions and improve docs (+1) (#11919)Alan Wu
* YJIT: Fill in commented-out assertion * YJIT: Rename yjit_reg_method() and add links in docs Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-10-18YJIT: Allow shareable consts in multi-ractor mode (#11917)John Hawthorn
* Update yjit-bindgen deps * YJIT: Allow shareable consts in multi-ractor mode * Update yjit/src/codegen.rs Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-10-17YJIT: In stats, group by resolved C method nameAlan Wu
Previously, in the "Top-N most frequent C calls" section of --yjit-stats output, we printed the class name of the receiver, not the method owner. This meant that calls on subclass instances that land on the same method showed up as different entires. Similarly, method called using an alias showed up as different entries from other aliases. Group by the resolved method instead. Test program: 1.itself; [].itself; true.inspect; true.to_s Before: Top-4 most frequent C calls (80.0% of C calls): 1 (20.0%): Integer#itself 1 (20.0%): TrueClass#to_s 1 (20.0%): TrueClass#inspect 1 (20.0%): Array#itself After: Top-2 most frequent C calls (80.0% of C calls): 2 (40.0%): Kernel#itself 2 (40.0%): TrueClass#to_s Notes: Merged: https://github.com/ruby/ruby/pull/11913
2024-10-17YJIT: Add compilation log (#11818)Kevin Menard
* YJIT: Add `--yjit-compilation-log` flag to print out the compilation log at exit. * YJIT: Add an option to enable the compilation log at runtime. * YJIT: Fix a typo in the `IseqPayload` docs. * YJIT: Add stubs for getting the YJIT compilation log in memory. * YJIT: Add a compilation log based on a circular buffer to cap the log size. * YJIT: Allow specifying either a file or directory name for the YJIT compilation log. The compilation log will be populated as compilation events occur. If a directory is supplied, then a filename based on the PID will be used as the write target. If a file name is supplied instead, the log will be written to that file. * YJIT: Add JIT compilation of C function substitutions to the compilation log. * YJIT: Add compilation events to the circular buffer even if output is sent to a file. Previously, the two modes were treated as being exclusive of one another. However, it could be beneficial to log all events to a file while also allowing for direct access of the last N events via `RubyVM::YJIT.compilation_log`. * YJIT: Make timestamps the first element in the YJIT compilation log tuple. * YJIT: Stream log to stderr if `--yjit-compilation-log` is supplied without an argument. * YJIT: Eagerly compute compilation log messages to avoid hanging on to references that may GC. * YJIT: Log all compiled blocks, not just the method entry points. * YJIT: Remove all compilation events other than block compilation to slim down the log. * YJIT: Replace circular buffer iterator with a consuming loop. * YJIT: Support `--yjit-compilation-log=quiet` as a way to activate the in-memory log without printing it. Co-authored-by: Randy Stauner <randy.stauner@shopify.com> * YJIT: Promote the compilation log to being the one YJIT log. Co-authored-by: Randy Stauner <randy.stauner@shopify.com> * Update doc/yjit/yjit.md * Update doc/yjit/yjit.md --------- Co-authored-by: Randy Stauner <randy.stauner@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-10-08YJIT: Fastpath for Module#name (#11819)Alan Wu
Module#name shows up as a top C method callee in lobsters so probably common enough. It's also easy to substitute thanks to rb_mod_name() already having no GC yield points. klass = BasicObject 50_000_000.times { klass.name } Benchmark 1: /.rubies/post/bin/ruby --yjit mod_name.rb Time (mean ± σ): 1.433 s ± 0.010 s [User: 1.410 s, System: 0.010 s] Range (min … max): 1.421 s … 1.449 s 10 runs Benchmark 2: /.rubies/mstr/bin/ruby --yjit mod_name.rb Time (mean ± σ): 1.491 s ± 0.012 s [User: 1.468 s, System: 0.010 s] Range (min … max): 1.470 s … 1.511 s 10 runs Summary /.rubies/post/bin/ruby --yjit mod_name.rb ran 1.04 ± 0.01 times faster than /.rubies/mstr/bin/ruby --yjit mod_name.rb
2024-10-07YJIT: Add --yjit-mem-size option (#11810)Takashi Kokubun
* YJIT: Add --yjit-mem-size option * Improve --help * s/the region/this virtual memory region/ Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-09-30Return an Iterator Instead of a Vector in `addrs_to_pages` Method (#11725)whtsht
* Returning an iterator instead of a vec * Avoid changing the meaning of end_page --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-09-05Optimized instruction for Hash#freezeÉtienne Barrié
If a Hash which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_hash_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-09-05Optimized instruction for Array#freezeÉtienne Barrié
If an Array which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_ary_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-08-27YJIT: Pass method arguments using registers (#11280)Takashi Kokubun
* YJIT: Pass method arguments using registers * s/at_current_insn/at_compile_target/ * Implement register shuffle Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-08-08YJIT: Allow tracing fallback counters (#11347)Takashi Kokubun
* YJIT: Allow tracing fallback counters * Update yjit.md about --yjit-trace-exits=counter Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-08-02YJIT: Enhance the `String#<<` method substitution to handle integer ↵Kevin Menard
codepoint values. (#11032) * Document why we need to explicitly spill registers. * Simplify passing a byte value to `str_buf_cat`. * YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. * YJIT: Move runtime type check into YJIT. Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-07-31YJIT: Decouple Context from encoding details (#11283)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-07-29Expand opt_newarray_send to support Array#pack with buffer keyword argRandy Stauner
Use an enum for the method arg instead of needing to add an id that doesn't map to an actual method name. $ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)' before: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 newarray 1 0009 putchilledstring "E*" 0011 getlocal_WC_0 b@0 0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG> 0015 leave ``` after: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 putchilledstring "E*" 0009 getlocal b@0, 0 0012 opt_newarray_send 3, 5 0015 leave ``` Notes: Merged: https://github.com/ruby/ruby/pull/11249
2024-07-18YJIT: Allow dev_nodebug to disasm release-mode code (#11198)Takashi Kokubun
* YJIT: Allow dev_nodebug to disasm release-mode code * Revert "YJIT: Squash canary before falling back" This reverts commit f05ad373d84909da7541bd6d6ace38b48eaf24a1. The stray canary issue should have been solved by def7023ee4a3fc6eeba9d3a34c31a5bcff315fac, alleviating this codegen accommodation. * s/runtime_assertions/runtime_checks/ --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-07-15YJIT: Local variable register allocation (#11157)Takashi Kokubun
* YJIT: Local variable register allocation * locals are not stack temps * Rename RegTemps to RegMappings * Rename RegMapping to RegOpnd * Rename local_size to num_locals * s/stack value/operand/ * Rename spill_temps() to spill_regs() * Clarify when num_locals becomes None * Mention that InsnOut uses different registers * Rename get_reg_mapping to get_reg_opnd * Resurrect --yjit-temp-regs capability * Use MAX_CTX_TEMPS and MAX_CTX_LOCALS
2024-07-08YJIT: `dump-disasm`: Print comments and bytes in release buildsAlan Wu
This change implements a fallback mode for the `--yjit-dump-disasm` development command-line option to make it usable in release builds. Previously, using the option with release builds of YJIT yielded only a warning asking the user to build with `--enable-yjit=dev`. While builds that use the `disasm` feature still give the best output, just having the comments is useful enough for many kinds of debugging. Having it usable in release builds is nice for new hackers, too, since this allows for tinkering without having to learn how to build YJIT in development mode. Sample output on A64: ``` # regenerate_branch # Insn: 0001 opt_send_without_block (stack_size: 1) # guard known object with singleton class 0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00 0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5 # RUBY_VM_CHECK_INTS(ec) 0x11f7e0050: 8b 02 42 b8 cb 07 01 35 # stack overflow check 0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54 # save PC to CFP 0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00 0x11f7e0073: f8 ab 82 00 91 ``` To ensure this feature doesn't incur too much cost when running without the `--yjit-dump-disasm` option, I checked that there is no significant impact to compile time and memory usage with the `compile_time_ns` and `yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The statistics summary and done with the `summary` function in R. Compile time, sample size of 60, lower is better: ``` Before After Min. :2.054e+09 Min. :2.028e+09 1st Qu.:2.069e+09 1st Qu.:2.044e+09 Median :2.081e+09 Median :2.060e+09 Mean :2.089e+09 Mean :2.066e+09 3rd Qu.:2.109e+09 3rd Qu.:2.085e+09 Max. :2.146e+09 Max. :2.144e+09 ``` Allocation size, sample size of 20, lower is better: ``` Before After Min. :21804742 Min. :21794082 1st Qu.:21826682 1st Qu.:21816282 Median :21844042 Median :21826814 Mean :21960664 Mean :22026291 3rd Qu.:21861228 3rd Qu.:22040439 Max. :22587426 Max. :22930614 ``` The `yjit_alloc_size` samples are noisy, but since the average increased by only 0.3%, and the median is lower, I feel safe saying that there is no significant change.
2024-07-02YJIT: Inline simple ISEQs with unused keyword parametersGabriel Lacroix
This commit expands inlining for simple ISeqs to accept callees that have unused keyword parameters and callers that specify unused keywords. The following shows 2 new callsites that will be inlined: ```ruby def let(a, checked: true) = a let(1) let(1, checked: false) ``` Co-authored-by: Kaan Ozkan <kaan.ozkan@shopify.com>
2024-06-29[YJIT] Don't expand kwargs on forwardingAaron Patterson
Similarly to splat arrays, we shouldn't expand splat kwargs. [ruby-core:118401]
2024-06-28YJIT: Fix `cargo doc --document-private-items` warnings [ci skip]Alan Wu
Mostly putting angle brackets around links to follow markdown syntax.
2024-06-28YJIT: Move `ocb` parameters into `JITState`Alan Wu
Many functions take an outlined code block but do nothing more than passing it along; only a couple of functions actually make use of it. So, in most cases the `ocb` parameter is just boilerplate. Most functions that take `ocb` already also take a `JITState` and this commit moves `ocb` into `JITState` to remove the visual noise of the `ocb` parameter.
2024-06-26[YJIT] Fix block and splat handling when forwardingAaron Patterson
This commit fixes splat and block handling when calling in to a forwarding iseq. In the case of a splat we need to avoid expanding the array to the stack. We need to also ensure the CI write is flushed to the SP, otherwise it's possible for a block handler to clobber the CI [ruby-core:118360]
2024-06-18Add two new instructions for forwarding callsAaron Patterson
This commit adds `sendforward` and `invokesuperforward` for forwarding parameters to calls Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
2024-06-18Optimized forwarding callers and calleesAaron Patterson
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(*a) = a def foo(...) list = [1, 2] bar(*list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | 5| delegatee(...) # CI2 (FORWARDING) | 6| end | 7| | 8| def caller | -> 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 -> 4| # | CI1 (argc: 2) 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | CI1 (argc: 2) -> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | self 9| delegator(1, 2) # CI1 (argc: 2) | 1 10| end | 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-07YJIT: implement variable-length context encoding scheme (#10888)Maxime Chevalier-Boisvert
* Implement BitVector data structure for variable-length context encoding * Rename method to make intent clearer * Rename write_uint => push_uint to make intent clearer * Implement debug trait for BitVector * Fix bug in BitVector::read_uint_at(), enable more tests * Add one more test for good measure * Start sketching Context::encode() * Progress on variable length context encoding * Add tests. Fix bug. * Encode stack state * Add comments. Try to estimate context encoding size. * More compact encoding for stack size * Commit before rebase * Change Context::encode() to take a BitVector as input * Refactor BitVector::read_uint(), add helper read functions * Implement Context::decode() function. Add test. * Fix bug, add tests * Rename methods * Add Context::encode() and decode() methods using global data * Make encode and decode methods use u32 indices * Refactor YJIT to use variable-length context encoding * Tag functions as allow unused * Add a simple caching mechanism and stats for bytes per context etc * Add comments, fix formatting * Grow vector of bytes by 1.2x instead of 2x * Add debug assert to check round-trip encoding-decoding * Take some rustfmt formatting * Add decoded_from field to Context to reuse previous encodings * Remove olde context stats * Re-add stack_size assert * Disable decoded_from optimization for now
2024-06-04Do not emit shape transition warnings when YJIT is compilingJean Boussier
[Bug #20522] If `Warning.warn` is redefined in Ruby, emitting a warning would invoke Ruby code, which can't safely be done when YJIT is compiling.