summaryrefslogtreecommitdiff
path: root/vm_insnhelper.c
AgeCommit message (Collapse)Author
2025-12-01vm_cc_new: don't assume `cme` is present. (#15322)Jean Boussier
[Bug #21694] `vm_search_super_method` explictly calls `vm_cc_new` with `cme=NULL` when there is no super class.
2025-10-08merge revision(s) bbf1130f918ca26e33aba4711ccf99a8083517ea, ↵Takashi Kokubun
43dbb9a93f4de3f1170d7d18641c30e81cc08365, 2bb6fe3854e2a4854bb89bfce4eaaea9d848fd1b, 7c9dd0ecff61153b96473c6c51d5582e809da489: [Backport #21629] [PATCH] Add `RBIMPL_ATTR_NONSTRING_ARRAY()` macro for GCC 15 [PATCH] [Bug #21629] Enable `nonstring` attribute on clang 21 [PATCH] [Bug #21629] Initialize `struct RString` [PATCH] [Bug #21629] Initialize `struct RArray`
2025-08-27Fix bad NameError raised using sendforward instruction through vcallLuke Gruber
If you called a VCALL method and the method takes forwarding arguments and then you forward those arguments along using the sendforward instruction, the method_missing class was wrongly chosen as NameError instead of NoMethodError. This is because the VM looked at the CallInfo of the vcall and determined it needed to raise NameError. Now we detect that case and raise NoMethodError. [Backport #21535]
2025-03-14Push a real iseq in rb_vm_push_frame_fname()Alan Wu
Previously, vm_make_env_each() (used during proc creation and for the debug inspector C API) picked up the non-GC-allocated iseq that rb_vm_push_frame_fname() creates, which led to a SEGV when the GC tried to mark the non GC object. Put a real iseq imemo instead. Speed should be about the same since the old code also did a imemo allocation and a malloc allocation. Real iseq allows ironing out the special-casing of dummy frames in rb_execution_context_mark() and rb_execution_context_update(). A check is added to RubyVM::ISeq#eval, though, to stop attempts to run dummy iseqs. [Bug #21180] Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-11-29Fix use-after-free in constant cachePeter Zhu
[Bug #20921] When we create a cache entry for a constant, the following sequence of events could happen: - vm_track_constant_cache is called to insert a constant cache. - In vm_track_constant_cache, we first look up the ST table for the ID of the constant. Assume the ST table exists because another iseq also holds a cache entry for this ID. - We then insert into this ST table with the iseq_inline_constant_cache. - However, while inserting into this ST table, it allocates memory, which could trigger a GC. Assume that it does trigger a GC. - The GC frees the one and only other iseq that holds a cache entry for this ID. - In remove_from_constant_cache, it will appear that the ST table is now empty because there are no more iseq with cache entries for this ID, so we free the ST table. - We complete GC and continue our st_insert. However, this ST table has been freed so we now have a use-after-free. This issue is very hard to reproduce, because it requires that the GC runs at a very specific time. However, we can make it show up by applying this patch which runs GC right before the st_insert to mimic the st_insert triggering a GC: diff --git a/vm_insnhelper.c b/vm_insnhelper.c index 3cb23f06f0..a93998136a 100644 --- a/vm_insnhelper.c +++ b/vm_insnhelper.c @@ -6338,6 +6338,10 @@ vm_track_constant_cache(ID id, void *ic) rb_id_table_insert(const_cache, id, (VALUE)ics); } + if (id == rb_intern("MyConstant")) rb_gc(); + st_insert(ics, (st_data_t) ic, (st_data_t) Qtrue); } And if we run this script: Object.const_set("MyConstant", "Hello!") my_proc = eval("-> { MyConstant }") my_proc.call my_proc = eval("-> { MyConstant }") my_proc.call We can see that ASAN outputs a use-after-free error: ==36540==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000049528 at pc 0x000102f3ceac bp 0x00016d607a70 sp 0x00016d607a68 READ of size 8 at 0x606000049528 thread T0 #0 0x102f3cea8 in do_hash st.c:321 #1 0x102f3ddd0 in rb_st_insert st.c:1132 #2 0x103140700 in vm_track_constant_cache vm_insnhelper.c:6345 #3 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 #4 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 #5 0x1030bc1e0 in vm_exec_core insns.def:263 #6 0x1030b55fc in rb_vm_exec vm.c:2585 #7 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 #8 0x102a82588 in rb_ec_exec_node eval.c:281 #9 0x102a81fe0 in ruby_run_node eval.c:319 #10 0x1027f3db4 in rb_main main.c:43 #11 0x1027f3bd4 in main main.c:68 #12 0x183900270 (<unknown module>) 0x606000049528 is located 8 bytes inside of 56-byte region [0x606000049520,0x606000049558) freed by thread T0 here: #0 0x104174d40 in free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54d40) #1 0x102ada89c in rb_gc_impl_free default.c:8183 #2 0x102ada7dc in ruby_sized_xfree gc.c:4507 #3 0x102ac4d34 in ruby_xfree gc.c:4518 #4 0x102f3cb34 in rb_st_free_table st.c:663 #5 0x102bd52d8 in remove_from_constant_cache iseq.c:119 #6 0x102bbe2cc in iseq_clear_ic_references iseq.c:153 #7 0x102bbd2a0 in rb_iseq_free iseq.c:166 #8 0x102b32ed0 in rb_imemo_free imemo.c:564 #9 0x102ac4b44 in rb_gc_obj_free gc.c:1407 #10 0x102af4290 in gc_sweep_plane default.c:3546 #11 0x102af3bdc in gc_sweep_page default.c:3634 #12 0x102aeb140 in gc_sweep_step default.c:3906 #13 0x102aeadf0 in gc_sweep_rest default.c:3978 #14 0x102ae4714 in gc_sweep default.c:4155 #15 0x102af8474 in gc_start default.c:6484 #16 0x102afbe30 in garbage_collect default.c:6363 #17 0x102ad37f0 in rb_gc_impl_start default.c:6816 #18 0x102ad3634 in rb_gc gc.c:3624 #19 0x1031406ec in vm_track_constant_cache vm_insnhelper.c:6342 #20 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 #21 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 #22 0x1030bc1e0 in vm_exec_core insns.def:263 #23 0x1030b55fc in rb_vm_exec vm.c:2585 #24 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 #25 0x102a82588 in rb_ec_exec_node eval.c:281 #26 0x102a81fe0 in ruby_run_node eval.c:319 #27 0x1027f3db4 in rb_main main.c:43 #28 0x1027f3bd4 in main main.c:68 #29 0x183900270 (<unknown module>) previously allocated by thread T0 here: #0 0x104174c04 in malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54c04) #1 0x102ada0ec in rb_gc_impl_malloc default.c:8198 #2 0x102acee44 in ruby_xmalloc gc.c:4438 #3 0x102f3c85c in rb_st_init_table_with_size st.c:571 #4 0x102f3c900 in rb_st_init_table st.c:600 #5 0x102f3c920 in rb_st_init_numtable st.c:608 #6 0x103140698 in vm_track_constant_cache vm_insnhelper.c:6337 #7 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 #8 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 #9 0x1030bc1e0 in vm_exec_core insns.def:263 #10 0x1030b55fc in rb_vm_exec vm.c:2585 #11 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 #12 0x102a82588 in rb_ec_exec_node eval.c:281 #13 0x102a81fe0 in ruby_run_node eval.c:319 #14 0x1027f3db4 in rb_main main.c:43 #15 0x1027f3bd4 in main main.c:68 #16 0x183900270 (<unknown module>) This commit fixes this bug by adding a inserting_constant_cache_id field to the VM, which stores the ID that is currently being inserted and, in remove_from_constant_cache, we don't free the ST table for ID equal to this one. Co-Authored-By: Alan Wu <alanwu@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/12203
2024-11-26Optimize instructions when creating an array just to call `include?` (#12123)Randy Stauner
* Add opt_duparray_send insn to skip the allocation on `#include?` If the method isn't going to modify the array we don't need to copy it. This avoids the allocation / array copy for things like `[:a, :b].include?(x)`. This adds a BOP for include? and tracks redefinition for it on Array. Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * YJIT: Implement opt_duparray_send include_p Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * Update opt_newarray_send to support simple forms of include?(arg) Similar to opt_duparray_send but for non-static arrays. * YJIT: Implement opt_newarray_send include_p --------- Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-25Fix vm_objtostring optimization for SymbolMaximillian Polhill
Co-authored-by: John Hawthorn <john@hawthorn.email> Notes: Merged: https://github.com/ruby/ruby/pull/12168
2024-11-06`Warning[:strict_unused_block]`Koichi Sasada
to show unused block warning strictly. ```ruby class C def f = nil end class D def f = yield end [C.new, D.new].each{|obj| obj.f{}} ``` In this case, `D#f` accepts a block. However `C#f` doesn't accept a block. There are some cases passing a block with `obj.f{}` where `obj` is `C` or `D`. To avoid warnings on such cases, "unused block warning" will be warned only if there is not same name which accepts a block. On the above example, `C.new.f{}` doesn't show any warnings because there is a same name `D#f` which accepts a block. We call this default behavior as "relax mode". `strict_unused_block` new warning category changes from "relax mode" to "strict mode", we don't check same name methods and `C.new.f{}` will be warned. [Feature #15554] Notes: Merged: https://github.com/ruby/ruby/pull/12005
2024-10-31Define `VM_ASSERT_TYPE` macrosNobuyoshi Nakada
2024-10-18YJIT: Allow shareable consts in multi-ractor mode (#11917)John Hawthorn
* Update yjit-bindgen deps * YJIT: Allow shareable consts in multi-ractor mode * Update yjit/src/codegen.rs Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-10-10Cast via `uintptr_t` function pointer between object pointerNobuyoshi Nakada
2024-10-08Add a macro to initialize union `cfunc_type`Nobuyoshi Nakada
``` vm_insnhelper.c:2430:49: error: ISO C prohibits argument conversion to union type [-Wpedantic] 2430 | if (!vm_method_cfunc_is(cd_owner, cd, recv, rb_obj_equal)) { | ^~~~~~~~~~~~ vm_insnhelper.c:2448:42: error: ISO C prohibits argument conversion to union type [-Wpedantic] 2448 | if (cc && check_cfunc(vm_cc_cme(cc), rb_obj_equal)) { | ^~~~~~~~~~~~ ``` and so on.
2024-10-08Cast via `uintptr_t` function pointer between object pointerNobuyoshi Nakada
- ISO C forbids conversion of function pointer to object pointer type - ISO C forbids conversion of object pointer to function pointer type
2024-10-07Revert "Add debugging code to vm_objtostring in ASAN"Peter Zhu
This reverts commit c32fd1b5ed6709dfbed3d19cac881886576e231b. The bug seems to have been fixed with 6acf03618a937f5302fbd3043f9c3420a49f8cb3.
2024-09-25Don't check poisoned for immediatesPeter Zhu
2024-09-25Add debugging code to vm_objtostring in ASANPeter Zhu
To debug this issue on CI: http://ci.rvm.jp/logfiles/brlog.trunk_asan.20240922-002945 Notes: Merged: https://github.com/ruby/ruby/pull/11667
2024-09-05Optimized instruction for Hash#freezeÉtienne Barrié
If a Hash which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_hash_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-09-05Optimized instruction for Array#freezeÉtienne Barrié
If an Array which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_ary_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11406
2024-08-19Avoid hash allocation for certain proc callsJeremy Evans
Previously, proc calls such as: ```ruby proc{|| }.(**empty_hash) proc{|b: 1| }.(**r2k_array_with_empty_hash) ``` both allocated hashes unnecessarily, due to two separate code paths. The first call goes through CALLER_SETUP_ARG/vm_caller_setup_keyword_hash, and is simple to fix by not duping an empty keyword hash that will be dropped. The second case is more involved, in setup_parameters_complex, but is fixed the exact same way as when the ruby2_keywords hash is not empty, by flattening the rest array to the VM stack, ignoring the last element (the empty keyword splat). Add a flatten_rest_array static function to handle this case. Update test_allocation.rb to automatically convert the method call allocation tests to proc allocation tests, at least for the calls that can be converted. With the code changes, all proc call allocation tests pass, showing that proc calls and method calls now allocate the same number of objects. I've audited the allocation tests, and I believe that all of the low hanging fruit has been collected. All remaining allocations are either caller side: * Positional splat + post argument * Multiple positional splats * Literal keywords + keyword splat * Multiple keyword splats Or callee side: * Positional splat parameter * Keyword splat parameter * Keyword to positional argument conversion for methods that don't accept keywords * ruby2_keywords method called with keywords Reapplies abc04e898b627ab37fa9dd5e330f239768778d8b, which was reverted at d56470a27c5a8a2e7aee7a76cea445c2d29c0c59, with the addition of a bug fix and test. Fixes [Bug #20679] Notes: Merged: https://github.com/ruby/ruby/pull/11409 Merged-By: jeremyevans <code@jeremyevans.net>
2024-08-16Revert "Avoid hash allocation for certain proc calls"Jeremy Evans
This reverts commit abc04e898b627ab37fa9dd5e330f239768778d8b. This caused problems in a Rails test. Notes: Merged: https://github.com/ruby/ruby/pull/11394
2024-08-16Stringize VM_ASSERT expression before expansionNobuyoshi Nakada
2024-08-15Avoid hash allocation for certain proc callsJeremy Evans
Previous, proc calls such as: ```ruby proc{|| }.(**empty_hash) proc{|b: 1| }.(**r2k_array_with_empty_hash) ``` both allocated hashes unnecessarily, due to two separate code paths. The first call goes through CALLER_SETUP_ARG/vm_caller_setup_keyword_hash, and is simple to fix by not duping an empty keyword hash that will be dropped. The second case is more involved, in setup_parameters_complex, but is fixed the exact same way as when the ruby2_keywords hash is not empty, by flattening the rest array to the VM stack, ignoring the last element (the empty keyword splat). Add a flatten_rest_array static function to handle this case. Update test_allocation.rb to automatically convert the method call allocation tests to proc allocation tests, at least for the calls that can be converted. With the code changes, all proc call allocation tests pass, showing that proc calls and method calls now allocate the same number of objects. I've audited the allocation tests, and I believe that all of the low hanging fruit has been collected. All remaining allocations are either caller side: * Positional splat + post argument * Multiple positional splats * Literal keywords + keyword splat * Multiple keyword splats Or callee side: * Positional splat parameter * Keyword splat parameter * Keyword to positional argument conversion for methods that don't accept keywords * ruby2_keywords method called with keywords Notes: Merged: https://github.com/ruby/ruby/pull/11258
2024-08-13do not show unused block on `send`Koichi Sasada
Some case it is difficult to know the calling method uses a block or not with `send` on a general framework. So this patch stops showing unused block warning on `send`. example with test/unit: ```ruby require 'test/unit' class T < Test::Unit::TestCase def setup end def test_foo = nil end ``` => /home/ko1/ruby/install/master/lib/ruby/gems/3.4.0+0/gems/test-unit-3.6.2/lib/test/unit/fixture.rb:284: warning: the block passed to 'priority_setup' defined at /home/ko1/ruby/install/master/lib/ruby/gems/3.4.0+0/gems/test-unit-3.6.2/lib/test/unit/priority.rb:183 may be ignored because test/unit can call any setup method (`priority_setup` in this case) with a block. Maybe we can show the warning again when we provide a way to recognize the calling method uses a block or not. Notes: Merged: https://github.com/ruby/ruby/pull/11349
2024-08-10rb_setup_fake_ary: use precomputed flagsJean Boussier
Setting up the fake array is a bit more expensive than would be expected because `rb_ary_freeze` does a lot of checks and lookup a shape transition. If we assume fake arrays will always be frozen, we can precompute the flags state and just assign it. Notes: Merged: https://github.com/ruby/ruby/pull/11344
2024-08-07Make rb_vm_invoke_bmethod() staticAlan Wu
Notes: Merged: https://github.com/ruby/ruby/pull/11331
2024-08-07Tune codegen for rb_yield() calls landing in ISeqsYour Name
Unlike in older revisions in the year, GCC 11 isn't inlining the call to vm_push_frame() inside invoke_iseq_block_from_c() anymore. We do want it to be inlined since rb_yield() speed is fairly important. Logs from -fopt-info-optimized-inline reveal that GCC was blowing its code size budget inlining invoke_block_from_c_bh() into its various callers, leaving suboptimal code for its body. Take away some uses of the `inline` keyword and merge a common tail call to vm_exec() for overall better code. This tweak gives about 18% on a micro benchmark and 1% on the chunky-png benchmark from yjit-bench. I tested on a Skylake server. ``` $ cat c-to-ruby-call.yml benchmark: - 0.upto(10_000_000) {} $ benchmark-driver --chruby '+patch;master' c-to-ruby-call.yml Warming up -------------------------------------- 0.upto(10_000_000) {} 2.299 i/s - 3.000 times in 1.304689s (434.90ms/i) Calculating ------------------------------------- +patch master 0.upto(10_000_000) {} 2.299 1.943 i/s - 6.000 times in 2.609393s 3.088353s Comparison: 0.upto(10_000_000) {} +patch: 2.3 i/s master: 1.9 i/s - 1.18x slower $ ruby run_benchmarks.rb --chruby 'master;+patch' chunky-png <snip> ---------- ----------- ---------- ----------- ---------- -------------- ------------- bench master (ms) stddev (%) +patch (ms) stddev (%) +patch 1st itr master/+patch chunky-png 1156.1 0.1 1142.2 0.2 1.01 1.01 ---------- ----------- ---------- ----------- ---------- -------------- ------------- ``` Notes: Merged: https://github.com/ruby/ruby/pull/11321
2024-08-02Delete unused declarationAlan Wu
2024-07-29Expand opt_newarray_send to support Array#pack with buffer keyword argRandy Stauner
Use an enum for the method arg instead of needing to add an id that doesn't map to an actual method name. $ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)' before: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 newarray 1 0009 putchilledstring "E*" 0011 getlocal_WC_0 b@0 0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG> 0015 leave ``` after: ``` == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] b@0 0000 putchilledstring "x" ( 1)[Li] 0002 setlocal_WC_0 b@0 0004 putself 0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0007 putchilledstring "E*" 0009 getlocal b@0, 0 0012 opt_newarray_send 3, 5 0015 leave ``` Notes: Merged: https://github.com/ruby/ruby/pull/11249
2024-07-14[Bug #20633] Fix the condition for `atomic_signal_fence`kimuraw (Wataru Kimura)
`AC_CHECK_DECLS` defines `HAVE_DECL_SYMBOL` to 1 if declared, 0 otherwise, not undefined.
2024-07-12fix `defined?(@ivar)` with RactorsKoichi Sasada
`defined?(@ivar)` on the non main Ractor has two issues: 1. raising an exception ```ruby class C @iv1 = [] def self.defined_iv1 = defined?(@iv1) end Ractor.new{ p C.defined_iv1 #=> can not get unshareable values from instance variables of classes/modules from non-main Ractors (Ractor::IsolationError) }.take ``` -> Do not raise an exception but return `"instance-variable"` because it is defined. 2. returning `"instance-variable"` if there is not defined. ``` class C # @iv2 is not defined def self.defined_iv2 = defined?(@iv2) end Ractor.new{ p C.defined_iv2 #=> "instance-variable" }.take ``` -> returns `nil`
2024-07-03[Feature #20470] Split GC into gc_impl.cPeter Zhu
This commit splits gc.c into two files: - gc.c now only contains code not specific to Ruby GC. This includes code to mark objects (which the GC implementation may choose not to use) and wrappers for internal APIs that the implementation may need to use (e.g. locking the VM). - gc_impl.c now contains the implementation of Ruby's GC. This includes marking, sweeping, compaction, and statistics. Most importantly, gc_impl.c only uses public APIs in Ruby and a limited set of functions exposed in gc.c. This allows us to build gc_impl.c independently of Ruby and plug Ruby's GC into itself.
2024-07-03Add explicit compiler fence when pushing frames to ensure safe profilingIvo Anjo
**What does this PR do?** This PR tweaks the `vm_push_frame` function to add an explicit compiler fence (`atomic_signal_fence`) to ensure profilers that use signals to interrupt applications (stackprof, vernier, pf2, Datadog profiler) can safely sample from the signal handler. **Motivation:** The `vm_push_frame` was specifically tweaked in https://github.com/ruby/ruby/pull/3296 to initialize the a frame before updating the `cfp` pointer. But since there's nothing stopping the compiler from reordering the initialization of a frame (`*cfp =`) with the update of the cfp pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on the Datadog profiler. In practice, after some experimentation + talking to folks, this reordering does not seem to happen. But since modern compilers have a way for us to exactly tell them not to do the reordering (`atomic_signal_fence`), this seems even better. I've actually extracted `vm_push_frame` into the "Compiler Explorer" website, which you can use to see the assembly output of this function across many compilers and architectures: https://godbolt.org/z/3oxd1446K On that link you can observe two things across many compilers: 1. The compilers are not reordering the writes 2. The barrier does not change the generated assembly output (== has no cost in practice) **Additional Notes:** The checks added in `configure.ac` define two new macros: * `HAVE_STDATOMIC_H` * `HAVE_DECL_ATOMIC_SIGNAL_FENCE` Since Ruby generates an arch-specific `config.h` header with these macros upon installation, this can be used by profilers and other libraries to test if Ruby was compiled with the fence enabled. **How to test the change?** As I mentioned above, you can check https://godbolt.org/z/3oxd1446K to confirm the compiled output of `vm_push_frame` does not change in most compilers (at least all that I've checked on that site).
2024-07-02Fix forwarding for optimized sendeileencodes
Always treat forwarding as a complex call.
2024-07-02Calling into a C func shouldn't fast path when forwardingeileencodes
When we forward calls to C functions if the callsite is a forwarding site it might not always be a splat, so we can't use the fast path. Fixes: [ruby-core:118418]
2024-06-21fix sendfwd with `send` and `method_missing`Koichi Sasada
combination with `send` method (optimized) or `method_missing` and forwarding send (`...`) needs to respect given `rb_forwarding_call_data`. Otherwize it causes critical error such as SEGV.
2024-06-18Deconstruct ci in one placeAaron Patterson
Putting these calls next to each other lets the compiler combine "packed ci" checks
2024-06-18Refactor so we don't have _cdAaron Patterson
This should make the diff more clean
2024-06-18Add two new instructions for forwarding callsAaron Patterson
This commit adds `sendforward` and `invokesuperforward` for forwarding parameters to calls Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
2024-06-18Set a fast path for forwardable iseqsAaron Patterson
2024-06-18Add a CC fastpath for forwardable methodsAaron Patterson
2024-06-18Optimized forwarding callers and calleesAaron Patterson
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(*a) = a def foo(...) list = [1, 2] bar(*list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | 5| delegatee(...) # CI2 (FORWARDING) | 6| end | 7| | 8| def caller | -> 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 -> 4| # | CI1 (argc: 2) 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | CI1 (argc: 2) -> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | self 9| delegator(1, 2) # CI1 (argc: 2) | 1 10| end | 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-03Count uninitialized call cache as miss emptyNobuyoshi Nakada
Fix segfault at start up when `USE_DEBUG_COUNTER` is enabled.
2024-06-02Stop exposing `rb_str_chilled_p`Jean Boussier
[Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings.
2024-05-29Cast to void pointer for -Wformat-pedanticNobuyoshi Nakada
2024-05-23Introduce a specialize instruction for Array#packNobuyoshi Nakada
Instructions for this code: ```ruby # frozen_string_literal: true [a].pack("C") ``` Before this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 newarray 1 0005 putobject "C" 0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE> 0009 leave ``` After this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 putobject "C" 0005 opt_newarray_send 2, :pack 0008 leave ``` Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-04-24We don't need to check if the ci is markable anymoreAaron Patterson
It doesn't matter if CI's are stack allocated or not.
2024-04-24Pass a callinfo object to global call cache searchAaron Patterson
Global call cache can be used with only a CI
2024-04-24Reuse slow path method search for gccctAaron Patterson
This way all code paths use the same search code for finding call caches for a particular method.
2024-04-19`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT`Koichi Sasada
`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT=1 ruby ...` will enable strict check for unused block warning. This option is only for trial to compare the results so the envname is not considered well. Should be removed before Ruby 3.4.0 release.
2024-04-18Remove markable guard before pushing on ccs listAaron Patterson
CCS list doesn't mark CI objects, so it doesn't matter whether or not they are markable before pushing.