| Age | Commit message (Collapse) | Author |
|
Push a real iseq in rb_vm_push_frame_fname()
Previously, vm_make_env_each() (used during proc
creation and for the debug inspector C API) picked up the
non-GC-allocated iseq that rb_vm_push_frame_fname() creates,
which led to a SEGV when the GC tried to mark the non GC object.
Put a real iseq imemo instead. Speed should be about the same since
the old code also did a imemo allocation and a malloc allocation.
Real iseq allows ironing out the special-casing of dummy frames in
rb_execution_context_mark() and rb_execution_context_update(). A check
is added to RubyVM::ISeq#eval, though, to stop attempts to run dummy
iseqs.
[Bug #21180]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
Fix use-after-free in constant cache
[Bug #20921]
When we create a cache entry for a constant, the following sequence of
events could happen:
- vm_track_constant_cache is called to insert a constant cache.
- In vm_track_constant_cache, we first look up the ST table for the ID
of the constant. Assume the ST table exists because another iseq also
holds a cache entry for this ID.
- We then insert into this ST table with the iseq_inline_constant_cache.
- However, while inserting into this ST table, it allocates memory, which
could trigger a GC. Assume that it does trigger a GC.
- The GC frees the one and only other iseq that holds a cache entry for
this ID.
- In remove_from_constant_cache, it will appear that the ST table is now
empty because there are no more iseq with cache entries for this ID, so
we free the ST table.
- We complete GC and continue our st_insert. However, this ST table has
been freed so we now have a use-after-free.
This issue is very hard to reproduce, because it requires that the GC runs
at a very specific time. However, we can make it show up by applying this
patch which runs GC right before the st_insert to mimic the st_insert
triggering a GC:
diff --git a/vm_insnhelper.c b/vm_insnhelper.c
index 3cb23f06f0..a93998136a 100644
--- a/vm_insnhelper.c
+++ b/vm_insnhelper.c
@@ -6338,6 +6338,10 @@ vm_track_constant_cache(ID id, void *ic)
rb_id_table_insert(const_cache, id, (VALUE)ics);
}
+ if (id == rb_intern("MyConstant")) rb_gc();
+
st_insert(ics, (st_data_t) ic, (st_data_t) Qtrue);
}
And if we run this script:
Object.const_set("MyConstant", "Hello!")
my_proc = eval("-> { MyConstant }")
my_proc.call
my_proc = eval("-> { MyConstant }")
my_proc.call
We can see that ASAN outputs a use-after-free error:
==36540==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000049528 at pc 0x000102f3ceac bp 0x00016d607a70 sp 0x00016d607a68
READ of size 8 at 0x606000049528 thread T0
#0 0x102f3cea8 in do_hash st.c:321
#1 0x102f3ddd0 in rb_st_insert st.c:1132
#2 0x103140700 in vm_track_constant_cache vm_insnhelper.c:6345
#3 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356
#4 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424
#5 0x1030bc1e0 in vm_exec_core insns.def:263
#6 0x1030b55fc in rb_vm_exec vm.c:2585
#7 0x1030fe0ac in rb_iseq_eval_main vm.c:2851
#8 0x102a82588 in rb_ec_exec_node eval.c:281
#9 0x102a81fe0 in ruby_run_node eval.c:319
#10 0x1027f3db4 in rb_main main.c:43
#11 0x1027f3bd4 in main main.c:68
#12 0x183900270 (<unknown module>)
0x606000049528 is located 8 bytes inside of 56-byte region [0x606000049520,0x606000049558)
freed by thread T0 here:
#0 0x104174d40 in free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54d40)
#1 0x102ada89c in rb_gc_impl_free default.c:8183
#2 0x102ada7dc in ruby_sized_xfree gc.c:4507
#3 0x102ac4d34 in ruby_xfree gc.c:4518
#4 0x102f3cb34 in rb_st_free_table st.c:663
#5 0x102bd52d8 in remove_from_constant_cache iseq.c:119
#6 0x102bbe2cc in iseq_clear_ic_references iseq.c:153
#7 0x102bbd2a0 in rb_iseq_free iseq.c:166
#8 0x102b32ed0 in rb_imemo_free imemo.c:564
#9 0x102ac4b44 in rb_gc_obj_free gc.c:1407
#10 0x102af4290 in gc_sweep_plane default.c:3546
#11 0x102af3bdc in gc_sweep_page default.c:3634
#12 0x102aeb140 in gc_sweep_step default.c:3906
#13 0x102aeadf0 in gc_sweep_rest default.c:3978
#14 0x102ae4714 in gc_sweep default.c:4155
#15 0x102af8474 in gc_start default.c:6484
#16 0x102afbe30 in garbage_collect default.c:6363
#17 0x102ad37f0 in rb_gc_impl_start default.c:6816
#18 0x102ad3634 in rb_gc gc.c:3624
#19 0x1031406ec in vm_track_constant_cache vm_insnhelper.c:6342
#20 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356
#21 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424
#22 0x1030bc1e0 in vm_exec_core insns.def:263
#23 0x1030b55fc in rb_vm_exec vm.c:2585
#24 0x1030fe0ac in rb_iseq_eval_main vm.c:2851
#25 0x102a82588 in rb_ec_exec_node eval.c:281
#26 0x102a81fe0 in ruby_run_node eval.c:319
#27 0x1027f3db4 in rb_main main.c:43
#28 0x1027f3bd4 in main main.c:68
#29 0x183900270 (<unknown module>)
previously allocated by thread T0 here:
#0 0x104174c04 in malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54c04)
#1 0x102ada0ec in rb_gc_impl_malloc default.c:8198
#2 0x102acee44 in ruby_xmalloc gc.c:4438
#3 0x102f3c85c in rb_st_init_table_with_size st.c:571
#4 0x102f3c900 in rb_st_init_table st.c:600
#5 0x102f3c920 in rb_st_init_numtable st.c:608
#6 0x103140698 in vm_track_constant_cache vm_insnhelper.c:6337
#7 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356
#8 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424
#9 0x1030bc1e0 in vm_exec_core insns.def:263
#10 0x1030b55fc in rb_vm_exec vm.c:2585
#11 0x1030fe0ac in rb_iseq_eval_main vm.c:2851
#12 0x102a82588 in rb_ec_exec_node eval.c:281
#13 0x102a81fe0 in ruby_run_node eval.c:319
#14 0x1027f3db4 in rb_main main.c:43
#15 0x1027f3bd4 in main main.c:68
#16 0x183900270 (<unknown module>)
This commit fixes this bug by adding a inserting_constant_cache_id field
to the VM, which stores the ID that is currently being inserted and, in
remove_from_constant_cache, we don't free the ST table for ID equal to
this one.
Co-Authored-By: Alan Wu <alanwu@ruby-lang.org>
|
|
[Bug #20633] Fix the condition for `atomic_signal_fence`
`AC_CHECK_DECLS` defines `HAVE_DECL_SYMBOL` to 1 if declared, 0
otherwise, not undefined.
Co-authored-by: kimuraw (Wataru Kimura) <kimuraw@i.nifty.jp>
|
|
safe profiling (#11090)
**What does this PR do?**
This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.
This is a backport of #11036 to Ruby 3.3 .
**Motivation:**
The `vm_push_frame` was specifically tweaked in
https://github.com/ruby/ruby/pull/3296 to initialize the a frame
before updating the `cfp` pointer.
But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.
In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.
But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.
I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K
On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
(== has no cost in practice)
**Additional Notes:**
The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`
Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.
**How to test the change?**
As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
|
|
This follows the same approach used for attr_reader/attr_writer in
2d98593bf54a37397c6e4886ccc7e3654c2eaf85, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.
Fixes [Bug #18886]
|
|
Since Ruby 2.4, `return` is supported at toplevel. This makes `eval "return"`
also supported at toplevel.
This mostly uses the same tests as direct `return` at toplevel, with a couple
differences:
`END {return if false}` is a SyntaxError, but `END {eval "return" if false}`
is not an error since the eval is never executed. `END {return}` is a
SyntaxError, but `END {eval "return"}` is a LocalJumpError.
The following is a SyntaxError:
```ruby
class X
nil&defined?0--begin e=no_method_error(); return; 0;end
end
```
However, the following is not, because the eval is never executed:
```ruby
class X
nil&defined?0--begin e=no_method_error(); eval "return"; 0;end
end
```
Fixes [Bug #19779]
|
|
The expandarray instruction can allocate an array, which can trigger
a GC compaction. However, since it does not increment the sp until the
end of the instruction, the objects it places on the stack are not
marked or reference updated by the GC, which can cause the objects to
move which leaves broken or incorrect objects on the stack.
This commit changes the instruction to be handles_sp so the sp is
incremented inside of the instruction right after the object is written
on the stack.
|
|
Previously, we didn't invalidate the method entry wrapped by
VM_METHOD_TYPE_REFINED method entries which could cause calls to
land in the wrong method like it did in the included test.
Do the invalidation, and adjust rb_method_entry_clone() to accommodate
this new invalidation vector.
Fix: cfd7729ce7a31c8b6ec5dd0e99c67b2932de4732
See-also: e201b81f79828c30500947fe8c8ea3c515e3d112
|
|
We've seen occasional CI failures on i686 in this codepath:
```
[BUG] vm_setivar_slowpath: didn't find ivar @verify_depth in shape
```
Generic ivars are very complex to get right, but also quite rare.
I don't see a good reason to take the risk to give them an optimized
path here, when the much more common T_CLASS/T_MODULE don't have one.
Having an optimization here means duplicating the fairly brittle
logic, which is a recipe for bugs, and I don't think it's worth
it in such case.
|
|
We're occasionally hitting this bug on CI, it would be useful
to see if the id is consistent.
|
|
This reverts commit 5f3fb4f4e397735783743fe52a7899b614bece20.
|
|
This reverts commit f6910a61122931e4193bcc0fad18d839c319b720.
We're seeing crashes in the test suite of Shopify's core monolith after
this change.
|
|
We don't need to create a shape to transition capacity as we can
transition the capacity when the capacity of the SHAPE_IVAR changes.
|
|
This commit changes generic ivars to respect the capacity transition in
shapes rather than growing the capacity independently.
|
|
When an inline cache misses, it is very likely that the stale shape_id
and the current instance shape_id have a close common ancestor.
For example if the instance variable is sometimes frozen sometimes
not, one of the two shape will be the direct parent of the other.
Another pattern that commonly cause IC misses is "memoization",
in such case the object will have a "base common shape" and then
a number of close descendants.
In addition, when we find a common ancestor, we store it in the
inline cache instead of the current shape. This help prevent the
cache from flip-flopping, ensuring the next lookup will be marginally
faster and more generally avoid writing in memory too much.
However, now that shapes have an ancestors index, we only check
for a few ancestors before falling back to use the index.
So overall this change speeds up what is assumed to be the more common
case, but makes what is assumed to be the less common case a bit slower.
```
compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master 701ca070b4) [arm64-darwin22]
built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22]
warming up......
| |compare-ruby|built-ruby|
|:------------------------------------|-----------:|---------:|
|vm_ivar_stable_shape | 11.672M| 11.679M|
| | -| 1.00x|
|vm_ivar_memoize_unstable_shape | 7.551M| 10.506M|
| | -| 1.39x|
|vm_ivar_memoize_unstable_shape_miss | 11.591M| 11.624M|
| | -| 1.00x|
|vm_ivar_unstable_undef | 9.037M| 7.981M|
| | 1.13x| -|
|vm_ivar_divergent_shape | 8.034M| 6.657M|
| | 1.21x| -|
|vm_ivar_divergent_shape_imbalanced | 10.471M| 9.231M|
| | 1.13x| -|
```
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
|
|
|
On 32-bit systems, we must store the shape ID in the gen_ivtbl to not
lose the shape. If we directly store the ST table into the generic
ivar table, then we lose the shape. This makes it impossible to
determine the shape of the object and whether it is too complex or not.
|
|
There is a handful of call sites where we may transition to
OBJ_TOO_COMPLEX_SHAPE if we just ran out of shapes, but that
weren't handling it properly.
|
|
We don't need to intern "initialize" all the time because we already
have `idInitialize` available
|
|
* YJIT: Fallback opt_getconstant_path for const_missing
* Fix a comment [ci skip]
* Remove a wrapper function
|
|
fix memory leak in vm_method
This introduces a unified reference_count to clarify who is referencing a method.
This also allows us to treat the refinement method as the def owner since it counts itself as a reference
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
|
|
Previously, TestStack#test_machine_stack_size failed pretty consistently
on ARM64 macOS, with Rust code and part of the interpreter used for
per-instruction fallback (rb_vm_invokeblock() and friends) touching the
stack guard page and crashing with SEGV. I've also seen the same test
fail on x64 Linux, though with a different symptom.
Notes:
Merged: https://github.com/ruby/ruby/pull/8443
Merged-By: XrXr
|
|
* YJIT: implement side chain fallback for setlocal to avoid exiting
* Update yjit/src/codegen.rs
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
fix [Feature #19572]
Notes:
Merged: https://github.com/ruby/ruby/pull/8150
|
|
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.
This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.
fix [Bug #18572]
```ruby
# without refinements
class C
def foo = :C
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
_END__
user system total real
master 0.362859 0.002544 0.365403 ( 0.365424)
modified 0.357251 0.000000 0.357251 ( 0.357258)
```
```ruby
# with refinment but without using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
__END__
user system total real
master 0.957182 0.000000 0.957182 ( 0.957212)
modified 0.359228 0.000000 0.359228 ( 0.359238)
```
```ruby
# with using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
using R
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
Notes:
Merged: https://github.com/ruby/ruby/pull/8129
|
|
`struct rb_calling_info::cd` is introduced and `rb_calling_info::ci`
is replaced with it to manipulate the inline cache of iseq while
method invocation process. So that `ci` can be acessed with
`calling->cd->ci`. It adds one indirection but it can be justified
by the following points:
1) `vm_search_method_fastpath()` doesn't need `ci` and also
`vm_call_iseq_setup_normal()` doesn't need `ci`. It means
reducing `cd->ci` access in `vm_sendish()` can make it faster.
2) most of method types need to access `ci` once in theory
so that 1 additional indirection doesn't matter.
Notes:
Merged: https://github.com/ruby/ruby/pull/8129
|
|
`vm_search_super_method()` makes orphan CCs (they are not connected
from ccs) and `cc->cme_` can be collected before without marking.
Notes:
Merged: https://github.com/ruby/ruby/pull/8145
|
|
Implement gen_opt_aref_with
Vm opt_aref_with is available
Test opt_aref_with
Stats for opt_aref_with
Co-authored-by: jhawthorn <jhawthorn@github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f50bc91bae48b3b91edb283bf96cb6b.
Closes ruby/ruby#6428
|
|
RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT.
Notes:
Merged: https://github.com/ruby/ruby/pull/8071
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7983
|
|
We were missing the write barrier for class_value to cref. This should
fix the segv we were seeing in http://ci.rvm.jp/logfiles/brlog.trunk-gc-asserts.20230601-165052
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/7900
|
|
This reverts commit 10621f7cb9a0c70e568f89cce47a02e878af6778.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/7900
|
|
This reverts commit 77d1b082470790c17c24a2f406b4fec5d522636b.
|
|
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/7265
|
|
|
|
Notes:
Merged-By: ioquatix <samuel@codeotaku.com>
|
|
Co-authored-by: Rafael Mendonça França <rafael@franca.dev>
|
|
Prior to this commit, a segmentation fault occurred in `vm_defined`'s
`zsuper` implementation after NULL is returned as `BasicObject`'s superclass.
This fix returns false from `vm_defined` if the superclass is NULL.
For example, the following code resulted in a segfault.
```ruby
class BasicObject
def seg_fault
defined?(super)
end
end
seg_fault
```
|
|
CALLER_ARG_SPLAT is not necessary for method_missing. We just need
to unshift the method name into the arguments.
This optimizes all method_missing calls:
* mm(recv) ~9%
* mm(recv, *args) ~215% for args.length == 200
* mm(recv, *args, **kw) ~55% for args.length == 200
* mm(recv, **kw) ~22%
* mm(recv, kw: 1) ~100%
Note that empty argument splats do get slower with this approach,
by about 30-40%. Other than non-empty argument splats, other
argument splats are faster, with the speedup depending on the
number of arguments.
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
Similar to the bmethod/send optimization, this avoids using
CALLER_ARG_SPLAT if not necessary. As long as the receiver argument
can be shifted off, other arguments are passed through as-is.
This optimizes the following types of calls:
* symproc.(recv) ~5%
* symproc.(recv, *args) ~65% for args.length == 200
* symproc.(recv, *args, **kw) ~45% for args.length == 200
* symproc.(recv, **kw) ~30%
* symproc.(recv, kw: 1) ~100%
Note that empty argument splats do get slower with this approach,
by about 2-3%. This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.
The following types of calls are not optimized:
* symproc.(*args)
* symproc.(*args, **kw)
This is because the you cannot shift the receiver argument off
without first splatting the arg.
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
Similar to the bmethod optimization, this avoids using
CALLER_ARG_SPLAT if not necessary. As long as the method argument
can be shifted off, other arguments are passed through as-is.
This optimizes the following types of calls:
* send(meth, arg) ~5%
* send(meth, *args) ~75% for args.length == 200
* send(meth, *args, **kw) ~50% for args.length == 200
* send(meth, **kw) ~25%
* send(meth, kw: 1) ~115%
Note that empty argument splats do get slower with this approach,
by about 20%. This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.
The following types of calls are not optimized:
* send(*args)
* send(*args, **kw)
This is because the you cannot shift the method argument off
without first splatting the arg.
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
This optimizes the following calls:
* ~10-15% for f(*a) when a does not end with a flagged keywords hash
* ~10-15% for f(*a) when a ends with an empty flagged keywords hash
* ~35-40% for f(*a, **kw) if kw is empty
This still copies the array contents to the VM stack, but avoids some
overhead. It would be faster to use the array pointer directly,
but that could cause problems if the array was modified during
the call to the function. You could do that optimization for frozen
arrays, but as splatting frozen arrays is uncommon, and the speedup
is minimal (<5%), it doesn't seem worth it.
The vm_send_cfunc benchmark has been updated to test additional cfunc
call types, and the numbers above were taken from the benchmark results.
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
Currently, bmethod arguments are copied from the VM stack to the
C stack in vm_call_bmethod, then copied from the C stack to the VM
stack later in invoke_iseq_block_from_c. This is inefficient.
This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod.
vm_call_iseq_bmethod is an optimized method that skips stack
copies (though there is one copy to remove the receiver from
the stack), and avoids calling vm_call_bmethod_body,
rb_vm_invoke_bmethod, invoke_block_from_c_proc,
invoke_iseq_block_from_c, and vm_yield_setup_args.
Th vm_call_iseq_bmethod argument handling is similar to the
way normal iseq methods are called, and allows for similar
performance optimizations when using splats or keywords.
However, even in the no argument case it's still significantly
faster.
A benchmark is added for bmethod calling. In my environment,
it improves bmethod calling performance by 38-59% for simple
bmethod calls, and up to 180% for bmethod calls passing
literal keywords on both sides.
```
./miniruby-iseq-bmethod: 18159792.6 i/s
./miniruby-m: 13174419.1 i/s - 1.38x slower
bmethod_simple_1
./miniruby-iseq-bmethod: 15890745.4 i/s
./miniruby-m: 10008972.7 i/s - 1.59x slower
bmethod_simple_0_splat
./miniruby-iseq-bmethod: 13142804.3 i/s
./miniruby-m: 11168595.2 i/s - 1.18x slower
bmethod_simple_1_splat
./miniruby-iseq-bmethod: 12375791.0 i/s
./miniruby-m: 8491140.1 i/s - 1.46x slower
bmethod_no_splat
./miniruby-iseq-bmethod: 10151258.8 i/s
./miniruby-m: 8716664.1 i/s - 1.16x slower
bmethod_0_splat
./miniruby-iseq-bmethod: 8138802.5 i/s
./miniruby-m: 7515600.2 i/s - 1.08x slower
bmethod_1_splat
./miniruby-iseq-bmethod: 8028372.7 i/s
./miniruby-m: 5947658.6 i/s - 1.35x slower
bmethod_10_splat
./miniruby-iseq-bmethod: 6953514.1 i/s
./miniruby-m: 4840132.9 i/s - 1.44x slower
bmethod_100_splat
./miniruby-iseq-bmethod: 5287288.4 i/s
./miniruby-m: 2243218.4 i/s - 2.36x slower
bmethod_kw
./miniruby-iseq-bmethod: 8931358.2 i/s
./miniruby-m: 3185818.6 i/s - 2.80x slower
bmethod_no_kw
./miniruby-iseq-bmethod: 12281287.4 i/s
./miniruby-m: 10041727.9 i/s - 1.22x slower
bmethod_kw_splat
./miniruby-iseq-bmethod: 5618956.8 i/s
./miniruby-m: 3657549.5 i/s - 1.54x slower
```
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
SystemStackError
Originally, when 2e7bceb34ea858649e1f975a934ce1894d1f06a6 fixed cfuncs to no
longer use the VM stack for large array splats, it was thought to have fully
fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs)
back in Ruby 2.2.
After additional research, I determined that same issue affects almost all
types of method calls, not just iseq and cfunc calls. There were two main
types of remaining issues, important cases (where large array splat should
work) and pedantic cases (where large array splat raised SystemStackError
instead of ArgumentError).
Important cases:
```ruby
define_method(:a){|*a|}
a(*1380888.times)
def b(*a); end
send(:b, *1380888.times)
:b.to_proc.call(self, *1380888.times)
def d; yield(*1380888.times) end
d(&method(:b))
def self.method_missing(*a); end
not_a_method(*1380888.times)
```
Pedantic cases:
```ruby
def a; end
a(*1380888.times)
def b(_); end
b(*1380888.times)
def c(_=nil); end
c(*1380888.times)
c = Class.new do
attr_accessor :a
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
c = Struct.new(:a) do
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
```
This patch fixes all usage of CALLER_SETUP_ARG with splatting a large
number of arguments, and required similar fixes to use a temporary
hidden array in three other cases where the VM would use the VM stack
for handling a large number of arguments. However, it is possible
there may be additional cases where splatting a large number
of arguments still causes a SystemStackError.
This has a measurable performance impact, as it requires additional
checks for a large number of arguments in many additional cases.
This change is fairly invasive, as there were many different VM
functions that needed to be modified to support this. To avoid
too much API change, I modified struct rb_calling_info to add a
heap_argv member for storing the array, so I would not have to
thread it through many functions. This struct is always stack
allocated, which helps ensure sure GC doesn't collect it early.
Because of how invasive the changes are, and how rarely large
arrays are actually splatted in Ruby code, the existing test/spec
suites are not great at testing for correct behavior. To try to
find and fix all issues, I tested this in CI with
VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used
for all array splat method calls. This was very helpful in
finding breaking cases, especially ones involving flagged keyword
hashes.
Fixes [Bug #4040]
Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/7522
|
|
This commit implements opt_newarray_send along with min / max / hash for
stack allocated arrays
Notes:
Merged: https://github.com/ruby/ruby/pull/6090
|
|
This commit introduces a new instruction `opt_newarray_send` which is
used when there is an array literal followed by either the `hash`,
`min`, or `max` method.
```
[a, b, c].hash
```
Will emit an `opt_newarray_send` instruction. This instruction falls
back to a method call if the "interested" method has been monkey
patched.
Here are some examples of the instructions generated:
```
$ ./miniruby --dump=insns -e '[@a, @b].max'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :max
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].min'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :min
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].hash'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :hash
0009 leave
```
[Feature #18897] [ruby-core:109147]
Co-authored-by: John Hawthorn <jhawthorn@github.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/6090
|