| Age | Commit message (Collapse) | Author |
|
Co-authored-by: Luke Gruber <luke.gru@gmail.com>
|
|
This is easier to access as ec->ractor_id instead of pointer-chasing through
ec->thread->ractor->ractor_id
Co-authored-by: Luke Gruber <luke.gru@gmail.com>
|
|
Before this change, GC'ing any Ractor object caused you to lose all
enabled tracepoints across all ractors (even main). Now tracepoints are
ractor-local and this doesn't happen. Internal events are still global.
Fixes [Bug #19112]
|
|
Since it now live in the EC.
|
|
Mutexes spend a significant amount of time in `rb_fiber_serial`
because it can't be inlined (except with LTO).
The fiber struct is opaque the so function can't be defined as inlineable.
Ideally the while fiber struct would not be opaque to the rest of
Ruby core, but it's tricky to do.
Instead we can store the fiber serial in the execution context
itself, and make its access cheaper:
```
$ hyperfine './miniruby-baseline --yjit /tmp/mut.rb' './miniruby-inline-serial --yjit /tmp/mut.rb'
Benchmark 1: ./miniruby-baseline --yjit /tmp/mut.rb
Time (mean ± σ): 4.011 s ± 0.084 s [User: 3.977 s, System: 0.011 s]
Range (min … max): 3.950 s … 4.245 s 10 runs
Benchmark 2: ./miniruby-inline-serial --yjit /tmp/mut.rb
Time (mean ± σ): 3.495 s ± 0.150 s [User: 3.448 s, System: 0.009 s]
Range (min … max): 3.340 s … 3.869 s 10 runs
Summary
./miniruby-inline-serial --yjit /tmp/mut.rb ran
1.15 ± 0.05 times faster than ./miniruby-baseline --yjit /tmp/mut.rb
```
```ruby
i = 10_000_000
mut = Mutex.new
while i > 0
i -= 1
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
mut.synchronize { }
end
```
|
|
|
|
|
|
Internal tracepoints only make sense to run globally, and they already
took completely different paths.
|
|
Fix race between timer thread dequeuing waiting thread and thread
skipping sleeping due to being dequeued. We now use `th->event_serial` which
is protected by `thread_sched_lock`. When a thread is put on timer thread's waiting
list, the event serial is saved on the item. The timer thread checks
that the saved serial is the same as current thread's serial before
calling `thread_sched_to_ready`.
The following script (taken from a test in `test_thread.rb` used to crash on
scheduler debug assertions. It would likely crash in non-debug mode as well.
```ruby
def assert_nil(val)
if val != nil
raise "Expected #{val} to be nil"
end
end
def assert_equal(expected, actual)
if expected != actual
raise "Expected #{expected} to be #{actual}"
end
end
def test_join2
ok = false
t1 = Thread.new { ok = true; sleep }
Thread.pass until ok
Thread.pass until t1.stop?
t2 = Thread.new do
Thread.pass while ok
t1.join(0.01)
end
t3 = Thread.new do
ok = false
t1.join
end
assert_nil(t2.value)
t1.wakeup
assert_equal(t1, t3.value)
ensure
t1&.kill&.join
t2&.kill&.join
t3&.kill&.join
end
rs = 30.times.map do
Ractor.new do
test_join2
end
end
rs.each(&:join)
```
|
|
|
|
This will make reading the parameters nicer for the JITs. Should be
no-op for the C side.
|
|
Rename to `VM_KW_SPECIFIED_BITS_MAX` now that it's in `vm_core.h`.
|
|
|
|
|
|
|
|
|
|
This is due to the way MMTK frees objects, which is on another native thread.
Due to this, there's no `ec` so we can't grab the VM Lock.
This was causing issues in release builds of MMTK on CI like:
```
/home/runner/work/ruby/ruby/build/ruby(sigsegv+0x46) [0x557905117ef6] ../src/signal.c:948
/lib/x86_64-linux-gnu/libc.so.6(0x7f555f845330) [0x7f555f845330]
/home/runner/work/ruby/ruby/build/ruby(rb_ec_thread_ptr+0x0) [0x5579051d59d5] ../src/vm_core.h:2087
/home/runner/work/ruby/ruby/build/ruby(rb_ec_ractor_ptr) ../src/vm_core.h:2036
/home/runner/work/ruby/ruby/build/ruby(rb_current_execution_context) ../src/vm_core.h:2105
/home/runner/work/ruby/ruby/build/ruby(rb_current_ractor_raw) ../src/vm_core.h:2104
/home/runner/work/ruby/ruby/build/ruby(rb_current_ractor) ../src/vm_core.h:2112
/home/runner/work/ruby/ruby/build/ruby(rb_current_ractor) ../src/vm_core.h:2110
/home/runner/work/ruby/ruby/build/ruby(vm_locked) ../src/vm_sync.c:15
/home/runner/work/ruby/ruby/build/ruby(rb_vm_lock_enter_body) ../src/vm_sync.c:141
/home/runner/work/ruby/ruby/build/ruby(rb_vm_lock_enter+0xa) [0x557905390a5a] ../src/vm_sync.h:76
/home/runner/work/ruby/ruby/build/ruby(fiber_pool_stack_release) ../src/cont.c:777
/home/runner/work/ruby/ruby/build/ruby(fiber_stack_release+0xe) [0x557905392075] ../src/cont.c:919
/home/runner/work/ruby/ruby/build/ruby(cont_free) ../src/cont.c:1087
/home/runner/work/ruby/ruby/build/ruby(fiber_free) ../src/cont.c:1180
```
This would have ran into an assertion error in a debug build but we don't run debug builds of MMTK on Github's CI.
Co-authored-by: john.hawthorn@shopify.com
|
|
|
|
We allocate the stack of the main thread using malloc, but we never set
malloc_stack to true and context_stack. If we fork, the main thread may
no longer be the main thread anymore so it reports memory being leaked
in RUBY_FREE_AT_EXIT.
This commit allows the main thread to free its own VM stack at shutdown.
|
|
|
|
to fix inconsistent and wrong current namespace detections.
This includes:
* Moving load_path and related things from rb_vm_t to rb_namespace_t to simplify
accessing those values via namespace (instead of accessing either vm or ns)
* Initializing root_namespace earlier and consolidate builtin_namespace into root_namespace
* Adding VM_FRAME_FLAG_NS_REQUIRE for checkpoints to detect a namespace to load/require files
* Removing implicit refinements in the root namespace which was used to determine
the namespace to be loaded (replaced by VM_FRAME_FLAG_NS_REQUIRE)
* Removing namespaces from rb_proc_t because its namespace can be identified by lexical context
* Starting to use ep[VM_ENV_DATA_INDEX_SPECVAL] to store the current namespace when
the frame type is MAGIC_TOP or MAGIC_CLASS (block handlers don't exist in this case)
|
|
rb_profile_frames() is used by profilers in a way such that it can run
on any instruction in the binary, and it crashed previously in the
following situation in `RUBY_DEBUG` builds:
```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x00000001002827f0 miniruby`vm_make_env_each(ec=0x0000000101866b00, cfp=0x000000080c91bee8) at vm.c:992:74
989 }
990
991 vm_make_env_each(ec, prev_cfp);
-> 992 VM_FORCE_WRITE_SPECIAL_CONST(&ep[VM_ENV_DATA_INDEX_SPECVAL], VM_GUARDED_PREV_EP(prev_cfp->ep));
993 }
994 }
995 else {
(lldb) call rb_profile_frames(0, 100, $2, $3)
/Users/alan/ruby/vm_core.h:1448: Assertion Failed: VM_ENV_FLAGS:FIXNUM_P(flags)
ruby 3.5.0dev (2025-09-23T20:20:04Z master 06b7a70837) +PRISM [arm64-darwin25]
-- Crash Report log information --------------------------------------------
See Crash Report log file in one of the following locations:
* ~/Library/Logs/DiagnosticReports
* /Library/Logs/DiagnosticReports
for more details.
Don't forget to include the above Crash Report log file in bug reports.
-- Control frame information -----------------------------------------------
c:0008 p:---- s:0029 e:000028 CFUNC :lambda
/Users/alan/ruby/vm_core.h:1448: Assertion Failed: VM_ENV_FLAGS:FIXNUM_P(flags)
ruby 3.5.0dev (2025-09-23T20:20:04Z master 06b7a70837) +PRISM [arm64-darwin25]
-- Crash Report log information --------------------------------------------
<snip>
```
There is a small window where the control frame is invalid and fails the
assert.
The double crash also shows that in `RUBY_DEBUG` builds, the crash reporter was
previously not resilient to corrupt frame state. In release builds, it
prints more info.
Add unchecked APIs for the crash reporter and profilers so they work
as well in `RUBY_DEBUG` builds as non-debug builds.
|
|
call-seq:
Ractor.sharable_proc(self: nil){} -> sharable proc
It returns shareable Proc object. The Proc object is
shareable and the self in a block will be replaced with
the value passed via `self:` keyword.
In a shareable Proc, the outer variables should
* (1) refer shareable objects
* (2) be not be overwritten
```ruby
a = 42
Ractor.shareable_proc{ p a }
#=> OK
b = 43
Ractor.shareable_proc{ p b; b = 44 }
#=> Ractor::IsolationError because 'b' is reassigned in the block.
c = 44
Ractor.shareable_proc{ p c }
#=> Ractor::IsolationError because 'c' will be reassigned outside of the block.
c = 45
d = 45
d = 46 if cond
Ractor.shareable_proc{ p d }
#=> Ractor::IsolationError because 'd' was reassigned outside of the block.
```
The last `d`'s case can be relaxed in a future version.
The above check will be done in a static analysis at compile time,
so the reflection feature such as `Binding#local_varaible_set`
can not be detected.
```ruby
e = 42
shpr = Ractor.shareable_proc{ p e } #=> OK
binding.local_variable_set(:e, 43)
shpr.call #=> 42 (returns captured timing value)
```
Ractor.sharaeble_lambda is also introduced.
[Feature #21550]
[Feature #21557]
|
|
Add the macro `RB_THREAD_CURRENT_EC_NOINLINE` to manage the condition to use
no-inline version rb_current_ec for a better maintainability.
Note that the `vm_core.h` includes the `THREAD_IMPL_H` by the
`#include THREAD_IMPL_H`. The `THREAD_IMPL_H` can be `thread_none.h`,
`thread_pthread.h` or `thread_win32.h` according to the
`tool/m4/ruby_thread.m4` creating the `THREAD_IMPL_H`.
The change in this commit only defining the `RB_THREAD_CURRENT_EC_NOINLINE` in
the `thread_pthread.h` is okay in this situation. Because in the
`thread_none.h` case, the thread feature is not used at all, including
Thread-Local Storage (TLS), and in the `thread_win32.h` case, the
`RB_THREAD_LOCAL_SPECIFIER` is not defined. In the `thread_pthread.h` case, the
`RB_THREAD_LOCAL_SPECIFIER` is defined in the `configure.ac`. In the
`thread_none.h` case, the `RB_THREAD_LOCAL_SPECIFIER` is defined in the
`thread_none.h`.
|
|
|
|
This commit fixes the failures in bootstraptest/test_ractor.rb reported on
the issue ticket <https://bugs.ruby-lang.org/issues/21534>.
TLS (Thread-Local Storage) may not be accessed across .so on ppc64le too.
I am not sure about that. The comment "// TLS can not be accessed across
.so on ..." in this commit comes from the following commit.
https://github.com/ruby/ruby/commit/319afed20fba8f9b44611d16e4930260f7b56b86#diff-408391c43b2372cfe1fefb3e1c2531df0184ed711f46d229b08964ec9e8fa8c7R118
> // on Darwin, TLS can not be accessed across .so`
This failures only happened when configuring with cppflags=-DRUBY_DEBUG and -O3
on ppc64le.
The reproducing steps were below.
```
$ ./autogen.sh
$ ./configure -C --disable-install-doc cppflags=-DRUBY_DEBUG
$ make -j4
$ make btest BTESTS=bootstraptest/test_ractor.rb
...
FAIL 2/147 tests failed
make: *** [uncommon.mk:913: yes-btest] Error 1
```
The steps with a reproducing script based on the `bootstraptest/test_ractor.rb`
were below.
```
$ cat test_ractor_1.rb
counts = []
counts << Ractor.count
p counts.inspect
ractors = (1..2).map { Ractor.new { Ractor.receive } }
counts << Ractor.count
p counts.inspect
ractors[0].send('End 0').join
sleep 0.1 until ractors[0].inspect =~ /terminated/
counts << Ractor.count
p counts.inspect
ractors[1].send('End 1').join
sleep 0.1 until ractors[1].inspect =~ /terminated/
counts << Ractor.count
p counts.inspect
$ make run TESTRUN_SCRIPT=test_ractor_1.rb
...
vm_core.h:2017: Assertion Failed: rb_current_execution_context:ec == rb_current_ec_noinline()
...
```
The assertion failure happened at the following line.
https://github.com/ruby/ruby/blob/f3206cc79bec2fd852e81ec56de59f0a67ab32b7/vm_core.h#L2017
This fix is similar with the following commit for the arm64.
https://github.com/ruby/ruby/commit/f7059af50a31a4d27a04ace0beadb60616f3f971
Fixes [Bug #21534]
|
|
|
|
There is a high likelyhood that `rb_obj_fields` is called
consecutively for the same object.
If we keep a cache of the last IMEMO/fields we interacted with,
we can save having to lookup the `gen_fields_tbl`, synchronize
the VM lock, etc.
On yjit-bench's, I instrumented the hit rate of this cache at:
- `shipit`: 38%, with 111k hits.
- `lobsters`: 59%, with 367k hits.
- `rubocop`: 100% with only 300 hits.
I also ran a micro-benchmark which shows that ivar access is:
- 1.25x faster when the cache is hit in single ractor mode.
- 2x faster when the cache is hit in multi ractor mode.
- 1.06x slower when the cache miss in single ractor mode.
- 1.01x slower when the cache miss in multi ractor mode.
```yml
prelude: |
class GenIvar < Array
def initialize(...)
super
@iv = 1
end
attr_reader :iv
end
a = GenIvar.new
b = GenIvar.new
benchmark:
hit: a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv;
miss: a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv;
```
Single ractor:
```
compare-ruby: ruby 3.5.0dev (2025-08-12T02:14:57Z master 428937a536) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-12T09:25:35Z gen-fields-cache 9456c35893) +YJIT +PRISM [arm64-darwin24]
warming up..
| |compare-ruby|built-ruby|
|:-----|-----------:|---------:|
|hit | 4.090M| 5.121M|
| | -| 1.25x|
|miss | 3.756M| 3.534M|
| | 1.06x| -|
```
Multi-ractor:
```
compare-ruby: ruby 3.5.0dev (2025-08-12T02:14:57Z master 428937a536) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-12T09:25:35Z gen-fields-cache 9456c35893) +YJIT +PRISM [arm64-darwin24]
warming up..
| |compare-ruby|built-ruby|
|:-----|-----------:|---------:|
|hit | 2.205M| 4.460M|
| | -| 2.02x|
|miss | 2.117M| 2.094M|
| | 1.01x| -|
```
|
|
This introduces a new method to encapsulate checking whether the current
Ractor owns the vm->ractor.sync lock. This allows us to disable TSan on
it since that operation should be safe, and still get validation of
other uses.
|
|
|
|
Fixes [Bug #21201]
This change addresses a performance regression where defining methods
inside `refine` blocks caused severe slowdowns. The issue was due to
`rb_clear_all_refinement_method_cache()` triggering a full object
space scan via `rb_objspace_each_objects` to find and invalidate
affected callcaches, which is very inefficient.
To fix this, I introduce `vm->cc_refinement_table` to track
callcaches related to refinements. This allows us to invalidate
only the necessary callcaches without scanning the entire heap,
resulting in significant performance improvement.
Notes:
Merged: https://github.com/ruby/ruby/pull/13077
|
|
|
|
Followup: https://github.com/ruby/ruby/pull/13341 / [Feature #21353]
Even thought `shape_id_t` has been make 32bits, we were still limited
to use only the lower 16 bits because they had to fit alongside `attr_index_t`
inside a `uintptr_t` in inline caches.
By enlarging inline caches we can unlock the full 32bits on all
platforms, allowing to use these extra bits for tagging.
Notes:
Merged: https://github.com/ruby/ruby/pull/13500
|
|
* Added `Ractor::Port`
* `Ractor::Port#receive` (support multi-threads)
* `Rcator::Port#close`
* `Ractor::Port#closed?`
* Added some methods
* `Ractor#join`
* `Ractor#value`
* `Ractor#monitor`
* `Ractor#unmonitor`
* Removed some methods
* `Ractor#take`
* `Ractor.yield`
* Change the spec
* `Racotr.select`
You can wait for multiple sequences of messages with `Ractor::Port`.
```ruby
ports = 3.times.map{ Ractor::Port.new }
ports.map.with_index do |port, ri|
Ractor.new port,ri do |port, ri|
3.times{|i| port << "r#{ri}-#{i}"}
end
end
p ports.each{|port| pp 3.times.map{port.receive}}
```
In this example, we use 3 ports, and 3 Ractors send messages to them respectively.
We can receive a series of messages from each port.
You can use `Ractor#value` to get the last value of a Ractor's block:
```ruby
result = Ractor.new do
heavy_task()
end.value
```
You can wait for the termination of a Ractor with `Ractor#join` like this:
```ruby
Ractor.new do
some_task()
end.join
```
`#value` and `#join` are similar to `Thread#value` and `Thread#join`.
To implement `#join`, `Ractor#monitor` (and `Ractor#unmonitor`) is introduced.
This commit changes `Ractor.select()` method.
It now only accepts ports or Ractors, and returns when a port receives a message or a Ractor terminates.
We removes `Ractor.yield` and `Ractor#take` because:
* `Ractor::Port` supports most of similar use cases in a simpler manner.
* Removing them significantly simplifies the code.
We also change the internal thread scheduler code (thread_pthread.c):
* During barrier synchronization, we keep the `ractor_sched` lock to avoid deadlocks.
This lock is released by `rb_ractor_sched_barrier_end()`
which is called at the end of operations that require the barrier.
* fix potential deadlock issues by checking interrupts just before setting UBF.
https://bugs.ruby-lang.org/issues/21262
Notes:
Merged: https://github.com/ruby/ruby/pull/13445
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13425
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13357
|
|
Notes:
Merged-By: ioquatix <samuel@codeotaku.com>
|
|
in same ractor
Rework ractors so that any ractor action (Ractor.receive, Ractor#send, Ractor.yield, Ractor#take,
Ractor.select) will operate on the thread that called the action. It will put that thread to sleep if
it's a blocking function and it needs to put it to sleep, and the awakening action (Ractor.yield,
Ractor#send) will wake up the blocked thread.
Before this change every blocking ractor action was associated with the ractor struct and its fields.
If a ractor called Ractor.receive, its wait status was wait_receiving, and when another ractor calls
r.send on it, it will look for that status in the ractor struct fields and wake it up. The problem was that
what if 2 threads call blocking ractor actions in the same ractor. Imagine if 1 thread has called Ractor.receive
and another r.take. Then, when a different ractor calls r.send on it, it doesn't know which ruby thread is associated
to which ractor action, so what ruby thread should it schedule? This change moves some fields onto the ruby thread
itself so that ruby threads are the ones that have ractor blocking statuses, and threads are then specifically scheduled
when unblocked.
Fixes [#17624]
Fixes [#21037]
Notes:
Merged: https://github.com/ruby/ruby/pull/12633
|
|
- `rb_thread_fd_close` is deprecated and now a no-op.
- IO operations (including close) no longer take a vm-wide lock.
Notes:
Merged-By: ioquatix <samuel@codeotaku.com>
|
|
|
|
When doing a coroutine transfer from one thread to another, there's a
risk that the compiler will reuse an address from TLS before the
transfer to the new thread.
These VM assertions are all in places we would not otherwise be reading
from TLS, but using the value of `ec` or `cr` passed in. Switching these
to test against rb_current_ec_noinline() instead ensures there isn't an
optimization applied to how we read ruby_current_ec.
Currently it seems we were hitting this on LLVM 18 specifically, but I
don't know of any reason other versions wouldn't have the same issue.
Notes:
Merged: https://github.com/ruby/ruby/pull/13217
|
|
Now that we have a hash-set implementation we can use that
instead of a hash-table with a static value.
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13131
|
|
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.
As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.
I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.
My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:
* We only need a hash set rather than a hash table (we only need keys,
not values), and so the full entry can be written as a single VALUE
* As a set we only need lookup/insert/delete, no update
* Delete is only run inside GC so does not need to be atomic (It could
be made concurrent)
* I use rb_vm_barrier for the (rare) table rebuilds (It could be made
concurrent) We VM lock (but don't require other threads to stop) for
table rebuilds, as those are rare
* The conservative garbage collector makes deferred replication easy,
using a T_DATA object
Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.
This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.
Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
Notes:
Merged: https://github.com/ruby/ruby/pull/12921
|
|
It looks like stat_insn_usage was introduced with YARV, but as far as I
can tell the field has never been used. I think we should remove the
field since we don't use it.
Notes:
Merged: https://github.com/ruby/ruby/pull/13100
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13044
|
|
This reverts the following commits as it's causing OOM in some cases in
ruby/ruby.wasm test suite.
* 372515f33c908b36b3f5fbd2edcb34c69b418500
* 3a730be8b464454878a42132f6fecb98ab4c1b5b
Notes:
Merged: https://github.com/ruby/ruby/pull/13026
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/12995
|
|
Also, Binding#local_variable_get and #local_variable_set rejects an
access to numbered parameters.
[Bug #20965] [Bug #21049]
Notes:
Merged: https://github.com/ruby/ruby/pull/12746
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/12740
|