ruby.git/internal/vm.h, branch v4.0.2

thead_sync.c: directly pass the execution context to yield

2025-12-12T09:08:05+00:00

Saves one more call to GET_EC()

Add `rb_eval_cmd_call_kw` to shortcut

2025-12-04T09:07:49+00:00

ZJIT: Implement side exit stats (#14357)

2025-08-27T17:01:07+00:00

ZJIT: Add --zjit-stats (#14034)

2025-07-29T17:00:15+00:00

Change how to correct the first lineno in the backtrace on ArgumentError

2025-06-24T02:39:58+00:00

Follow up to fix 3b7373fd00a0ba456498a7b7d6de2a47c96434a2.
In that commit, the line number in the first frame was overwritten after
the whole backtrace was created. There was a problem that the line
number was overwritten even if the location was backpatched.

Instead, this commit uses first_lineno if the frame is
VM_FRAME_MAGIC_DUMMY when generating the backtrace.

Before the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:in '': wrong number of arguments (given 1, expected 0) (ArgumentError)
        from -e:1:in 'Enumerable#inject'
        from -e:1:in ''
```

After the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:1:in '': wrong number of arguments (given 1, expected 0) (ArgumentError)
        from -e:1:in 'Enumerable#inject'
        from -e:1:in ''
```

namespace on read

2025-05-11T14:32:50+00:00

Lock-free hash set for fstrings [Feature #21268]

2025-04-18T04:03:54+00:00

This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).

Only count VM instructions in YJIT stats builds

2025-02-14T19:39:35+00:00

The instruction counter is slowing multi-Ractor applications.  I had
changed it to use a thread local, but using a thread local is slowing
single threaded applications.  This commit only enables the instruction
counter in YJIT stats builds until we can figure out a way to gather the
information with lower overhead.

Co-authored-by: Randy Stauner

Make rb_vm_insns_count a thread local variable

2025-01-10T21:39:21+00:00

`rb_vm_insns_count` is a global variable used for reporting YJIT
statistics. It is a counter that tallies the number of interpreter
instructions that have been executed, this way we can approximate how
much time we're spending in YJIT compared to the interpreter.

Unfortunately keeping this statistic means that every instruction
executed in the interpreter loop must increment the counter. Normally
this isn't a problem, but in multi-threaded situations (when Ractors are
used), incrementing this counter can become quite costly due to page
caching issues.

Additionally, since there is no locking when incrementing this global,
the count can't really make sense in a multi-threaded environment.

This commit changes `rb_vm_insns_count` to a thread local. That way each
Ractor has it's own copy of the counter and incrementing the counter
becomes quite cheap. Of course this means that in multi-threaded
situations, the value doesn't really make sense (but it didn't make
sense before because of the lack of locking).

The counter is used for YJIT statistics, and since YJIT is basically
disabled when Ractors are in use, I don't think we care about
inaccuracies (for the time being). We can revisit this counter when we
give YJIT multi-threading support, but for the time being this commit
restores multi-threaded performance.

To test this, I used the benchmark in [Bug #20489].

Here is the performance on Ruby 3.2:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in 2.53 secs fish external
usr time 19.86 secs 370.00 micros 19.86 secs
sys time 0.02 secs 320.00 micros 0.02 secs
```

We can see the regression in performance on the master branch:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702dafb) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in 24.87 secs fish external
usr time 195.55 secs 0.00 micros 195.55 secs
sys time 0.00 secs 716.00 micros 0.00 secs
```

Here are the stats after this commit:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in 2.46 secs fish external
usr time 19.34 secs 381.00 micros 19.34 secs
sys time 0.01 secs 321.00 micros 0.01 secs
```

[Bug #20489]

Remove 1 allocation in Enumerable#each_with_index (#11868)

2024-10-11T14:22:44+00:00

* Remove 1 allocation in Enumerable#each_with_index

Previously, each call to Enumerable#each_with_index allocates 2
objects, one for the counting index, the other an imemo_ifunc passed
to `self.each` as a block.

Use `struct vm_ifunc::data` to hold the counting index directly to
remove 1 allocation.

* [DOC] Brief summary for usages of `struct vm_ifunc`