| Age | Commit message (Collapse) | Author |
|
Saves one more call to GET_EC()
|
|
|
|
|
|
|
|
Follow up to fix 3b7373fd00a0ba456498a7b7d6de2a47c96434a2.
In that commit, the line number in the first frame was overwritten after
the whole backtrace was created. There was a problem that the line
number was overwritten even if the location was backpatched.
Instead, this commit uses first_lineno if the frame is
VM_FRAME_MAGIC_DUMMY when generating the backtrace.
Before the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:in '<main>': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
After the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:1:in '<main>': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
|
|
|
|
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.
As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.
I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.
My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:
* We only need a hash set rather than a hash table (we only need keys,
not values), and so the full entry can be written as a single VALUE
* As a set we only need lookup/insert/delete, no update
* Delete is only run inside GC so does not need to be atomic (It could
be made concurrent)
* I use rb_vm_barrier for the (rare) table rebuilds (It could be made
concurrent) We VM lock (but don't require other threads to stop) for
table rebuilds, as those are rare
* The conservative garbage collector makes deferred replication easy,
using a T_DATA object
Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.
This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.
Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
Notes:
Merged: https://github.com/ruby/ruby/pull/12921
|
|
The instruction counter is slowing multi-Ractor applications. I had
changed it to use a thread local, but using a thread local is slowing
single threaded applications. This commit only enables the instruction
counter in YJIT stats builds until we can figure out a way to gather the
information with lower overhead.
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/12670
|
|
`rb_vm_insns_count` is a global variable used for reporting YJIT
statistics. It is a counter that tallies the number of interpreter
instructions that have been executed, this way we can approximate how
much time we're spending in YJIT compared to the interpreter.
Unfortunately keeping this statistic means that every instruction
executed in the interpreter loop must increment the counter. Normally
this isn't a problem, but in multi-threaded situations (when Ractors are
used), incrementing this counter can become quite costly due to page
caching issues.
Additionally, since there is no locking when incrementing this global,
the count can't really make sense in a multi-threaded environment.
This commit changes `rb_vm_insns_count` to a thread local. That way each
Ractor has it's own copy of the counter and incrementing the counter
becomes quite cheap. Of course this means that in multi-threaded
situations, the value doesn't really make sense (but it didn't make
sense before because of the lack of locking).
The counter is used for YJIT statistics, and since YJIT is basically
disabled when Ractors are in use, I don't think we care about
inaccuracies (for the time being). We can revisit this counter when we
give YJIT multi-threading support, but for the time being this commit
restores multi-threaded performance.
To test this, I used the benchmark in [Bug #20489].
Here is the performance on Ruby 3.2:
```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
________________________________________________________
Executed in 2.53 secs fish external
usr time 19.86 secs 370.00 micros 19.86 secs
sys time 0.02 secs 320.00 micros 0.02 secs
```
We can see the regression in performance on the master branch:
```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702dafb) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
________________________________________________________
Executed in 24.87 secs fish external
usr time 195.55 secs 0.00 micros 195.55 secs
sys time 0.00 secs 716.00 micros 0.00 secs
```
Here are the stats after this commit:
```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
________________________________________________________
Executed in 2.46 secs fish external
usr time 19.34 secs 381.00 micros 19.34 secs
sys time 0.01 secs 321.00 micros 0.01 secs
```
[Bug #20489]
Notes:
Merged: https://github.com/ruby/ruby/pull/12549
|
|
* Remove 1 allocation in Enumerable#each_with_index
Previously, each call to Enumerable#each_with_index allocates 2
objects, one for the counting index, the other an imemo_ifunc passed
to `self.each` as a block.
Use `struct vm_ifunc::data` to hold the counting index directly to
remove 1 allocation.
* [DOC] Brief summary for usages of `struct vm_ifunc`
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/11331
|
|
This function accepts flags:
RB_NO_KEYWORDS, RB_PASS_KEYWORDS, RB_PASS_CALLED_KEYWORDS:
Works as the same as rb_block_call_kw.
RB_BLOCK_NO_USE_PACKED_ARGS:
The given block ("bl_proc") does not use "yielded_arg" of rb_block_call_func_t.
Instead, the block accesses the yielded arguments via "argc" and "argv".
This flag allows the called method to yield arguments without allocating an Array.
|
|
[Feature #13557]
Setting the backtrace with an array of strings is lossy. The resulting
exception will return nil on `#backtrace_locations`.
By accepting an array of `Backtrace::Location` instance, we can rebuild
a `Backtrace` instance and have a fully functioning Exception.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
|
|
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.
Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.
So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.
`vm->mark_object_ary` is also being refactored.
Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.
This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.
But using a custom TypedData we can save from having to mark
all the references on minor GC runs.
Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
|
|
|
|
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
|
|
This commit moves IO#readline to Ruby. In order to call C functions,
keyword arguments must be converted to hashes. Prior to this commit,
code like `io.readline(chomp: true)` would allocate a hash. This
commits moves the keyword "denaturing" to Ruby, allowing us to send
positional arguments to the C API and avoiding the hash allocation.
Here is an allocation benchmark for the method:
```
x = GC.stat(:total_allocated_objects)
File.open("/usr/share/dict/words") do |f|
f.readline(chomp: true) until f.eof?
end
p ALLOCATIONS: GC.stat(:total_allocated_objects) - x
```
Before this commit, the output was this:
```
$ make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin22-fake ./test.rb
{:ALLOCATIONS=>707939}
```
Now it is this:
```
$ make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin22-fake ./test.rb
{:ALLOCATIONS=>471962}
```
[Bug #19890] [ruby-core:114803]
|
|
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This was used only by MJIT.
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7461
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7459
|
|
Before object shapes, we were using class serial to invalidate
inline caches. Now that we use shape_id for inline cache keys,
the class serial is unnecessary.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/6605
|
|
This patch pushes dummy frames when loading code for the
profiling purpose.
The following methods push a dummy frame:
* `Kernel#require`
* `Kernel#load`
* `RubyVM::InstructionSequence.compile_file`
* `RubyVM::InstructionSequence.load_from_binary`
https://bugs.ruby-lang.org/issues/18559
Notes:
Merged: https://github.com/ruby/ruby/pull/6572
|
|
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).
The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.
This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.
[Feature #18589]
Notes:
Merged: https://github.com/ruby/ruby/pull/5716
|
|
This reverts commits for [Feature #18589]:
* 8008fb7352abc6fba433b99bf20763cf0d4adb38
"Update formatting per feedback"
* 8f6eaca2e19828e92ecdb28b0fe693d606a03f96
"Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b4bead1103267652f8b96b1083573a8
"Finer-grained inline constant cache invalidation"
MSWin builds on AppVeyor have been crashing since the merger.
Notes:
Merged: https://github.com/ruby/ruby/pull/5715
Merged-By: nobu <nobu@ruby-lang.org>
|
|
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends on global counter
end
foo # populate inline cache
foo # hit inline cache
C = 1 # global counter increments, all caches are invalidated
foo # misses inline cache due to `C = 1`
```
Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends constants named "A" and "B"
end
foo # populate inline cache
foo # hit inline cache
C = 1 # caches that depend on the name "C" are invalidated
foo # hits inline cache because IC only depends on "A" and "B"
```
Examples of breaking the new cache:
```ruby
module C
# Breaks `foo` cache because "A" constant is set and the cache in foo depends
# on "A" and "B"
class A; end
end
B = 1
```
We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
Notes:
Merged: https://github.com/ruby/ruby/pull/5433
|
|
In the past, many internal functions are declared in intern.h
under include/ruby directory, because there were no headers for
internal use.
|
|
This check is needed to fix a bug of error_highlight when NameError
occurred in eval'ed code.
https://github.com/ruby/error_highlight/pull/16
The same check for proc/method has been already introduced since
64ac984129a7a4645efe5ac57c168ef880b479b2.
|
|
ast.c: Use kept script_lines data instead of re-open the source file
Notes:
Merged-By: mame <mame@ruby-lang.org>
|
|
These contents are purely implementation details, not worth appearing in
CAPI documents. [ci skip]
Notes:
Merged: https://github.com/ruby/ruby/pull/4815
|
|
RubyVM::AST.of(Thread::Backtrace::Location) returns a node that
corresponds to the location. Typically, the node is a method call, but
not always.
This change also includes iseq's dump/load support of node_ids for each
instructions.
Notes:
Merged: https://github.com/ruby/ruby/pull/4558
|
|
This reverts commit fac2498e0299f13dffe4f09a7dd7657fb49bf643 for
now, due to [Bug #17509], the breakage in the case `super` is
called in `respond_to?`.
Notes:
Merged: https://github.com/ruby/ruby/pull/4057
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4053
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4037
|
|
This avoids recursive checks when the `hash` method of an object
isn't specialized.
Notes:
Merged-By: nurse <naruse@airemix.jp>
|
|
Make the code a bit modern and consistent with some other places.
|
|
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.
To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
Notes:
Merged: https://github.com/ruby/ruby/pull/3903
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3777
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3741
|
|
* `GC.auto_compact=`, `GC.auto_compact` can be used to control when
compaction runs. Setting `auto_compact=` to true will cause
compaction to occurr duing major collections. At the moment,
compaction adds significant overhead to major collections, so please
test first!
[Feature #17176]
|
|
See <https://bugs.ruby-lang.org/issues/16815> for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/3422
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3180
|
|
To fix build failures.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
This shall fix compile errors.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
Since https://github.com/ruby/ruby/pull/2888 this macro is no longer
used in any place.
|
|
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
Notes:
Merged: https://github.com/ruby/ruby/pull/3023
|
|
Split ruby.h
Notes:
Merged-By: shyouhei <shyouhei@ruby-lang.org>
|
|
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|