| Age | Commit message (Collapse) | Author |
|
Specialize monomorphic `GetIvar` into:
* `GuardType(HeapObject)`
* `GuardShape`
* `LoadIvarEmbedded` or `LoadIvarExtended`
This requires profiling self for `getinstancevariable` (it's not on the operand
stack).
This also optimizes `GetIvar`s that happen as a result of inlining
`attr_reader` and `attr_accessor`.
Also move some (newly) shared JIT helpers into jit.c.
|
|
Using `assume_single_ractor_mode` we can skip all ractor safety
checks if we're in single ractor mode.
```
compare-ruby: ruby 3.5.0dev (2025-08-27T14:58:58Z merge-vm-setivar-d.. 5b749d8e53) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-28T21:23:38Z yjit-get-exivar 3cc21b76d4) +YJIT +PRISM [arm64-darwin24]
| |compare-ruby|built-ruby|
|:--------------------------|-----------:|---------:|
|vm_ivar_get_on_obj | 975.981| 975.772|
| | 1.00x| -|
|vm_ivar_get_on_class | 136.214| 470.912|
| | -| 3.46x|
|vm_ivar_get_on_generic | 148.315| 299.122|
| | -| 2.02x|
```
|
|
The only exception it could raise is if we're in multi ractor mode.
|
|
While accessing the ivars of other types is too complicated to
realistically generate the ASM for it, we can at least provide
the ivar index as to not have to lookup the shape tree every
time.
```
compare-ruby: ruby 3.5.0dev (2025-08-27T14:58:58Z merge-vm-setivar-d.. 5b749d8e53) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-28T17:58:32Z yjit-get-exivar efaa8c9b09) +YJIT +PRISM [arm64-darwin24]
| |compare-ruby|built-ruby|
|:--------------------------|-----------:|---------:|
|vm_ivar_get_on_obj | 930.458| 936.865|
| | -| 1.01x|
|vm_ivar_get_on_class | 134.471| 431.622|
| | -| 3.21x|
|vm_ivar_get_on_generic | 146.679| 284.408|
| | -| 1.94x|
```
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
The `shape_id` now includes 3 bits for the `heap_id`.
It is always non-zero for `T_OBJECT` and always zero for all
other types.
Hence all these allocator checks are no longer necessary.
|
|
|
|
The embed layout is way more common than the heap one,
especially since WVA.
I think it makes for more readable code to inverse the
flag.
|
|
When these instructions were introduced it was common to read from a
hash with mutable string literals. However, these days, I think these
instructions are fairly rare.
I tested this with the lobsters benchmark, and saw no difference in
speed. In order to be sure, I tracked down every use of this
instruction in the lobsters benchmark, and there were only 4 places
where it was used.
Additionally, this patch fixes a case where "chilled strings" should
emit a warning but they don't.
```ruby
class Foo
def self.[](x)= x.gsub!(/hello/, "hi")
end
Foo["hello world"]
```
Removing these instructions shows this warning:
```
> ./miniruby -vw test.rb
ruby 3.5.0dev (2025-08-25T21:36:50Z rm-opt_aref_with dca08e286c) +PRISM [arm64-darwin24]
test.rb:2: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```
[Feature #21553]
|
|
|
|
|
|
It only makes sense for heap objects.
|
|
Previously, YJIT returned truthy for the block given query at the top
level. That's incorrect because the top level script never receives a
block, and `yield` is a syntax error there.
Inside methods, the number of hops to get from `iseq` to
`iseq->body->local_iseq` is the same as the number of
`VM_ENV_PREV_EP(ep)` hops to get to an environment with
`VM_ENV_FLAG_LOCAL`. YJIT and the interpreter both rely on this as can
be seen in get_lvar_level(). However, this identity does not hold for
the top level frame because of vm_set_eval_stack(), which sets up
`TOPLEVEL_BINDING`.
Since only methods can take a block that `yield` goes to, have ISEQs
that are the child of a non-method ISEQ return falsy for the block given
query. This fixes the issue for the top level script and is an
optimization for non-method contexts such as inside `ISEQ_TYPE_CLASS`.
|
|
|
|
* ZJIT: Implement SingleRactorMode invalidation
* ZJIT: Add macro for compiling jumps
* ZJIT: Fix typo in comment
* YJIT: Fix typo in comment
* ZJIT: Avoid using unexported types in zjit.h
`enum ruby_vminsn_type` is declared in `insns.inc` and is not exported.
Using it in `zjit.h` would cause build errors when the file including it
doesn't include `insns.inc`.
|
|
|
|
|
|
It has been marked as obsolete for a while and I see no reason
to keep it.
|
|
|
|
Because we have set all code memory to writable before the reference
updating phase, we can use raw memory writes directly.
|
|
|
|
* YJIT: Side-exit on String#dup when it's not leaf
* Use an enum instead of a macro for bindgen
|
|
This is the second part of making YJIT work with parallel GC.
During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need
to resolve offsets in `Block::gc_obj_offsets` into absolute addresses
before reading or updating the fields. This needs the base address
stored in `VirtualMemory::region_start` which was previously behind a
`RefCell`. When multiple GC threads scan multiple iseq simultaneously
(which is possible for some GC modules such as MMTk), it will panic
because the `RefCell` is already borrowed.
We notice that some fields of `VirtualMemory`, such as `region_start`,
are never modified once `VirtualMemory` is constructed. We change the
type of the field `CodeBlock::mem_block` from `Rc<RefCell<T>>` to
`Rc<T>`, and push the `RefCell` into `VirtualMemory`. We extract
mutable fields of `VirtualMemory` into a dedicated struct
`VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable`
which is a `RefCell<VirtualMemoryMut>`. After this change, methods that
access immutable fields in `VirtualMemory`, particularly `base_ptr()`
which reads `region_start`, will no longer need to borrow any `RefCell`.
Methods that access mutable fields will need to borrow
`VirtualMemory::mutable`, but the number of borrowing operations becomes
strictly fewer than before because borrowing operations previously done
in callers (such as `CodeBlock::write_mem`) are moved into methods of
`VirtualMemory` (such as `VirtualMemory::write_bytes`).
|
|
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC. Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.
This commit makes one part of the change.
We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase. Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`. This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects. It also has performance
overhead due to the frequent invocation of system calls. We now set the
permission of all the code memory in bulk before and after the reference
updating phase. Multiple GC threads can now perform raw memory writes
in parallel. We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
|
|
|
|
Lots of stdlib methods such as Integer#times and Kernel#then use this,
so at least this will make writing tests slightly easier.
|
|
If we have a frozen array `[..., a, ...]` and a compile-time fixnum index `i`,
we can do the array load at compile-time.
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13626
|
|
Now that the shape_id gives us all the same information, it's no
longer needed.
Notes:
Merged: https://github.com/ruby/ruby/pull/13612
|
|
This allow checking if an object has ivars with just a shape_id
mask.
Notes:
Merged: https://github.com/ruby/ruby/pull/13606
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13596
|
|
This behave almost exactly as a T_OBJECT, the layout is entirely
compatible.
This aims to solve two problems.
First, it solves the problem of namspaced classes having
a single `shape_id`. Now each namespaced classext
has an object that can hold the namespace specific
shape.
Second, it open the door to later make class instance variable
writes atomics, hence be able to read class variables
without locking the VM.
In the future, in multi-ractor mode, we can do the write
on a copy of the `fields_obj` and then atomically swap it.
Considerations:
- Right now the `RClass` shape_id is always synchronized,
but with namespace we should likely mark classes that have
multiple namespace with a specific shape flag.
Notes:
Merged: https://github.com/ruby/ruby/pull/13411
|
|
Previously, `asm.mov(m32, imm32)` panicked when `imm32 > 0x80000000`. It
attempted to split imm32 into a register before doing the store, but
then the register size didn't match the destination size.
Instead of splitting, use the `MOV r/m32, imm32` form which works for
all 32-bit values. Adjust asserts that assumed that all forms undergo
sign extension, which is not true for this case.
See: 54edc930f9f0a658da45cfcef46648d1b6f82467
Notes:
Merged: https://github.com/ruby/ruby/pull/13576
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13556
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13524
|
|
Now all flags are only in the `shape_id_t`, and can all be checked
without needing to dereference a pointer.
Notes:
Merged: https://github.com/ruby/ruby/pull/13515
|
|
Instead it's now a `shape_id` flag.
This allows to check if an object is complex without having
to chase the `rb_shape_t` pointer.
Notes:
Merged: https://github.com/ruby/ruby/pull/13511
|
|
Followup: https://github.com/ruby/ruby/pull/13341 / [Feature #21353]
Even thought `shape_id_t` has been make 32bits, we were still limited
to use only the lower 16 bits because they had to fit alongside `attr_index_t`
inside a `uintptr_t` in inline caches.
By enlarging inline caches we can unlock the full 32bits on all
platforms, allowing to use these extra bits for tagging.
Notes:
Merged: https://github.com/ruby/ruby/pull/13500
|
|
Whenever we run into an inline cache miss when we try to set
an ivar, we may need to take the global lock, just to be able to
lookup inside `shape->edges`.
To solve that, when we're in multi-ractor mode, we can treat
the `shape->edges` as immutable. When we need to add a new
edge, we first copy the table, and then replace it with
CAS.
This increases memory allocations, however we expect that
creating new transitions becomes increasingly rare over time.
```ruby
class A
def initialize(bool)
@a = 1
if bool
@b = 2
else
@c = 3
end
end
def test
@d = 4
end
end
def bench(iterations)
i = iterations
while i > 0
A.new(true).test
A.new(false).test
i -= 1
end
end
if ARGV.first == "ractor"
ractors = 8.times.map do
Ractor.new do
bench(20_000_000 / 8)
end
end
ractors.each(&:take)
else
bench(20_000_000)
end
```
The above benchmark takes 27 seconds in Ractor mode on Ruby 3.4,
and only 1.7s with this branch.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/13441
|
|
Previously we used a flag to set whether a module was uninitialized.
When checked whether a class was initialized, we first had to check that
it had a non-zero superclass, as well as that it wasn't BasicObject.
With the advent of namespaces, RCLASS_SUPER is now an expensive
operation, and though we could just check for the prime superclass, we
might as well take this opportunity to use a flag so that we can perform
the initialized check with as few instructions as possible.
It's possible in the future that we could prevent uninitialized classes
from being available to the user, but currently there are a few ways to
do that.
Notes:
Merged: https://github.com/ruby/ruby/pull/13443
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13450
|
|
Further reduce exposure of `rb_shape_t`.
Notes:
Merged: https://github.com/ruby/ruby/pull/13450
|
|
We should avoid conversions from `rb_shape_t *` into `shape_id_t`
outside of `shape.c` as the short term goal is to have `shape_id_t`
contain tags.
Notes:
Merged: https://github.com/ruby/ruby/pull/13448
|
|
```
# frozen_string_ltieral: true
hash["literal"] = value
```
Notes:
Merged: https://github.com/ruby/ruby/pull/13342
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
|
|
As well as `RB_OBJ_SHAPE_ID` -> `rb_obj_shape_id`
and `RSHAPE` is now a simple alias for `rb_shape_lookup`.
I tried to turn all these into `static inline` but I'm having
trouble with `RUBY_EXTERN rb_shape_tree_t *rb_shape_tree_ptr;`
not being exposed as I'd expect.
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
And `rb_shape_get_shape` -> `RB_OBJ_SHAPE`.
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
Also rename it, and change parameters to be consistent with
other transition functions.
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|