<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/yjit/src/asm/mod.rs, branch v4.0.4</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>YJIT: Tiny refactors (#14505)</title>
<updated>2025-09-10T20:37:17+00:00</updated>
<author>
<name>Stan Lo</name>
<email>stan.lo@shopify.com</email>
</author>
<published>2025-09-10T20:37:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=23c60e185ea7e74f0eebc27119184fb1ed844856'/>
<id>23c60e185ea7e74f0eebc27119184fb1ed844856</id>
<content type='text'>
Addressed some suggestions from clippy that made sense to me.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Addressed some suggestions from clippy that made sense to me.</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Move RefCell one level down</title>
<updated>2025-07-14T20:21:55+00:00</updated>
<author>
<name>Kunshan Wang</name>
<email>wks1986@gmail.com</email>
</author>
<published>2025-07-10T08:55:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3a47f4eacf3cd755df9db554a0b5e40789611602'/>
<id>3a47f4eacf3cd755df9db554a0b5e40789611602</id>
<content type='text'>
This is the second part of making YJIT work with parallel GC.

During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need
to resolve offsets in `Block::gc_obj_offsets` into absolute addresses
before reading or updating the fields.  This needs the base address
stored in `VirtualMemory::region_start` which was previously behind a
`RefCell`.  When multiple GC threads scan multiple iseq simultaneously
(which is possible for some GC modules such as MMTk), it will panic
because the `RefCell` is already borrowed.

We notice that some fields of `VirtualMemory`, such as `region_start`,
are never modified once `VirtualMemory` is constructed.  We change the
type of the field `CodeBlock::mem_block` from `Rc&lt;RefCell&lt;T&gt;&gt;` to
`Rc&lt;T&gt;`, and push the `RefCell` into `VirtualMemory`.  We extract
mutable fields of `VirtualMemory` into a dedicated struct
`VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable`
which is a `RefCell&lt;VirtualMemoryMut&gt;`.  After this change, methods that
access immutable fields in `VirtualMemory`, particularly `base_ptr()`
which reads `region_start`, will no longer need to borrow any `RefCell`.
Methods that access mutable fields will need to borrow
`VirtualMemory::mutable`, but the number of borrowing operations becomes
strictly fewer than before because borrowing operations previously done
in callers (such as `CodeBlock::write_mem`) are moved into methods of
`VirtualMemory` (such as `VirtualMemory::write_bytes`).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the second part of making YJIT work with parallel GC.

During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need
to resolve offsets in `Block::gc_obj_offsets` into absolute addresses
before reading or updating the fields.  This needs the base address
stored in `VirtualMemory::region_start` which was previously behind a
`RefCell`.  When multiple GC threads scan multiple iseq simultaneously
(which is possible for some GC modules such as MMTk), it will panic
because the `RefCell` is already borrowed.

We notice that some fields of `VirtualMemory`, such as `region_start`,
are never modified once `VirtualMemory` is constructed.  We change the
type of the field `CodeBlock::mem_block` from `Rc&lt;RefCell&lt;T&gt;&gt;` to
`Rc&lt;T&gt;`, and push the `RefCell` into `VirtualMemory`.  We extract
mutable fields of `VirtualMemory` into a dedicated struct
`VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable`
which is a `RefCell&lt;VirtualMemoryMut&gt;`.  After this change, methods that
access immutable fields in `VirtualMemory`, particularly `base_ptr()`
which reads `region_start`, will no longer need to borrow any `RefCell`.
Methods that access mutable fields will need to borrow
`VirtualMemory::mutable`, but the number of borrowing operations becomes
strictly fewer than before because borrowing operations previously done
in callers (such as `CodeBlock::write_mem`) are moved into methods of
`VirtualMemory` (such as `VirtualMemory::write_bytes`).
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Set code mem permissions in bulk</title>
<updated>2025-07-14T20:21:55+00:00</updated>
<author>
<name>Kunshan Wang</name>
<email>wks1986@gmail.com</email>
</author>
<published>2025-06-30T06:21:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=51a3ea5adeb452e51c119a395acfd5c87cc63735'/>
<id>51a3ea5adeb452e51c119a395acfd5c87cc63735</id>
<content type='text'>
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC.  Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.

This commit makes one part of the change.

We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase.  Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`.  This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects.  It also has performance
overhead due to the frequent invocation of system calls.  We now set the
permission of all the code memory in bulk before and after the reference
updating phase.  Multiple GC threads can now perform raw memory writes
in parallel.  We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC.  Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.

This commit makes one part of the change.

We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase.  Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`.  This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects.  It also has performance
overhead due to the frequent invocation of system calls.  We now set the
permission of all the code memory in bulk before and after the reference
updating phase.  Multiple GC threads can now perform raw memory writes
in parallel.  We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: A64: Remove assert that trips when OOM at page boundary</title>
<updated>2025-01-30T00:09:39+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2025-01-29T19:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5a7089fc03fb77da6f0f6c005a9a0ef655660cff'/>
<id>5a7089fc03fb77da6f0f6c005a9a0ef655660cff</id>
<content type='text'>
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)

Return error in these cases instead of crashing.

Fixes: https://github.com/Shopify/ruby/issues/566
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)

Return error in these cases instead of crashing.

Fixes: https://github.com/Shopify/ruby/issues/566
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Add --yjit-mem-size option (#11810)</title>
<updated>2024-10-07T17:07:23+00:00</updated>
<author>
<name>Takashi Kokubun</name>
<email>takashikkbn@gmail.com</email>
</author>
<published>2024-10-07T17:07:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=35711903f239e462da682929982f434ee45c2199'/>
<id>35711903f239e462da682929982f434ee45c2199</id>
<content type='text'>
* YJIT: Add --yjit-mem-size option

* Improve --help

* s/the region/this virtual memory region/

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;

---------

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* YJIT: Add --yjit-mem-size option

* Improve --help

* s/the region/this virtual memory region/

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;

---------

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Return an Iterator Instead of a Vector in `addrs_to_pages` Method (#11725)</title>
<updated>2024-09-30T23:00:54+00:00</updated>
<author>
<name>whtsht</name>
<email>85547207+whtsht@users.noreply.github.com</email>
</author>
<published>2024-09-30T23:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=af63b4f8b7a659ab78a75af97416c042ca357a3b'/>
<id>af63b4f8b7a659ab78a75af97416c042ca357a3b</id>
<content type='text'>
* Returning an iterator instead of a vec

* Avoid changing the meaning of end_page

---------

Co-authored-by: Takashi Kokubun &lt;takashikkbn@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Returning an iterator instead of a vec

* Avoid changing the meaning of end_page

---------

Co-authored-by: Takashi Kokubun &lt;takashikkbn@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: `dump-disasm`: Print comments and bytes in release builds</title>
<updated>2024-07-08T20:02:30+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2024-07-08T20:02:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3be9ce3cf61e8396ecc3eea6169a2fd1a2c2bfea'/>
<id>3be9ce3cf61e8396ecc3eea6169a2fd1a2c2bfea</id>
<content type='text'>
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.

While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.

Sample output on A64:

```
  # regenerate_branch
  # Insn: 0001 opt_send_without_block (stack_size: 1)
  # guard known object with singleton class
  0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
  0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
  # RUBY_VM_CHECK_INTS(ec)
  0x11f7e0050: 8b 02 42 b8 cb 07 01 35
  # stack overflow check
  0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
  # save PC to CFP
  0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
  0x11f7e0073: f8 ab 82 00 91
```

To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.

Compile time, sample size of 60, lower is better:

```
       Before              After
 Min.   :2.054e+09   Min.   :2.028e+09
 1st Qu.:2.069e+09   1st Qu.:2.044e+09
 Median :2.081e+09   Median :2.060e+09
 Mean   :2.089e+09   Mean   :2.066e+09
 3rd Qu.:2.109e+09   3rd Qu.:2.085e+09
 Max.   :2.146e+09   Max.   :2.144e+09
```

Allocation size, sample size of 20, lower is better:

```
       Before             After
 Min.   :21804742   Min.   :21794082
 1st Qu.:21826682   1st Qu.:21816282
 Median :21844042   Median :21826814
 Mean   :21960664   Mean   :22026291
 3rd Qu.:21861228   3rd Qu.:22040439
 Max.   :22587426   Max.   :22930614
```

The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.

While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.

Sample output on A64:

```
  # regenerate_branch
  # Insn: 0001 opt_send_without_block (stack_size: 1)
  # guard known object with singleton class
  0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
  0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
  # RUBY_VM_CHECK_INTS(ec)
  0x11f7e0050: 8b 02 42 b8 cb 07 01 35
  # stack overflow check
  0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
  # save PC to CFP
  0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
  0x11f7e0073: f8 ab 82 00 91
```

To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.

Compile time, sample size of 60, lower is better:

```
       Before              After
 Min.   :2.054e+09   Min.   :2.028e+09
 1st Qu.:2.069e+09   1st Qu.:2.044e+09
 Median :2.081e+09   Median :2.060e+09
 Mean   :2.089e+09   Mean   :2.066e+09
 3rd Qu.:2.109e+09   3rd Qu.:2.085e+09
 Max.   :2.146e+09   Max.   :2.144e+09
```

Allocation size, sample size of 20, lower is better:

```
       Before             After
 Min.   :21804742   Min.   :21794082
 1st Qu.:21826682   1st Qu.:21816282
 Median :21844042   Median :21826814
 Mean   :21960664   Mean   :22026291
 3rd Qu.:21861228   3rd Qu.:22040439
 Max.   :22587426   Max.   :22930614
```

The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Assert code pages are not partially in-bounds</title>
<updated>2023-12-05T18:20:06+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2023-12-05T16:40:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=a063969ec18c8f213d9c70471a29d0d31ec5850a'/>
<id>a063969ec18c8f213d9c70471a29d0d31ec5850a</id>
<content type='text'>
Helps understand page switching
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Helps understand page switching
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Simplify code page switching logic, remove an assert</title>
<updated>2023-12-05T18:20:06+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2023-12-05T15:22:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=695e5c179ed06761e47c700c6b31a26f48eee699'/>
<id>695e5c179ed06761e47c700c6b31a26f48eee699</id>
<content type='text'>
We have received a report of `assert!( !cb.has_dropped_bytes())` in
set_page() failing. The only explanation for this seems to be memory
allocation failing in write_byte(). The if condition implies that
`current_write_pos &lt; dst_pos &lt; mem_size`, which rules out failing to
encode the relative jump. The has_capacity() assert above not tripping
implies that we were in a place in the page where write_byte() did
attempt to write the byte and potentially made a syscall in the process.

Remove the assert, since memory allocation could fail. Also, return
failure if the destination is outside of the code region to detect that
out-of-memory situation quicker.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have received a report of `assert!( !cb.has_dropped_bytes())` in
set_page() failing. The only explanation for this seems to be memory
allocation failing in write_byte(). The if condition implies that
`current_write_pos &lt; dst_pos &lt; mem_size`, which rules out failing to
encode the relative jump. The has_capacity() assert above not tripping
implies that we were in a place in the page where write_byte() did
attempt to write the byte and potentially made a syscall in the process.

Remove the assert, since memory allocation could fail. Also, return
failure if the destination is outside of the code region to detect that
out-of-memory situation quicker.
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Use u32 for CodePtr to save 4 bytes each</title>
<updated>2023-11-07T22:43:43+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2023-10-16T22:35:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=a1c61f0ae5f5ecaa7d8289942b78e6b0c77118fe'/>
<id>a1c61f0ae5f5ecaa7d8289942b78e6b0c77118fe</id>
<content type='text'>
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.

To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.

On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.

To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.

On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
</pre>
</div>
</content>
</entry>
</feed>
