| Age | Commit message (Collapse) | Author |
|
Addressed some suggestions from clippy that made sense to me.
|
|
This is the second part of making YJIT work with parallel GC.
During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need
to resolve offsets in `Block::gc_obj_offsets` into absolute addresses
before reading or updating the fields. This needs the base address
stored in `VirtualMemory::region_start` which was previously behind a
`RefCell`. When multiple GC threads scan multiple iseq simultaneously
(which is possible for some GC modules such as MMTk), it will panic
because the `RefCell` is already borrowed.
We notice that some fields of `VirtualMemory`, such as `region_start`,
are never modified once `VirtualMemory` is constructed. We change the
type of the field `CodeBlock::mem_block` from `Rc<RefCell<T>>` to
`Rc<T>`, and push the `RefCell` into `VirtualMemory`. We extract
mutable fields of `VirtualMemory` into a dedicated struct
`VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable`
which is a `RefCell<VirtualMemoryMut>`. After this change, methods that
access immutable fields in `VirtualMemory`, particularly `base_ptr()`
which reads `region_start`, will no longer need to borrow any `RefCell`.
Methods that access mutable fields will need to borrow
`VirtualMemory::mutable`, but the number of borrowing operations becomes
strictly fewer than before because borrowing operations previously done
in callers (such as `CodeBlock::write_mem`) are moved into methods of
`VirtualMemory` (such as `VirtualMemory::write_bytes`).
|
|
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC. Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.
This commit makes one part of the change.
We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase. Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`. This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects. It also has performance
overhead due to the frequent invocation of system calls. We now set the
permission of all the code memory in bulk before and after the reference
updating phase. Multiple GC threads can now perform raw memory writes
in parallel. We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
|
|
Previously, `asm.mov(m32, imm32)` panicked when `imm32 > 0x80000000`. It
attempted to split imm32 into a register before doing the store, but
then the register size didn't match the destination size.
Instead of splitting, use the `MOV r/m32, imm32` form which works for
all 32-bit values. Adjust asserts that assumed that all forms undergo
sign extension, which is not true for this case.
See: 54edc930f9f0a658da45cfcef46648d1b6f82467
Notes:
Merged: https://github.com/ruby/ruby/pull/13576
|
|
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)
Return error in these cases instead of crashing.
Fixes: https://github.com/Shopify/ruby/issues/566
Notes:
Merged: https://github.com/ruby/ruby/pull/12668
|
|
* YJIT: Add --yjit-mem-size option
* Improve --help
* s/the region/this virtual memory region/
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Returning an iterator instead of a vec
* Avoid changing the meaning of end_page
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.
While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.
Sample output on A64:
```
# regenerate_branch
# Insn: 0001 opt_send_without_block (stack_size: 1)
# guard known object with singleton class
0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
# RUBY_VM_CHECK_INTS(ec)
0x11f7e0050: 8b 02 42 b8 cb 07 01 35
# stack overflow check
0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
# save PC to CFP
0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
0x11f7e0073: f8 ab 82 00 91
```
To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.
Compile time, sample size of 60, lower is better:
```
Before After
Min. :2.054e+09 Min. :2.028e+09
1st Qu.:2.069e+09 1st Qu.:2.044e+09
Median :2.081e+09 Median :2.060e+09
Mean :2.089e+09 Mean :2.066e+09
3rd Qu.:2.109e+09 3rd Qu.:2.085e+09
Max. :2.146e+09 Max. :2.144e+09
```
Allocation size, sample size of 20, lower is better:
```
Before After
Min. :21804742 Min. :21794082
1st Qu.:21826682 1st Qu.:21816282
Median :21844042 Median :21826814
Mean :21960664 Mean :22026291
3rd Qu.:21861228 3rd Qu.:22040439
Max. :22587426 Max. :22930614
```
The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.
|
|
order to support natural line stepping. (#11083)
Use a special breakpoint address if one isn't explicitly supplied in order to support natural line stepping.
ARM64 will not increment the program counter (PC) upon hitting a breakpoint instruction. Consequently, stepping through code with a debugger ends up looping back to the breakpoint instruction. LLDB has a special breakpoint address of 0xf000 that will increment the PC and allow the debugger to work as expected. This change makes it possible to debug YJIT generated code on ARM64.
More details at: https://discourse.llvm.org/t/stepping-over-a-brk-instruction-on-arm64/69766/8
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
Mostly putting angle brackets around links to follow markdown syntax.
|
|
* YJIT: A64: Add CBZ and CBNZ encoding functions
* YJIT: A64: Use CBZ/CBNZ to check for zero
Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare
and Branch on Zero for us to just do `cbz x0, #target`. This commit
utilizes that and the related CBNZ instruction when appropriate.
We check for zero most commonly in interrupt checks:
```diff
# Insn: 0003 leave (stack_size: 1)
# RUBY_VM_CHECK_INTS(ec)
ldur w11, [x20, #0x20]
-tst w11, w11
-b.ne #0x109002164
+cbnz w11, #0x1049021d0
```
* fix copy paste error
Co-authored-by: Randy Stauner <randy@r4s6.net>
---------
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
* YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible
We were loading 1 into a register and then doing ADDS/SUBS previously.
That was particularly bad since those come up in fixnum operations.
```diff
# integer left shift with rhs=1
- mov x11, #1
- subs x11, x1, x11
+ subs x11, x1, #1
lsl x12, x11, #1
asr x13, x12, #1
cmp x13, x11
- b.ne #0x106ab60f8
- mov x11, #1
- adds x12, x12, x11
+ b.ne #0x10903a0f8
+ adds x12, x12, #1
mov x1, x12
```
Note that it's fine to cast between i64 and u64 since the bit pattern is
preserved, and the add/sub themselves don't care about the signedness of
the operands.
CMP is just another mnemonic for SUBS.
* YJIT: A64: Split asm.mul() with immediates properly
There is in fact no MUL on A64 that takes an immediate, so this
instruction was using the wrong split method. No current usages of this
form in YJIT.
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
|
|
|
|
Helps understand page switching
|
|
We have received a report of `assert!( !cb.has_dropped_bytes())` in
set_page() failing. The only explanation for this seems to be memory
allocation failing in write_byte(). The if condition implies that
`current_write_pos < dst_pos < mem_size`, which rules out failing to
encode the relative jump. The has_capacity() assert above not tripping
implies that we were in a place in the page where write_byte() did
attempt to write the byte and potentially made a syscall in the process.
Remove the assert, since memory allocation could fail. Also, return
failure if the destination is outside of the code region to detect that
out-of-memory situation quicker.
|
|
|
|
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.
To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.
On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
|
|
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
|
|
YJIT: Skip adding past_pages_bytes for past pages
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Make compiled_* stats available by default
* Update comment about default counters [ci skip]
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
The ARM backend allows for this so let's make x64 consistent.
Notes:
Merged: https://github.com/ruby/ruby/pull/8263
Merged-By: XrXr
|
|
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: implement codegen for rb_int_lshift
* Update yjit/src/asm/x86_64/mod.rs
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
Typo fix for the last commit (1432b37)
|
|
This is used only for arm64's cb.jmp_ptr_bytes().
|
|
* YJIT: Reduce paddings if --yjit-exec-mem-size <= 128
on arm64
* YJIT: Define jmp_ptr_bytes on CodeBlock
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
The derived `&mut` from `other_cb()` overlapped with the parameter
`ocb`.
Use `cfg!()` instead of `#[cfg...]` to avoid unused warnings.
Notes:
Merged: https://github.com/ruby/ruby/pull/7611
|
|
Making overlapping `&mut`s triggers Undefined Bahavior. This function
previously had them through `cb` and `ocb` aliasing with `self` or live
references in the caller.
To fix the overlap, take `ocb` as a parameter and don't use `get_inline_cb()`
in the body of the function.
Notes:
Merged: https://github.com/ruby/ruby/pull/7611
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
The code and comments in there have been disabled by comments for a long
time. The issues that the counter used to solve are now solved more
comprehensively by "runningness" [tracking][1] introduced by Code GC
and [delayed deallocation][2].
Having a single counter doesn't fit our current model where code pages
that could be touched or not are interleaved, anyway.
Just delete the code.
[1]: e7c71c6c9271b0c29f210769159090e17128e740
[2]: a0b0365e905e1ac51998ace7e6fc723406a2f157
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Follows up [Bug #19400]
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Previously on ARM64 Linux systems that use 64 KiB pages
(`CONFIG_ARM64_64K_PAGES=y`), YJIT was panicking on boot due to a failed
assertion.
The assertion was making sure that code GC can free the last code page
that YJIT manages without freeing unrelated memory. YJIT prefers picking
16 KiB as the granularity at which to free code memory, but when the
system can only free at 64 KiB granularity, that is not possible.
The fix is to use the system page size as the code page size when the
system page size is 64 KiB. Continue to use 16 KiB as the code page size
on common systems that use 16/4 KiB pages.
Add asserts to code_gc() and free_page() about code GC's assumptions.
Fixes [Bug #19400]
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
Since the other cb is in CodegenGlobals, and we want Rust tests to be
self-contained.
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
This allows for supplying a freed_pages vec in Rust tests. We need it so we
can test scenarios that occur after code GC.
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
* Add stats so we can keep track of x86 rel32 vs register calls
To know if we get that "prime real estate" as Alan put it.
* Fix bug pointed by Alan
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Add job to check clippy lints in CI
* Address all remaining clippy lints
* Check lints on arm64 as well
* Apply latest clippy lints
* Do not exit 0 on clippy warnings
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
warning: unused variable: `start_addr`
--> ../yjit/src/asm/mod.rs:359:39
|
359 | pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) {
| ^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_start_addr`
|
= note: `#[warn(unused_variables)]` on by default
warning: unused variable: `end_addr`
--> ../yjit/src/asm/mod.rs:359:60
|
359 | pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) {
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Certain code page sizes don't work and can cause crashes, so having this
value available as a command-line option is a bit dangerous. Remove it
and turn it into a constant instead.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* Fix 32 and 16 bit register store in YJIT
Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com>
* Remove an unnecessary diff
* Reuse an rm_num_bits result
* Use u16::MAX instead
* Update the link
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Just use sturh for 16 bits
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|