| Age | Commit message (Collapse) | Author |
|
Or else we put garbage into the flags.
|
|
Previously, cpop_all() did not in fact restore the register mapping
state since it was effectively doing a no-op
`self.ctx.set_reg_mapping(self.ctx.get_reg_mapping())`. This desync in
bookkeeping led to issues with the --yjit-dump-insns option because
print_str() used to use cpush_all() and cpop_all().
|
|
* ZJIT: Share more code with YJIT in jit.c
* Fix ZJIT references to JIT
|
|
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)
Return error in these cases instead of crashing.
Fixes: https://github.com/Shopify/ruby/issues/566
Notes:
Merged: https://github.com/ruby/ruby/pull/12668
|
|
* YJIT: Local variable register allocation
* locals are not stack temps
* Rename RegTemps to RegMappings
* Rename RegMapping to RegOpnd
* Rename local_size to num_locals
* s/stack value/operand/
* Rename spill_temps() to spill_regs()
* Clarify when num_locals becomes None
* Mention that InsnOut uses different registers
* Rename get_reg_mapping to get_reg_opnd
* Resurrect --yjit-temp-regs capability
* Use MAX_CTX_TEMPS and MAX_CTX_LOCALS
|
|
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.
While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.
Sample output on A64:
```
# regenerate_branch
# Insn: 0001 opt_send_without_block (stack_size: 1)
# guard known object with singleton class
0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
# RUBY_VM_CHECK_INTS(ec)
0x11f7e0050: 8b 02 42 b8 cb 07 01 35
# stack overflow check
0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
# save PC to CFP
0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
0x11f7e0073: f8 ab 82 00 91
```
To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.
Compile time, sample size of 60, lower is better:
```
Before After
Min. :2.054e+09 Min. :2.028e+09
1st Qu.:2.069e+09 1st Qu.:2.044e+09
Median :2.081e+09 Median :2.060e+09
Mean :2.089e+09 Mean :2.066e+09
3rd Qu.:2.109e+09 3rd Qu.:2.085e+09
Max. :2.146e+09 Max. :2.144e+09
```
Allocation size, sample size of 20, lower is better:
```
Before After
Min. :21804742 Min. :21794082
1st Qu.:21826682 1st Qu.:21816282
Median :21844042 Median :21826814
Mean :21960664 Mean :22026291
3rd Qu.:21861228 3rd Qu.:22040439
Max. :22587426 Max. :22930614
```
The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.
|
|
* YJIT: A64: Add CBZ and CBNZ encoding functions
* YJIT: A64: Use CBZ/CBNZ to check for zero
Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare
and Branch on Zero for us to just do `cbz x0, #target`. This commit
utilizes that and the related CBNZ instruction when appropriate.
We check for zero most commonly in interrupt checks:
```diff
# Insn: 0003 leave (stack_size: 1)
# RUBY_VM_CHECK_INTS(ec)
ldur w11, [x20, #0x20]
-tst w11, w11
-b.ne #0x109002164
+cbnz w11, #0x1049021d0
```
* fix copy paste error
Co-authored-by: Randy Stauner <randy@r4s6.net>
---------
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
Same idea as the x64 equivalent in c2622b52536c5, removing the register
shuffle coming from the pop two, push one stack motion these VM
instructions perform.
```
# Insn: 0004 opt_or (stack_size: 2)
- orr x11, x1, x9
- mov x1, x11
+ orr x1, x1, x9
```
|
|
* YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible
We were loading 1 into a register and then doing ADDS/SUBS previously.
That was particularly bad since those come up in fixnum operations.
```diff
# integer left shift with rhs=1
- mov x11, #1
- subs x11, x1, x11
+ subs x11, x1, #1
lsl x12, x11, #1
asr x13, x12, #1
cmp x13, x11
- b.ne #0x106ab60f8
- mov x11, #1
- adds x12, x12, x11
+ b.ne #0x10903a0f8
+ adds x12, x12, #1
mov x1, x12
```
Note that it's fine to cast between i64 and u64 since the bit pattern is
preserved, and the add/sub themselves don't care about the signedness of
the operands.
CMP is just another mnemonic for SUBS.
* YJIT: A64: Split asm.mul() with immediates properly
There is in fact no MUL on A64 that takes an immediate, so this
instruction was using the wrong split method. No current usages of this
form in YJIT.
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
I ran into this while trying to implement setbyte, was surprised
to find out we hadn't implemented it yet.
|
|
|
|
|
|
Previously, PosMarker callbacks ran even when the assembler failed to
assemble its contents due to insufficient space. This was problematic
because when Assembler::compile() failed, the callbacks were given
positions that have no valid code, contrary to general expectation.
For example, we use a PosMarker callback to record VM instruction
boundaries and patch in jumps to exits in case the guest program starts
tracing, however, previously, we could record a location near the end of
the code block, where there is no space to patch in jumps. I suspect
this is the cause of the recent occurrences of rare random failures on
GitHub Actions with the invariants.rs:529 "can rewrite existing code"
message. `--yjit-perf` also uses PosMarker and had a similar issue.
Buffer the list of callbacks to fire, and only fire them when all code
in the assembler are written out successfully. It's more intuitive this
way.
|
|
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.
To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.
On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
|
|
|
|
So that we get a reminder to check CodeBlock::has_dropped_bytes().
Internally, asm.compile() already checks it, and this patch just
propagates it out to the caller with a `#[must_use]`.
Code GC logic moved out one level in entry_stub_hit(), so the body
can freely use `?`
|
|
* YJIT: Chain-guard opt_mult overflow
* YJIT: Support regenerating Jo after Mul
|
|
* YJIT: Avoid creating a vector in get_temp_regs()
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* Remove unused import
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
YJIT: implement missing jg instruction in backend
While trying to implement a specialize integer left shift, I ran
into a problem where we have no way to do a greater-than comparison
at the moment. Surprising we went this far without ever needing it.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Revert "Revert "YJIT: Break register cycles for C arguments (#7918)""
This reverts commit 78ca085785460de46bfc4851a898d525c1698ef8.
* Use shfited_live_ranges for the last-insn check
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This reverts commit 888ba29e462075472776098f4f95eb6d3df8e730.
It caused a CI failure
http://ci.rvm.jp/results/trunk-yjit@ruby-sp2-docker/4598881
and I'm investigating it.
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Refactor arm64_split with &mut insn
* YJIT: Merge csel and mov on arm64
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Avoid splitting mov for small values on arm64
* Fix a comment
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Test the 0xffff boundary
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Replace Mov with LoadInto on arm64
* YJIT: Add a test for the new pass
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Remove Insn::RegTemps
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Introduce Target::SideExit
* YJIT: Obviate Insn::SideExitContext
* YJIT: Avoid cloning a Context for each insn
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Let Assembler own Context
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Reduce paddings if --yjit-exec-mem-size <= 128
on arm64
* YJIT: Define jmp_ptr_bytes on CodeBlock
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Stack temp register allocation for arm64
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* Update comments about assertion
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
I kept getting unused warnings for this macro on A64 macOS.
Notes:
Merged: https://github.com/ruby/ruby/pull/7533
Merged-By: XrXr
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Previously, with Code GC, YJIT panicked while trying to emit a B.cond
instruction with an offset that is not encodable in 19 bits. This only
happens when the code in an assembler instance straddles two pages.
To fix this, when we detect that a jump to a label can land on a
different page, we switch to a fresh new page and regenerate all the
code in the assembler there. We still assume that no one assembler has
so much code that it wouldn't fit inside a fresh new page.
[Bug #19385]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
* Fix 32 and 16 bit register store in YJIT
Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com>
* Remove an unnecessary diff
* Reuse an rm_num_bits result
* Use u16::MAX instead
* Update the link
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Just use sturh for 16 bits
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
YJIT: Skip padding jumps to side exits
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit. That means we don't need to load the shape id to a
register on x86 machines.
Given the following program:
```ruby
class Foo
def initialize
@foo = 1
@bar = 1
end
def read
[@foo, @bar]
end
end
foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read
puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```
The machine code we generated _before_ this change is like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
# getinstancevariable
0x559a18623023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x559a18623027: test al, 7
0x559a1862302a: jne 0x559a1862502d
0x559a18623030: cmp rax, 4
0x559a18623034: jbe 0x559a1862502d
# guard shape, embedded, and T_OBJECT
0x559a1862303a: mov rcx, qword ptr [rax]
0x559a1862303d: movabs r11, 0xffff00000000201f
0x559a18623047: and rcx, r11
0x559a1862304a: movabs r11, 0xb000000002001
0x559a18623054: cmp rcx, r11
0x559a18623057: jne 0x559a18625046
0x559a1862305d: mov rax, qword ptr [rax + 0x18]
0x559a18623061: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x559a18623064: mov rax, qword ptr [r13 + 0x18]
# guard shape, embedded, and T_OBJECT
0x559a18623068: mov rcx, qword ptr [rax]
0x559a1862306b: movabs r11, 0xffff00000000201f
0x559a18623075: and rcx, r11
0x559a18623078: movabs r11, 0xb000000002001
0x559a18623082: cmp rcx, r11
0x559a18623085: jne 0x559a18625099
0x559a1862308b: mov rax, qword ptr [rax + 0x20]
0x559a1862308f: mov qword ptr [rbx + 8], rax
```
After this change, it's like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
# getinstancevariable
0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x5560c986d027: test al, 7
0x5560c986d02a: jne 0x5560c986f02d
0x5560c986d030: cmp rax, 4
0x5560c986d034: jbe 0x5560c986f02d
# guard shape
0x5560c986d03a: cmp word ptr [rax + 6], 0x19
0x5560c986d03f: jne 0x5560c986f046
0x5560c986d045: mov rax, qword ptr [rax + 0x10]
0x5560c986d049: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
# guard shape
0x5560c986d050: cmp word ptr [rax + 6], 0x19
0x5560c986d055: jne 0x5560c986f099
0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
0x5560c986d05f: mov qword ptr [rbx + 8], rax
```
The first ivar read is a bit more complex, but the second ivar read is
much simpler. I think eventually we could teach the context about the
shape, then emit only one shape guard.
Notes:
Merged: https://github.com/ruby/ruby/pull/6737
|
|
Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Support invokeblock
* Update yjit/src/backend/arm64/mod.rs
* Update yjit/src/codegen.rs
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Invalidate i-cache for the other cb on next_page
* YJIT: Invalidate only what's written by jmp_ptr
* YJIT: Move the code to the arm64 backend
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Previously, enabling only "disasm" didn't actually build. Since these
two features are closely related and we don't really use one without the
other, let's simplify and merge the two features together.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* fixes more clippy warnings
* Fix x86 c_callable to have doc_strings
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|