| Age | Commit message (Collapse) | Author |
|
|
|
|
|
Previously, PosMarker callbacks ran even when the assembler failed to
assemble its contents due to insufficient space. This was problematic
because when Assembler::compile() failed, the callbacks were given
positions that have no valid code, contrary to general expectation.
For example, we use a PosMarker callback to record VM instruction
boundaries and patch in jumps to exits in case the guest program starts
tracing, however, previously, we could record a location near the end of
the code block, where there is no space to patch in jumps. I suspect
this is the cause of the recent occurrences of rare random failures on
GitHub Actions with the invariants.rs:529 "can rewrite existing code"
message. `--yjit-perf` also uses PosMarker and had a similar issue.
Buffer the list of callbacks to fire, and only fire them when all code
in the assembler are written out successfully. It's more intuitive this
way.
|
|
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.
To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.
On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
|
|
|
|
So that we get a reminder to check CodeBlock::has_dropped_bytes().
Internally, asm.compile() already checks it, and this patch just
propagates it out to the caller with a `#[must_use]`.
Code GC logic moved out one level in entry_stub_hit(), so the body
can freely use `?`
|
|
* YJIT: Chain-guard opt_mult overflow
* YJIT: Support regenerating Jo after Mul
|
|
* YJIT: Avoid creating a vector in get_temp_regs()
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* Remove unused import
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
YJIT: implement missing jg instruction in backend
While trying to implement a specialize integer left shift, I ran
into a problem where we have no way to do a greater-than comparison
at the moment. Surprising we went this far without ever needing it.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Revert "Revert "YJIT: Break register cycles for C arguments (#7918)""
This reverts commit 78ca085785460de46bfc4851a898d525c1698ef8.
* Use shfited_live_ranges for the last-insn check
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This reverts commit 888ba29e462075472776098f4f95eb6d3df8e730.
It caused a CI failure
http://ci.rvm.jp/results/trunk-yjit@ruby-sp2-docker/4598881
and I'm investigating it.
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Refactor arm64_split with &mut insn
* YJIT: Merge csel and mov on arm64
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Avoid splitting mov for small values on arm64
* Fix a comment
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Test the 0xffff boundary
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Replace Mov with LoadInto on arm64
* YJIT: Add a test for the new pass
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Remove Insn::RegTemps
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Introduce Target::SideExit
* YJIT: Obviate Insn::SideExitContext
* YJIT: Avoid cloning a Context for each insn
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Let Assembler own Context
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Reduce paddings if --yjit-exec-mem-size <= 128
on arm64
* YJIT: Define jmp_ptr_bytes on CodeBlock
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Stack temp register allocation for arm64
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* Update comments about assertion
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
I kept getting unused warnings for this macro on A64 macOS.
Notes:
Merged: https://github.com/ruby/ruby/pull/7533
Merged-By: XrXr
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Previously, with Code GC, YJIT panicked while trying to emit a B.cond
instruction with an offset that is not encodable in 19 bits. This only
happens when the code in an assembler instance straddles two pages.
To fix this, when we detect that a jump to a label can land on a
different page, we switch to a fresh new page and regenerate all the
code in the assembler there. We still assume that no one assembler has
so much code that it wouldn't fit inside a fresh new page.
[Bug #19385]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7227
|
|
* Fix 32 and 16 bit register store in YJIT
Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com>
* Remove an unnecessary diff
* Reuse an rm_num_bits result
* Use u16::MAX instead
* Update the link
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Just use sturh for 16 bits
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
YJIT: Skip padding jumps to side exits
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit. That means we don't need to load the shape id to a
register on x86 machines.
Given the following program:
```ruby
class Foo
def initialize
@foo = 1
@bar = 1
end
def read
[@foo, @bar]
end
end
foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read
puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```
The machine code we generated _before_ this change is like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
# getinstancevariable
0x559a18623023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x559a18623027: test al, 7
0x559a1862302a: jne 0x559a1862502d
0x559a18623030: cmp rax, 4
0x559a18623034: jbe 0x559a1862502d
# guard shape, embedded, and T_OBJECT
0x559a1862303a: mov rcx, qword ptr [rax]
0x559a1862303d: movabs r11, 0xffff00000000201f
0x559a18623047: and rcx, r11
0x559a1862304a: movabs r11, 0xb000000002001
0x559a18623054: cmp rcx, r11
0x559a18623057: jne 0x559a18625046
0x559a1862305d: mov rax, qword ptr [rax + 0x18]
0x559a18623061: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x559a18623064: mov rax, qword ptr [r13 + 0x18]
# guard shape, embedded, and T_OBJECT
0x559a18623068: mov rcx, qword ptr [rax]
0x559a1862306b: movabs r11, 0xffff00000000201f
0x559a18623075: and rcx, r11
0x559a18623078: movabs r11, 0xb000000002001
0x559a18623082: cmp rcx, r11
0x559a18623085: jne 0x559a18625099
0x559a1862308b: mov rax, qword ptr [rax + 0x20]
0x559a1862308f: mov qword ptr [rbx + 8], rax
```
After this change, it's like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
# getinstancevariable
0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x5560c986d027: test al, 7
0x5560c986d02a: jne 0x5560c986f02d
0x5560c986d030: cmp rax, 4
0x5560c986d034: jbe 0x5560c986f02d
# guard shape
0x5560c986d03a: cmp word ptr [rax + 6], 0x19
0x5560c986d03f: jne 0x5560c986f046
0x5560c986d045: mov rax, qword ptr [rax + 0x10]
0x5560c986d049: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
# guard shape
0x5560c986d050: cmp word ptr [rax + 6], 0x19
0x5560c986d055: jne 0x5560c986f099
0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
0x5560c986d05f: mov qword ptr [rbx + 8], rax
```
The first ivar read is a bit more complex, but the second ivar read is
much simpler. I think eventually we could teach the context about the
shape, then emit only one shape guard.
Notes:
Merged: https://github.com/ruby/ruby/pull/6737
|
|
Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Support invokeblock
* Update yjit/src/backend/arm64/mod.rs
* Update yjit/src/codegen.rs
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Invalidate i-cache for the other cb on next_page
* YJIT: Invalidate only what's written by jmp_ptr
* YJIT: Move the code to the arm64 backend
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Previously, enabling only "disasm" didn't actually build. Since these
two features are closely related and we don't really use one without the
other, let's simplify and merge the two features together.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* fixes more clippy warnings
* Fix x86 c_callable to have doc_strings
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
For logical instructions such as AND, there is a constraint that the N
part of the bitmask immediate must be 0. We weren't respecting this
condition previously and were silently emitting undefined instructions.
Check for this condition in the assembler and tweak the backend to
correctly detect whether a number could be encoded as an immediate in a
32 bit logical instruction. Due to the nature of the immediate encoding,
the same numeric value encodes differently depending on the size of
the register the instruction works on.
We currently don't have cases where we use 32 bit immediates but we ran
into this encoding issue during development.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Change IncrCounter lowering on AArch64
Previously we were using LDADDAL which is not available on
Graviton 1 chips. Instead, we're going to use an exclusive
load/store group through the LDAXR/STLXR instructions.
* Update yjit/src/backend/arm64/mod.rs
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* YJIT: Add Opnd#sub_opnd to use only 8 bits
* Add with_num_bits and let arm64_split use it
* Add another assertion to with_num_bits
* Use only with_num_bits
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Introduce InstructionOffset for AArch64
There are a lot of instructions on AArch64 where we take an offset
from PC in terms of the number of instructions. This is for loading
a value relative to the PC or for jumping.
We were usually accepting an A64Opnd or an i32. It can get
confusing and inconsistent though because sometimes you would
divide by 4 to get the number of instructions or multiply by 4 to
get the number of bytes.
This commit adds a struct that wraps an i32 in order to keep all of
that logic in one place. It makes it much easier to read and reason
about how these offsets are getting used.
* Use b instruction when the offset fits on AArch64
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This commit does a bunch of stuff to try to eliminate as many
unnecessary mov instructions as possible.
First, it introduces the Insn::LoadInto instruction. Previously
when we needed a value to go into a specific register (like in
Insn::CCall when we're putting values into the argument registers
or in Insn::CRet when we're putting a value into the return
register) we would first load the value and then mov it into the
correct register. This resulted in a lot of duplicated work with
short live ranges since they basically immediately we unnecessary.
The new instruction accepts a destination and does not interact
with the register allocator at all, making it much more efficient.
We then use the new instruction when we're loading values into
argument registers for AArch64 or X86_64, and when we're returning
a value from AArch64. Notably we don't do it when we're returning
a value from X86_64 because everything can be accomplished with a
single mov anyway.
A couple of unnecessary movs were also present because when we
called the split_load_opnd function in a lot of split passes we
were loading all registers and instruction outputs. We no longer do
that.
This commit also makes it so that UImm(0) passes through the
Insn::Store split without attempting to be loaded, which allows it
can take advantage of the zero register. So now instead of mov-ing
0 into a register and then calling store, it just stores XZR.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
[Bug #18985]
* Callee-save x22 for aarch64
* Just use a caller-saved register
* Update yjit/src/backend/arm64/mod.rs
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* Better b.cond usage on AArch64
When we're lowering a conditional jump, we previously had a bit of
a complicated setup where we could emit a conditional jump to skip
over a jump that was the next instruction, and then write out the
destination and use a branch register.
Now instead we use the b.cond instruction if our offset fits (not
common, but not unused either) and if it doesn't we write out an
inverse condition to jump past loading the destination and
branching directly.
* Added an inverse fn for Condition (#443)
Prevents the need to pass two params and potentially reduces errors.
Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/6304
|