| Age | Commit message (Collapse) | Author |
|
order to support natural line stepping. (#11083)
Use a special breakpoint address if one isn't explicitly supplied in order to support natural line stepping.
ARM64 will not increment the program counter (PC) upon hitting a breakpoint instruction. Consequently, stepping through code with a debugger ends up looping back to the breakpoint instruction. LLDB has a special breakpoint address of 0xf000 that will increment the PC and allow the debugger to work as expected. This change makes it possible to debug YJIT generated code on ARM64.
More details at: https://discourse.llvm.org/t/stepping-over-a-brk-instruction-on-arm64/69766/8
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
Mostly putting angle brackets around links to follow markdown syntax.
|
|
* YJIT: A64: Add CBZ and CBNZ encoding functions
* YJIT: A64: Use CBZ/CBNZ to check for zero
Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare
and Branch on Zero for us to just do `cbz x0, #target`. This commit
utilizes that and the related CBNZ instruction when appropriate.
We check for zero most commonly in interrupt checks:
```diff
# Insn: 0003 leave (stack_size: 1)
# RUBY_VM_CHECK_INTS(ec)
ldur w11, [x20, #0x20]
-tst w11, w11
-b.ne #0x109002164
+cbnz w11, #0x1049021d0
```
* fix copy paste error
Co-authored-by: Randy Stauner <randy@r4s6.net>
---------
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
* YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible
We were loading 1 into a register and then doing ADDS/SUBS previously.
That was particularly bad since those come up in fixnum operations.
```diff
# integer left shift with rhs=1
- mov x11, #1
- subs x11, x1, x11
+ subs x11, x1, #1
lsl x12, x11, #1
asr x13, x12, #1
cmp x13, x11
- b.ne #0x106ab60f8
- mov x11, #1
- adds x12, x12, x11
+ b.ne #0x10903a0f8
+ adds x12, x12, #1
mov x1, x12
```
Note that it's fine to cast between i64 and u64 since the bit pattern is
preserved, and the add/sub themselves don't care about the signedness of
the operands.
CMP is just another mnemonic for SUBS.
* YJIT: A64: Split asm.mul() with immediates properly
There is in fact no MUL on A64 that takes an immediate, so this
instruction was using the wrong split method. No current usages of this
form in YJIT.
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
|
|
|
|
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
|
|
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
* Fix 32 and 16 bit register store in YJIT
Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com>
* Remove an unnecessary diff
* Reuse an rm_num_bits result
* Use u16::MAX instead
* Update the link
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Just use sturh for 16 bits
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit. That means we don't need to load the shape id to a
register on x86 machines.
Given the following program:
```ruby
class Foo
def initialize
@foo = 1
@bar = 1
end
def read
[@foo, @bar]
end
end
foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read
puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```
The machine code we generated _before_ this change is like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
# getinstancevariable
0x559a18623023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x559a18623027: test al, 7
0x559a1862302a: jne 0x559a1862502d
0x559a18623030: cmp rax, 4
0x559a18623034: jbe 0x559a1862502d
# guard shape, embedded, and T_OBJECT
0x559a1862303a: mov rcx, qword ptr [rax]
0x559a1862303d: movabs r11, 0xffff00000000201f
0x559a18623047: and rcx, r11
0x559a1862304a: movabs r11, 0xb000000002001
0x559a18623054: cmp rcx, r11
0x559a18623057: jne 0x559a18625046
0x559a1862305d: mov rax, qword ptr [rax + 0x18]
0x559a18623061: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x559a18623064: mov rax, qword ptr [r13 + 0x18]
# guard shape, embedded, and T_OBJECT
0x559a18623068: mov rcx, qword ptr [rax]
0x559a1862306b: movabs r11, 0xffff00000000201f
0x559a18623075: and rcx, r11
0x559a18623078: movabs r11, 0xb000000002001
0x559a18623082: cmp rcx, r11
0x559a18623085: jne 0x559a18625099
0x559a1862308b: mov rax, qword ptr [rax + 0x20]
0x559a1862308f: mov qword ptr [rbx + 8], rax
```
After this change, it's like this:
```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
# getinstancevariable
0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
# guard object is heap
0x5560c986d027: test al, 7
0x5560c986d02a: jne 0x5560c986f02d
0x5560c986d030: cmp rax, 4
0x5560c986d034: jbe 0x5560c986f02d
# guard shape
0x5560c986d03a: cmp word ptr [rax + 6], 0x19
0x5560c986d03f: jne 0x5560c986f046
0x5560c986d045: mov rax, qword ptr [rax + 0x10]
0x5560c986d049: mov qword ptr [rbx], rax
== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
# gen_direct_jmp: fallthrough
# getinstancevariable
# regenerate_branch
# getinstancevariable
# regenerate_branch
0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
# guard shape
0x5560c986d050: cmp word ptr [rax + 6], 0x19
0x5560c986d055: jne 0x5560c986f099
0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
0x5560c986d05f: mov qword ptr [rbx + 8], rax
```
The first ivar read is a bit more complex, but the second ivar read is
much simpler. I think eventually we could teach the context about the
shape, then emit only one shape guard.
Notes:
Merged: https://github.com/ruby/ruby/pull/6737
|
|
When RUBY_DEBUG is enabled, shape ids are 16 bits. I would like to do
16 bit comparisons, so I need to load halfwords sometimes. This commit
adds LDURH so that I can load halfwords.
https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/LDURH--Load-Register-Halfword--unscaled--?lang=en
I verified the bytes using clang:
```
$ cat asmthing.s
.global _start
.align 2
_start:
ldurh w10, [x1]
ldurh w10, [x1, #123]
$ as asmthing.s -o asmthing.o && objdump --disassemble asmthing.o
asmthing.o: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000000000000 <ltmp0>:
0: 2a 00 40 78 ldurh w10, [x1]
4: 2a b0 47 78 ldurh w10, [x1, #123]
```
Notes:
Merged: https://github.com/ruby/ruby/pull/6729
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
For logical instructions such as AND, there is a constraint that the N
part of the bitmask immediate must be 0. We weren't respecting this
condition previously and were silently emitting undefined instructions.
Check for this condition in the assembler and tweak the backend to
correctly detect whether a number could be encoded as an immediate in a
32 bit logical instruction. Due to the nature of the immediate encoding,
the same numeric value encodes differently depending on the size of
the register the instruction works on.
We currently don't have cases where we use 32 bit immediates but we ran
into this encoding issue during development.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Change IncrCounter lowering on AArch64
Previously we were using LDADDAL which is not available on
Graviton 1 chips. Instead, we're going to use an exclusive
load/store group through the LDAXR/STLXR instructions.
* Update yjit/src/backend/arm64/mod.rs
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Add Opnd#sub_opnd to use only 8 bits
* Add with_num_bits and let arm64_split use it
* Add another assertion to with_num_bits
* Use only with_num_bits
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Introduce InstructionOffset for AArch64
There are a lot of instructions on AArch64 where we take an offset
from PC in terms of the number of instructions. This is for loading
a value relative to the PC or for jumping.
We were usually accepting an A64Opnd or an i32. It can get
confusing and inconsistent though because sometimes you would
divide by 4 to get the number of instructions or multiply by 4 to
get the number of bytes.
This commit adds a struct that wraps an i32 in order to keep all of
that logic in one place. It makes it much easier to read and reason
about how these offsets are getting used.
* Use b instruction when the offset fits on AArch64
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* Better b.cond usage on AArch64
When we're lowering a conditional jump, we previously had a bit of
a complicated setup where we could emit a conditional jump to skip
over a jump that was the next instruction, and then write out the
destination and use a branch register.
Now instead we use the b.cond instruction if our offset fits (not
common, but not unused either) and if it doesn't we write out an
inverse condition to jump past loading the destination and
branching directly.
* Added an inverse fn for Condition (#443)
Prevents the need to pass two params and potentially reduces errors.
Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
There are a lot of times when encoding AArch64 instructions that we
need to represent an integer value with a custom fixed width. For
example, the offset for a B instruction is 26 bits, so we store an
i32 on the instruction struct and then mask it when we encode.
We've been doing this masking everywhere, which has worked, but
it's getting a bit copy-pasty all over the place. This commit
centralizes that logic to make sure we stay consistent.
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
* When we're storing an immediate 0 value at a memory address, we
can use STUR XZR, Xd instead of loading 0 into a register and
then storing that register.
* When we're moving 0 into an argument register, we can use
MOV Xd, XZR instead of loading the value into a register first.
* In the newarray instruction, we can skip looking at the stack at
all if the number of values we're using is 0.
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
|
|
|
|
(https://github.com/Shopify/ruby/pull/382)
* LDR instruction for AArch64
* Split loads in arm64_split when memory address displacements do not fit
|
|
* Left and right shift for IR
* Update yjit/src/backend/x86_64/mod.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
|
|
(https://github.com/Shopify/ruby/pull/344)
* A64: Fix off by one in offset encoding for BL
It's relative to the address of the instruction not the end of it.
* A64: Fix off by one when encoding B
It's relative to the start of the instruction not the end.
* A64: Add some tests for boundary offsets
|
|
* Fix conditional jumps to label
* Bitmask immediates cannot be u64::MAX
|
|
* Better splitting for Op::Add, Op::Sub, and Op::Cmp
* Split stores if the displacement is too large
* Use a shifted immediate argument
* Split all places where shifted immediates are used
* Add more tests to the cirrus workflow
|
|
* Fix bitmask encoding to u32
* Fix splitting for Op::And to account for bitmask immediate
|
|
(https://github.com/Shopify/ruby/pull/335)
|
|
(https://github.com/Shopify/ruby/pull/329)
* Move to/from SP on AArch64
* Consolidate loads and stores
* Implement LDR post-index and LDR pre-index for AArch64
* Implement STR post-index and STR pre-index for AArch64
* Module entrypoints for LDR pre/post -index and STR pre/post -index
* Use STR (pre-index) and LDR (post-index) to implement push/pop
* Go back to using MOV for to/from SP
|
|
|
|
|
|
|
|
* CSEL on AArch64
* Implement various Op::CSel* instructions
|
|
* Port print_int to the new backend
* Tests for print_int and print_str
|
|
* ADR and ADRP for AArch64
* Implement Op::Jbe on X86
* Lera instruction
* Op::BakeString
* LeaPC -> LeaLabel
* Port print_str to the new backend
* Port print_value to the new backend
* Port print_ptr to the new backend
* Write null-terminators in Op::BakeString
* Fix up rebase issues on print-str port
* Add back in panic for X86 backend for unsupported instructions being lowered
* Fix target architecture
|
|
Instructions for pushing all caller-save registers and the flags so that
we can implement dump_insns.
|
|
|
|
Previously we were using a `Box<dyn FnOnce>` to support patching the
code when jumping to labels. We needed to do this because some of the
closures that were being used to patch needed to capture local variables
(on both X86 and ARM it was the type of condition for the conditional
jumps).
To get around that, we can instead use const generics since the
condition codes are always known at compile-time. This means that the
closures go from polymorphic to monomorphic, which means they can be
represented as an `fn` instead of a `Box<dyn FnOnce>`, which means they
can fall back to a plain function pointer. This simplifies the storage
of the `LabelRef` structs and should hopefully be a better default
going forward.
|
|
* More Arm64 lowering/backend work
* We now have encoding support for the LDR instruction for loading a PC-relative memory location
* You can now call add/adds/sub/subs with signed immediates, which switches appropriately based on sign
* We can now load immediates into registers appropriately, attempting to keep the minimal number of instructions:
* If it fits into 16 bytes, we use just a single movz.
* Else if it can be encoded into a bitmask immediate, we use a single mov.
* Otherwise we use a movz, a movk, and then optionally another one or two movks.
* Fixed a bunch of code to do with the Op::Load opcode.
* We now handle GC-offsets properly for Op::Load by skipping around them with a jump instruction. (This will be made better by constant pools in the future.)
* Op::Lea is doing what it's supposed to do now.
* Fixed a bug in the backend tests to do with not using the result of an Op::Add.
* Fix the remaining tests for Arm64
* Move split loads logic into each backend
|
|
* Get initial wiring up
* Split IncrCounter instruction
* Breakpoints in Arm64
* Support for ORR
* MOV instruction encodings
* Implement JmpOpnd and CRet
* Add ORN
* Add MVN
* PUSH, POP, CCALL for Arm64
* Some formatting and implement Op::Not for Arm64
* Consistent constants when working with the Arm64 SP
* Allow OR-ing values into the memory buffer
* Test lowering Arm64 ADD
* Emit unconditional jumps consistently in Arm64
* Begin emitting conditional jumps for A64
* Back out some labelref changes
* Remove label API that no longer exists
* Use a trait for the label encoders
* Encode nop
* Add in nops so jumps are the same width no matter what on Arm64
* Op::Jbe for CodePtr
* Pass src_addr and dst_addr instead of calculated offset to label refs
* Even more jump work for Arm64
* Fix up jumps to use consistent assertions
* Handle splitting Add, Sub, and Not insns for Arm64
* More Arm64 splits and various fixes
* PR feedback for Arm64 support
* Split up jumps and conditional jump logic
|
|
* LSL and LSR
* B.cond
* Move A64 files around to make more sense
* offset -> byte_offset for bcond
|
|
* Add TST instruction and AND/ANDS entrypoints for immediates
* TST/AND/ANDS for registers
* CMP instruction
|
|
|