| Age | Commit message (Collapse) | Author |
|
There are a lot of times when encoding AArch64 instructions that we
need to represent an integer value with a custom fixed width. For
example, the offset for a B instruction is 26 bits, so we store an
i32 on the instruction struct and then mask it when we encode.
We've been doing this masking everywhere, which has worked, but
it's getting a bit copy-pasty all over the place. This commit
centralizes that logic to make sure we stay consistent.
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
(https://github.com/Shopify/ruby/pull/430)
* Add --yjit-dump-disasm to dump every compiled code
* Just use get_option
* Carve out disasm_from_addr
* Avoid push_str with format!
* Share the logic through asm.compile
* This seems to negatively impact the compilation speed
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
* When we're storing an immediate 0 value at a memory address, we
can use STUR XZR, Xd instead of loading 0 into a register and
then storing that register.
* When we're moving 0 into an argument register, we can use
MOV Xd, XZR instead of loading the value into a register first.
* In the newarray instruction, we can skip looking at the stack at
all if the number of values we're using is 0.
Notes:
Merged: https://github.com/ruby/ruby/pull/6289
|
|
|
|
|
|
(https://github.com/Shopify/ruby/pull/395)
`YJIT.simulate_oom!` used to leave one byte of space in the code block,
so our test didn't expose a problem with asserting that the write
position is in bounds in `CodeBlock::set_pos`. We do the following when
patching code:
1. save current write position
2. seek to middle of the code block and patch
3. restore old write position
The bounds check fails on (3) when the code block is already filled up.
Leaving one byte of space also meant that when we write that byte, we
need to fill the entire code region with trapping instruction in
`VirtualMem`, which made the OOM tests unnecessarily slow.
Remove the incorrect bounds check and stop leaving space in the code
block when simulating OOM.
|
|
(https://github.com/Shopify/ruby/pull/382)
* LDR instruction for AArch64
* Split loads in arm64_split when memory address displacements do not fit
|
|
* Left and right shift for IR
* Update yjit/src/backend/x86_64/mod.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
|
|
(https://github.com/Shopify/ruby/pull/342)
|
|
(https://github.com/Shopify/ruby/pull/344)
* A64: Fix off by one in offset encoding for BL
It's relative to the address of the instruction not the end of it.
* A64: Fix off by one when encoding B
It's relative to the start of the instruction not the end.
* A64: Add some tests for boundary offsets
|
|
* Fix conditional jumps to label
* Bitmask immediates cannot be u64::MAX
|
|
* Better splitting for Op::Add, Op::Sub, and Op::Cmp
* Split stores if the displacement is too large
* Use a shifted immediate argument
* Split all places where shifted immediates are used
* Add more tests to the cirrus workflow
|
|
* Fix bitmask encoding to u32
* Fix splitting for Op::And to account for bitmask immediate
|
|
(https://github.com/Shopify/ruby/pull/335)
|
|
(https://github.com/Shopify/ruby/pull/329)
* Move to/from SP on AArch64
* Consolidate loads and stores
* Implement LDR post-index and LDR pre-index for AArch64
* Implement STR post-index and STR pre-index for AArch64
* Module entrypoints for LDR pre/post -index and STR pre/post -index
* Use STR (pre-index) and LDR (post-index) to implement push/pop
* Go back to using MOV for to/from SP
|
|
|
|
|
|
|
|
* CSEL on AArch64
* Implement various Op::CSel* instructions
|
|
* Port print_int to the new backend
* Tests for print_int and print_str
|
|
* ADR and ADRP for AArch64
* Implement Op::Jbe on X86
* Lera instruction
* Op::BakeString
* LeaPC -> LeaLabel
* Port print_str to the new backend
* Port print_value to the new backend
* Port print_ptr to the new backend
* Write null-terminators in Op::BakeString
* Fix up rebase issues on print-str port
* Add back in panic for X86 backend for unsupported instructions being lowered
* Fix target architecture
|
|
Instructions for pushing all caller-save registers and the flags so that
we can implement dump_insns.
|
|
(https://github.com/Shopify/ruby/pull/316)
|
|
|
|
Previously we were using a `Box<dyn FnOnce>` to support patching the
code when jumping to labels. We needed to do this because some of the
closures that were being used to patch needed to capture local variables
(on both X86 and ARM it was the type of condition for the conditional
jumps).
To get around that, we can instead use const generics since the
condition codes are always known at compile-time. This means that the
closures go from polymorphic to monomorphic, which means they can be
represented as an `fn` instead of a `Box<dyn FnOnce>`, which means they
can fall back to a plain function pointer. This simplifies the storage
of the `LabelRef` structs and should hopefully be a better default
going forward.
|
|
* More Arm64 lowering/backend work
* We now have encoding support for the LDR instruction for loading a PC-relative memory location
* You can now call add/adds/sub/subs with signed immediates, which switches appropriately based on sign
* We can now load immediates into registers appropriately, attempting to keep the minimal number of instructions:
* If it fits into 16 bytes, we use just a single movz.
* Else if it can be encoded into a bitmask immediate, we use a single mov.
* Otherwise we use a movz, a movk, and then optionally another one or two movks.
* Fixed a bunch of code to do with the Op::Load opcode.
* We now handle GC-offsets properly for Op::Load by skipping around them with a jump instruction. (This will be made better by constant pools in the future.)
* Op::Lea is doing what it's supposed to do now.
* Fixed a bug in the backend tests to do with not using the result of an Op::Add.
* Fix the remaining tests for Arm64
* Move split loads logic into each backend
|
|
* Get initial wiring up
* Split IncrCounter instruction
* Breakpoints in Arm64
* Support for ORR
* MOV instruction encodings
* Implement JmpOpnd and CRet
* Add ORN
* Add MVN
* PUSH, POP, CCALL for Arm64
* Some formatting and implement Op::Not for Arm64
* Consistent constants when working with the Arm64 SP
* Allow OR-ing values into the memory buffer
* Test lowering Arm64 ADD
* Emit unconditional jumps consistently in Arm64
* Begin emitting conditional jumps for A64
* Back out some labelref changes
* Remove label API that no longer exists
* Use a trait for the label encoders
* Encode nop
* Add in nops so jumps are the same width no matter what on Arm64
* Op::Jbe for CodePtr
* Pass src_addr and dst_addr instead of calculated offset to label refs
* Even more jump work for Arm64
* Fix up jumps to use consistent assertions
* Handle splitting Add, Sub, and Not insns for Arm64
* More Arm64 splits and various fixes
* PR feedback for Arm64 support
* Split up jumps and conditional jump logic
|
|
* LSL and LSR
* B.cond
* Move A64 files around to make more sense
* offset -> byte_offset for bcond
|
|
* Add TST instruction and AND/ANDS entrypoints for immediates
* TST/AND/ANDS for registers
* CMP instruction
|
|
|
|
|
|
|
|
* LDADDAL instruction
* STUR
* BL instruction
* Remove num_bits from imm and uimm
* Tests for imm_fits_bits and uimm_fits_bits
* Reorder arguments to LDADDAL
|
|
* MOVK instruction
* More tests for the A64 entrypoints
* Finish testing entrypoints
* MOVZ
* BR instruction
|
|
* Remove x86-64 dependency from codegen.rs
* Port over putnil and putobject
* Port over gen_leave()
* Complete port of gen_leave()
* Fix bug in x86 instruction splitting
|
|
|
|
* LDUR
* Fix up immediate masking
* Consume operands directly
* Consistency and cleanup
* More consistency and entrypoints
* Cleaner syntax for masks
* Cleaner shifting for encodings
|
|
|
|
* Initial setup for aarch64
* ADDS and SUBS
* ADD and SUB for immediates
* Revert moved code
* Documentation
* Rename Arm64* to A64*
* Comments on shift types
* Share sig_imm_size and unsig_imm_size
|
|
|
|
|
|
|
|
|
|
This commit makes YJIT allocate memory for generated code gradually as
needed. Previously, YJIT allocates all the memory it needs on boot in
one go, leading to higher than necessary resident set size (RSS) and
time spent on boot initializing the memory with a large memset().
Users should no longer need to search for a magic number to pass to
`--yjit-exec-mem` since physical memory consumption should now more
accurately reflect the requirement of the workload.
YJIT now reserves a range of addresses on boot. This region start out
with no access permission at all so buggy attempts to jump to the region
crashes like before this change. To get this hardening at finer
granularity than the page size, we fill each page with trapping
instructions when we first allocate physical memory for the page.
Most of the time applications don't need 256 MiB of executable code, so
allocating on-demand ends up doing less total work than before. Case in
point, a simple `ruby --yjit-call-threshold=1 -eitself` takes about
half as long after this change. In terms of memory consumption, here is
a table to give a rough summary of the impact:
| Peak RSS in MiB | -eitself example | railsbench once |
| :-------------: | ---------------: | --------------: |
| before | 265 | 377 |
| after | 11 | 143 |
| no YJIT | 10 | 101 |
A new module is introduced to handle allocation bookkeeping.
`CodePtr` is moved into the module since it has a close relationship
with the new `VirtualMemory` struct. This new interface has a slightly
smaller surface than before in that marking a region as writable is no
longer a public operation.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Follow up https://github.com/ruby/ruby/commit/0514d81
Rust YJIT requires Rust 1.60.0 or later. So, `extern crate` looks unnecessary
because it can use the following Rust 2018 edition feature:
https://doc.rust-lang.org/stable/edition-guide/rust-2018/path-changes.html#no-more-extern-crate
It passes the following tests.
```console
% cd yjit
% cargo test --features asm_comments,disasm
(snip)
test result: ok. 56 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
```
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
`rustc` performs in depth dead code analysis and issues warning
even for things like unused struct fields and unconstructed enum
variants. This was annoying for us during the port but hopefully
they are less of an issue now.
This patch enables all the unused warnings we disabled and address
all the warnings we previously ignored. Generally, the approach I've
taken is to use `cfg!` instead of using the `cfg` attribute and
to delete code where it makes sense. I've put `#[allow(unused)]`
on things we intentionally keep around for printf style debugging
and on items that are too annoying to keep warning-free in all
build configs.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This adopts most suggestions that rust-clippy is confident enough to
auto apply. The manual changes mostly fix manual if-lets and take
opportunities to use the `Default` trait on standard collections.
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/5853
|
|
is disabled (#5863)
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|