summaryrefslogtreecommitdiff
path: root/yjit/src/backend/ir.rs
AgeCommit message (Collapse)Author
2022-11-23YJIT: Simplify Insn::CCall to obviate Target::FunPtr (#6793)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-22YJIT: Skip padding jumps to side exits on Arm (#6790)Takashi Kokubun
YJIT: Skip padding jumps to side exits Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-1832 bit comparison on shape idAaron Patterson
This commit changes the shape id comparisons to use a 32 bit comparison rather than 64 bit. That means we don't need to load the shape id to a register on x86 machines. Given the following program: ```ruby class Foo def initialize @foo = 1 @bar = 1 end def read [@foo, @bar] end end foo = Foo.new foo.read foo.read foo.read foo.read foo.read puts RubyVM::YJIT.disasm(Foo.instance_method(:read)) ``` The machine code we generated _before_ this change is like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ====================== # getinstancevariable 0x559a18623023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x559a18623027: test al, 7 0x559a1862302a: jne 0x559a1862502d 0x559a18623030: cmp rax, 4 0x559a18623034: jbe 0x559a1862502d # guard shape, embedded, and T_OBJECT 0x559a1862303a: mov rcx, qword ptr [rax] 0x559a1862303d: movabs r11, 0xffff00000000201f 0x559a18623047: and rcx, r11 0x559a1862304a: movabs r11, 0xb000000002001 0x559a18623054: cmp rcx, r11 0x559a18623057: jne 0x559a18625046 0x559a1862305d: mov rax, qword ptr [rax + 0x18] 0x559a18623061: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x559a18623064: mov rax, qword ptr [r13 + 0x18] # guard shape, embedded, and T_OBJECT 0x559a18623068: mov rcx, qword ptr [rax] 0x559a1862306b: movabs r11, 0xffff00000000201f 0x559a18623075: and rcx, r11 0x559a18623078: movabs r11, 0xb000000002001 0x559a18623082: cmp rcx, r11 0x559a18623085: jne 0x559a18625099 0x559a1862308b: mov rax, qword ptr [rax + 0x20] 0x559a1862308f: mov qword ptr [rbx + 8], rax ``` After this change, it's like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ====================== # getinstancevariable 0x5560c986d023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5560c986d027: test al, 7 0x5560c986d02a: jne 0x5560c986f02d 0x5560c986d030: cmp rax, 4 0x5560c986d034: jbe 0x5560c986f02d # guard shape 0x5560c986d03a: cmp word ptr [rax + 6], 0x19 0x5560c986d03f: jne 0x5560c986f046 0x5560c986d045: mov rax, qword ptr [rax + 0x10] 0x5560c986d049: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x5560c986d04c: mov rax, qword ptr [r13 + 0x18] # guard shape 0x5560c986d050: cmp word ptr [rax + 6], 0x19 0x5560c986d055: jne 0x5560c986f099 0x5560c986d05b: mov rax, qword ptr [rax + 0x18] 0x5560c986d05f: mov qword ptr [rbx + 8], rax ``` The first ivar read is a bit more complex, but the second ivar read is much simpler. I think eventually we could teach the context about the shape, then emit only one shape guard. Notes: Merged: https://github.com/ruby/ruby/pull/6737
2022-11-01YJIT: Visualize live ranges on register spill (#6651)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-19YJIT: Skip dumping code for the other cb on --yjit-dump-disasm (#6592)Takashi Kokubun
YJIT: Skip dumping code for the other cb on --yjit-dump-disasm Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-17YJIT: Allow --yjit-dump-disasm to dump into a file (#6552)Takashi Kokubun
* YJIT: Allow --yjit-dump-disasm to dump into a file * YJIT: Move IO implementation to disasm.rs * YJIT: More consistent naming Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-10-17YJIT: Interleave inline and outlined code blocks (#6460)Takashi Kokubun
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-10-13fixes more clippy warnings (#6543)Jimmy Miller
* fixes more clippy warnings * Fix x86 c_callable to have doc_strings Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-30A bunch of clippy auto fixes for yjit (#6476)Jimmy Miller
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-14YJIT: Add Opnd#with_num_bits to use only 8 bits (#6359)Takashi Kokubun
* YJIT: Add Opnd#sub_opnd to use only 8 bits * Add with_num_bits and let arm64_split use it * Add another assertion to with_num_bits * Use only with_num_bits Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-08Remove as many unnecessary moves as possible (#6342)v3_2_0_preview2Kevin Newton
This commit does a bunch of stuff to try to eliminate as many unnecessary mov instructions as possible. First, it introduces the Insn::LoadInto instruction. Previously when we needed a value to go into a specific register (like in Insn::CCall when we're putting values into the argument registers or in Insn::CRet when we're putting a value into the return register) we would first load the value and then mov it into the correct register. This resulted in a lot of duplicated work with short live ranges since they basically immediately we unnecessary. The new instruction accepts a destination and does not interact with the register allocator at all, making it much more efficient. We then use the new instruction when we're loading values into argument registers for AArch64 or X86_64, and when we're returning a value from AArch64. Notably we don't do it when we're returning a value from X86_64 because everything can be accomplished with a single mov anyway. A couple of unnecessary movs were also present because when we called the split_load_opnd function in a lot of split passes we were loading all registers and instruction outputs. We no longer do that. This commit also makes it so that UImm(0) passes through the Insn::Store split without attempting to be loaded, which allows it can take advantage of the zero register. So now instead of mov-ing 0 into a register and then calling store, it just stores XZR. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-01Let --yjit-dump-disasm=all dump ocb code as well (#6309)Takashi Kokubun
* Let --yjit-dump-disasm=all dump ocb code as well * Use an enum instead * Add a None Option to DumpDisasm (#444) * Add a None Option to DumpDisasm * Update yjit/src/asm/mod.rs Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> * Fix a build failure * Use only a single name * Only None will be a disabled case * Fix cargo test * Fix --yjit-dump-disasm=all to print outlined cb Co-authored-by: Jimmy Miller <jimmyhmiller@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-08-29Add --yjit-dump-disasm to dump every compiled code ↵Takashi Kokubun
(https://github.com/Shopify/ruby/pull/430) * Add --yjit-dump-disasm to dump every compiled code * Just use get_option * Carve out disasm_from_addr * Avoid push_str with format! * Share the logic through asm.compile * This seems to negatively impact the compilation speed Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Use shorter syntax for the same pattern ↵Alan Wu
(https://github.com/Shopify/ruby/pull/425) Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Better variable name, no must_use on ccall ↵Kevin Newton
(https://github.com/Shopify/ruby/pull/424) Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Instruction enum (https://github.com/Shopify/ruby/pull/423)Kevin Newton
* Remove references to explicit instruction parts Previously we would reference individual instruction fields manually. We can't do that with instructions that are enums, so this commit removes those references. As a side effect, we can remove the push_insn_parts() function from the assembler because we now explicitly push instruction structs every time. * Switch instructions to enum Instructions are now no longer a large struct with a bunch of optional fields. Instead they are an enum with individual shapes for the variants. In terms of size, the instruction struct was 120 bytes while the new instruction enum is 106 bytes. The bigger win however is that we're not allocating any vectors for instruction operands (except for CCall), which should help cut down on memory usage. Adding new instructions will be a little more complicated going forward, but every mission-critical function that needs to be touched will have an exhaustive match, so the compiler should guide any additions. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29More work toward instruction enum (https://github.com/Shopify/ruby/pull/421)Kevin Newton
* Operand iterators There are a couple of times when we're dealing with instructions that we need to iterate through their operands. At the moment this is relatively easy because there's an opnds field and we can work with it directly. When the instructions become enums, however, the shape of each variant will be different so we'll need an iterator to make sense of the shape. This commit introduces two new iterators that are created from an instruction. One iterates over references to each operand (for instances where they don't need to be mutable like updating live ranges) and one iterates over mutable references to each operand (for instances where you need to mutate them like loading values in arm64). Note that because iterators can't have generic items (i.e., be associated with lifetimes) the mutable iterator forces you to use the `while let Some` syntax as opposed to the for-loop like we did with instructions. This commit eliminates the last reference to insn.opnds, which is going to make it much easier to transition to an enum. * Consolidate output operand fetching Currently we always look at the .out field on instructions whenever we want to access the output operand. When the instructions become an enum, this is not going to be possible since the shape of the variants will be different. Instead, this commit introduces two functions on Insn: out_opnd() and out_opnd_mut(). These return an Option containing a reference to the output operand and a mutable reference to the output operand, respectively. This commit then uses those functions to replace all instances of accessing the output operand. For the most part this was straightforward; when we previously checked if it was Opnd::None we now check that it's None, when we assumed there was an output operand we now unwrap. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Fix a bus error on regenerate_branch (https://github.com/Shopify/ruby/pull/408)Takashi Kokubun
* Fix a bus error on regenerate_branch * Fix pad_size Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Even more prep for instruction enum (https://github.com/Shopify/ruby/pull/413)Kevin Newton
* Mutate in place for register allocation Currently we allocate a new instruction every time when we're doing register allocation by first splitting up the instruction into its component parts, mapping the operands and the output, and then pushing all of its parts onto the new assembler. Since we don't need the old instruction, we can mutate the existing one in place. While it's not that big of a win in and of itself, it matches much more closely to what we're going to have to do when we switch the instruction from being a struct to being an enum, because it's much easier for the instruction to modify itself since it knows its own shape than it is to push a new instruction that very closely matches. * Mutate in place for arm64 split When we're splitting instructions for the arm64 backend, we map all of the operands for a given instruction when it has an Opnd::Value. We can do this in place with the existing operand instead of allocating a new vector each time. This enables us to pattern match against the entire instruction instead of just the opcode, which is much closer to matching against an enum. * Match against entire instruction in arm64_emit Instead of matching against the opcode and then accessing all of the various fields on the instruction when emitting bytecode for arm64, we should instead match against the entire instruction. This makes it much closer to what's going to happen when we switch it over to being an enum. * Match against entire instruction in x86_64 backend When we're splitting or emitting code for x86_64, we should match against the entire instruction instead of matching against just the opcode. This gets us closer to matching against an enum instead of a struct. * Reuse instructions for arm64_split When we're splitting, the default behavior was previously to split up the instruction into its component parts and then reassemble them in a new instruction. Instead, we can reuse the existing instruction. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Build output operands explicitly (https://github.com/Shopify/ruby/pull/411)Kevin Newton
When we're pushing instructions onto the assembler, we previously would iterate through the instruction's operands and then assign the output operand to it through the push_insn function. This is easy when all instructions have a vector of operands, but is much more difficult when the shape differs in an enum. This commit changes it so that we explicitly define the output operand for each instruction before it gets pushed onto the assembler. This has the added benefit of changing the definition of push_insn to no longer require a mutable instruction. This paves the way to make the out field on the instructions an Option<Opnd> instead which is going to more accurately reflect the behavior we're going to have once we switch the instructions over to an enum instead of a struct.
2022-08-29Instruction builders for backend IR (https://github.com/Shopify/ruby/pull/410)Kevin Newton
Currently we use macros to define the shape of each of the instruction building methods. This works while all of the instructions share the same fields, but is really hard to get working when they're an enum with different shapes. This is an incremental step toward a bigger refactor of changing the Insn from a struct to an enum.
2022-08-29Fix issue with expandarray, add missing jl, enable tests ↵Maxime Chevalier-Boisvert
(https://github.com/Shopify/ruby/pull/409)
2022-08-29Op::Xor for backend IR (https://github.com/Shopify/ruby/pull/397)Kevin Newton
2022-08-29Iterator (https://github.com/Shopify/ruby/pull/372)Kevin Newton
* Iterator * Use the new iterator for the X86 backend split * Use iterator for reg alloc, remove forward pass * Fix up iterator usage on AArch64 * Update yjit/src/backend/ir.rs Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> * Various PR feedback for iterators for IR * Use a local mutable reference for a64_split * Move tests from ir.rs to tests.rs in backend * Fix x86 shift instructions live range calculation * Iterator * Use the new iterator for the X86 backend split * Fix up x86 iterator usage * Fix ARM iterator usage * Remove unintentionally duplicated tests
2022-08-29Left and right shift for IR (https://github.com/Shopify/ruby/pull/374)Kevin Newton
* Left and right shift for IR * Update yjit/src/backend/x86_64/mod.rs Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-08-29Opnd::Value fixes (https://github.com/Shopify/ruby/pull/354)Alan Wu
* Fix asm.load(VALUE) - `<VALUE as impl Into<Opnd>>` didn't track that the value is a value - `Iterator::map` doesn't evaluate the closure you give it until you call `collect`. Use a for loop instead so we put the gc offsets into the compiled block properly. * x64: Mov(mem, VALUE) should load the value first Tripped in codegen for putobject now that we are actually feeding `Opnd::Value` into the backend. * x64 split: Canonicallize VALUE loads * Update yjit/src/backend/x86_64/mod.rs
2022-08-29Fix live_ranges idx calculation (https://github.com/Shopify/ruby/pull/353)John Hawthorn
forward_pass adjusts the indexes of our opnds to reflect the new instructions as they are generated in the forward pass. However, we were using the old live_ranges array, for which the new indexes are incorrect. This caused us to previously generate an IR which contained unnecessary trivial load instructions (ex. mov rax, rax), because it was looking at the wrong lifespans. Presumably this could also cause bugs because the lifespan of the incorrectly considered operand idx could be short. We've added an assert which would have failed on the previous trivial case (but not necessarily all cases). Co-authored-by: Matthew Draper <matthew@trebex.net>
2022-08-29Convert getinstancevariable to new backend IR ↵Takashi Kokubun
(https://github.com/Shopify/ruby/pull/352) * Convert getinstancevariable to new backend IR * Support mem-based mem * Use more into() * Add tests for getivar * Just load obj_opnd to a register * Apply another into() * Flip the nil-out condition * Fix duplicated counts of side_exit
2022-08-29Binary OR instruction for the IR (https://github.com/Shopify/ruby/pull/355)Kevin Newton
2022-08-29Fix C call reg alloc bug reported by Noah & KokubunMaxime Chevalier-Boisvert
2022-08-29Push first pass at SSA IR sketchMaxime Chevalier-Boisvert
2022-08-29Minor cleanups (https://github.com/Shopify/ruby/pull/345)Alan Wu
* Move allocation into Assembler::pos_marker We wanted to do this to begin with but didn't because we were confused about the lifetime parameter. It's actually talking about the lifetime of the references that the closure captures. Since all of our usages capture no references (they use `move`), it's fine to put a `+ 'static` here. * Use optional token syntax for calling convention macro * Explicitly request C ABI on ARM It looks like the Rust calling convention for functions are the same as the C ABI for now and it's unlikely to change, but it's easy for us to be explicit here. I also tried saying `extern "aapcs"` but that unfortunately doesn't work.
2022-08-29Add LiveReg IR instruction to fix stats leave exit code ↵Alan Wu
(https://github.com/Shopify/ruby/pull/341) It allows for reserving a specific register and prevents the register allocator from clobbering it. Without this `./miniruby --yjit-stats --yjit-callthreshold=1 -e0` was crashing because the counter incrementing code was clobbering RAX incorrectly.
2022-08-29Refactor YJIT branches to use PosMarker ↵Maxime Chevalier-Boisvert
(https://github.com/Shopify/ruby/pull/333) * Refactor defer_compilation to use PosMarker * Port gen_direct_jump() to use PosMarker * Port gen_branch, branchunless * Port over gen_jump() * Port over branchif and branchnil * Fix use od record_boundary_patch_point in jump_to_next_insn
2022-08-29Fix dupn (https://github.com/Shopify/ruby/pull/330)Noah Gibbs
* get_dupn was allocating and throwing away an Assembler object instead of using the one passed in * Uncomment remaining tests in codegen.rs, which seem to work now
2022-08-29Implement PosMarker instruction (https://github.com/Shopify/ruby/pull/328)Maxime Chevalier-Boisvert
* Implement PosMarker instruction * Implement PosMarker in the arm backend * Make bindgen run only for clang image * Fix if-else in cirrus CI file * Add missing semicolon * Try removing trailing semicolon * Try to fix shell/YAML syntax Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2022-08-29AArch64 frames (https://github.com/Shopify/ruby/pull/324)Kevin Newton
2022-08-29Conditionals (https://github.com/Shopify/ruby/pull/323)Kevin Newton
* CSEL on AArch64 * Implement various Op::CSel* instructions
2022-08-29Port print_int to the new backend (https://github.com/Shopify/ruby/pull/321)Kevin Newton
* Port print_int to the new backend * Tests for print_int and print_str
2022-08-29Port print_str to new backend (https://github.com/Shopify/ruby/pull/318)Kevin Newton
* ADR and ADRP for AArch64 * Implement Op::Jbe on X86 * Lera instruction * Op::BakeString * LeaPC -> LeaLabel * Port print_str to the new backend * Port print_value to the new backend * Port print_ptr to the new backend * Write null-terminators in Op::BakeString * Fix up rebase issues on print-str port * Add back in panic for X86 backend for unsupported instructions being lowered * Fix target architecture
2022-08-29Port newhash, add tests for newhash, duphashMaxime Chevalier-Boisvert
2022-08-29Add extra assertion in new_label for KevinMaxime Chevalier-Boisvert
2022-08-29Exclude X0 (C_RET_REG) from allocatable registers on arm ↵Maxime Chevalier-Boisvert
(https://github.com/Shopify/ruby/pull/319) * Exclude X0 (C_RET_REG) from allocatable registers on arm * Add another small test snippett
2022-08-29Op::CPushAll and Op::CPopAll (https://github.com/Shopify/ruby/pull/317)Kevin Newton
Instructions for pushing all caller-save registers and the flags so that we can implement dump_insns.
2022-08-29Port over setlocal_wc0Maxime Chevalier-Boisvert
2022-08-29More Arm64 lowering/backend work (https://github.com/Shopify/ruby/pull/307)Kevin Newton
* More Arm64 lowering/backend work * We now have encoding support for the LDR instruction for loading a PC-relative memory location * You can now call add/adds/sub/subs with signed immediates, which switches appropriately based on sign * We can now load immediates into registers appropriately, attempting to keep the minimal number of instructions: * If it fits into 16 bytes, we use just a single movz. * Else if it can be encoded into a bitmask immediate, we use a single mov. * Otherwise we use a movz, a movk, and then optionally another one or two movks. * Fixed a bunch of code to do with the Op::Load opcode. * We now handle GC-offsets properly for Op::Load by skipping around them with a jump instruction. (This will be made better by constant pools in the future.) * Op::Lea is doing what it's supposed to do now. * Fixed a bug in the backend tests to do with not using the result of an Op::Add. * Fix the remaining tests for Arm64 * Move split loads logic into each backend
2022-08-29Add #[must_use] annotations to asm instructionsMaxime Chevalier-Boisvert
2022-08-29Get started on branchunless portMaxime Chevalier-Boisvert
2022-08-29Arm64 progress (https://github.com/Shopify/ruby/pull/304)Kevin Newton
* Get initial wiring up * Split IncrCounter instruction * Breakpoints in Arm64 * Support for ORR * MOV instruction encodings * Implement JmpOpnd and CRet * Add ORN * Add MVN * PUSH, POP, CCALL for Arm64 * Some formatting and implement Op::Not for Arm64 * Consistent constants when working with the Arm64 SP * Allow OR-ing values into the memory buffer * Test lowering Arm64 ADD * Emit unconditional jumps consistently in Arm64 * Begin emitting conditional jumps for A64 * Back out some labelref changes * Remove label API that no longer exists * Use a trait for the label encoders * Encode nop * Add in nops so jumps are the same width no matter what on Arm64 * Op::Jbe for CodePtr * Pass src_addr and dst_addr instead of calculated offset to label refs * Even more jump work for Arm64 * Fix up jumps to use consistent assertions * Handle splitting Add, Sub, and Not insns for Arm64 * More Arm64 splits and various fixes * PR feedback for Arm64 support * Split up jumps and conditional jump logic
2022-08-29Conscise IR disassembly (https://github.com/Shopify/ruby/pull/302)Alan Wu
The output from `dbg!` was too verbose. For `test_jo` the output went from 37 lines to 5 lines. The added index helps parsing InsnOut indicies. Samples: ``` test backend::tests::test_jo ... [src/backend/ir.rs:589] &self = Assembler 000 Load(Mem64[Reg(3) + 8]) -> Out64(0) 001 Sub(Out64(0), 1_i64) -> Out64(1) 002 Load(Out64(1)) -> Out64(2) 003 Add(Out64(2), Mem64[Reg(3)]) -> Out64(3) 004 Jo() target=CodePtr(CodePtr(0x5)) -> Out64(4) 005 Mov(Mem64[Reg(3)], Out64(3)) -> Out64(5) test backend::tests::test_reuse_reg ... [src/backend/ir.rs:589] &self = Assembler 000 Load(Mem64[Reg(3)]) -> Out64(0) 001 Add(Out64(0), 1_u64) -> Out64(1) 002 Load(Mem64[Reg(3) + 8]) -> Out64(2) 003 Add(Out64(2), 1_u64) -> Out64(3) 004 Add(Out64(1), 1_u64) -> Out64(4) 005 Add(Out64(1), Out64(4)) -> Out64(5) 006 Store(Mem64[Reg(3)], Out64(4)) -> Out64(6) 007 Store(Mem64[Reg(3) + 8], Out64(5)) -> Out64(7) ```