summaryrefslogtreecommitdiff
path: root/yjit/src/backend
AgeCommit message (Collapse)Author
2023-03-13YJIT: Merge add/sub/and/or/xor and mov on x86_64 (#7492)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-03-09YJIT: Merge x86_merge into x86_split (#7487)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2023-03-09YJIT: Optimize `cmp REG, 0` into `test REG, REG` (#7471)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-03-07YJIT: Add comments to peek and x86_mergeTakashi Kokubun
Notes: Merged: https://github.com/ruby/ruby/pull/7453
2023-03-07YJIT: Merge lea and mov on x86_64 when possibleTakashi Kokubun
Notes: Merged: https://github.com/ruby/ruby/pull/7453
2023-03-02YJIT: shrink stack_size/sp_offet to u8/i8 (#7426)Maxime Chevalier-Boisvert
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-02-22YJIT: Introduce Opnd::Stack (#7352)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-02-02Fix typos in YJIT [ci skip]Alan Wu
2023-02-02YJIT: ARM64: Fix long jumps to labelsAlan Wu
Previously, with Code GC, YJIT panicked while trying to emit a B.cond instruction with an offset that is not encodable in 19 bits. This only happens when the code in an assembler instance straddles two pages. To fix this, when we detect that a jump to a label can land on a different page, we switch to a fresh new page and regenerate all the code in the assembler there. We still assume that no one assembler has so much code that it wouldn't fit inside a fresh new page. [Bug #19385] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged: https://github.com/ruby/ruby/pull/7227
2023-02-02YJIT: ARM64: Move functions out of arm64_emit()Alan Wu
Notes: Merged: https://github.com/ruby/ruby/pull/7227
2023-01-19YJIT: Refactor side_exitsJimmy Miller
Notes: Merged: https://github.com/ruby/ruby/pull/7155
2023-01-03YJIT: Dump spill error to stderr [ci skip]Alan Wu
Since the panic message is in stderr, better to use the same stream in case stdout and stderr are not synced due to IO redirection.
2022-12-01YJIT: fix 32 and 16 bit register store (#6840)Jemma Issroff
* Fix 32 and 16 bit register store in YJIT Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com> * Remove an unnecessary diff * Reuse an rm_num_bits result * Use u16::MAX instead * Update the link Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Just use sturh for 16 bits Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-23YJIT: Simplify Insn::CCall to obviate Target::FunPtr (#6793)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-23Fix YJIT backend to account for unsigned int immediates (#6789)Jemma Issroff
YJIT: x86_64: Fix cmp with number where sign bit is set Before this commit, we were unconditionally treating unsigned ints as signed ints when counting the number of bits required for representing the immediate in machine code. When the size of the immediate matches the size of the other operand, no sign extension happens, so this was incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this issue. Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org> Co-Authored-By: Alan Wu <alanwu@ruby-lang.org> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: Alan Wu <alanwu@ruby-lang.org> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-22YJIT: Skip padding jumps to side exits on Arm (#6790)Takashi Kokubun
YJIT: Skip padding jumps to side exits Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-1832 bit comparison on shape idAaron Patterson
This commit changes the shape id comparisons to use a 32 bit comparison rather than 64 bit. That means we don't need to load the shape id to a register on x86 machines. Given the following program: ```ruby class Foo def initialize @foo = 1 @bar = 1 end def read [@foo, @bar] end end foo = Foo.new foo.read foo.read foo.read foo.read foo.read puts RubyVM::YJIT.disasm(Foo.instance_method(:read)) ``` The machine code we generated _before_ this change is like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ====================== # getinstancevariable 0x559a18623023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x559a18623027: test al, 7 0x559a1862302a: jne 0x559a1862502d 0x559a18623030: cmp rax, 4 0x559a18623034: jbe 0x559a1862502d # guard shape, embedded, and T_OBJECT 0x559a1862303a: mov rcx, qword ptr [rax] 0x559a1862303d: movabs r11, 0xffff00000000201f 0x559a18623047: and rcx, r11 0x559a1862304a: movabs r11, 0xb000000002001 0x559a18623054: cmp rcx, r11 0x559a18623057: jne 0x559a18625046 0x559a1862305d: mov rax, qword ptr [rax + 0x18] 0x559a18623061: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x559a18623064: mov rax, qword ptr [r13 + 0x18] # guard shape, embedded, and T_OBJECT 0x559a18623068: mov rcx, qword ptr [rax] 0x559a1862306b: movabs r11, 0xffff00000000201f 0x559a18623075: and rcx, r11 0x559a18623078: movabs r11, 0xb000000002001 0x559a18623082: cmp rcx, r11 0x559a18623085: jne 0x559a18625099 0x559a1862308b: mov rax, qword ptr [rax + 0x20] 0x559a1862308f: mov qword ptr [rbx + 8], rax ``` After this change, it's like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ====================== # getinstancevariable 0x5560c986d023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5560c986d027: test al, 7 0x5560c986d02a: jne 0x5560c986f02d 0x5560c986d030: cmp rax, 4 0x5560c986d034: jbe 0x5560c986f02d # guard shape 0x5560c986d03a: cmp word ptr [rax + 6], 0x19 0x5560c986d03f: jne 0x5560c986f046 0x5560c986d045: mov rax, qword ptr [rax + 0x10] 0x5560c986d049: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x5560c986d04c: mov rax, qword ptr [r13 + 0x18] # guard shape 0x5560c986d050: cmp word ptr [rax + 6], 0x19 0x5560c986d055: jne 0x5560c986f099 0x5560c986d05b: mov rax, qword ptr [rax + 0x18] 0x5560c986d05f: mov qword ptr [rbx + 8], rax ``` The first ivar read is a bit more complex, but the second ivar read is much simpler. I think eventually we could teach the context about the shape, then emit only one shape guard. Notes: Merged: https://github.com/ruby/ruby/pull/6737
2022-11-15YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)Takashi Kokubun
* YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> * Introduce heap_object_p * Leave original mov intact * Remove unneeded branches * Add a test for movabs Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-11-03YJIT: Stop incrementing write_pos if cb.has_dropped_bytes (#6664)Takashi Kokubun
Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-02YJIT: Support invokeblock (#6640)Takashi Kokubun
* YJIT: Support invokeblock * Update yjit/src/backend/arm64/mod.rs * Update yjit/src/codegen.rs Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-01YJIT: Visualize live ranges on register spill (#6651)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-26YJIT: Invalidate i-cache for the other cb on next_page (#6631)Takashi Kokubun
* YJIT: Invalidate i-cache for the other cb on next_page * YJIT: Invalidate only what's written by jmp_ptr * YJIT: Move the code to the arm64 backend Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-19YJIT: Skip dumping code for the other cb on --yjit-dump-disasm (#6592)Takashi Kokubun
YJIT: Skip dumping code for the other cb on --yjit-dump-disasm Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-19YJIT: fold the "asm_comments" feature into "disasm" (#6591)Alan Wu
Previously, enabling only "disasm" didn't actually build. Since these two features are closely related and we don't really use one without the other, let's simplify and merge the two features together. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-17YJIT: Allow --yjit-dump-disasm to dump into a file (#6552)Takashi Kokubun
* YJIT: Allow --yjit-dump-disasm to dump into a file * YJIT: Move IO implementation to disasm.rs * YJIT: More consistent naming Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-10-17YJIT: Interleave inline and outlined code blocks (#6460)Takashi Kokubun
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-10-14More clippy fixes (#6547)Jimmy Miller
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-13fixes more clippy warnings (#6543)Jimmy Miller
* fixes more clippy warnings * Fix x86 c_callable to have doc_strings Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-06YJIT: fix ARM64 bitmask encoding for 32 bit registers (#6503)Alan Wu
For logical instructions such as AND, there is a constraint that the N part of the bitmask immediate must be 0. We weren't respecting this condition previously and were silently emitting undefined instructions. Check for this condition in the assembler and tweak the backend to correctly detect whether a number could be encoded as an immediate in a 32 bit logical instruction. Due to the nature of the immediate encoding, the same numeric value encodes differently depending on the size of the register the instruction works on. We currently don't have cases where we use 32 bit immediates but we ran into this encoding issue during development. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-10-03Split cmp operations that aren't 32/64 bit for arm (#6484)Jimmy Miller
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-30A bunch of clippy auto fixes for yjit (#6476)Jimmy Miller
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-27Change IncrCounter lowering on AArch64 (#6455)Kevin Newton
* Change IncrCounter lowering on AArch64 Previously we were using LDADDAL which is not available on Graviton 1 chips. Instead, we're going to use an exclusive load/store group through the LDAXR/STLXR instructions. * Update yjit/src/backend/arm64/mod.rs Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-16Invalidate i-cache after link_labels (#6388)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-09-14YJIT: Add Opnd#with_num_bits to use only 8 bits (#6359)Takashi Kokubun
* YJIT: Add Opnd#sub_opnd to use only 8 bits * Add with_num_bits and let arm64_split use it * Add another assertion to with_num_bits * Use only with_num_bits Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-09YJIT: eliminate redundant mov in csel/cmov on x86 (#6348)Maxime Chevalier-Boisvert
* Eliminate redundant mov in csel/cmov. Translate mov reg,0 into xor * Fix x86 asm test * Remove dbg!() * xor optimization unsound because it resets flags Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-09Better offsets (#6315)Kevin Newton
* Introduce InstructionOffset for AArch64 There are a lot of instructions on AArch64 where we take an offset from PC in terms of the number of instructions. This is for loading a value relative to the PC or for jumping. We were usually accepting an A64Opnd or an i32. It can get confusing and inconsistent though because sometimes you would divide by 4 to get the number of instructions or multiply by 4 to get the number of bytes. This commit adds a struct that wraps an i32 in order to keep all of that logic in one place. It makes it much easier to read and reason about how these offsets are getting used. * Use b instruction when the offset fits on AArch64 Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-08Remove as many unnecessary moves as possible (#6342)v3_2_0_preview2Kevin Newton
This commit does a bunch of stuff to try to eliminate as many unnecessary mov instructions as possible. First, it introduces the Insn::LoadInto instruction. Previously when we needed a value to go into a specific register (like in Insn::CCall when we're putting values into the argument registers or in Insn::CRet when we're putting a value into the return register) we would first load the value and then mov it into the correct register. This resulted in a lot of duplicated work with short live ranges since they basically immediately we unnecessary. The new instruction accepts a destination and does not interact with the register allocator at all, making it much more efficient. We then use the new instruction when we're loading values into argument registers for AArch64 or X86_64, and when we're returning a value from AArch64. Notably we don't do it when we're returning a value from X86_64 because everything can be accomplished with a single mov anyway. A couple of unnecessary movs were also present because when we called the split_load_opnd function in a lot of split passes we were loading all registers and instruction outputs. We no longer do that. This commit also makes it so that UImm(0) passes through the Insn::Store split without attempting to be loaded, which allows it can take advantage of the zero register. So now instead of mov-ing 0 into a register and then calling store, it just stores XZR. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-01Allow comparing against 64-bit immediates on x86 (#6314)Kevin Newton
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-09-01Stop using a callee-saved register for scratch0 on aarch64 (#6312)Takashi Kokubun
[Bug #18985] * Callee-save x22 for aarch64 * Just use a caller-saved register * Update yjit/src/backend/arm64/mod.rs Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-09-01Let --yjit-dump-disasm=all dump ocb code as well (#6309)Takashi Kokubun
* Let --yjit-dump-disasm=all dump ocb code as well * Use an enum instead * Add a None Option to DumpDisasm (#444) * Add a None Option to DumpDisasm * Update yjit/src/asm/mod.rs Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> * Fix a build failure * Use only a single name * Only None will be a disabled case * Fix cargo test * Fix --yjit-dump-disasm=all to print outlined cb Co-authored-by: Jimmy Miller <jimmyhmiller@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-08-31Better b.cond usage on AArch64 (#6305)Kevin Newton
* Better b.cond usage on AArch64 When we're lowering a conditional jump, we previously had a bit of a complicated setup where we could emit a conditional jump to skip over a jump that was the next instruction, and then write out the destination and use a branch register. Now instead we use the b.cond instruction if our offset fits (not common, but not unused either) and if it doesn't we write out an inverse condition to jump past loading the destination and branching directly. * Added an inverse fn for Condition (#443) Prevents the need to pass two params and potentially reduces errors. Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan> Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-08-30Skip linking rb_yjit_icache_invalidate on cargo testTakashi Kokubun
Co-authored-by: Kevin Newton <kddnewton@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/6304
2022-08-29A64: Only clear icache when writing out new code ↵Alan Wu
(https://github.com/Shopify/ruby/pull/442) Previously we cleared the cache for all the code in the system when we flip memory protection, which was prohibitively expensive since the operation is not constant time. Instead, only clear the cache for the memory region of newly written code when we write out new code. This brings the runtime for the 30k_if_else test down to about 6 seconds from the previous 45 seconds on my laptop. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Remove ir_ssa.rs as we aren't using it and it's now outdatedMaxime Chevalier-Boisvert
Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Add --yjit-dump-disasm to dump every compiled code ↵Takashi Kokubun
(https://github.com/Shopify/ruby/pull/430) * Add --yjit-dump-disasm to dump every compiled code * Just use get_option * Carve out disasm_from_addr * Avoid push_str with format! * Share the logic through asm.compile * This seems to negatively impact the compilation speed Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Various AArch64 optimizations (https://github.com/Shopify/ruby/pull/433)Kevin Newton
* When we're storing an immediate 0 value at a memory address, we can use STUR XZR, Xd instead of loading 0 into a register and then storing that register. * When we're moving 0 into an argument register, we can use MOV Xd, XZR instead of loading the value into a register first. * In the newarray instruction, we can skip looking at the stack at all if the number of values we're using is 0. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Use shorter syntax for the same pattern ↵Alan Wu
(https://github.com/Shopify/ruby/pull/425) Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Better variable name, no must_use on ccall ↵Kevin Newton
(https://github.com/Shopify/ruby/pull/424) Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29Instruction enum (https://github.com/Shopify/ruby/pull/423)Kevin Newton
* Remove references to explicit instruction parts Previously we would reference individual instruction fields manually. We can't do that with instructions that are enums, so this commit removes those references. As a side effect, we can remove the push_insn_parts() function from the assembler because we now explicitly push instruction structs every time. * Switch instructions to enum Instructions are now no longer a large struct with a bunch of optional fields. Instead they are an enum with individual shapes for the variants. In terms of size, the instruction struct was 120 bytes while the new instruction enum is 106 bytes. The bigger win however is that we're not allocating any vectors for instruction operands (except for CCall), which should help cut down on memory usage. Adding new instructions will be a little more complicated going forward, but every mission-critical function that needs to be touched will have an exhaustive match, so the compiler should guide any additions. Notes: Merged: https://github.com/ruby/ruby/pull/6289
2022-08-29More work toward instruction enum (https://github.com/Shopify/ruby/pull/421)Kevin Newton
* Operand iterators There are a couple of times when we're dealing with instructions that we need to iterate through their operands. At the moment this is relatively easy because there's an opnds field and we can work with it directly. When the instructions become enums, however, the shape of each variant will be different so we'll need an iterator to make sense of the shape. This commit introduces two new iterators that are created from an instruction. One iterates over references to each operand (for instances where they don't need to be mutable like updating live ranges) and one iterates over mutable references to each operand (for instances where you need to mutate them like loading values in arm64). Note that because iterators can't have generic items (i.e., be associated with lifetimes) the mutable iterator forces you to use the `while let Some` syntax as opposed to the for-loop like we did with instructions. This commit eliminates the last reference to insn.opnds, which is going to make it much easier to transition to an enum. * Consolidate output operand fetching Currently we always look at the .out field on instructions whenever we want to access the output operand. When the instructions become an enum, this is not going to be possible since the shape of the variants will be different. Instead, this commit introduces two functions on Insn: out_opnd() and out_opnd_mut(). These return an Option containing a reference to the output operand and a mutable reference to the output operand, respectively. This commit then uses those functions to replace all instances of accessing the output operand. For the most part this was straightforward; when we previously checked if it was Opnd::None we now check that it's None, when we assumed there was an output operand we now unwrap. Notes: Merged: https://github.com/ruby/ruby/pull/6289