ruby.git - The Ruby Programming Language

Age	Commit message (Collapse)	Author
2024-10-07	YJIT: Add --yjit-mem-size option (#11810)	Takashi Kokubun
	* YJIT: Add --yjit-mem-size option * Improve --help * s/the region/this virtual memory region/ Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-09-30	Return an Iterator Instead of a Vector in `addrs_to_pages` Method (#11725)	whtsht
	* Returning an iterator instead of a vec * Avoid changing the meaning of end_page --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2024-07-08	YJIT: `dump-disasm`: Print comments and bytes in release builds	Alan Wu
	This change implements a fallback mode for the `--yjit-dump-disasm` development command-line option to make it usable in release builds. Previously, using the option with release builds of YJIT yielded only a warning asking the user to build with `--enable-yjit=dev`. While builds that use the `disasm` feature still give the best output, just having the comments is useful enough for many kinds of debugging. Having it usable in release builds is nice for new hackers, too, since this allows for tinkering without having to learn how to build YJIT in development mode. Sample output on A64: ``` # regenerate_branch # Insn: 0001 opt_send_without_block (stack_size: 1) # guard known object with singleton class 0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00 0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5 # RUBY_VM_CHECK_INTS(ec) 0x11f7e0050: 8b 02 42 b8 cb 07 01 35 # stack overflow check 0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54 # save PC to CFP 0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00 0x11f7e0073: f8 ab 82 00 91 ``` To ensure this feature doesn't incur too much cost when running without the `--yjit-dump-disasm` option, I checked that there is no significant impact to compile time and memory usage with the `compile_time_ns` and `yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The statistics summary and done with the `summary` function in R. Compile time, sample size of 60, lower is better: ``` Before After Min. :2.054e+09 Min. :2.028e+09 1st Qu.:2.069e+09 1st Qu.:2.044e+09 Median :2.081e+09 Median :2.060e+09 Mean :2.089e+09 Mean :2.066e+09 3rd Qu.:2.109e+09 3rd Qu.:2.085e+09 Max. :2.146e+09 Max. :2.144e+09 ``` Allocation size, sample size of 20, lower is better: ``` Before After Min. :21804742 Min. :21794082 1st Qu.:21826682 1st Qu.:21816282 Median :21844042 Median :21826814 Mean :21960664 Mean :22026291 3rd Qu.:21861228 3rd Qu.:22040439 Max. :22587426 Max. :22930614 ``` The `yjit_alloc_size` samples are noisy, but since the average increased by only 0.3%, and the median is lower, I feel safe saying that there is no significant change.
2024-07-02	YJIT: Use a special breakpoint address if one isn't explicitly supplied in ↵	Kevin Menard
	order to support natural line stepping. (#11083) Use a special breakpoint address if one isn't explicitly supplied in order to support natural line stepping. ARM64 will not increment the program counter (PC) upon hitting a breakpoint instruction. Consequently, stepping through code with a debugger ends up looping back to the breakpoint instruction. LLDB has a special breakpoint address of 0xf000 that will increment the PC and allow the debugger to work as expected. This change makes it possible to debug YJIT generated code on ARM64. More details at: https://discourse.llvm.org/t/stepping-over-a-brk-instruction-on-arm64/69766/8 Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-06-28	YJIT: Fix `cargo doc --document-private-items` warnings [ci skip]	Alan Wu
	Mostly putting angle brackets around links to follow markdown syntax.
2024-04-17	YJIT: A64: Use CBZ/CBNZ to check for zero	Alan Wu
	* YJIT: A64: Add CBZ and CBNZ encoding functions * YJIT: A64: Use CBZ/CBNZ to check for zero Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare and Branch on Zero for us to just do `cbz x0, #target`. This commit utilizes that and the related CBNZ instruction when appropriate. We check for zero most commonly in interrupt checks: ```diff # Insn: 0003 leave (stack_size: 1) # RUBY_VM_CHECK_INTS(ec) ldur w11, [x20, #0x20] -tst w11, w11 -b.ne #0x109002164 +cbnz w11, #0x1049021d0 ``` * fix copy paste error Co-authored-by: Randy Stauner <randy@r4s6.net> --------- Co-authored-by: Randy Stauner <randy@r4s6.net>
2024-04-02	YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible (#10402)	Alan Wu
	* YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible We were loading 1 into a register and then doing ADDS/SUBS previously. That was particularly bad since those come up in fixnum operations. ```diff # integer left shift with rhs=1 - mov x11, #1 - subs x11, x1, x11 + subs x11, x1, #1 lsl x12, x11, #1 asr x13, x12, #1 cmp x13, x11 - b.ne #0x106ab60f8 - mov x11, #1 - adds x12, x12, x11 + b.ne #0x10903a0f8 + adds x12, x12, #1 mov x1, x12 ``` Note that it's fine to cast between i64 and u64 since the bit pattern is preserved, and the add/sub themselves don't care about the signedness of the operands. CMP is just another mnemonic for SUBS. * YJIT: A64: Split asm.mul() with immediates properly There is in fact no MUL on A64 that takes an immediate, so this instruction was using the wrong split method. No current usages of this form in YJIT. --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-01-25	YJIT: Assert lea source operand type	Alan Wu

2023-12-25	Typofix under bootstraptest, spec and yjit directories	Hiroshi SHIBATA

2023-12-05	YJIT: Assert code pages are not partially in-bounds	Alan Wu
	Helps understand page switching
2023-12-05	YJIT: Simplify code page switching logic, remove an assert	Alan Wu
	We have received a report of `assert!( !cb.has_dropped_bytes())` in set_page() failing. The only explanation for this seems to be memory allocation failing in write_byte(). The if condition implies that `current_write_pos < dst_pos < mem_size`, which rules out failing to encode the relative jump. The has_capacity() assert above not tripping implies that we were in a place in the page where write_byte() did attempt to write the byte and potentially made a syscall in the process. Remove the assert, since memory allocation could fail. Also, return failure if the destination is outside of the code region to detect that out-of-memory situation quicker.
2023-11-10	YJIT: Fix `clippy::useless_vec` in a test	Alan Wu

2023-11-07	YJIT: Use u32 for CodePtr to save 4 bytes each	Alan Wu
	We've long had a size restriction on the code memory region such that a u32 could refer to everything. This commit capitalizes on this restriction by shrinking the size of `CodePtr` to be 4 bytes from 8. To derive a full raw pointer from a `CodePtr`, one needs a base pointer. Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The base pointer is readily available everywhere, except for in the case of the `jit_return` "branch". Generalize lea_label() to lea_jump_target() in the IR to delay deriving the `jit_return` address until `compile()`, when the base pointer is available. On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size` (58,397,765 to 57,742,248).
2023-10-18	YJIT: Add --yjit-perf (#8697)	Takashi Kokubun
	Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2023-09-13	YJIT: Skip adding past_page_bytes for past pages (#8433)	Takashi Kokubun
	YJIT: Skip adding past_pages_bytes for past pages Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-09-06	YJIT: Make compiled_* stats available by default (#8379)	Takashi Kokubun
	* YJIT: Make compiled_* stats available by default * Update comment about default counters [ci skip] Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2023-08-22	YJIT: x64: Split mem-to-mem Insn::Store like Insn::Mov	Alan Wu
	The ARM backend allows for this so let's make x64 consistent. Notes: Merged: https://github.com/ruby/ruby/pull/8263 Merged-By: XrXr
2023-08-18	YJIT: implement fast path for integer multiplication in opt_mult (#8204)	Maxime Chevalier-Boisvert
	* YJIT: implement fast path for integer multiplication in opt_mult * Update yjit/src/codegen.rs Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Implement mul with overflow checking on arm64 * Fix missing semicolon * Add arm splitting for lshift, rshift, urshift --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-08-11	YJIT: implement codegen for rb_int_lshift (#8201)	Maxime Chevalier-Boisvert
	* YJIT: implement codegen for rb_int_lshift * Update yjit/src/asm/x86_64/mod.rs Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-08-09	YJIT: implement imul instruction encoding in x86 assembler (#8191)	Maxime Chevalier-Boisvert
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-08-09	Implement MUL instruction for aarch64 (#8193)	Kevin Newton
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-08-04	YJIT: expand bitwise shift support in x86 assembler (#8174)	Maxime Chevalier-Boisvert
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-05-24	Add a newline at EOF [ci skip]	Nobuyoshi Nakada

2023-04-11	YJIT: Fix build on A64	Alan Wu
	Typo fix for the last commit (1432b37)
2023-04-11	YJIT: Fix a compilation warning in x86_64	Takashi Kokubun
	This is used only for arm64's cb.jmp_ptr_bytes().
2023-04-11	YJIT: Reduce paddings if --yjit-exec-mem-size <= 128 on arm64 (#7671)	Takashi Kokubun
	* YJIT: Reduce paddings if --yjit-exec-mem-size <= 128 on arm64 * YJIT: Define jmp_ptr_bytes on CodeBlock Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-04-05	YJIT: Count the number of actually written bytes (#7658)	Takashi Kokubun
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-03-29	YJIT: code_gc(): Assert self is inline to avoid other_cb()	Alan Wu
	The derived `&mut` from `other_cb()` overlapped with the parameter `ocb`. Use `cfg!()` instead of `#[cfg...]` to avoid unused warnings. Notes: Merged: https://github.com/ruby/ruby/pull/7611
2023-03-29	YJIT: Fix overlapping &mut in Assembler::code_gc()	Alan Wu
	Making overlapping `&mut`s triggers Undefined Bahavior. This function previously had them through `cb` and `ocb` aliasing with `self` or live references in the caller. To fix the overlap, take `ocb` as a parameter and don't use `get_inline_cb()` in the body of the function. Notes: Merged: https://github.com/ruby/ruby/pull/7611
2023-03-03	YJIT: Fix a cargo test warning on x86_64 (#7428)	Takashi Kokubun
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-03-02	YJIT: Delete stale `frozen_bytes` related code (#7423)	Alan Wu
	The code and comments in there have been disabled by comments for a long time. The issues that the counter used to solve are now solved more comprehensively by "runningness" [tracking][1] introduced by Code GC and [delayed deallocation][2]. Having a single counter doesn't fit our current model where code pages that could be touched or not are interleaved, anyway. Just delete the code. [1]: e7c71c6c9271b0c29f210769159090e17128e740 [2]: a0b0365e905e1ac51998ace7e6fc723406a2f157 Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-02-20	YJIT: Fix assertion for partially mapped last pages (#7337)	Takashi Kokubun
	Follows up [Bug #19400] Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2023-02-10	YJIT: add counters for polymorphic send and send with known class (#7288)	Maxime Chevalier-Boisvert
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-02-09	YJIT: Use the system page size when the code page size is too small (#7267)	Alan Wu
	Previously on ARM64 Linux systems that use 64 KiB pages (`CONFIG_ARM64_64K_PAGES=y`), YJIT was panicking on boot due to a failed assertion. The assertion was making sure that code GC can free the last code page that YJIT manages without freeing unrelated memory. YJIT prefers picking 16 KiB as the granularity at which to free code memory, but when the system can only free at 64 KiB granularity, that is not possible. The fix is to use the system page size as the code page size when the system page size is 64 KiB. Continue to use 16 KiB as the code page size on common systems that use 16/4 KiB pages. Add asserts to code_gc() and free_page() about code GC's assumptions. Fixes [Bug #19400] Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-02-02	Fix typos in YJIT [ci skip]	Alan Wu

2023-02-02	YJIT: other_cb is None in tests	Alan Wu
	Since the other cb is in CodegenGlobals, and we want Rust tests to be self-contained. Notes: Merged: https://github.com/ruby/ruby/pull/7227
2023-02-02	YJIT: Move CodegenGlobals::freed_pages into an Rc	Alan Wu
	This allows for supplying a freed_pages vec in Rust tests. We need it so we can test scenarios that occur after code GC. Notes: Merged: https://github.com/ruby/ruby/pull/7227
2023-01-18	Add stats so we can keep track of x86 rel32 vs register calls (#7142)	Maxime Chevalier-Boisvert
	* Add stats so we can keep track of x86 rel32 vs register calls To know if we get that "prime real estate" as Alan put it. * Fix bug pointed by Alan Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-01-12	Enable `clippy` checks for yjit in CI (#7093)	Ian Ker-Seymer
	* Add job to check clippy lints in CI * Address all remaining clippy lints * Check lints on arm64 as well * Apply latest clippy lints * Do not exit 0 on clippy warnings Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-01-12	Strip trailing spaces [ci skip]	Nobuyoshi Nakada

2023-01-10	YJIT: Fix a compilation warning with release build (#7092)	Takashi Kokubun
	warning: unused variable: `start_addr` --> ../yjit/src/asm/mod.rs:359:39 \| 359 \| pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) { \| ^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_start_addr` \| = note: `#[warn(unused_variables)]` on by default warning: unused variable: `end_addr` --> ../yjit/src/asm/mod.rs:359:60 \| 359 \| pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) { \| Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2023-01-09	YJIT: Remove old comments for regenerated branches (#7083)	Takashi Kokubun
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-12-05	YJIT: Remove --yjit-code-page-size (#6865)	Alan Wu
	Certain code page sizes don't work and can cause crashes, so having this value available as a command-line option is a bit dangerous. Remove it and turn it into a constant instead. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-12-01	YJIT: Respect destination num_bits on STUR (#6848)	Takashi Kokubun
	Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-12-01	YJIT: fix 32 and 16 bit register store (#6840)	Jemma Issroff
	* Fix 32 and 16 bit register store in YJIT Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com> * Remove an unnecessary diff * Reuse an rm_num_bits result * Use u16::MAX instead * Update the link Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Just use sturh for 16 bits Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-30	YJIT: Fix IseqPayload::pages memory bloat	Alan Wu
	HashSet::clear() doesn't deallocate the backing buffer and shrink the capacity. Replace with a 0-capcity set instead so we reclaim some memory each code GC. Notes: Merged: https://github.com/ruby/ruby/pull/6833
2022-11-23	YJIT: Use NonNull pointer for CodePtr (#6792)	Takashi Kokubun
	Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-23	Fix YJIT backend to account for unsigned int immediates (#6789)	Jemma Issroff
	YJIT: x86_64: Fix cmp with number where sign bit is set Before this commit, we were unconditionally treating unsigned ints as signed ints when counting the number of bits required for representing the immediate in machine code. When the size of the immediate matches the size of the other operand, no sign extension happens, so this was incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this issue. Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org> Co-Authored-By: Alan Wu <alanwu@ruby-lang.org> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: Alan Wu <alanwu@ruby-lang.org> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2022-11-18	32 bit comparison on shape id	Aaron Patterson
	This commit changes the shape id comparisons to use a 32 bit comparison rather than 64 bit. That means we don't need to load the shape id to a register on x86 machines. Given the following program: ```ruby class Foo def initialize @foo = 1 @bar = 1 end def read [@foo, @bar] end end foo = Foo.new foo.read foo.read foo.read foo.read foo.read puts RubyVM::YJIT.disasm(Foo.instance_method(:read)) ``` The machine code we generated _before_ this change is like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ====================== # getinstancevariable 0x559a18623023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x559a18623027: test al, 7 0x559a1862302a: jne 0x559a1862502d 0x559a18623030: cmp rax, 4 0x559a18623034: jbe 0x559a1862502d # guard shape, embedded, and T_OBJECT 0x559a1862303a: mov rcx, qword ptr [rax] 0x559a1862303d: movabs r11, 0xffff00000000201f 0x559a18623047: and rcx, r11 0x559a1862304a: movabs r11, 0xb000000002001 0x559a18623054: cmp rcx, r11 0x559a18623057: jne 0x559a18625046 0x559a1862305d: mov rax, qword ptr [rax + 0x18] 0x559a18623061: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x559a18623064: mov rax, qword ptr [r13 + 0x18] # guard shape, embedded, and T_OBJECT 0x559a18623068: mov rcx, qword ptr [rax] 0x559a1862306b: movabs r11, 0xffff00000000201f 0x559a18623075: and rcx, r11 0x559a18623078: movabs r11, 0xb000000002001 0x559a18623082: cmp rcx, r11 0x559a18623085: jne 0x559a18625099 0x559a1862308b: mov rax, qword ptr [rax + 0x20] 0x559a1862308f: mov qword ptr [rbx + 8], rax ``` After this change, it's like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ====================== # getinstancevariable 0x5560c986d023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5560c986d027: test al, 7 0x5560c986d02a: jne 0x5560c986f02d 0x5560c986d030: cmp rax, 4 0x5560c986d034: jbe 0x5560c986f02d # guard shape 0x5560c986d03a: cmp word ptr [rax + 6], 0x19 0x5560c986d03f: jne 0x5560c986f046 0x5560c986d045: mov rax, qword ptr [rax + 0x10] 0x5560c986d049: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x5560c986d04c: mov rax, qword ptr [r13 + 0x18] # guard shape 0x5560c986d050: cmp word ptr [rax + 6], 0x19 0x5560c986d055: jne 0x5560c986f099 0x5560c986d05b: mov rax, qword ptr [rax + 0x18] 0x5560c986d05f: mov qword ptr [rbx + 8], rax ``` The first ivar read is a bit more complex, but the second ivar read is much simpler. I think eventually we could teach the context about the shape, then emit only one shape guard. Notes: Merged: https://github.com/ruby/ruby/pull/6737
2022-11-15	YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)	Takashi Kokubun
	* YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> * Introduce heap_object_p * Leave original mov intact * Remove unneeded branches * Add a test for movabs Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>