summaryrefslogtreecommitdiff
path: root/yjit/src/stats.rs
AgeCommit message (Collapse)Author
2025-12-16merge revision(s) 9168cad4d63a5d281d443bde4edea6be213b0b25: [Backport #21266]Takashi Kokubun
[PATCH] YJIT: Bail out if proc would be stored above stack top Fixes [Bug #21266].
2025-12-01YJIT: Abort expandarray optimization if method_missing is definedRandy Stauner
Fixes: [Bug #21707] [AW: rewrote comments] Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
2024-12-13YJIT: Speculate block arg for `c_func_method(&nil)` calls (#12326)Alan Wu
A good amount of call sites always pass nil as block argument, but the nil doesn't show up in the context. Put a runtime guard for those cases to handle it. Particular relevant for the `ruby-lsp` benchmark in `yjit-bench`. Up to a 2% speedup across headline benchmarks. Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: Kevin Menard <kevin@nirvdrum.com> Co-authored-by: Randy Stauner <randy.stauner@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-12-04YJIT: track time since initialization (#12263)Maxime Chevalier-Boisvert
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-20YJIT: Make compilation_failure a default stat (#12128)Alan Wu
It's good to monitor compilation failures. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-14YJIT: Specialize String#dup (#12090)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-14YJIT: Specialize Integer#pred (#12082)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-13YJIT: Add inline_block_count stat (#12081)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-13YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069)Randy Stauner
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments String#[] is in the top few C calls of several YJIT benchmarks: liquid-compile rubocop mail sudoku This speeds up these benchmarks by 1-2%. * YJIT: Try harder to get type info for `String#[]` In the large generated code of the mail gem the context doesn't have the type info. In that case if we peek at the stack and add a guard we can still apply the specialization and it speeds up the mail benchmark by 5%. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-10-23YJIT: Check when gen_branch() failsAlan Wu
We got some core dumps in the wild where a PendingBranch had everything as None, leading to a panic unwrapping in PendingBranch::into_branch(). This happened while compiling a `branchif`. It seems that the only way this can happen is when core::gen_branch() fails, but not due to OOM. We wouldn't have reach into_branch() when OOM, and the only way to not leave markers that would've set the branch's start_addr to some value in gen_branch() is for set_target() to fail, causing an early return. Unfortunately, it's hard to tell the exact sequence of events that led to this situation, but regardless, the dumps show us that we should check for errors in gen_branch(). Because gen_branch() is used deep in the stack during compilation (e.g. guard_known_class() -> jit_chain_guard() -> gen_branch()), it'd be bad for compile speed to propagate the error everywhere, not to mention the massive patch required. Opt for a flag checked near the end of compilation. Notes: Merged: https://github.com/ruby/ruby/pull/11938 Merged-By: XrXr
2024-10-07YJIT: Add --yjit-mem-size option (#11810)Takashi Kokubun
* YJIT: Add --yjit-mem-size option * Improve --help * s/the region/this virtual memory region/ Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-09-25YJIT: Cache Context decoding (#11680)Takashi Kokubun
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-09-17YJIT: Accept key for runtime_stats to return only that stat (#11536)Randy Stauner
Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-08-27YJIT: Encode doubles to VALUE objects and move stat generation to rust (#11388)Randy Stauner
* YJIT: Encode doubles to VALUE objects and move stat generation to rust Stats that can now be generated from rust have been moved there. * Move object_shape_count call for runtime_stats to rust This reduces the ruby method to a single primitive. * Change hash_aset_usize from macro to function Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-08-02YJIT: Enhance the `String#<<` method substitution to handle integer ↵Kevin Menard
codepoint values. (#11032) * Document why we need to explicitly spill registers. * Simplify passing a byte value to `str_buf_cat`. * YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. * YJIT: Move runtime type check into YJIT. Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-06-18Optimized forwarding callers and calleesAaron Patterson
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(*a) = a def foo(...) list = [1, 2] bar(*list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | 5| delegatee(...) # CI2 (FORWARDING) | 6| end | 7| | 8| def caller | -> 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 -> 4| # | CI1 (argc: 2) 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | CI1 (argc: 2) -> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | self 9| delegator(1, 2) # CI1 (argc: 2) | 1 10| end | 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-13YJIT: Delete otherwise-empty defer_compilation() blocksAlan Wu
Calls to defer_compilation() leave behind a stub and a `struct Block` that we retain. If the block is empty, it only exits to hold the `struct Branch` that the stub needs. This patch transplants the branch out of the empty block into the newly generated block when the defer_compilation() stub is hit, and deletes the empty block to save memory. To assist the transplantation, `Block::outgoing` is now a `MutableBranchList`, and `Branch::Block` now in a `Cell`. These types don't incur a size cost. On the `lobsters` benchmark, `yjit_alloc_size` is roughly 98% of what it was before the change. Co-authored-by: Kevin Menard <kevin.menard@shopify.com> Co-authored-by: Randy Stauner <randy@r4s6.net> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-06-12YJIT: add context cache hits stat (#10979)Maxime Chevalier-Boisvert
* YJIT: add context cache hits stat This stat should make more sense when it comes to interpreting the effectiveness of the cache on large deployed apps.
2024-06-11YJIT: Make num_contexts_encoded a default counterTakashi Kokubun
2024-06-11YJIT: add context cache size stat, lazily allocate cacheMaxime Chevalier-Boisvert
* YJIT: add context cache size stat * Allocate the context cache in a box so CRuby doesn't pay overhead * Add an extra debug assertion
2024-06-07YJIT: implement variable-length context encoding scheme (#10888)Maxime Chevalier-Boisvert
* Implement BitVector data structure for variable-length context encoding * Rename method to make intent clearer * Rename write_uint => push_uint to make intent clearer * Implement debug trait for BitVector * Fix bug in BitVector::read_uint_at(), enable more tests * Add one more test for good measure * Start sketching Context::encode() * Progress on variable length context encoding * Add tests. Fix bug. * Encode stack state * Add comments. Try to estimate context encoding size. * More compact encoding for stack size * Commit before rebase * Change Context::encode() to take a BitVector as input * Refactor BitVector::read_uint(), add helper read functions * Implement Context::decode() function. Add test. * Fix bug, add tests * Rename methods * Add Context::encode() and decode() methods using global data * Make encode and decode methods use u32 indices * Refactor YJIT to use variable-length context encoding * Tag functions as allow unused * Add a simple caching mechanism and stats for bytes per context etc * Add comments, fix formatting * Grow vector of bytes by 1.2x instead of 2x * Add debug assert to check round-trip encoding-decoding * Take some rustfmt formatting * Add decoded_from field to Context to reuse previous encodings * Remove olde context stats * Re-add stack_size assert * Disable decoded_from optimization for now
2024-05-28YJIT: limit size of call count stats dict (#10858)Maxime Chevalier-Boisvert
* YJIT: limit size of call count stats dict Someone reported that logs were getting bloated because the ISEQ and C call count dicts were huge, since they include all of the call sites. I wrote code on the Rust size to limit the size of the dict to avoid this problem. The size limit is hardcoded at 20, but I figure this is probably fine? * Fix bug reported by Kokubun.
2024-05-06YJIT: Fix comment and counter in rb_yjit_invalidate_ep_is_bp() (#10722)Alan Wu
`mem::take` substitutes an empty instance which makes `jit.ep_is_bp()` return false.
2024-04-03YJIT: Suppress warn(static_mut_refs) (#10440)Takashi Kokubun
2024-03-28YJIT: add iseq_alloc_count to stats (#10398)Maxime Chevalier-Boisvert
* YJIT: add iseq_alloc_count to stats * Remove an empty line --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2024-03-25YJIT: Propagate Array, Hash, and String classes (#10323)Takashi Kokubun
2024-03-18YJIT: Support arity=-2 cfuncs (#10268)Alan Wu
This type of cfuncs shows up as consume a lot of cycles in profiles of the lobsters benchmark, even though in the stats they don't happen that frequently. Might be a bug in the profiling, but these calls are not too bad to support, so might as well do it. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-03-13YJIT: Fallback cfunc varg splat for ruby2_keywords (#10226)Takashi Kokubun
2024-03-06YJIT: String#getbyte codegen (#10188)Maxime Chevalier-Boisvert
* WIP getbyte implementation * WIP String#getbyte implementation * Fix whitespace in stats.rs * fix? * Fix whitespace, add comment --------- Co-authored-by: Aaron Patterson <aaron.patterson@shopify.com>
2024-02-27YJIT: Support splat with C methods with -1 arityAlan Wu
Usually we deal with splats by speculating that they're of a specific size. In this case, the C method takes a pointer and a length, so we can support changing sizes just fine.
2024-02-23YJIT: Lazily push a frame for specialized C funcs (#10080)Takashi Kokubun
* YJIT: Lazily push a frame for specialized C funcs Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> * Fix a comment on pc_to_cfunc * Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame * Rename it to jit_prepare_lazy_frame_call * Fix a typo * Optimize String#getbyte as well * Optimize String#byteslice as well --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-02-22YJIT: Optimize attr_writer (#9986)Takashi Kokubun
* YJIT: Optimize attr_writer * Comment about StackOpnd vs SelfOpnd
2024-02-20YJIT: Support `**nil` for cfuncsAlan Wu
Similar to the iseq call support. Fairly straight forward.
2024-02-16YJIT: Remove unused countersAlan Wu
2024-02-16YJIT: Support `**nil`Alan Wu
This adds YJIT support for VM_CALL_KW_SPLAT with nil, specifically for when we already know from the context that it's done with a nil. This is enough to support forwarding with `...` when there no keyword arguments are present. Amend the kw_rest support to propagate the type of the parameter to help with this. Test interactions with splat, since the splat array sits lower on the stack when a kw_splat argument is present.
2024-02-15YJIT: Pass nil to anonymous kwrest when empty (#9972)Alan Wu
This is the same optimization as e4272fd29 ("Avoid allocation when passing no keywords to anonymous kwrest methods") but for YJIT. For anonymous kwrest parameters, nil is just as good as an empty hash. On the usage side, update `splatkw` to handle `nil` with a leaner path.
2024-02-14YJIT: Simplify Kernel#send guards and admit more cases (#9956)Alan Wu
Previously, our compile time check rejected dynamic symbols (e.g. what String#to_sym could return) even though we could handle them just fine. The runtime guards for the type of method name was also overly restrictive and didn't accept dynamic symbols. Fold the type check into the rb_get_symbol_id() and take advantage of the guard already checking for 0. This also avoids generating the same call twice in case the same method name is presented as different types.
2024-02-12YJIT: Add support for `**kwrest` parametersAlan Wu
Now that `...` uses `**kwrest` instead of regular splat and ruby2keywords, we need to support these type of methods to support `...` well.
2024-02-09YJIT: Add top ISEQ call counts to --yjit-stats (#9906)Takashi Kokubun
2024-02-09YJIT: Fallback megamorphic opt_case_dispatch (#9894)Takashi Kokubun
2024-02-08YJIT: Allow tracing a counted exit (#9890)Takashi Kokubun
* YJIT: Allow tracing a counted exit * Avoid clobbering caller-saved registers
2024-02-08YJIT: Report invalidation counts in non-stats mode (#9878)John Hawthorn
The `invalidation_count` and `invalidate_*` counters are all incremented using `incr_counter!` without a guard on stats mode, so they can be made always available. This could be to helpful in investigating where, how often, and what types of invalidations are occurring in a production system.
2024-02-05YJIT: No need to reject splat+zsuperAlan Wu
There is nothing special about argument handling when it comes to zsuper if you look around in the VM. Everything passes removing these fallback reasons. It was ~16% on `railsbench`.
2024-01-31YJIT: Add codegen for Float arithmetics (#9774)Takashi Kokubun
* YJIT: Add codegen for Float arithmetics * Add Flonum and Fixnum tests
2024-01-30YJIT: Specialize splatkw on T_HASH (#9764)Takashi Kokubun
* YJIT: Specialize splatkw on T_HASH * Fix a typo Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Fix a few more comments --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-01-25YJIT: Add a counter for invokebuiltin exits (#9696)Takashi Kokubun
2024-01-24YJIT: Avoid leaks by skipping objects with a singleton classAlan Wu
For receiver with a singleton class, there are multiple vectors YJIT can end up retaining the object. There is a path in jit_guard_known_klass() that bakes the receiver into the code, and the object could also be kept alive indirectly through a path starting at the CME object baked into the code. To avoid these leaks, avoid compiling calls on objects with a singleton class. See: https://github.com/Shopify/ruby/issues/552 [Bug #20209]
2024-01-23YJIT: Fix ruby2_keywords splat+rest and drop bogus checksAlan Wu
YJIT didn't guard for ruby2_keywords hash in case of splat calls that land in methods with a rest parameter, creating incorrect results. The compile-time checks didn't correspond to any actual effects of ruby2_keywords, so it was masking this bug and YJIT was needlessly refusing to compile some code. About 16% of fallback reasons in `lobsters` was due to the ISeq check. We already handle the tagging part with exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard for all splat cases. Note for backporting: You also need 7f51959ff1. [Bug #20195]
2024-01-23YJIT: Allow inlining ISEQ calls with a block (#9622)Takashi Kokubun
* YJIT: Allow inlining ISEQ calls with a block * Leave a TODO comment about u16 inline_block
2024-01-22YJIT: Drop extra arguments passed by yield (#9596)Alan Wu
Support dropping extra arguments passed by `yield` in blocks. For example `10.times { work }` drops the count argument. This is common enough that it's about 3% of fallback reasons in `lobsters`. Only support simple cases where the surplus arguments are at the top of the stack, that way they just need to be popped, which takes no work.