| Age | Commit message (Collapse) | Author |
|
[PATCH] YJIT: Bail out if proc would be stored above stack top
Fixes [Bug #21266].
|
|
Fixes: [Bug #21707]
[AW: rewrote comments]
Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
|
|
A good amount of call sites always pass nil as block argument, but the
nil doesn't show up in the context. Put a runtime guard for those
cases to handle it. Particular relevant for the `ruby-lsp` benchmark in
`yjit-bench`. Up to a 2% speedup across headline benchmarks.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Kevin Menard <kevin@nirvdrum.com>
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
It's good to monitor compilation failures.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments
String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku
This speeds up these benchmarks by 1-2%.
* YJIT: Try harder to get type info for `String#[]`
In the large generated code of the mail gem the context doesn't have
the type info. In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
We got some core dumps in the wild where a PendingBranch had everything
as None, leading to a panic unwrapping in PendingBranch::into_branch().
This happened while compiling a `branchif`.
It seems that the only way this can happen is when core::gen_branch()
fails, but not due to OOM. We wouldn't have reach into_branch() when
OOM, and the only way to not leave markers that would've set the
branch's start_addr to some value in gen_branch() is for set_target() to
fail, causing an early return.
Unfortunately, it's hard to tell the exact sequence of events that led
to this situation, but regardless, the dumps show us that we should
check for errors in gen_branch().
Because gen_branch() is used deep in the stack during compilation (e.g.
guard_known_class() -> jit_chain_guard() -> gen_branch()), it'd be bad
for compile speed to propagate the error everywhere, not to mention the
massive patch required. Opt for a flag checked near the end of
compilation.
Notes:
Merged: https://github.com/ruby/ruby/pull/11938
Merged-By: XrXr
|
|
* YJIT: Add --yjit-mem-size option
* Improve --help
* s/the region/this virtual memory region/
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Encode doubles to VALUE objects and move stat generation to rust
Stats that can now be generated from rust have been moved there.
* Move object_shape_count call for runtime_stats to rust
This reduces the ruby method to a single primitive.
* Change hash_aset_usize from macro to function
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
codepoint values. (#11032)
* Document why we need to explicitly spill registers.
* Simplify passing a byte value to `str_buf_cat`.
* YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values.
* YJIT: Move runtime type check into YJIT.
Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
|
|
Calls to defer_compilation() leave behind a stub and a `struct Block`
that we retain. If the block is empty, it only exits to hold the
`struct Branch` that the stub needs.
This patch transplants the branch out of the empty block into the newly
generated block when the defer_compilation() stub is hit, and deletes
the empty block to save memory.
To assist the transplantation, `Block::outgoing` is now a
`MutableBranchList`, and `Branch::Block` now in a `Cell`. These types
don't incur a size cost.
On the `lobsters` benchmark, `yjit_alloc_size` is roughly 98% of what
it was before the change.
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Randy Stauner <randy@r4s6.net>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
* YJIT: add context cache hits stat
This stat should make more sense when it comes to interpreting
the effectiveness of the cache on large deployed apps.
|
|
|
|
* YJIT: add context cache size stat
* Allocate the context cache in a box so CRuby doesn't pay overhead
* Add an extra debug assertion
|
|
* Implement BitVector data structure for variable-length context encoding
* Rename method to make intent clearer
* Rename write_uint => push_uint to make intent clearer
* Implement debug trait for BitVector
* Fix bug in BitVector::read_uint_at(), enable more tests
* Add one more test for good measure
* Start sketching Context::encode()
* Progress on variable length context encoding
* Add tests. Fix bug.
* Encode stack state
* Add comments. Try to estimate context encoding size.
* More compact encoding for stack size
* Commit before rebase
* Change Context::encode() to take a BitVector as input
* Refactor BitVector::read_uint(), add helper read functions
* Implement Context::decode() function. Add test.
* Fix bug, add tests
* Rename methods
* Add Context::encode() and decode() methods using global data
* Make encode and decode methods use u32 indices
* Refactor YJIT to use variable-length context encoding
* Tag functions as allow unused
* Add a simple caching mechanism and stats for bytes per context etc
* Add comments, fix formatting
* Grow vector of bytes by 1.2x instead of 2x
* Add debug assert to check round-trip encoding-decoding
* Take some rustfmt formatting
* Add decoded_from field to Context to reuse previous encodings
* Remove olde context stats
* Re-add stack_size assert
* Disable decoded_from optimization for now
|
|
* YJIT: limit size of call count stats dict
Someone reported that logs were getting bloated because the
ISEQ and C call count dicts were huge, since they include all
of the call sites. I wrote code on the Rust size to limit the
size of the dict to avoid this problem. The size limit is
hardcoded at 20, but I figure this is probably fine?
* Fix bug reported by Kokubun.
|
|
`mem::take` substitutes an empty instance which makes `jit.ep_is_bp()`
return false.
|
|
|
|
* YJIT: add iseq_alloc_count to stats
* Remove an empty line
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
|
|
|
|
This type of cfuncs shows up as consume a lot of cycles in profiles of
the lobsters benchmark, even though in the stats they don't happen that
frequently. Might be a bug in the profiling, but these calls are not
too bad to support, so might as well do it.
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
|
|
* WIP getbyte implementation
* WIP String#getbyte implementation
* Fix whitespace in stats.rs
* fix?
* Fix whitespace, add comment
---------
Co-authored-by: Aaron Patterson <aaron.patterson@shopify.com>
|
|
Usually we deal with splats by speculating that they're of a specific
size. In this case, the C method takes a pointer and a length, so
we can support changing sizes just fine.
|
|
* YJIT: Lazily push a frame for specialized C funcs
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
* Fix a comment on pc_to_cfunc
* Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame
* Rename it to jit_prepare_lazy_frame_call
* Fix a typo
* Optimize String#getbyte as well
* Optimize String#byteslice as well
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
* YJIT: Optimize attr_writer
* Comment about StackOpnd vs SelfOpnd
|
|
Similar to the iseq call support. Fairly straight forward.
|
|
|
|
This adds YJIT support for VM_CALL_KW_SPLAT with nil, specifically for
when we already know from the context that it's done with a nil. This is
enough to support forwarding with `...` when there no keyword arguments
are present.
Amend the kw_rest support to propagate the type of the parameter to help
with this. Test interactions with splat, since the splat array sits
lower on the stack when a kw_splat argument is present.
|
|
This is the same optimization as e4272fd29 ("Avoid allocation when
passing no keywords to anonymous kwrest methods") but for YJIT. For
anonymous kwrest parameters, nil is just as good as an empty hash.
On the usage side, update `splatkw` to handle `nil` with a leaner path.
|
|
Previously, our compile time check rejected dynamic symbols (e.g. what
String#to_sym could return) even though we could handle them just fine.
The runtime guards for the type of method name was also overly
restrictive and didn't accept dynamic symbols.
Fold the type check into the rb_get_symbol_id() and take advantage of
the guard already checking for 0. This also avoids generating the same
call twice in case the same method name is presented as different
types.
|
|
Now that `...` uses `**kwrest` instead of regular splat and
ruby2keywords, we need to support these type of methods to
support `...` well.
|
|
|
|
|
|
* YJIT: Allow tracing a counted exit
* Avoid clobbering caller-saved registers
|
|
The `invalidation_count` and `invalidate_*` counters are all incremented
using `incr_counter!` without a guard on stats mode, so they can be made
always available.
This could be to helpful in investigating where, how often, and what
types of invalidations are occurring in a production system.
|
|
There is nothing special about argument handling when it comes to zsuper
if you look around in the VM. Everything passes removing these fallback
reasons. It was ~16% on `railsbench`.
|
|
* YJIT: Add codegen for Float arithmetics
* Add Flonum and Fixnum tests
|
|
* YJIT: Specialize splatkw on T_HASH
* Fix a typo
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Fix a few more comments
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
|
|
For receiver with a singleton class, there are multiple vectors YJIT can
end up retaining the object. There is a path in jit_guard_known_klass()
that bakes the receiver into the code, and the object could also be kept
alive indirectly through a path starting at the CME object baked into
the code.
To avoid these leaks, avoid compiling calls on objects with a singleton
class.
See: https://github.com/Shopify/ruby/issues/552
[Bug #20209]
|
|
YJIT didn't guard for ruby2_keywords hash in case of splat calls that
land in methods with a rest parameter, creating incorrect results.
The compile-time checks didn't correspond to any actual effects of
ruby2_keywords, so it was masking this bug and YJIT was needlessly
refusing to compile some code. About 16% of fallback reasons in
`lobsters` was due to the ISeq check.
We already handle the tagging part with
exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard
for all splat cases.
Note for backporting: You also need 7f51959ff1.
[Bug #20195]
|
|
* YJIT: Allow inlining ISEQ calls with a block
* Leave a TODO comment about u16 inline_block
|
|
Support dropping extra arguments passed by `yield` in blocks. For
example `10.times { work }` drops the count argument. This is common
enough that it's about 3% of fallback reasons in `lobsters`.
Only support simple cases where the surplus arguments are at the top of
the stack, that way they just need to be popped, which takes no work.
|