| Age | Commit message (Collapse) | Author |
|
For <https://bugs.ruby-lang.org/issues/21716>, the panic is looking like
some sort of third party memory corruption, with YJIT taking the fall.
At the point of this assert, the assembler has dropped, so there's
nothing in YJIT's code other than JITState that could be holding on to
these transient `PendingBranchRef`.
The strong count being more than a handful or the weak count is non-zero
shows that someone in the process (likely some native extension)
corrupted the Rc's counts.
|
|
[DOC] Fix typos in YJIT core
|
|
I'm seeing some memory corruption in the wild on blocks in
`IseqPayload::dead_blocks`. While I unfortunately can't recreate the
issue, (For all I know, it could be some external code corrupting YJIT's
memory.) establishing a link between dead blocks and live blocks seems
fishy enough that we ought to prevent it. When it did happen, it might've
had bad interacts with Code GC and the optimization to immediately
free empty blocks.
|
|
Addressed some suggestions from clippy that made sense to me.
|
|
ZJIT: Remove JITed code after TracePoint is enabled
|
|
Because we have set all code memory to writable before the reference
updating phase, we can use raw memory writes directly.
|
|
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC. Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.
This commit makes one part of the change.
We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase. Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`. This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects. It also has performance
overhead due to the frequent invocation of system calls. We now set the
permission of all the code memory in bulk before and after the reference
updating phase. Multiple GC threads can now perform raw memory writes
in parallel. We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
|
|
|
|
Avoid generating an infinite loop in the case where:
1. Block `first` is adjacent to block `second`, and the branch from `first` to
`second` is a fallthrough, and
2. Block `second` immediately exits to the interpreter, and
3. Block `second` is invalidated and YJIT is OOM
While pondering how to fix this, I think I've stumbled on another related edge case:
1. Block `incoming_one` and `incoming_two` both branch to block `second`. Block
`incoming_one` has a fallthrough
2. Block `second` immediately exits to the interpreter (so it starts with its exit)
3. When Block `second` is invalidated, the incoming fallthrough branch from
`incoming_one` might be rewritten first, which overwrites the start of block
`second` with a jump to a new branch stub.
4. YJIT runs of out memory
5. The incoming branch from `incoming_two` is then rewritten, but because we're
OOM we can't generate a new stub, so we use `second`'s exit as the branch
target. However `second`'s exit was already overwritten with a jump to the
branch stub for `incoming_one`, so `incoming_two` will end up jumping to
`incoming_one`'s branch stub.
Fixes [Bug #21257]
Notes:
Merged: https://github.com/ruby/ruby/pull/13186
Merged-By: XrXr
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/12310
|
|
* YJIT: Spill/load argument registers to reuse blocks
* Mention the immediate function name
* Explain the context behind spill/load operations
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
In [1], we started checking for gen_branch failures, but I made two
crucial mistakes. One, defer_compilation() had the same issue as
gen_branch() but wasn't checked. Two, returning None from a codegen
function does not throw away the block. Checking how gen_single_block()
handles codegen functions, you can see that None terminates the block
with an exit, but does not overall return an Err. This handling is fine
for unimplemented instructions, for example, but incorrect in case
gen_branch() fails. The missing branch essentially corrupts the
block; adding more code after a missing branch doesn't correct the code.
Always abandon the block when defer_compilation() or gen_branch() fails.
[1]: cb661d7d82984cdb54485ea3f4af01ac21960882
Fixup: [1]
Notes:
Merged: https://github.com/ruby/ruby/pull/12035
Merged-By: XrXr
|
|
|
|
We got some core dumps in the wild where a PendingBranch had everything
as None, leading to a panic unwrapping in PendingBranch::into_branch().
This happened while compiling a `branchif`.
It seems that the only way this can happen is when core::gen_branch()
fails, but not due to OOM. We wouldn't have reach into_branch() when
OOM, and the only way to not leave markers that would've set the
branch's start_addr to some value in gen_branch() is for set_target() to
fail, causing an early return.
Unfortunately, it's hard to tell the exact sequence of events that led
to this situation, but regardless, the dumps show us that we should
check for errors in gen_branch().
Because gen_branch() is used deep in the stack during compilation (e.g.
guard_known_class() -> jit_chain_guard() -> gen_branch()), it'd be bad
for compile speed to propagate the error everywhere, not to mention the
massive patch required. Opt for a flag checked near the end of
compilation.
Notes:
Merged: https://github.com/ruby/ruby/pull/11938
Merged-By: XrXr
|
|
|
|
* YJIT: Add `--yjit-compilation-log` flag to print out the compilation log at exit.
* YJIT: Add an option to enable the compilation log at runtime.
* YJIT: Fix a typo in the `IseqPayload` docs.
* YJIT: Add stubs for getting the YJIT compilation log in memory.
* YJIT: Add a compilation log based on a circular buffer to cap the log size.
* YJIT: Allow specifying either a file or directory name for the YJIT compilation log.
The compilation log will be populated as compilation events occur. If a directory is supplied, then a filename based on the PID will be used as the write target. If a file name is supplied instead, the log will be written to that file.
* YJIT: Add JIT compilation of C function substitutions to the compilation log.
* YJIT: Add compilation events to the circular buffer even if output is sent to a file.
Previously, the two modes were treated as being exclusive of one another. However, it could be beneficial to log all events to a file while also allowing for direct access of the last N events via `RubyVM::YJIT.compilation_log`.
* YJIT: Make timestamps the first element in the YJIT compilation log tuple.
* YJIT: Stream log to stderr if `--yjit-compilation-log` is supplied without an argument.
* YJIT: Eagerly compute compilation log messages to avoid hanging on to references that may GC.
* YJIT: Log all compiled blocks, not just the method entry points.
* YJIT: Remove all compilation events other than block compilation to slim down the log.
* YJIT: Replace circular buffer iterator with a consuming loop.
* YJIT: Support `--yjit-compilation-log=quiet` as a way to activate the in-memory log without printing it.
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
* YJIT: Promote the compilation log to being the one YJIT log.
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
* Update doc/yjit/yjit.md
* Update doc/yjit/yjit.md
---------
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/11882
Merged-By: XrXr
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
|
|
* YJIT: Pass method arguments using registers
* s/at_current_insn/at_compile_target/
* Implement register shuffle
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Split these values to avoid using a bit mask in the context
Use variable length encoding to save a few bits on chain depth
|
|
* YJIT: Local variable register allocation
* locals are not stack temps
* Rename RegTemps to RegMappings
* Rename RegMapping to RegOpnd
* Rename local_size to num_locals
* s/stack value/operand/
* Rename spill_temps() to spill_regs()
* Clarify when num_locals becomes None
* Mention that InsnOut uses different registers
* Rename get_reg_mapping to get_reg_opnd
* Resurrect --yjit-temp-regs capability
* Use MAX_CTX_TEMPS and MAX_CTX_LOCALS
|
|
* YJIT: increase context cache size to 1024 redux
* Move context hashing code outside of unsafe block
* Avoid allocating large table on the stack, which would cause a stack overflow
Co-authored by Alan Wu @XrXr
|
|
* YJIT: increase context cache size to 1024
The other day I ran into a mysterious bug while increasing the
cache size to 1024. I was not able to reproduce this locally.
Opening this PR for testing/debugging.
* Add extra debug assertions
* Add more comments to context code
* Update yjit/src/core.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Update yjit/src/core.rs
* Comment out potentially problematic assertion
* Revert cache size to 512 so we can merge other changes
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
Mostly putting angle brackets around links to follow markdown syntax.
|
|
Many functions take an outlined code block but do nothing more than
passing it along; only a couple of functions actually make use of it.
So, in most cases the `ocb` parameter is just boilerplate.
Most functions that take `ocb` already also take a `JITState` and this
commit moves `ocb` into `JITState` to remove the visual noise of the
`ocb` parameter.
|
|
Calls to defer_compilation() leave behind a stub and a `struct Block`
that we retain. If the block is empty, it only exits to hold the
`struct Branch` that the stub needs.
This patch transplants the branch out of the empty block into the newly
generated block when the defer_compilation() stub is hit, and deletes
the empty block to save memory.
To assist the transplantation, `Block::outgoing` is now a
`MutableBranchList`, and `Branch::Block` now in a `Cell`. These types
don't incur a size cost.
On the `lobsters` benchmark, `yjit_alloc_size` is roughly 98% of what
it was before the change.
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Randy Stauner <randy@r4s6.net>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
|
|
* YJIT: add context cache hits stat
This stat should make more sense when it comes to interpreting
the effectiveness of the cache on large deployed apps.
|
|
* YJIT: add context cache size stat
* Allocate the context cache in a box so CRuby doesn't pay overhead
* Add an extra debug assertion
|
|
* YJIT: implement cache for recently encoded/decoded contexts
* Increase cache size to 512
|
|
* Implement BitVector data structure for variable-length context encoding
* Rename method to make intent clearer
* Rename write_uint => push_uint to make intent clearer
* Implement debug trait for BitVector
* Fix bug in BitVector::read_uint_at(), enable more tests
* Add one more test for good measure
* Start sketching Context::encode()
* Progress on variable length context encoding
* Add tests. Fix bug.
* Encode stack state
* Add comments. Try to estimate context encoding size.
* More compact encoding for stack size
* Commit before rebase
* Change Context::encode() to take a BitVector as input
* Refactor BitVector::read_uint(), add helper read functions
* Implement Context::decode() function. Add test.
* Fix bug, add tests
* Rename methods
* Add Context::encode() and decode() methods using global data
* Make encode and decode methods use u32 indices
* Refactor YJIT to use variable-length context encoding
* Tag functions as allow unused
* Add a simple caching mechanism and stats for bytes per context etc
* Add comments, fix formatting
* Grow vector of bytes by 1.2x instead of 2x
* Add debug assert to check round-trip encoding-decoding
* Take some rustfmt formatting
* Add decoded_from field to Context to reuse previous encodings
* Remove olde context stats
* Re-add stack_size assert
* Disable decoded_from optimization for now
|
|
No plan about migrating to the 2024 edition yet (it's not even
available yet), but this is a simple enough suggestion so we can just
take it.
```
warning: this method call resolves to `<&Box<[T]> as IntoIterator>::into_iter` (due to backwards compatibility), but will resolve to `<Box<[T]> as IntoIterator>::into_iter` in Rust 2024
--> ../yjit/src/core.rs:1003:49
|
1003 | formatter.debug_list().entries(branches.into_iter()).finish()
| ^^^^^^^^^
|
= warning: this changes meaning in Rust 2024
= note: `#[warn(boxed_slice_into_iter)]` on by default
help: use `.iter()` instead of `.into_iter()` to avoid ambiguity
|
1003 | formatter.debug_list().entries(branches.iter()).finish()
| ~~~~
help: or use `IntoIterator::into_iter(..)` instead of `.into_iter()` to explicitly iterate by value
|
1003 | formatter.debug_list().entries(IntoIterator::into_iter(branches)).finish()
| ++++++++++++++++++++++++ ~
```
|
|
Previously, the update was done in the ISEQ callback. That effectively
never updated anything because the callback itself is given an intact
reference, so it could update its content, and `rb_gc_location(iseq)`
never returned a new address. Update the whole table once in the YJIT
root instead.
|
|
Types like `Type::CString` really only assert that at one point the object had
its class field equal to `String`. Once a singleton class is created for any
strings, the type makes no assertion about any class field anymore, and becomes
the same as `Type::TString`.
Previously, the `--yjit-verify-ctx` option wasn't allowing objects of these
kind that have have singleton classes to pass verification even though the code
generators handle it just fine.
Found through `ruby/spec`.
|
|
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"
This reverts commit c8783441952217c18e523749c821f82cd7e5d222.
* YJIT: Take care of GC references in ISEQ invariants
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
|
|
* YJIT: Fix shrinking block with assumption too much
Under the very specific circumstances, discovered by a test case in
`ruby/spec`, an `expandarray` block can contain just a branch and carry
a method lookup assumption. Previously, when we regenerated the branch,
we allowed it to shrink to empty, since we put the code at the jump
target immediately after it. That was incorrect and caused a crash while
the block is invalidated, since that left no room to patch in an exit.
When regenerating a branch that makes up a block entirely, and the block
could be invalidated, we need to ensure there is room for invalidation.
When there is code before the branch, they should act as padding, so we
don't need to worry about those cases.
* skip on RJIT
|
|
This reverts commit 4cc58ea0b865f2fd20f1e881ddbd4c4fab0b072c.
Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
|
|
|
|
|
|
|
|
|
|
When running_iseq happens to be 0, it's better to fail on the assertion
rather than referencing the null pointer.
|
|
It eventually casts it to i32 anyways, and a lot of callers already have
an i32, so using isize was just adding unnecessary casts.
|
|
* YJIT: Allow tracing a counted exit
* Avoid clobbering caller-saved registers
|
|
|