| Age | Commit message (Collapse) | Author |
|
jump, branchif, etc don't invalidate locals in the JIT; they might in the interpreter because they can execute arbitrary code, but the JIT side exits before that happens.
|
|
This can happen with documentation updates and we don't want
those to trip on ZJIT tests.
Redact the whole name since names like "_bi342" aren't that helpful
anyways.
|
|
|
|
|
|
|
|
It's just a nicety (they fit fine as CUInt32) but this makes printing
look nicer in real execution and also in tests (helps with #15489).
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
|
|
Fix https://github.com/Shopify/ruby/issues/874
|
|
No point doing the manual size unit conversion for add. Sorry, no new
tests since there is no way to generate a LoadField with a negative
offset from ruby code AFAICT. Careful with the `as` casts.
|
|
* ZJIT: Fold LoadField on frozen objects to constants
When accessing instance variables from frozen objects via attr_reader/
attr_accessor, fold the LoadField instruction to a constant at compile
time. This enables further optimizations like constant propagation.
- Add fold_getinstancevariable_frozen optimization in Function::optimize
- Check if receiver type has a known ruby_object() that is frozen
- Read the field value at compile time and replace with Const instruction
- Add 10 unit tests covering various value types (fixnum, string, symbol,
nil, true/false) and negative cases (unfrozen, dynamic receiver)
* Run zjit-test-update
* Add a test that we don't fold non-BasicObject
* Small cleanups
---------
Co-authored-by: Max Bernstein <ruby@bernsteinbear.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
|
|
This fixes a crash when the new shape after a transition is too complex;
we need to check that it's not complex before trying to read by index.
|
|
This adds comments to the hir dump output like this:
v13:BasicObject = SendWithoutBlock v6, :test, v11 # SendFallbackReason: Complex argument passing
|
|
|
|
|
|
|
|
|
|
Fixes https://github.com/Shopify/ruby/issues/902
This pull request adds code generation for dividing fixnums.
Testing confirms the normal case, flooring, and side-exiting on division by zero.
|
|
ZJIT: Print local variable names GetLocal and SetLocal instructions
|
|
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes.
lobsters stats before:
```
Top-20 calls to C functions from JIT code (79.4% of total 90,051,140):
rb_vm_opt_send_without_block: 19,762,433 (21.9%)
rb_vm_setinstancevariable: 7,698,314 ( 8.5%)
rb_hash_aref: 6,767,461 ( 7.5%)
rb_vm_env_write: 5,373,080 ( 6.0%)
rb_vm_send: 5,049,229 ( 5.6%)
rb_vm_getinstancevariable: 4,535,259 ( 5.0%)
rb_obj_is_kind_of: 3,746,306 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%)
rb_vm_invokesuper: 3,037,467 ( 3.4%)
rb_ary_entry: 2,351,983 ( 2.6%)
rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%)
rb_vm_invokeblock: 1,184,474 ( 1.3%)
Hash#[]=: 1,064,288 ( 1.2%)
rb_gc_writebarrier: 1,006,972 ( 1.1%)
rb_ec_ary_new_from_values: 902,687 ( 1.0%)
fetch: 898,667 ( 1.0%)
rb_str_buf_append: 833,787 ( 0.9%)
rb_class_allocate_instance: 822,024 ( 0.9%)
Hash#fetch: 699,580 ( 0.8%)
_bi20: 682,068 ( 0.8%)
Top-4 setivar fallback reasons (100.0% of total 7,732,326):
shape_transition: 6,032,109 (78.0%)
not_monomorphic: 1,469,300 (19.0%)
not_t_object: 172,636 ( 2.2%)
too_complex: 58,281 ( 0.8%)
```
lobsters stats after:
```
Top-20 calls to C functions from JIT code (79.0% of total 88,322,656):
rb_vm_opt_send_without_block: 19,777,880 (22.4%)
rb_hash_aref: 6,771,589 ( 7.7%)
rb_vm_env_write: 5,372,789 ( 6.1%)
rb_gc_writebarrier: 5,195,527 ( 5.9%)
rb_vm_send: 5,049,145 ( 5.7%)
rb_vm_getinstancevariable: 4,538,485 ( 5.1%)
rb_obj_is_kind_of: 3,746,241 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%)
rb_vm_invokesuper: 3,037,157 ( 3.4%)
rb_ary_entry: 2,351,968 ( 2.7%)
rb_vm_setinstancevariable: 1,703,337 ( 1.9%)
rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%)
rb_vm_invokeblock: 1,184,290 ( 1.3%)
Hash#[]=: 1,061,868 ( 1.2%)
rb_ec_ary_new_from_values: 902,666 ( 1.0%)
fetch: 898,666 ( 1.0%)
rb_str_buf_append: 833,784 ( 0.9%)
rb_class_allocate_instance: 821,778 ( 0.9%)
Hash#fetch: 755,913 ( 0.9%)
Top-4 setivar fallback reasons (100.0% of total 1,703,337):
not_monomorphic: 1,472,405 (86.4%)
not_t_object: 172,629 (10.1%)
too_complex: 58,281 ( 3.4%)
new_shape_needs_extension: 22 ( 0.0%)
```
I also noticed that primitive printing in HIR was broken so I fixed that.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
We generally know the receiver's class from profile info. I see 600k of these when running lobsters.
|
|
|
|
It's used as an alternative to find-and-replace, so we should have
nothing to replace.
|
|
* All Invariant::SingleRactorMode PatchPoint are replaced by
assume_single_ractor_mode() to fix https://github.com/Shopify/ruby/issues/875
for SingleRactorMode patchpoints.
|
|
Don't call a C function.
|
|
Same as Integer#>>. Also add more strict type checks for both Integer#>>
and Integer#<<.
|
|
Append a codepoint.
|
|
This otherwise would miss annotations of C methods.
|
|
Use actual receiver type. This gives us better method lookup.
|
|
ZJIT: Standardize C call related insn fields
- Add `recv` field to `CCall` and `CCallWithFrame` so now all method dispatch
related instructions have `recv` field, separate from `args` field.
This ensures consistent pointer arithmetic when generating code for these
instructions.
- Standardize `recv` field's display position in send related instructions.
|
|
ZJIT: Optimize variadic cfunc Send calls into CCallVariadic
|
|
lobsters:
```
Top-4 setivar fallback reasons (100.0% of total 7,789,008):
shape_transition: 6,074,085 (78.0%)
not_monomorphic: 1,484,013 (19.1%)
not_t_object: 172,629 ( 2.2%)
too_complex: 58,281 ( 0.7%)
Top-3 getivar fallback reasons (100.0% of total 9,348,832):
not_t_object: 4,658,833 (49.8%)
not_monomorphic: 4,542,316 (48.6%)
too_complex: 147,683 ( 1.6%)
Top-3 definedivar fallback reasons (100.0% of total 366,383):
not_monomorphic: 361,389 (98.6%)
too_complex: 3,062 ( 0.8%)
not_t_object: 1,932 ( 0.5%)
```
railsbench:
```
Top-3 setivar fallback reasons (100.0% of total 15,119,057):
shape_transition: 13,760,763 (91.0%)
not_monomorphic: 982,368 ( 6.5%)
not_t_object: 375,926 ( 2.5%)
Top-2 getivar fallback reasons (100.0% of total 14,438,747):
not_t_object: 7,643,870 (52.9%)
not_monomorphic: 6,794,877 (47.1%)
Top-2 definedivar fallback reasons (100.0% of total 209,613):
not_monomorphic: 209,526 (100.0%)
not_t_object: 87 ( 0.0%)
```
shipit:
```
Top-3 setivar fallback reasons (100.0% of total 14,516,254):
shape_transition: 8,613,512 (59.3%)
not_monomorphic: 5,761,398 (39.7%)
not_t_object: 141,344 ( 1.0%)
Top-2 getivar fallback reasons (100.0% of total 21,016,444):
not_monomorphic: 11,313,482 (53.8%)
not_t_object: 9,702,962 (46.2%)
Top-2 definedivar fallback reasons (100.0% of total 290,382):
not_monomorphic: 287,755 (99.1%)
not_t_object: 2,627 ( 0.9%)
```
|
|
Don't support shape transitions for now.
|
|
JIT-to-JIT sends don't blit locals to nil in the callee's
EP memory region because HIR is aware of this initial state and
memory ops are only done when necessary. Previously, we
read from this initialized memory by emitting `GetLocal` in e.g. BBs
that are immediate successor to an entrypoint.
The entry points sets up the frame state properly and we also reload
locals if necessary after an operation that potentially makes the
environment escape. So, listen to the frame state when it's supposed to
be up-to-date (`!local_inval`).
|
|
|
|
|
|
This is good for protoboeuf and other binary parsing
|
|
Add `LoadEC` then it's just two `LoadField`.
|
|
This lets us constant-fold common monomorphic cases.
|
|
Don't emit a CCall.
|
|
Going through a call to a C function just to read a bitfield was a
little extreme. We did it to be super conservative since bitfields
have historically been the trigger of many bugs and surprises. Let's
try directly accessing them with code from rust-bindgen. If this
ends up causing issues, we can use the FFI approach behind nicer
wrappers.
In any case, directly access regular struct fields such as `lead_num`
and `opt_num` to remove boilerplate.
|
|
|
|
|
|
Fixes https://github.com/Shopify/ruby/issues/877
I didn't consider the ability to have the successor or predecessor sets having duplicates when originally crafting the Iongraph support PR, but have added this to prevent that happening in the future.
I don't think it interferes with the underlying Iongraph implementation, but it doesn't really make sense.
I think this kind of behaviour happens when there are multiple jump instructions that go to the same basic block within a given block.
|
|
|
|
|
|
## Components
This PR adds functionality to visualize HIR using the [Iongraph](https://spidermonkey.dev/blog/2025/10/28/iongraph-web.html) tool first created for use with Spidermonkey.
## Justification
Iongraph's viewer is (as mentioned in the article above) a few notches above graphviz for viewing large CFGs. It also allows easily inspecting different compiler optimization passes and multiple functions in the same browser window. Since Spidermonkey is using this format, it may be beneficial to use it for our own JIT development.
The requirement for JSON is downstream from that of the Iongraph format. As for writing the implementation myself, ZJIT leans towards having fewer dependencies, so this is the preferred approach.
## How does it look?
<img width="902" height="957" alt="image" src="https://github.com/user-attachments/assets/e4e0991b-572a-41fd-9fed-1215bd1926c3" />
<img width="770" height="624" alt="image" src="https://github.com/user-attachments/assets/01398373-1f75-46b8-b1aa-7f5d4cbca6b8" />
Right now, it's aesthetically minimal, but is fairly robust.
## Functionality
Using `--zjit-dump-hir-iongraph` will dump all compiled functions into a directory named `/tmp/zjit-iongraph-{PROCESS_PID}`. Each file will be named `func_{ZJIT_FUNC_NAME}.json`. In order to use them in the Iongraph viewer, you'll need to use `jq` to collate them to a single file. An example invocation of `jq` is shown below for reference. The name of the file created does not matter to my understanding.
`jq --slurp --null-input '.functions=inputs | .version=2' /tmp/zjit-iongraph-{PROCESS_PID}/func*.json > ~/Downloads/foo.json`
From there, you can use https://mozilla-spidermonkey.github.io/iongraph/ to view your trace.
### Caveats
- The upstream Iongraph viewer doesn't allow you to click arguments to an instruction to find the instruction that they originate from when using the format that this PR generates. (I have made a small fork at https://github.com/aidenfoxivey/iongraph that fixes that functionality via https://github.com/aidenfoxivey/iongraph/commit/9e9c29b41c4dbb35cf66cb6161e5b19c8b796379.patch)
- The upstream Iongraph viewer can sometimes show "exiting edges" in the CFG as being not attached to the box representing its basic block.
<img width="1814" height="762" alt="image" src="https://github.com/user-attachments/assets/afbbaa16-332f-498f-849e-11c69a8cb0cc" />
(Image courtesy of @tekknolagi)
This is because the original tool was (to our understanding) written for an SSA format that does not use extended basic blocks. (Extended basic blocks let you put a jump instruction, conditional or otherwise, anywhere in the basic block.) This means that our format may generate more outgoing edges than the viewer is written to handle.
|
|
|
|
lobsters:
```
Top-20 calls to C functions from JIT code (79.9% of total 97,004,883):
rb_vm_opt_send_without_block: 19,874,212 (20.5%)
rb_vm_setinstancevariable: 9,774,841 (10.1%)
rb_ivar_get: 9,358,866 ( 9.6%)
rb_hash_aref: 6,828,948 ( 7.0%)
rb_vm_send: 6,441,551 ( 6.6%)
rb_vm_env_write: 5,375,989 ( 5.5%)
rb_vm_invokesuper: 3,037,836 ( 3.1%)
Module#===: 2,562,446 ( 2.6%)
rb_ary_entry: 2,354,546 ( 2.4%)
Kernel#is_a?: 1,424,092 ( 1.5%)
rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%)
Thread.current: 1,300,822 ( 1.3%)
rb_zjit_defined_ivar: 1,222,613 ( 1.3%)
rb_vm_invokeblock: 1,184,555 ( 1.2%)
Hash#[]=: 1,061,969 ( 1.1%)
rb_ary_push: 1,024,987 ( 1.1%)
rb_ary_new_capa: 904,003 ( 0.9%)
rb_str_buf_append: 833,782 ( 0.9%)
rb_class_allocate_instance: 822,626 ( 0.8%)
Hash#fetch: 755,913 ( 0.8%)
```
railsbench:
```
Top-20 calls to C functions from JIT code (74.8% of total 189,170,268):
rb_vm_opt_send_without_block: 29,870,307 (15.8%)
rb_vm_setinstancevariable: 17,631,199 ( 9.3%)
rb_hash_aref: 16,928,890 ( 8.9%)
rb_ivar_get: 14,441,240 ( 7.6%)
rb_vm_env_write: 11,571,001 ( 6.1%)
rb_vm_send: 11,153,457 ( 5.9%)
rb_vm_invokesuper: 7,568,267 ( 4.0%)
Module#===: 6,065,923 ( 3.2%)
Hash#[]=: 2,842,990 ( 1.5%)
rb_ary_entry: 2,766,125 ( 1.5%)
rb_ary_push: 2,722,079 ( 1.4%)
rb_vm_invokeblock: 2,594,398 ( 1.4%)
Thread.current: 2,560,129 ( 1.4%)
rb_str_getbyte: 1,965,627 ( 1.0%)
Kernel#is_a?: 1,961,815 ( 1.0%)
rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%)
rb_hash_new_with_size: 1,796,456 ( 0.9%)
rb_class_allocate_instance: 1,785,043 ( 0.9%)
String#empty?: 1,713,414 ( 0.9%)
rb_ary_new_capa: 1,678,834 ( 0.9%)
```
shipit:
```
Top-20 calls to C functions from JIT code (83.4% of total 182,402,821):
rb_vm_opt_send_without_block: 45,753,484 (25.1%)
rb_ivar_get: 21,020,650 (11.5%)
rb_vm_setinstancevariable: 17,528,603 ( 9.6%)
rb_hash_aref: 11,892,856 ( 6.5%)
rb_vm_send: 11,723,471 ( 6.4%)
rb_vm_env_write: 10,434,452 ( 5.7%)
Module#===: 4,225,048 ( 2.3%)
rb_vm_invokesuper: 3,705,906 ( 2.0%)
Thread.current: 3,337,603 ( 1.8%)
rb_ary_entry: 3,114,378 ( 1.7%)
Hash#[]=: 2,509,912 ( 1.4%)
Array#empty?: 2,282,994 ( 1.3%)
rb_vm_invokeblock: 2,210,511 ( 1.2%)
Hash#fetch: 2,017,960 ( 1.1%)
_bi20: 1,975,147 ( 1.1%)
rb_zjit_defined_ivar: 1,897,127 ( 1.0%)
rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%)
rb_ary_new_capa: 1,615,406 ( 0.9%)
Kernel#is_a?: 1,567,854 ( 0.9%)
rb_class_allocate_instance: 1,560,035 ( 0.9%)
```
Thanks to @eregon for the idea.
Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
Rename to `VM_KW_SPECIFIED_BITS_MAX` now that it's in `vm_core.h`.
|