| Age | Commit message (Collapse) | Author |
|
jump, branchif, etc don't invalidate locals in the JIT; they might in the interpreter because they can execute arbitrary code, but the JIT side exits before that happens.
|
|
This can happen with documentation updates and we don't want
those to trip on ZJIT tests.
Redact the whole name since names like "_bi342" aren't that helpful
anyways.
|
|
|
|
|
|
|
|
It's just a nicety (they fit fine as CUInt32) but this makes printing
look nicer in real execution and also in tests (helps with #15489).
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
|
|
Fix https://github.com/Shopify/ruby/issues/874
|
|
* ZJIT: Fold LoadField on frozen objects to constants
When accessing instance variables from frozen objects via attr_reader/
attr_accessor, fold the LoadField instruction to a constant at compile
time. This enables further optimizations like constant propagation.
- Add fold_getinstancevariable_frozen optimization in Function::optimize
- Check if receiver type has a known ruby_object() that is frozen
- Read the field value at compile time and replace with Const instruction
- Add 10 unit tests covering various value types (fixnum, string, symbol,
nil, true/false) and negative cases (unfrozen, dynamic receiver)
* Run zjit-test-update
* Add a test that we don't fold non-BasicObject
* Small cleanups
---------
Co-authored-by: Max Bernstein <ruby@bernsteinbear.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
This fixes a crash when the new shape after a transition is too complex;
we need to check that it's not complex before trying to read by index.
|
|
This adds comments to the hir dump output like this:
v13:BasicObject = SendWithoutBlock v6, :test, v11 # SendFallbackReason: Complex argument passing
|
|
|
|
|
|
|
|
ZJIT: Print local variable names GetLocal and SetLocal instructions
|
|
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes.
lobsters stats before:
```
Top-20 calls to C functions from JIT code (79.4% of total 90,051,140):
rb_vm_opt_send_without_block: 19,762,433 (21.9%)
rb_vm_setinstancevariable: 7,698,314 ( 8.5%)
rb_hash_aref: 6,767,461 ( 7.5%)
rb_vm_env_write: 5,373,080 ( 6.0%)
rb_vm_send: 5,049,229 ( 5.6%)
rb_vm_getinstancevariable: 4,535,259 ( 5.0%)
rb_obj_is_kind_of: 3,746,306 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%)
rb_vm_invokesuper: 3,037,467 ( 3.4%)
rb_ary_entry: 2,351,983 ( 2.6%)
rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%)
rb_vm_invokeblock: 1,184,474 ( 1.3%)
Hash#[]=: 1,064,288 ( 1.2%)
rb_gc_writebarrier: 1,006,972 ( 1.1%)
rb_ec_ary_new_from_values: 902,687 ( 1.0%)
fetch: 898,667 ( 1.0%)
rb_str_buf_append: 833,787 ( 0.9%)
rb_class_allocate_instance: 822,024 ( 0.9%)
Hash#fetch: 699,580 ( 0.8%)
_bi20: 682,068 ( 0.8%)
Top-4 setivar fallback reasons (100.0% of total 7,732,326):
shape_transition: 6,032,109 (78.0%)
not_monomorphic: 1,469,300 (19.0%)
not_t_object: 172,636 ( 2.2%)
too_complex: 58,281 ( 0.8%)
```
lobsters stats after:
```
Top-20 calls to C functions from JIT code (79.0% of total 88,322,656):
rb_vm_opt_send_without_block: 19,777,880 (22.4%)
rb_hash_aref: 6,771,589 ( 7.7%)
rb_vm_env_write: 5,372,789 ( 6.1%)
rb_gc_writebarrier: 5,195,527 ( 5.9%)
rb_vm_send: 5,049,145 ( 5.7%)
rb_vm_getinstancevariable: 4,538,485 ( 5.1%)
rb_obj_is_kind_of: 3,746,241 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%)
rb_vm_invokesuper: 3,037,157 ( 3.4%)
rb_ary_entry: 2,351,968 ( 2.7%)
rb_vm_setinstancevariable: 1,703,337 ( 1.9%)
rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%)
rb_vm_invokeblock: 1,184,290 ( 1.3%)
Hash#[]=: 1,061,868 ( 1.2%)
rb_ec_ary_new_from_values: 902,666 ( 1.0%)
fetch: 898,666 ( 1.0%)
rb_str_buf_append: 833,784 ( 0.9%)
rb_class_allocate_instance: 821,778 ( 0.9%)
Hash#fetch: 755,913 ( 0.9%)
Top-4 setivar fallback reasons (100.0% of total 1,703,337):
not_monomorphic: 1,472,405 (86.4%)
not_t_object: 172,629 (10.1%)
too_complex: 58,281 ( 3.4%)
new_shape_needs_extension: 22 ( 0.0%)
```
I also noticed that primitive printing in HIR was broken so I fixed that.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
We generally know the receiver's class from profile info. I see 600k of these when running lobsters.
|
|
* All Invariant::SingleRactorMode PatchPoint are replaced by
assume_single_ractor_mode() to fix https://github.com/Shopify/ruby/issues/875
for SingleRactorMode patchpoints.
|
|
Don't call a C function.
|
|
|
|
Same as Integer#>>. Also add more strict type checks for both Integer#>>
and Integer#<<.
|
|
Append a codepoint.
|
|
This otherwise would miss annotations of C methods.
|
|
Use actual receiver type. This gives us better method lookup.
|
|
|
|
ZJIT: Standardize C call related insn fields
- Add `recv` field to `CCall` and `CCallWithFrame` so now all method dispatch
related instructions have `recv` field, separate from `args` field.
This ensures consistent pointer arithmetic when generating code for these
instructions.
- Standardize `recv` field's display position in send related instructions.
|
|
ZJIT: Optimize variadic cfunc Send calls into CCallVariadic
|
|
lobsters:
```
Top-4 setivar fallback reasons (100.0% of total 7,789,008):
shape_transition: 6,074,085 (78.0%)
not_monomorphic: 1,484,013 (19.1%)
not_t_object: 172,629 ( 2.2%)
too_complex: 58,281 ( 0.7%)
Top-3 getivar fallback reasons (100.0% of total 9,348,832):
not_t_object: 4,658,833 (49.8%)
not_monomorphic: 4,542,316 (48.6%)
too_complex: 147,683 ( 1.6%)
Top-3 definedivar fallback reasons (100.0% of total 366,383):
not_monomorphic: 361,389 (98.6%)
too_complex: 3,062 ( 0.8%)
not_t_object: 1,932 ( 0.5%)
```
railsbench:
```
Top-3 setivar fallback reasons (100.0% of total 15,119,057):
shape_transition: 13,760,763 (91.0%)
not_monomorphic: 982,368 ( 6.5%)
not_t_object: 375,926 ( 2.5%)
Top-2 getivar fallback reasons (100.0% of total 14,438,747):
not_t_object: 7,643,870 (52.9%)
not_monomorphic: 6,794,877 (47.1%)
Top-2 definedivar fallback reasons (100.0% of total 209,613):
not_monomorphic: 209,526 (100.0%)
not_t_object: 87 ( 0.0%)
```
shipit:
```
Top-3 setivar fallback reasons (100.0% of total 14,516,254):
shape_transition: 8,613,512 (59.3%)
not_monomorphic: 5,761,398 (39.7%)
not_t_object: 141,344 ( 1.0%)
Top-2 getivar fallback reasons (100.0% of total 21,016,444):
not_monomorphic: 11,313,482 (53.8%)
not_t_object: 9,702,962 (46.2%)
Top-2 definedivar fallback reasons (100.0% of total 290,382):
not_monomorphic: 287,755 (99.1%)
not_t_object: 2,627 ( 0.9%)
```
|
|
Don't support shape transitions for now.
|
|
JIT-to-JIT sends don't blit locals to nil in the callee's
EP memory region because HIR is aware of this initial state and
memory ops are only done when necessary. Previously, we
read from this initialized memory by emitting `GetLocal` in e.g. BBs
that are immediate successor to an entrypoint.
The entry points sets up the frame state properly and we also reload
locals if necessary after an operation that potentially makes the
environment escape. So, listen to the frame state when it's supposed to
be up-to-date (`!local_inval`).
|
|
|
|
|
|
|
|
This is good for protoboeuf and other binary parsing
|
|
Add `LoadEC` then it's just two `LoadField`.
|
|
This lets us constant-fold common monomorphic cases.
|
|
Don't emit a CCall.
|
|
|
|
lobsters:
```
Top-20 calls to C functions from JIT code (79.9% of total 97,004,883):
rb_vm_opt_send_without_block: 19,874,212 (20.5%)
rb_vm_setinstancevariable: 9,774,841 (10.1%)
rb_ivar_get: 9,358,866 ( 9.6%)
rb_hash_aref: 6,828,948 ( 7.0%)
rb_vm_send: 6,441,551 ( 6.6%)
rb_vm_env_write: 5,375,989 ( 5.5%)
rb_vm_invokesuper: 3,037,836 ( 3.1%)
Module#===: 2,562,446 ( 2.6%)
rb_ary_entry: 2,354,546 ( 2.4%)
Kernel#is_a?: 1,424,092 ( 1.5%)
rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%)
Thread.current: 1,300,822 ( 1.3%)
rb_zjit_defined_ivar: 1,222,613 ( 1.3%)
rb_vm_invokeblock: 1,184,555 ( 1.2%)
Hash#[]=: 1,061,969 ( 1.1%)
rb_ary_push: 1,024,987 ( 1.1%)
rb_ary_new_capa: 904,003 ( 0.9%)
rb_str_buf_append: 833,782 ( 0.9%)
rb_class_allocate_instance: 822,626 ( 0.8%)
Hash#fetch: 755,913 ( 0.8%)
```
railsbench:
```
Top-20 calls to C functions from JIT code (74.8% of total 189,170,268):
rb_vm_opt_send_without_block: 29,870,307 (15.8%)
rb_vm_setinstancevariable: 17,631,199 ( 9.3%)
rb_hash_aref: 16,928,890 ( 8.9%)
rb_ivar_get: 14,441,240 ( 7.6%)
rb_vm_env_write: 11,571,001 ( 6.1%)
rb_vm_send: 11,153,457 ( 5.9%)
rb_vm_invokesuper: 7,568,267 ( 4.0%)
Module#===: 6,065,923 ( 3.2%)
Hash#[]=: 2,842,990 ( 1.5%)
rb_ary_entry: 2,766,125 ( 1.5%)
rb_ary_push: 2,722,079 ( 1.4%)
rb_vm_invokeblock: 2,594,398 ( 1.4%)
Thread.current: 2,560,129 ( 1.4%)
rb_str_getbyte: 1,965,627 ( 1.0%)
Kernel#is_a?: 1,961,815 ( 1.0%)
rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%)
rb_hash_new_with_size: 1,796,456 ( 0.9%)
rb_class_allocate_instance: 1,785,043 ( 0.9%)
String#empty?: 1,713,414 ( 0.9%)
rb_ary_new_capa: 1,678,834 ( 0.9%)
```
shipit:
```
Top-20 calls to C functions from JIT code (83.4% of total 182,402,821):
rb_vm_opt_send_without_block: 45,753,484 (25.1%)
rb_ivar_get: 21,020,650 (11.5%)
rb_vm_setinstancevariable: 17,528,603 ( 9.6%)
rb_hash_aref: 11,892,856 ( 6.5%)
rb_vm_send: 11,723,471 ( 6.4%)
rb_vm_env_write: 10,434,452 ( 5.7%)
Module#===: 4,225,048 ( 2.3%)
rb_vm_invokesuper: 3,705,906 ( 2.0%)
Thread.current: 3,337,603 ( 1.8%)
rb_ary_entry: 3,114,378 ( 1.7%)
Hash#[]=: 2,509,912 ( 1.4%)
Array#empty?: 2,282,994 ( 1.3%)
rb_vm_invokeblock: 2,210,511 ( 1.2%)
Hash#fetch: 2,017,960 ( 1.1%)
_bi20: 1,975,147 ( 1.1%)
rb_zjit_defined_ivar: 1,897,127 ( 1.0%)
rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%)
rb_ary_new_capa: 1,615,406 ( 0.9%)
Kernel#is_a?: 1,567,854 ( 0.9%)
rb_class_allocate_instance: 1,560,035 ( 0.9%)
```
Thanks to @eregon for the idea.
Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
|
|
Name contradictory now, and we have other tests testing the same thing.
|
|
|
|
* Add Insn::StoreField and Insn::WriteBarrier
|
|
Make it easier to see what happens when one is changed.
|
|
* Correct JIT entry points for optionals so each optional start with nil
before their initialization routine runs. Establish
`jit_entry_points[filled_opts_num]` gives the appropriate entry point
* Correct number of HIR block parameters for each JIT entry point
* Entry points that share the same ISEQ PC get separate entries since
they start with different state. No more deduplication.
* Reject post parameters. Was hidden behind check for optionals.
* Make sure to visit every BB in iseq_to_hir(). Some wasn't visited
when the initialization routine for an optional terminates the block
in a `SideExit`. Remove the now impossible `FailedOptionalArguments`.
|
|
|
|
|
|
|
|
|
|
|