| Age | Commit message (Collapse) | Author |
|
|
|
|
|
use of it later (e.g. fold_constants for ArrayAref)
|
|
|
|
* ZJIT: Profile `invokesuper` instructions
* ZJIT: Introduce the `InvokeSuperDirect` HIR instruction
The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ.
* ZJIT: Expand definition of unspecializable to more complex cases
* ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified
* ZJIT: Simplify `invokesuper` specialization to most common case
Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`.
* ZJIT: Track `super` method entries directly to avoid GC issues
Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children.
* ZJIT: Optimize `super` calls with simple argument forms
* ZJIT: Report the reason why we can't optimize an `invokesuper` instance
* ZJIT: Revise send fallback reasons for `super` calls
* ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
|
|
Make sure we check if we have seen a singleton for this class before assuming we have not. Port the API from YJIT.
|
|
Resolves https://github.com/Shopify/ruby/issues/915
When we have `LoadField` with a `Shape` return type, we can fold it similar to the object case.
`GuardBitEquals` can be removed when the argument is `Const` and the values are equal.
The behaviors for loading instances variables from frozen/dynamic objects are already covered in existing tests so no new tests were added.
|
|
This reduces some processing and makes the HIR more accurate.
|
|
This is used a lot in optcarrot.
|
|
You can see the reordered args in the new Snapshot right before the
DirectSend insn:
v14:Any = Snapshot FrameState { pc: 0x00, stack: [v6, v11, v13], locals: [] }
PatchPoint MethodRedefined(Object@0x00, a@0x00, cme:0x00)
PatchPoint NoSingletonClass(Object@0x00)
v22:HeapObject[class_exact*:Object@VALUE(0x00)] = GuardType v6, HeapObject[class_exact*:Object@VALUE(0x00)]
- v23:BasicObject = SendWithoutBlockDirect v22, :a (0x00), v13, v11
- v16:Any = Snapshot FrameState { pc: 0x00, stack: [v23], locals: [] }
+ v23:Any = Snapshot FrameState { pc: 0x00, stack: [v6, v13, v11], locals: [] }
+ v24:BasicObject = SendWithoutBlockDirect v22, :a (0x00), v13, v11
+ v16:Any = Snapshot FrameState { pc: 0x00, stack: [v24], locals: [] }
|
|
|
|
|
|
|
|
GuardShape is just load+guard, so use the existing HIR instructions for load+guard. Probably makes future analysis slightly easier.
|
|
Inline `Array#[]=` into `ArrayAset`.
|
|
jump, branchif, etc don't invalidate locals in the JIT; they might in the interpreter because they can execute arbitrary code, but the JIT side exits before that happens.
|
|
This can happen with documentation updates and we don't want
those to trip on ZJIT tests.
Redact the whole name since names like "_bi342" aren't that helpful
anyways.
|
|
|
|
|
|
|
|
It's just a nicety (they fit fine as CUInt32) but this makes printing
look nicer in real execution and also in tests (helps with #15489).
Co-authored-by: Randy Stauner <randy@r4s6.net>
|
|
|
|
Fix https://github.com/Shopify/ruby/issues/874
|
|
* ZJIT: Fold LoadField on frozen objects to constants
When accessing instance variables from frozen objects via attr_reader/
attr_accessor, fold the LoadField instruction to a constant at compile
time. This enables further optimizations like constant propagation.
- Add fold_getinstancevariable_frozen optimization in Function::optimize
- Check if receiver type has a known ruby_object() that is frozen
- Read the field value at compile time and replace with Const instruction
- Add 10 unit tests covering various value types (fixnum, string, symbol,
nil, true/false) and negative cases (unfrozen, dynamic receiver)
* Run zjit-test-update
* Add a test that we don't fold non-BasicObject
* Small cleanups
---------
Co-authored-by: Max Bernstein <ruby@bernsteinbear.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
This fixes a crash when the new shape after a transition is too complex;
we need to check that it's not complex before trying to read by index.
|
|
This adds comments to the hir dump output like this:
v13:BasicObject = SendWithoutBlock v6, :test, v11 # SendFallbackReason: Complex argument passing
|
|
|
|
|
|
|
|
|
|
ZJIT: Print local variable names GetLocal and SetLocal instructions
|
|
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes.
lobsters stats before:
```
Top-20 calls to C functions from JIT code (79.4% of total 90,051,140):
rb_vm_opt_send_without_block: 19,762,433 (21.9%)
rb_vm_setinstancevariable: 7,698,314 ( 8.5%)
rb_hash_aref: 6,767,461 ( 7.5%)
rb_vm_env_write: 5,373,080 ( 6.0%)
rb_vm_send: 5,049,229 ( 5.6%)
rb_vm_getinstancevariable: 4,535,259 ( 5.0%)
rb_obj_is_kind_of: 3,746,306 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%)
rb_vm_invokesuper: 3,037,467 ( 3.4%)
rb_ary_entry: 2,351,983 ( 2.6%)
rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%)
rb_vm_invokeblock: 1,184,474 ( 1.3%)
Hash#[]=: 1,064,288 ( 1.2%)
rb_gc_writebarrier: 1,006,972 ( 1.1%)
rb_ec_ary_new_from_values: 902,687 ( 1.0%)
fetch: 898,667 ( 1.0%)
rb_str_buf_append: 833,787 ( 0.9%)
rb_class_allocate_instance: 822,024 ( 0.9%)
Hash#fetch: 699,580 ( 0.8%)
_bi20: 682,068 ( 0.8%)
Top-4 setivar fallback reasons (100.0% of total 7,732,326):
shape_transition: 6,032,109 (78.0%)
not_monomorphic: 1,469,300 (19.0%)
not_t_object: 172,636 ( 2.2%)
too_complex: 58,281 ( 0.8%)
```
lobsters stats after:
```
Top-20 calls to C functions from JIT code (79.0% of total 88,322,656):
rb_vm_opt_send_without_block: 19,777,880 (22.4%)
rb_hash_aref: 6,771,589 ( 7.7%)
rb_vm_env_write: 5,372,789 ( 6.1%)
rb_gc_writebarrier: 5,195,527 ( 5.9%)
rb_vm_send: 5,049,145 ( 5.7%)
rb_vm_getinstancevariable: 4,538,485 ( 5.1%)
rb_obj_is_kind_of: 3,746,241 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%)
rb_vm_invokesuper: 3,037,157 ( 3.4%)
rb_ary_entry: 2,351,968 ( 2.7%)
rb_vm_setinstancevariable: 1,703,337 ( 1.9%)
rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%)
rb_vm_invokeblock: 1,184,290 ( 1.3%)
Hash#[]=: 1,061,868 ( 1.2%)
rb_ec_ary_new_from_values: 902,666 ( 1.0%)
fetch: 898,666 ( 1.0%)
rb_str_buf_append: 833,784 ( 0.9%)
rb_class_allocate_instance: 821,778 ( 0.9%)
Hash#fetch: 755,913 ( 0.9%)
Top-4 setivar fallback reasons (100.0% of total 1,703,337):
not_monomorphic: 1,472,405 (86.4%)
not_t_object: 172,629 (10.1%)
too_complex: 58,281 ( 3.4%)
new_shape_needs_extension: 22 ( 0.0%)
```
I also noticed that primitive printing in HIR was broken so I fixed that.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
We generally know the receiver's class from profile info. I see 600k of these when running lobsters.
|
|
* All Invariant::SingleRactorMode PatchPoint are replaced by
assume_single_ractor_mode() to fix https://github.com/Shopify/ruby/issues/875
for SingleRactorMode patchpoints.
|
|
|
|
Don't call a C function.
|
|
|
|
Same as Integer#>>. Also add more strict type checks for both Integer#>>
and Integer#<<.
|
|
Append a codepoint.
|
|
This otherwise would miss annotations of C methods.
|
|
Use actual receiver type. This gives us better method lookup.
|
|
|
|
ZJIT: Standardize C call related insn fields
- Add `recv` field to `CCall` and `CCallWithFrame` so now all method dispatch
related instructions have `recv` field, separate from `args` field.
This ensures consistent pointer arithmetic when generating code for these
instructions.
- Standardize `recv` field's display position in send related instructions.
|
|
ZJIT: Optimize variadic cfunc Send calls into CCallVariadic
|
|
lobsters:
```
Top-4 setivar fallback reasons (100.0% of total 7,789,008):
shape_transition: 6,074,085 (78.0%)
not_monomorphic: 1,484,013 (19.1%)
not_t_object: 172,629 ( 2.2%)
too_complex: 58,281 ( 0.7%)
Top-3 getivar fallback reasons (100.0% of total 9,348,832):
not_t_object: 4,658,833 (49.8%)
not_monomorphic: 4,542,316 (48.6%)
too_complex: 147,683 ( 1.6%)
Top-3 definedivar fallback reasons (100.0% of total 366,383):
not_monomorphic: 361,389 (98.6%)
too_complex: 3,062 ( 0.8%)
not_t_object: 1,932 ( 0.5%)
```
railsbench:
```
Top-3 setivar fallback reasons (100.0% of total 15,119,057):
shape_transition: 13,760,763 (91.0%)
not_monomorphic: 982,368 ( 6.5%)
not_t_object: 375,926 ( 2.5%)
Top-2 getivar fallback reasons (100.0% of total 14,438,747):
not_t_object: 7,643,870 (52.9%)
not_monomorphic: 6,794,877 (47.1%)
Top-2 definedivar fallback reasons (100.0% of total 209,613):
not_monomorphic: 209,526 (100.0%)
not_t_object: 87 ( 0.0%)
```
shipit:
```
Top-3 setivar fallback reasons (100.0% of total 14,516,254):
shape_transition: 8,613,512 (59.3%)
not_monomorphic: 5,761,398 (39.7%)
not_t_object: 141,344 ( 1.0%)
Top-2 getivar fallback reasons (100.0% of total 21,016,444):
not_monomorphic: 11,313,482 (53.8%)
not_t_object: 9,702,962 (46.2%)
Top-2 definedivar fallback reasons (100.0% of total 290,382):
not_monomorphic: 287,755 (99.1%)
not_t_object: 2,627 ( 0.9%)
```
|
|
Don't support shape transitions for now.
|
|
JIT-to-JIT sends don't blit locals to nil in the callee's
EP memory region because HIR is aware of this initial state and
memory ops are only done when necessary. Previously, we
read from this initialized memory by emitting `GetLocal` in e.g. BBs
that are immediate successor to an entrypoint.
The entry points sets up the frame state properly and we also reload
locals if necessary after an operation that potentially makes the
environment escape. So, listen to the frame state when it's supposed to
be up-to-date (`!local_inval`).
|
|
|
|
|
|
|