| Age | Commit message (Collapse) | Author |
|
Resolves https://github.com/Shopify/ruby/issues/772
Adds profiling for the `getblockparamproxy` YARV instruction and handles the `nil` block case by pushing `nil` instead of the block proxy object, improves `ratio_in_zjit` a tiny bit (0.1%)
Profiling data for `getblockparamproxy` on Lobsters
```
Top-6 getblockparamproxy handler (100.0% of total 3,353,291):
polymorphic: 2,337,372 (69.7%)
nil: 552,629 (16.5%)
iseq: 259,636 ( 7.7%)
no_profiles: 156,734 ( 4.7%)
proc: 40,223 ( 1.2%)
megamorphic: 6,697 ( 0.2%)
```
Lobsters benchmark stats:
<details>
<summary>Stats before (master):</summary>
<p>
```
❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters
***ZJIT: Printing ZJIT statistics on exit***
...
Top-20 side exit reasons (100.0% of total 15,338,024):
guard_type_failure: 6,889,050 (44.9%)
guard_shape_failure: 6,848,898 (44.7%)
block_param_proxy_not_iseq_or_ifunc: 1,008,525 ( 6.6%)
unhandled_hir_insn: 236,977 ( 1.5%)
compile_error: 191,763 ( 1.3%)
fixnum_mult_overflow: 50,739 ( 0.3%)
block_param_proxy_modified: 28,119 ( 0.2%)
patchpoint_stable_constant_names: 18,229 ( 0.1%)
unhandled_newarray_send_pack: 14,481 ( 0.1%)
unhandled_block_arg: 13,782 ( 0.1%)
fixnum_lshift_overflow: 10,085 ( 0.1%)
patchpoint_no_ep_escape: 7,815 ( 0.1%)
unhandled_yarv_insn: 7,540 ( 0.0%)
expandarray_failure: 4,533 ( 0.0%)
guard_super_method_entry: 4,475 ( 0.0%)
patchpoint_method_redefined: 1,207 ( 0.0%)
patchpoint_no_singleton_class: 1,130 ( 0.0%)
obj_to_string_fallback: 412 ( 0.0%)
guard_less_failure: 163 ( 0.0%)
interrupt: 82 ( 0.0%)
...
ratio_in_zjit: 82.1%
```
</p>
</details>
<details>
<summary>Stats after:</summary>
<p>
```
❯ ./run_benchmarks.rb --chruby 'ruby-zjit --zjit-stats' lobsters
***ZJIT: Printing ZJIT statistics on exit***
...
Top-20 side exit reasons (100.0% of total 15,061,422):
guard_type_failure: 6,892,934 (45.8%)
guard_shape_failure: 6,850,512 (45.5%)
block_param_proxy_not_iseq_or_ifunc: 549,823 ( 3.7%)
unhandled_hir_insn: 236,979 ( 1.6%)
compile_error: 191,782 ( 1.3%)
unhandled_yarv_insn: 128,695 ( 0.9%)
block_param_proxy_not_nil: 68,623 ( 0.5%)
fixnum_mult_overflow: 50,739 ( 0.3%)
patchpoint_stable_constant_names: 18,568 ( 0.1%)
unhandled_newarray_send_pack: 14,481 ( 0.1%)
block_param_proxy_modified: 13,819 ( 0.1%)
unhandled_block_arg: 13,798 ( 0.1%)
fixnum_lshift_overflow: 10,085 ( 0.1%)
patchpoint_no_ep_escape: 7,815 ( 0.1%)
expandarray_failure: 4,533 ( 0.0%)
guard_super_method_entry: 4,475 ( 0.0%)
patchpoint_method_redefined: 1,207 ( 0.0%)
obj_to_string_fallback: 1,140 ( 0.0%)
patchpoint_no_singleton_class: 1,130 ( 0.0%)
guard_less_failure: 163 ( 0.0%)
...
ratio_in_zjit: 82.2%
```
</p>
</details>
|
|
* ZJIT: Profile `invokesuper` instructions
* ZJIT: Introduce the `InvokeSuperDirect` HIR instruction
The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ.
* ZJIT: Expand definition of unspecializable to more complex cases
* ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified
* ZJIT: Simplify `invokesuper` specialization to most common case
Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`.
* ZJIT: Track `super` method entries directly to avoid GC issues
Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children.
* ZJIT: Optimize `super` calls with simple argument forms
* ZJIT: Report the reason why we can't optimize an `invokesuper` instance
* ZJIT: Revise send fallback reasons for `super` calls
* ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
|
|
* ZJIT: Add dump to file for --zjit-stats
* ZJIT: Rename --zjit-stats=quiet to --zjit-stats-quiet
|
|
lobsters:
```
Top-4 setivar fallback reasons (100.0% of total 7,789,008):
shape_transition: 6,074,085 (78.0%)
not_monomorphic: 1,484,013 (19.1%)
not_t_object: 172,629 ( 2.2%)
too_complex: 58,281 ( 0.7%)
Top-3 getivar fallback reasons (100.0% of total 9,348,832):
not_t_object: 4,658,833 (49.8%)
not_monomorphic: 4,542,316 (48.6%)
too_complex: 147,683 ( 1.6%)
Top-3 definedivar fallback reasons (100.0% of total 366,383):
not_monomorphic: 361,389 (98.6%)
too_complex: 3,062 ( 0.8%)
not_t_object: 1,932 ( 0.5%)
```
railsbench:
```
Top-3 setivar fallback reasons (100.0% of total 15,119,057):
shape_transition: 13,760,763 (91.0%)
not_monomorphic: 982,368 ( 6.5%)
not_t_object: 375,926 ( 2.5%)
Top-2 getivar fallback reasons (100.0% of total 14,438,747):
not_t_object: 7,643,870 (52.9%)
not_monomorphic: 6,794,877 (47.1%)
Top-2 definedivar fallback reasons (100.0% of total 209,613):
not_monomorphic: 209,526 (100.0%)
not_t_object: 87 ( 0.0%)
```
shipit:
```
Top-3 setivar fallback reasons (100.0% of total 14,516,254):
shape_transition: 8,613,512 (59.3%)
not_monomorphic: 5,761,398 (39.7%)
not_t_object: 141,344 ( 1.0%)
Top-2 getivar fallback reasons (100.0% of total 21,016,444):
not_monomorphic: 11,313,482 (53.8%)
not_t_object: 9,702,962 (46.2%)
Top-2 definedivar fallback reasons (100.0% of total 290,382):
not_monomorphic: 287,755 (99.1%)
not_t_object: 2,627 ( 0.9%)
```
|
|
lobsters:
```
Top-20 calls to C functions from JIT code (79.9% of total 97,004,883):
rb_vm_opt_send_without_block: 19,874,212 (20.5%)
rb_vm_setinstancevariable: 9,774,841 (10.1%)
rb_ivar_get: 9,358,866 ( 9.6%)
rb_hash_aref: 6,828,948 ( 7.0%)
rb_vm_send: 6,441,551 ( 6.6%)
rb_vm_env_write: 5,375,989 ( 5.5%)
rb_vm_invokesuper: 3,037,836 ( 3.1%)
Module#===: 2,562,446 ( 2.6%)
rb_ary_entry: 2,354,546 ( 2.4%)
Kernel#is_a?: 1,424,092 ( 1.5%)
rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%)
Thread.current: 1,300,822 ( 1.3%)
rb_zjit_defined_ivar: 1,222,613 ( 1.3%)
rb_vm_invokeblock: 1,184,555 ( 1.2%)
Hash#[]=: 1,061,969 ( 1.1%)
rb_ary_push: 1,024,987 ( 1.1%)
rb_ary_new_capa: 904,003 ( 0.9%)
rb_str_buf_append: 833,782 ( 0.9%)
rb_class_allocate_instance: 822,626 ( 0.8%)
Hash#fetch: 755,913 ( 0.8%)
```
railsbench:
```
Top-20 calls to C functions from JIT code (74.8% of total 189,170,268):
rb_vm_opt_send_without_block: 29,870,307 (15.8%)
rb_vm_setinstancevariable: 17,631,199 ( 9.3%)
rb_hash_aref: 16,928,890 ( 8.9%)
rb_ivar_get: 14,441,240 ( 7.6%)
rb_vm_env_write: 11,571,001 ( 6.1%)
rb_vm_send: 11,153,457 ( 5.9%)
rb_vm_invokesuper: 7,568,267 ( 4.0%)
Module#===: 6,065,923 ( 3.2%)
Hash#[]=: 2,842,990 ( 1.5%)
rb_ary_entry: 2,766,125 ( 1.5%)
rb_ary_push: 2,722,079 ( 1.4%)
rb_vm_invokeblock: 2,594,398 ( 1.4%)
Thread.current: 2,560,129 ( 1.4%)
rb_str_getbyte: 1,965,627 ( 1.0%)
Kernel#is_a?: 1,961,815 ( 1.0%)
rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%)
rb_hash_new_with_size: 1,796,456 ( 0.9%)
rb_class_allocate_instance: 1,785,043 ( 0.9%)
String#empty?: 1,713,414 ( 0.9%)
rb_ary_new_capa: 1,678,834 ( 0.9%)
```
shipit:
```
Top-20 calls to C functions from JIT code (83.4% of total 182,402,821):
rb_vm_opt_send_without_block: 45,753,484 (25.1%)
rb_ivar_get: 21,020,650 (11.5%)
rb_vm_setinstancevariable: 17,528,603 ( 9.6%)
rb_hash_aref: 11,892,856 ( 6.5%)
rb_vm_send: 11,723,471 ( 6.4%)
rb_vm_env_write: 10,434,452 ( 5.7%)
Module#===: 4,225,048 ( 2.3%)
rb_vm_invokesuper: 3,705,906 ( 2.0%)
Thread.current: 3,337,603 ( 1.8%)
rb_ary_entry: 3,114,378 ( 1.7%)
Hash#[]=: 2,509,912 ( 1.4%)
Array#empty?: 2,282,994 ( 1.3%)
rb_vm_invokeblock: 2,210,511 ( 1.2%)
Hash#fetch: 2,017,960 ( 1.1%)
_bi20: 1,975,147 ( 1.1%)
rb_zjit_defined_ivar: 1,897,127 ( 1.0%)
rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%)
rb_ary_new_capa: 1,615,406 ( 0.9%)
Kernel#is_a?: 1,567,854 ( 0.9%)
rb_class_allocate_instance: 1,560,035 ( 0.9%)
```
Thanks to @eregon for the idea.
Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
|
|
|
|
This implements Shopify#854:
- Splits boot-time and enable-time initialization,
tracks progress with `InitializationState` enum
- Introduces `RubyVM::ZJIT.enable` Ruby method for
enabling the JIT lazily, if not already enabled
- Introduces `--zjit-disable` flag, which can be
used alongside the other `--zjit-*` flags but
prevents enabling the JIT at boot time
- Adds ZJIT infra to support JIT hooks, but this
is not currently exercised (Shopify/ruby#667)
Left for future enhancements:
- Support kwargs for overriding the CLI flags in
`RubyVM::ZJIT.enable`
Closes Shopify#854
|
|
|
|
|
|
Make it more obvious that this hasn't been handled and could be
broken down more.
|
|
|
|
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks.
For lobsters:
```
Top-6 invokeblock handler (100.0% of total 1,064,155):
megamorphic: 494,931 (46.5%)
monomorphic_iseq: 337,171 (31.7%)
polymorphic: 113,381 (10.7%)
monomorphic_ifunc: 52,260 ( 4.9%)
monomorphic_other: 38,970 ( 3.7%)
no_profiles: 27,442 ( 2.6%)
```
For railsbench:
```
Top-6 invokeblock handler (100.0% of total 2,529,104):
monomorphic_iseq: 834,452 (33.0%)
megamorphic: 818,347 (32.4%)
polymorphic: 632,273 (25.0%)
monomorphic_ifunc: 224,243 ( 8.9%)
monomorphic_other: 19,595 ( 0.8%)
no_profiles: 194 ( 0.0%)
```
For shipit:
```
Top-6 invokeblock handler (100.0% of total 2,104,148):
megamorphic: 1,269,889 (60.4%)
polymorphic: 411,475 (19.6%)
no_profiles: 173,367 ( 8.2%)
monomorphic_other: 118,619 ( 5.6%)
monomorphic_iseq: 84,891 ( 4.0%)
monomorphic_ifunc: 45,907 ( 2.2%)
```
Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
|
|
|
|
new ZJIT stats excerpt from liquid-runtime:
```
vm_read_from_parent_iseq_local_count: 10,909,753
guard_type_count: 45,109,441
guard_type_exit_ratio: 4.3%
guard_shape_count: 15,272,133
guard_shape_exit_ratio: 20.1%
code_region_bytes: 3,899,392
```
lobsters
```
guard_type_count: 71,765,580
guard_type_exit_ratio: 4.3%
guard_shape_count: 21,872,560
guard_shape_exit_ratio: 8.0%
```
railsbench
```
guard_type_count: 117,661,124
guard_type_exit_ratio: 0.7%
guard_shape_count: 28,032,665
guard_shape_exit_ratio: 5.1%
```
shipit
```
guard_type_count: 106,195,615
guard_type_exit_ratio: 3.5%
guard_shape_count: 33,672,673
guard_shape_exit_ratio: 10.1%
```
|
|
Kokubun bought up that "complex" is a more fitting name for what these
counters count. Thanks!
Also:
- make the SendFallbackReason enum name consistent with the counter name
- rewrite the printout prompt in zjit.rb
|
|
`send_fallback_fancy_call_feature`
In cases we fall back when the callee has an unsupported signature, it
was a little inaccurate to use `send_fallback_send_not_optimized_method_type`.
We do support the method type in other situations.
Add a new `send_fallback_fancy_call_feature` for these situations. Also,
`send_fallback_bmethod_non_iseq_proc` so we can stop using
`not_optimized_method_type` completely for bmethods.
Add accompanying `fancy_arg_pass_*` counters. These don't sum to the number
of unoptimized calls that run, but establishes the level of support the
optimizer provides for a given workload.
|
|
|
|
We can measure how many we can remove by adding type information to C
functions, etc.
|
|
* ZJIT: Print out full path to --zjit-trace-exits output
This helps with any `chdir`-related issues.
* Don't include dot
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
|
|
It's mostly a duplicate of not-inlined-cfuncs right now.
|
|
ZJIT: Revert 9a75c05
|
|
Copy the YJIT simple inliner except for the kwargs bit. It works great!
|
|
* ZJIT: Count unoptimized `Send`
This includes `Send` in `send fallback reasons` to guide future
optimizations.
* ZJIT: Create dedicated def_type counter for Send
|
|
|
|
|
|
|
|
|
|
* Using https://www.rubyexplorer.xyz/?c=frames+%3D+results%5B%3Aframes%5D.dup shows dup is called regardless
|
|
|
|
* ZJIT: Add HIR for CCallWithFrame
* ZJIT: Update stats to count not inlined cfunc calls
* ZJIT: Stops optimizing SendWithoutBlock when TracePoint is activated
* ZJIT: Fallback to SendWithoutBlock when CCallWithFrame has too many args
* ZJIT: Rename cfun -> cfunc
|
|
|
|
`File.binwrite` with a big string can exceed the `INT_MAX` limit of write(2)
and fail with an exception.
|
|
|
|
|
|
|
|
|
|
|
|
Add side exit tracing functionality for ZJIT
|
|
|
|
|
|
* ZJIT: Add stats for cfuncs that are not optimized
* ZJIT: Add IncrCounterPtr HIR instead
From `lobsters`
```
Top-20 Unoptimized C functions (73.0% of total 15,276,688):
Kernel#is_a?: 2,052,363 (13.4%)
Class#current: 1,892,623 (12.4%)
String#to_s: 975,973 ( 6.4%)
Hash#key?: 677,623 ( 4.4%)
String#empty?: 636,468 ( 4.2%)
TrueClass#===: 457,232 ( 3.0%)
Hash#[]=: 455,908 ( 3.0%)
FalseClass#===: 448,798 ( 2.9%)
ActiveSupport::OrderedOptions#_get: 377,468 ( 2.5%)
Kernel#kind_of?: 339,551 ( 2.2%)
Kernel#dup: 329,371 ( 2.2%)
String#==: 324,286 ( 2.1%)
String#include?: 297,528 ( 1.9%)
Hash#[]: 294,561 ( 1.9%)
Array#include?: 287,145 ( 1.9%)
Kernel#block_given?: 283,633 ( 1.9%)
BasicObject#!=: 278,874 ( 1.8%)
Hash#delete: 250,951 ( 1.6%)
Set#include?: 246,447 ( 1.6%)
NilClass#===: 242,776 ( 1.6%)
```
From `liquid-render`
```
Top-20 Unoptimized C functions (99.8% of total 5,195,549):
Hash#key?: 2,459,048 (47.3%)
String#to_s: 1,119,758 (21.6%)
Set#include?: 799,469 (15.4%)
Kernel#is_a?: 214,223 ( 4.1%)
Integer#<<: 171,073 ( 3.3%)
Integer#/: 127,622 ( 2.5%)
CGI::EscapeExt#escapeHTML: 56,971 ( 1.1%)
Regexp#===: 50,008 ( 1.0%)
String#empty?: 43,990 ( 0.8%)
String#===: 36,838 ( 0.7%)
String#==: 21,309 ( 0.4%)
Time#strftime: 21,251 ( 0.4%)
String#strip: 15,271 ( 0.3%)
String#scan: 13,753 ( 0.3%)
String#+@: 12,603 ( 0.2%)
Array#include?: 8,059 ( 0.2%)
String#+: 5,295 ( 0.1%)
String#dup: 4,606 ( 0.1%)
String#-@: 3,213 ( 0.1%)
Class#generate: 3,011 ( 0.1%)
```
|
|
* ZJIT: Add polymorphism counters
* .
* .
|
|
ZJIT: Measure writing to locals with level > 0
|
|
|
|
This is a) a lot of memory traffic and b) is another good proxy for our
ability to strength reduce method calls.
|
|
I thought about creating a new HIR like `SendWithoutBlockFailedToOptimize` that can carry very specific reasons later. But it'll mean adding it to every branch matching `SendWithoutBlock` and may make code unnecessarily complicated.
So I take the easier path for now:
```
Top-4 send fallback def_types (100.0% of total 21,375,357):
cfunc: 20,164,487 (94.3%)
optimized: 1,197,897 ( 5.6%)
attrset: 12,953 ( 0.1%)
alias: 20 ( 0.0%)
```
|
|
|
|
|
|
[ZJIT] Avoid mutating string in zjit stats
GitHub runs with a Symbol patch that causes a frozen string error when
running `--zjit-stats`
```rb
Class Symbol
alias_method :to_s, :name
end
```
I remember hearing that Shopify runs a similar patch, and that we might
try to make this the default behavior in Ruby some day.
Any chance we can avoid mutating the string here in case it's frozen?
That does mean we'll end up making some extra strings when it's not
frozen, but I think that's OK for printing stats.
|