| Age | Commit message (Collapse) | Author |
|
* ZJIT: Profile `invokesuper` instructions
* ZJIT: Introduce the `InvokeSuperDirect` HIR instruction
The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ.
* ZJIT: Expand definition of unspecializable to more complex cases
* ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified
* ZJIT: Simplify `invokesuper` specialization to most common case
Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`.
* ZJIT: Track `super` method entries directly to avoid GC issues
Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children.
* ZJIT: Optimize `super` calls with simple argument forms
* ZJIT: Report the reason why we can't optimize an `invokesuper` instance
* ZJIT: Revise send fallback reasons for `super` calls
* ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks
|
|
Make sure we check if we have seen a singleton for this class before assuming we have not. Port the API from YJIT.
|
|
GuardShape is just load+guard, so use the existing HIR instructions for load+guard. Probably makes future analysis slightly easier.
|
|
Inline `Array#[]=` into `ArrayAset`.
|
|
Fix https://github.com/Shopify/ruby/issues/874
|
|
|
|
|
|
|
|
Fixes https://github.com/Shopify/ruby/issues/902
This pull request adds code generation for dividing fixnums.
Testing confirms the normal case, flooring, and side-exiting on division by zero.
|
|
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes.
lobsters stats before:
```
Top-20 calls to C functions from JIT code (79.4% of total 90,051,140):
rb_vm_opt_send_without_block: 19,762,433 (21.9%)
rb_vm_setinstancevariable: 7,698,314 ( 8.5%)
rb_hash_aref: 6,767,461 ( 7.5%)
rb_vm_env_write: 5,373,080 ( 6.0%)
rb_vm_send: 5,049,229 ( 5.6%)
rb_vm_getinstancevariable: 4,535,259 ( 5.0%)
rb_obj_is_kind_of: 3,746,306 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%)
rb_vm_invokesuper: 3,037,467 ( 3.4%)
rb_ary_entry: 2,351,983 ( 2.6%)
rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%)
rb_vm_invokeblock: 1,184,474 ( 1.3%)
Hash#[]=: 1,064,288 ( 1.2%)
rb_gc_writebarrier: 1,006,972 ( 1.1%)
rb_ec_ary_new_from_values: 902,687 ( 1.0%)
fetch: 898,667 ( 1.0%)
rb_str_buf_append: 833,787 ( 0.9%)
rb_class_allocate_instance: 822,024 ( 0.9%)
Hash#fetch: 699,580 ( 0.8%)
_bi20: 682,068 ( 0.8%)
Top-4 setivar fallback reasons (100.0% of total 7,732,326):
shape_transition: 6,032,109 (78.0%)
not_monomorphic: 1,469,300 (19.0%)
not_t_object: 172,636 ( 2.2%)
too_complex: 58,281 ( 0.8%)
```
lobsters stats after:
```
Top-20 calls to C functions from JIT code (79.0% of total 88,322,656):
rb_vm_opt_send_without_block: 19,777,880 (22.4%)
rb_hash_aref: 6,771,589 ( 7.7%)
rb_vm_env_write: 5,372,789 ( 6.1%)
rb_gc_writebarrier: 5,195,527 ( 5.9%)
rb_vm_send: 5,049,145 ( 5.7%)
rb_vm_getinstancevariable: 4,538,485 ( 5.1%)
rb_obj_is_kind_of: 3,746,241 ( 4.2%)
rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%)
rb_vm_invokesuper: 3,037,157 ( 3.4%)
rb_ary_entry: 2,351,968 ( 2.7%)
rb_vm_setinstancevariable: 1,703,337 ( 1.9%)
rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%)
rb_vm_invokeblock: 1,184,290 ( 1.3%)
Hash#[]=: 1,061,868 ( 1.2%)
rb_ec_ary_new_from_values: 902,666 ( 1.0%)
fetch: 898,666 ( 1.0%)
rb_str_buf_append: 833,784 ( 0.9%)
rb_class_allocate_instance: 821,778 ( 0.9%)
Hash#fetch: 755,913 ( 0.9%)
Top-4 setivar fallback reasons (100.0% of total 1,703,337):
not_monomorphic: 1,472,405 (86.4%)
not_t_object: 172,629 (10.1%)
too_complex: 58,281 ( 3.4%)
new_shape_needs_extension: 22 ( 0.0%)
```
I also noticed that primitive printing in HIR was broken so I fixed that.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
lobsters:
```
Top-4 setivar fallback reasons (100.0% of total 7,789,008):
shape_transition: 6,074,085 (78.0%)
not_monomorphic: 1,484,013 (19.1%)
not_t_object: 172,629 ( 2.2%)
too_complex: 58,281 ( 0.7%)
Top-3 getivar fallback reasons (100.0% of total 9,348,832):
not_t_object: 4,658,833 (49.8%)
not_monomorphic: 4,542,316 (48.6%)
too_complex: 147,683 ( 1.6%)
Top-3 definedivar fallback reasons (100.0% of total 366,383):
not_monomorphic: 361,389 (98.6%)
too_complex: 3,062 ( 0.8%)
not_t_object: 1,932 ( 0.5%)
```
railsbench:
```
Top-3 setivar fallback reasons (100.0% of total 15,119,057):
shape_transition: 13,760,763 (91.0%)
not_monomorphic: 982,368 ( 6.5%)
not_t_object: 375,926 ( 2.5%)
Top-2 getivar fallback reasons (100.0% of total 14,438,747):
not_t_object: 7,643,870 (52.9%)
not_monomorphic: 6,794,877 (47.1%)
Top-2 definedivar fallback reasons (100.0% of total 209,613):
not_monomorphic: 209,526 (100.0%)
not_t_object: 87 ( 0.0%)
```
shipit:
```
Top-3 setivar fallback reasons (100.0% of total 14,516,254):
shape_transition: 8,613,512 (59.3%)
not_monomorphic: 5,761,398 (39.7%)
not_t_object: 141,344 ( 1.0%)
Top-2 getivar fallback reasons (100.0% of total 21,016,444):
not_monomorphic: 11,313,482 (53.8%)
not_t_object: 9,702,962 (46.2%)
Top-2 definedivar fallback reasons (100.0% of total 290,382):
not_monomorphic: 287,755 (99.1%)
not_t_object: 2,627 ( 0.9%)
```
|
|
This is good for protoboeuf and other binary parsing
|
|
lobsters:
```
Top-20 calls to C functions from JIT code (79.9% of total 97,004,883):
rb_vm_opt_send_without_block: 19,874,212 (20.5%)
rb_vm_setinstancevariable: 9,774,841 (10.1%)
rb_ivar_get: 9,358,866 ( 9.6%)
rb_hash_aref: 6,828,948 ( 7.0%)
rb_vm_send: 6,441,551 ( 6.6%)
rb_vm_env_write: 5,375,989 ( 5.5%)
rb_vm_invokesuper: 3,037,836 ( 3.1%)
Module#===: 2,562,446 ( 2.6%)
rb_ary_entry: 2,354,546 ( 2.4%)
Kernel#is_a?: 1,424,092 ( 1.5%)
rb_vm_opt_getconstant_path: 1,344,923 ( 1.4%)
Thread.current: 1,300,822 ( 1.3%)
rb_zjit_defined_ivar: 1,222,613 ( 1.3%)
rb_vm_invokeblock: 1,184,555 ( 1.2%)
Hash#[]=: 1,061,969 ( 1.1%)
rb_ary_push: 1,024,987 ( 1.1%)
rb_ary_new_capa: 904,003 ( 0.9%)
rb_str_buf_append: 833,782 ( 0.9%)
rb_class_allocate_instance: 822,626 ( 0.8%)
Hash#fetch: 755,913 ( 0.8%)
```
railsbench:
```
Top-20 calls to C functions from JIT code (74.8% of total 189,170,268):
rb_vm_opt_send_without_block: 29,870,307 (15.8%)
rb_vm_setinstancevariable: 17,631,199 ( 9.3%)
rb_hash_aref: 16,928,890 ( 8.9%)
rb_ivar_get: 14,441,240 ( 7.6%)
rb_vm_env_write: 11,571,001 ( 6.1%)
rb_vm_send: 11,153,457 ( 5.9%)
rb_vm_invokesuper: 7,568,267 ( 4.0%)
Module#===: 6,065,923 ( 3.2%)
Hash#[]=: 2,842,990 ( 1.5%)
rb_ary_entry: 2,766,125 ( 1.5%)
rb_ary_push: 2,722,079 ( 1.4%)
rb_vm_invokeblock: 2,594,398 ( 1.4%)
Thread.current: 2,560,129 ( 1.4%)
rb_str_getbyte: 1,965,627 ( 1.0%)
Kernel#is_a?: 1,961,815 ( 1.0%)
rb_vm_opt_getconstant_path: 1,863,678 ( 1.0%)
rb_hash_new_with_size: 1,796,456 ( 0.9%)
rb_class_allocate_instance: 1,785,043 ( 0.9%)
String#empty?: 1,713,414 ( 0.9%)
rb_ary_new_capa: 1,678,834 ( 0.9%)
```
shipit:
```
Top-20 calls to C functions from JIT code (83.4% of total 182,402,821):
rb_vm_opt_send_without_block: 45,753,484 (25.1%)
rb_ivar_get: 21,020,650 (11.5%)
rb_vm_setinstancevariable: 17,528,603 ( 9.6%)
rb_hash_aref: 11,892,856 ( 6.5%)
rb_vm_send: 11,723,471 ( 6.4%)
rb_vm_env_write: 10,434,452 ( 5.7%)
Module#===: 4,225,048 ( 2.3%)
rb_vm_invokesuper: 3,705,906 ( 2.0%)
Thread.current: 3,337,603 ( 1.8%)
rb_ary_entry: 3,114,378 ( 1.7%)
Hash#[]=: 2,509,912 ( 1.4%)
Array#empty?: 2,282,994 ( 1.3%)
rb_vm_invokeblock: 2,210,511 ( 1.2%)
Hash#fetch: 2,017,960 ( 1.1%)
_bi20: 1,975,147 ( 1.1%)
rb_zjit_defined_ivar: 1,897,127 ( 1.0%)
rb_vm_opt_getconstant_path: 1,813,294 ( 1.0%)
rb_ary_new_capa: 1,615,406 ( 0.9%)
Kernel#is_a?: 1,567,854 ( 0.9%)
rb_class_allocate_instance: 1,560,035 ( 0.9%)
```
Thanks to @eregon for the idea.
Co-authored-by: Jacob Denbeaux <jacob.denbeaux@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
|
|
* Correct JIT entry points for optionals so each optional start with nil
before their initialization routine runs. Establish
`jit_entry_points[filled_opts_num]` gives the appropriate entry point
* Correct number of HIR block parameters for each JIT entry point
* Entry points that share the same ISEQ PC get separate entries since
they start with different state. No more deduplication.
* Reject post parameters. Was hidden behind check for optionals.
* Make sure to visit every BB in iseq_to_hir(). Some wasn't visited
when the initialization routine for an optional terminates the block
in a `SideExit`. Remove the now impossible `FailedOptionalArguments`.
|
|
|
|
lobsters before:
```
Top-14 instructions with uncategorized fallback reason (100.0% of total 5,583,226):
invokesuper: 3,039,693 (54.4%)
invokeblock: 1,181,433 (21.2%)
sendforward: 572,612 (10.3%)
opt_eq: 464,760 ( 8.3%)
opt_plus: 169,904 ( 3.0%)
opt_minus: 77,487 ( 1.4%)
opt_send_without_block: 42,264 ( 0.8%)
opt_gt: 12,263 ( 0.2%)
opt_neq: 9,033 ( 0.2%)
opt_mult: 8,384 ( 0.2%)
opt_or: 4,792 ( 0.1%)
opt_lt: 404 ( 0.0%)
opt_and: 160 ( 0.0%)
opt_ge: 37 ( 0.0%)
Top-15 send fallback reasons (100.0% of total 33,316,627):
send_without_block_polymorphic: 12,847,877 (38.6%)
uncategorized: 5,583,226 (16.8%)
one_or_more_complex_arg_pass: 4,504,446 (13.5%)
send_not_optimized_method_type: 3,773,513 (11.3%)
send_without_block_no_profiles: 2,663,575 ( 8.0%)
send_no_profiles: 2,206,479 ( 6.6%)
send_without_block_not_optimized_method_type_optimized: 742,574 ( 2.2%)
send_polymorphic: 467,750 ( 1.4%)
send_without_block_megamorphic: 428,364 ( 1.3%)
send_without_block_direct_too_many_args: 33,097 ( 0.1%)
send_without_block_cfunc_array_variadic: 22,255 ( 0.1%)
obj_to_string_not_string: 19,435 ( 0.1%)
send_megamorphic: 17,153 ( 0.1%)
send_without_block_not_optimized_method_type: 5,922 ( 0.0%)
ccall_with_frame_too_many_args: 961 ( 0.0%)
```
lobsters after:
```
Top-4 instructions with uncategorized fallback reason (100.0% of total 4,835,995):
invokesuper: 3,039,692 (62.9%)
invokeblock: 1,181,427 (24.4%)
sendforward: 572,612 (11.8%)
opt_send_without_block: 42,264 ( 0.9%)
Top-17 send fallback reasons (100.0% of total 33,316,645):
send_without_block_polymorphic: 12,847,879 (38.6%)
uncategorized: 4,835,995 (14.5%)
one_or_more_complex_arg_pass: 4,502,767 (13.5%)
send_without_block_no_profiles: 2,663,578 ( 8.0%)
send_not_optimized_method_type: 2,381,743 ( 7.1%)
send_no_profiles: 2,206,481 ( 6.6%)
send_cfunc_variadic: 1,391,775 ( 4.2%)
send_without_block_operands_not_fixnum: 747,228 ( 2.2%)
send_without_block_not_optimized_method_type_optimized: 742,574 ( 2.2%)
send_polymorphic: 467,750 ( 1.4%)
send_without_block_megamorphic: 428,364 ( 1.3%)
send_without_block_direct_too_many_args: 33,097 ( 0.1%)
send_without_block_cfunc_array_variadic: 22,255 ( 0.1%)
obj_to_string_not_string: 19,440 ( 0.1%)
send_megamorphic: 17,153 ( 0.1%)
send_without_block_not_optimized_method_type: 7,605 ( 0.0%)
ccall_with_frame_too_many_args: 961 ( 0.0%)
```
|
|
|
|
These refer to "OptimizedMethodType" which is a subcategory of "MethodType::Optimized"
so name them after the latter to avoid "not_optimized_optimized".
|
|
|
|
Make it more obvious that this hasn't been handled and could be
broken down more.
|
|
|
|
|
|
|
|
<details>
<summary>Before</summary>
<br>
```
**ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (64.0% of total 3,683,424):
Kernel#is_a?: 427,127 (11.6%)
Hash#[]=: 426,276 (11.6%)
String#start_with?: 336,245 ( 9.1%)
ObjectSpace::WeakKeyMap#[]: 139,406 ( 3.8%)
Hash#fetch: 127,291 ( 3.5%)
String#hash: 79,259 ( 2.2%)
Process.clock_gettime: 74,658 ( 2.0%)
Array#any?: 74,441 ( 2.0%)
Integer#==: 71,067 ( 1.9%)
Kernel#dup: 68,058 ( 1.8%)
Hash#key?: 62,306 ( 1.7%)
Regexp#match?: 62,247 ( 1.7%)
SQLite3::Statement#step: 61,172 ( 1.7%)
SQLite3::Statement#done?: 61,172 ( 1.7%)
Kernel#Array: 55,015 ( 1.5%)
Integer#<=>: 49,127 ( 1.3%)
String.new: 48,363 ( 1.3%)
IO#read: 47,753 ( 1.3%)
Array#include?: 43,307 ( 1.2%)
Struct#initialize: 42,650 ( 1.2%)
Top-3 not optimized method types for send (100.0% of total 1,022,743):
iseq: 736,483 (72.0%)
cfunc: 286,174 (28.0%)
null: 86 ( 0.0%)
Top-6 not optimized method types for send_without_block (100.0% of total 189,556):
optimized_call: 115,966 (61.2%)
optimized_send: 36,767 (19.4%)
optimized_struct_aset: 33,788 (17.8%)
null: 2,521 ( 1.3%)
optimized_block_call: 510 ( 0.3%)
cfunc: 4 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 1,648,882):
invokesuper: 697,471 (42.3%)
invokeblock: 496,687 (30.1%)
sendforward: 221,094 (13.4%)
opt_eq: 147,620 ( 9.0%)
opt_minus: 40,865 ( 2.5%)
opt_plus: 22,912 ( 1.4%)
opt_send_without_block: 18,932 ( 1.1%)
opt_gt: 867 ( 0.1%)
opt_mult: 768 ( 0.0%)
opt_neq: 654 ( 0.0%)
opt_or: 508 ( 0.0%)
opt_lt: 359 ( 0.0%)
opt_ge: 145 ( 0.0%)
Top-13 send fallback reasons (100.0% of total 8,308,826):
send_without_block_polymorphic: 3,174,975 (38.2%)
not_optimized_instruction: 1,648,882 (19.8%)
fancy_call_feature: 1,072,807 (12.9%)
send_not_optimized_method_type: 1,022,743 (12.3%)
send_no_profiles: 599,715 ( 7.2%)
send_without_block_no_profiles: 486,108 ( 5.9%)
send_without_block_not_optimized_optimized_method_type: 187,031 ( 2.3%)
send_polymorphic: 101,834 ( 1.2%)
obj_to_string_not_string: 7,610 ( 0.1%)
send_without_block_not_optimized_method_type: 2,525 ( 0.0%)
send_without_block_direct_too_many_args: 2,369 ( 0.0%)
send_without_block_cfunc_array_variadic: 2,190 ( 0.0%)
ccall_with_frame_too_many_args: 37 ( 0.0%)
Top-8 popular unsupported argument-parameter features (100.0% of total 1,209,121):
param_opt: 583,595 (48.3%)
param_forwardable: 178,162 (14.7%)
param_block: 162,689 (13.5%)
param_kw: 150,575 (12.5%)
param_rest: 90,091 ( 7.5%)
param_kwrest: 33,791 ( 2.8%)
caller_splat: 10,214 ( 0.8%)
caller_kw_splat: 4 ( 0.0%)
Top-7 unhandled YARV insns (100.0% of total 128,032):
checkkeyword: 88,698 (69.3%)
invokesuperforward: 22,296 (17.4%)
getblockparam: 16,292 (12.7%)
getconstant: 336 ( 0.3%)
checkmatch: 290 ( 0.2%)
setblockparam: 101 ( 0.1%)
once: 19 ( 0.0%)
Top-1 compile error reasons (100.0% of total 21,283):
exception_handler: 21,283 (100.0%)
Top-18 side exit reasons (100.0% of total 2,335,562):
guard_type_failure: 677,930 (29.0%)
guard_shape_failure: 410,183 (17.6%)
unhandled_kwarg: 235,100 (10.1%)
patchpoint_stable_constant_names: 206,172 ( 8.8%)
block_param_proxy_not_iseq_or_ifunc: 199,931 ( 8.6%)
patchpoint_no_singleton_class: 188,359 ( 8.1%)
unhandled_yarv_insn: 128,032 ( 5.5%)
unknown_newarray_send: 124,805 ( 5.3%)
patchpoint_method_redefined: 73,062 ( 3.1%)
unhandled_hir_insn: 56,688 ( 2.4%)
compile_error: 21,283 ( 0.9%)
block_param_proxy_modified: 11,647 ( 0.5%)
fixnum_mult_overflow: 954 ( 0.0%)
patchpoint_no_ep_escape: 813 ( 0.0%)
guard_bit_equals_failure: 316 ( 0.0%)
obj_to_string_fallback: 230 ( 0.0%)
interrupt: 35 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 26,775,579
dynamic_send_count: 8,308,826 (31.0%)
optimized_send_count: 18,466,753 (69.0%)
iseq_optimized_send_count: 7,611,729 (28.4%)
inline_cfunc_optimized_send_count: 5,935,290 (22.2%)
inline_iseq_optimized_send_count: 657,555 ( 2.5%)
non_variadic_cfunc_optimized_send_count: 3,169,054 (11.8%)
variadic_cfunc_optimized_send_count: 1,093,125 ( 4.1%)
dynamic_getivar_count: 2,793,635
dynamic_setivar_count: 3,040,844
compiled_iseq_count: 4,496
failed_iseq_count: 0
compile_time: 915ms
profile_time: 6ms
gc_time: 6ms
invalidation_time: 20ms
vm_write_pc_count: 26,857,114
vm_write_sp_count: 25,770,558
vm_write_locals_count: 25,770,558
vm_write_stack_count: 25,770,558
vm_write_to_parent_iseq_local_count: 106,036
vm_read_from_parent_iseq_local_count: 3,213,992
guard_type_count: 27,683,170
guard_type_exit_ratio: 2.4%
code_region_bytes: 32,178,176
side_exit_count: 2,335,562
total_insn_count: 170,714,077
vm_insn_count: 28,999,194
zjit_insn_count: 141,714,883
ratio_in_zjit: 83.0%
```
</details>
<details>
<summary>After</summary>
<br>
```
**ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (63.9% of total 3,686,703):
Kernel#is_a?: 427,123 (11.6%)
Hash#[]=: 426,276 (11.6%)
String#start_with?: 336,245 ( 9.1%)
ObjectSpace::WeakKeyMap#[]: 139,406 ( 3.8%)
Hash#fetch: 127,291 ( 3.5%)
String#hash: 79,259 ( 2.1%)
Process.clock_gettime: 74,658 ( 2.0%)
Array#any?: 74,441 ( 2.0%)
Integer#==: 71,067 ( 1.9%)
Kernel#dup: 68,058 ( 1.8%)
Regexp#match?: 62,336 ( 1.7%)
Hash#key?: 62,306 ( 1.7%)
SQLite3::Statement#step: 61,172 ( 1.7%)
SQLite3::Statement#done?: 61,172 ( 1.7%)
Kernel#Array: 55,048 ( 1.5%)
Integer#<=>: 49,127 ( 1.3%)
String.new: 48,363 ( 1.3%)
IO#read: 47,753 ( 1.3%)
Array#include?: 43,309 ( 1.2%)
Struct#initialize: 42,650 ( 1.2%)
Top-3 not optimized method types for send (100.0% of total 1,026,413):
iseq: 737,496 (71.9%)
cfunc: 288,831 (28.1%)
null: 86 ( 0.0%)
Top-6 not optimized method types for send_without_block (100.0% of total 189,556):
optimized_call: 115,966 (61.2%)
optimized_send: 36,767 (19.4%)
optimized_struct_aset: 33,788 (17.8%)
null: 2,521 ( 1.3%)
optimized_block_call: 510 ( 0.3%)
cfunc: 4 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 1,648,949):
invokesuper: 697,452 (42.3%)
invokeblock: 496,687 (30.1%)
sendforward: 221,094 (13.4%)
opt_eq: 147,620 ( 9.0%)
opt_minus: 40,863 ( 2.5%)
opt_plus: 22,912 ( 1.4%)
opt_send_without_block: 19,020 ( 1.2%)
opt_gt: 867 ( 0.1%)
opt_mult: 768 ( 0.0%)
opt_neq: 654 ( 0.0%)
opt_or: 508 ( 0.0%)
opt_lt: 359 ( 0.0%)
opt_ge: 145 ( 0.0%)
Top-13 send fallback reasons (100.0% of total 8,318,975):
send_without_block_polymorphic: 3,177,471 (38.2%)
not_optimized_instruction: 1,648,949 (19.8%)
fancy_call_feature: 1,075,143 (12.9%)
send_not_optimized_method_type: 1,026,413 (12.3%)
send_no_profiles: 599,748 ( 7.2%)
send_without_block_no_profiles: 486,190 ( 5.8%)
send_without_block_not_optimized_optimized_method_type: 187,031 ( 2.2%)
send_polymorphic: 102,497 ( 1.2%)
obj_to_string_not_string: 8,412 ( 0.1%)
send_without_block_not_optimized_method_type: 2,525 ( 0.0%)
send_without_block_direct_too_many_args: 2,369 ( 0.0%)
send_without_block_cfunc_array_variadic: 2,190 ( 0.0%)
ccall_with_frame_too_many_args: 37 ( 0.0%)
Top-8 popular unsupported argument-parameter features (100.0% of total 1,211,457):
param_opt: 584,073 (48.2%)
param_forwardable: 178,907 (14.8%)
param_block: 162,689 (13.4%)
param_kw: 151,688 (12.5%)
param_rest: 90,091 ( 7.4%)
param_kwrest: 33,791 ( 2.8%)
caller_splat: 10,214 ( 0.8%)
caller_kw_splat: 4 ( 0.0%)
Top-6 unhandled YARV insns (100.0% of total 39,334):
invokesuperforward: 22,296 (56.7%)
getblockparam: 16,292 (41.4%)
getconstant: 336 ( 0.9%)
checkmatch: 290 ( 0.7%)
setblockparam: 101 ( 0.3%)
once: 19 ( 0.0%)
Top-1 compile error reasons (100.0% of total 21,283):
exception_handler: 21,283 (100.0%)
Top-18 side exit reasons (100.0% of total 2,253,541):
guard_type_failure: 682,695 (30.3%)
guard_shape_failure: 410,183 (18.2%)
unhandled_kwarg: 236,780 (10.5%)
patchpoint_stable_constant_names: 206,310 ( 9.2%)
block_param_proxy_not_iseq_or_ifunc: 199,931 ( 8.9%)
patchpoint_no_singleton_class: 188,438 ( 8.4%)
unknown_newarray_send: 124,805 ( 5.5%)
patchpoint_method_redefined: 73,056 ( 3.2%)
unhandled_hir_insn: 56,686 ( 2.5%)
unhandled_yarv_insn: 39,334 ( 1.7%)
compile_error: 21,283 ( 0.9%)
block_param_proxy_modified: 11,647 ( 0.5%)
fixnum_mult_overflow: 954 ( 0.0%)
patchpoint_no_ep_escape: 813 ( 0.0%)
guard_bit_equals_failure: 316 ( 0.0%)
obj_to_string_fallback: 230 ( 0.0%)
interrupt: 58 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 27,032,751
dynamic_send_count: 8,318,975 (30.8%)
optimized_send_count: 18,713,776 (69.2%)
iseq_optimized_send_count: 7,809,698 (28.9%)
inline_cfunc_optimized_send_count: 5,980,083 (22.1%)
inline_iseq_optimized_send_count: 657,677 ( 2.4%)
non_variadic_cfunc_optimized_send_count: 3,170,381 (11.7%)
variadic_cfunc_optimized_send_count: 1,095,937 ( 4.1%)
dynamic_getivar_count: 2,793,987
dynamic_setivar_count: 3,350,905
compiled_iseq_count: 4,498
failed_iseq_count: 0
compile_time: 884ms
profile_time: 6ms
gc_time: 6ms
invalidation_time: 19ms
vm_write_pc_count: 27,417,915
vm_write_sp_count: 26,327,928
vm_write_locals_count: 26,327,928
vm_write_stack_count: 26,327,928
vm_write_to_parent_iseq_local_count: 106,036
vm_read_from_parent_iseq_local_count: 3,213,992
guard_type_count: 27,937,831
guard_type_exit_ratio: 2.4%
code_region_bytes: 32,571,392
side_exit_count: 2,253,541
total_insn_count: 170,630,429
vm_insn_count: 26,617,244
zjit_insn_count: 144,013,185
ratio_in_zjit: 84.4%
```
</details>
|
|
|
|
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks.
For lobsters:
```
Top-6 invokeblock handler (100.0% of total 1,064,155):
megamorphic: 494,931 (46.5%)
monomorphic_iseq: 337,171 (31.7%)
polymorphic: 113,381 (10.7%)
monomorphic_ifunc: 52,260 ( 4.9%)
monomorphic_other: 38,970 ( 3.7%)
no_profiles: 27,442 ( 2.6%)
```
For railsbench:
```
Top-6 invokeblock handler (100.0% of total 2,529,104):
monomorphic_iseq: 834,452 (33.0%)
megamorphic: 818,347 (32.4%)
polymorphic: 632,273 (25.0%)
monomorphic_ifunc: 224,243 ( 8.9%)
monomorphic_other: 19,595 ( 0.8%)
no_profiles: 194 ( 0.0%)
```
For shipit:
```
Top-6 invokeblock handler (100.0% of total 2,104,148):
megamorphic: 1,269,889 (60.4%)
polymorphic: 411,475 (19.6%)
no_profiles: 173,367 ( 8.2%)
monomorphic_other: 118,619 ( 5.6%)
monomorphic_iseq: 84,891 ( 4.0%)
monomorphic_ifunc: 45,907 ( 2.2%)
```
Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
|
|
|
|
new ZJIT stats excerpt from liquid-runtime:
```
vm_read_from_parent_iseq_local_count: 10,909,753
guard_type_count: 45,109,441
guard_type_exit_ratio: 4.3%
guard_shape_count: 15,272,133
guard_shape_exit_ratio: 20.1%
code_region_bytes: 3,899,392
```
lobsters
```
guard_type_count: 71,765,580
guard_type_exit_ratio: 4.3%
guard_shape_count: 21,872,560
guard_shape_exit_ratio: 8.0%
```
railsbench
```
guard_type_count: 117,661,124
guard_type_exit_ratio: 0.7%
guard_shape_count: 28,032,665
guard_shape_exit_ratio: 5.1%
```
shipit
```
guard_type_count: 106,195,615
guard_type_exit_ratio: 3.5%
guard_shape_count: 33,672,673
guard_shape_exit_ratio: 10.1%
```
|
|
Kokubun bought up that "complex" is a more fitting name for what these
counters count. Thanks!
Also:
- make the SendFallbackReason enum name consistent with the counter name
- rewrite the printout prompt in zjit.rb
|
|
Inline the `String#bytesize` function and remove the C call.
|
|
These just call to the C functions that do the optimized test but this avoids the side exit.
See https://github.com/ruby/ruby/pull/12123 for the original CRuby/YJIT implementation.
|
|
These count caller-side features we don't support. But because we side
exit when we see them through unhandled_call_type(), these new counters
currently don't trigger.
|
|
`send_fallback_fancy_call_feature`
In cases we fall back when the callee has an unsupported signature, it
was a little inaccurate to use `send_fallback_send_not_optimized_method_type`.
We do support the method type in other situations.
Add a new `send_fallback_fancy_call_feature` for these situations. Also,
`send_fallback_bmethod_non_iseq_proc` so we can stop using
`not_optimized_method_type` completely for bmethods.
Add accompanying `fancy_arg_pass_*` counters. These don't sum to the number
of unoptimized calls that run, but establishes the level of support the
optimizer provides for a given workload.
|
|
We can see send/block call/struct aref/... e.g. on lobsters:
```
Top-9 not optimized method types for send_without_block (100.0% of total 3,133,812):
iseq: 2,004,557 (64.0%)
optimized_struct_aref: 496,232 (15.8%)
alias: 268,579 ( 8.6%)
optimized_call: 224,883 ( 7.2%)
optimized_send: 120,531 ( 3.8%)
bmethod: 12,011 ( 0.4%)
null: 4,636 ( 0.1%)
optimized_block_call: 1,930 ( 0.1%)
cfunc: 453 ( 0.0%)
```
railsbench:
```
Top-8 not optimized method types for send_without_block (100.0% of total 5,735,608):
iseq: 2,854,551 (49.8%)
optimized_struct_aref: 871,459 (15.2%)
optimized_call: 862,185 (15.0%)
alias: 588,486 (10.3%)
optimized_send: 482,171 ( 8.4%)
null: 39,942 ( 0.7%)
bmethod: 36,784 ( 0.6%)
cfunc: 30 ( 0.0%)
```
shipit:
```
Top-10 not optimized method types for send_without_block (100.0% of total 4,844,304):
iseq: 2,881,206 (59.5%)
optimized_struct_aref: 1,158,935 (23.9%)
optimized_call: 472,898 ( 9.8%)
alias: 208,010 ( 4.3%)
optimized_send: 55,479 ( 1.1%)
null: 47,273 ( 1.0%)
bmethod: 12,608 ( 0.3%)
optimized_block_call: 7,860 ( 0.2%)
cfunc: 31 ( 0.0%)
optimized_struct_aset: 4 ( 0.0%)
```
|
|
Allow instructions to constrain their operands' input types to avoid
accidentally creating invalid HIR.
|
|
We can measure how many we can remove by adding type information to C
functions, etc.
|
|
Fixes https://github.com/Shopify/ruby/issues/814
This change specializes the case of calling `Array#pop` on a non frozen array with no arguments. `Array#pop` exists in the non-inlined C function list in the ZJIT SFR performance burndown list.
If in the future it is helpful, this patch could be extended to support the case where an argument is provided, but this initial work seeks to elide the ruby frame normally pushed in the case of `Array#pop` without an argument.
|
|
Copy the YJIT simple inliner except for the kwargs bit. It works great!
|
|
This helps ZJIT optimize ~300,000 more sends in ruby-bench's lobsters
Top-6 not optimized method types for send_without_block
Before After
iseq: 713,899 (48.0%) iseq: 725,668 (62.4%)
optimized: 359,864 (24.2%) optimized: 359,940 (31.0%)
bmethod: 339,040 (22.8%) alias: 73,541 ( 6.3%)
alias: 73,392 ( 4.9%) null: 2,521 ( 0.2%)
null: 2,521 ( 0.2%) bmethod: 979 ( 0.1%)
cfunc: 4 ( 0.0%) cfunc: 4 ( 0.0%)
|
|
This is mostly to see what happens to the loops-times benchmark.
|
|
Only support the simple case: no splat or rest.
lobsters before:
<details>
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (60.5% of total 11,039,954):
Kernel#is_a?: 1,030,769 ( 9.3%)
String#<<: 851,954 ( 7.7%)
Hash#[]=: 742,941 ( 6.7%)
Regexp#match?: 399,894 ( 3.6%)
String#empty?: 353,775 ( 3.2%)
Hash#key?: 349,147 ( 3.2%)
String#start_with?: 334,961 ( 3.0%)
Kernel#respond_to?: 316,528 ( 2.9%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 2.2%)
TrueClass#===: 235,771 ( 2.1%)
FalseClass#===: 231,144 ( 2.1%)
Array#include?: 211,385 ( 1.9%)
Hash#fetch: 204,702 ( 1.9%)
Kernel#block_given?: 181,797 ( 1.6%)
Kernel#dup: 179,341 ( 1.6%)
BasicObject#!=: 175,997 ( 1.6%)
Class#new: 168,079 ( 1.5%)
Kernel#kind_of?: 165,600 ( 1.5%)
String#==: 157,735 ( 1.4%)
Module#clock_gettime: 144,992 ( 1.3%)
Top-20 not annotated C methods (61.4% of total 11,202,087):
Kernel#is_a?: 1,212,660 (10.8%)
String#<<: 851,954 ( 7.6%)
Hash#[]=: 743,120 ( 6.6%)
Regexp#match?: 399,894 ( 3.6%)
String#empty?: 361,013 ( 3.2%)
Hash#key?: 349,147 ( 3.1%)
String#start_with?: 334,961 ( 3.0%)
Kernel#respond_to?: 316,528 ( 2.8%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 2.1%)
TrueClass#===: 235,771 ( 2.1%)
FalseClass#===: 231,144 ( 2.1%)
Array#include?: 211,385 ( 1.9%)
Hash#fetch: 204,702 ( 1.8%)
Kernel#block_given?: 191,666 ( 1.7%)
Kernel#dup: 179,348 ( 1.6%)
BasicObject#!=: 176,181 ( 1.6%)
Class#new: 168,079 ( 1.5%)
Kernel#kind_of?: 165,634 ( 1.5%)
String#==: 163,667 ( 1.5%)
Module#clock_gettime: 144,992 ( 1.3%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,682):
iseq: 2,271,936 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,703 (21.0%)
alias: 310,747 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,171):
invokesuper: 2,373,404 (55.3%)
invokeblock: 811,926 (18.9%)
sendforward: 505,452 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,404 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,530,724):
send_without_block_polymorphic: 9,722,491 (38.1%)
send_no_profiles: 5,894,788 (23.1%)
send_without_block_not_optimized_method_type: 4,523,682 (17.7%)
not_optimized_instruction: 4,293,171 (16.8%)
send_without_block_no_profiles: 998,746 ( 3.9%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,950):
expandarray: 328,490 (47.5%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 49,119 ( 7.1%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,718,636):
register_spill_on_alloc: 3,418,255 (91.9%)
register_spill_on_ccall: 182,018 ( 4.9%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,385):
compile_error: 3,718,636 (34.2%)
guard_type_failure: 2,638,926 (24.3%)
guard_shape_failure: 1,917,209 (17.7%)
unhandled_yarv_insn: 690,950 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,789 ( 4.9%)
unhandled_kwarg: 455,347 ( 4.2%)
patchpoint: 370,476 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,071 ( 1.1%)
unhandled_hir_insn: 76,397 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 17 ( 0.0%)
send_count: 62,244,604
dynamic_send_count: 25,530,724 (41.0%)
optimized_send_count: 36,713,880 (59.0%)
iseq_optimized_send_count: 18,587,512 (29.9%)
inline_cfunc_optimized_send_count: 7,086,414 (11.4%)
non_variadic_cfunc_optimized_send_count: 8,375,754 (13.5%)
variadic_cfunc_optimized_send_count: 2,664,200 ( 4.3%)
dynamic_getivar_count: 7,365,995
dynamic_setivar_count: 7,245,005
compiled_iseq_count: 4,796
failed_iseq_count: 447
compile_time: 814ms
profile_time: 9ms
gc_time: 9ms
invalidation_time: 72ms
vm_write_pc_count: 64,156,223
vm_write_sp_count: 62,812,449
vm_write_locals_count: 62,812,449
vm_write_stack_count: 62,812,449
vm_write_to_parent_iseq_local_count: 292,458
vm_read_from_parent_iseq_local_count: 6,599,701
code_region_bytes: 22,953,984
side_exit_count: 10,860,385
total_insn_count: 517,606,340
vm_insn_count: 162,979,530
zjit_insn_count: 354,626,810
ratio_in_zjit: 68.5%
```
</details>
lobsters after:
<details>
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (59.9% of total 11,291,815):
Kernel#is_a?: 1,046,269 ( 9.3%)
String#<<: 851,954 ( 7.5%)
Hash#[]=: 743,274 ( 6.6%)
Regexp#match?: 399,894 ( 3.5%)
String#empty?: 353,775 ( 3.1%)
Hash#key?: 349,147 ( 3.1%)
String#start_with?: 334,961 ( 3.0%)
Kernel#respond_to?: 316,502 ( 2.8%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 2.1%)
TrueClass#===: 235,771 ( 2.1%)
FalseClass#===: 231,144 ( 2.0%)
String#sub!: 219,579 ( 1.9%)
Array#include?: 211,385 ( 1.9%)
Hash#fetch: 204,702 ( 1.8%)
Kernel#block_given?: 181,797 ( 1.6%)
Kernel#dup: 179,341 ( 1.6%)
BasicObject#!=: 175,997 ( 1.6%)
Class#new: 168,079 ( 1.5%)
Kernel#kind_of?: 165,600 ( 1.5%)
String#==: 157,742 ( 1.4%)
Top-20 not annotated C methods (60.9% of total 11,466,928):
Kernel#is_a?: 1,239,923 (10.8%)
String#<<: 851,954 ( 7.4%)
Hash#[]=: 743,453 ( 6.5%)
Regexp#match?: 399,894 ( 3.5%)
String#empty?: 361,013 ( 3.1%)
Hash#key?: 349,147 ( 3.0%)
String#start_with?: 334,961 ( 2.9%)
Kernel#respond_to?: 316,502 ( 2.8%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 2.1%)
TrueClass#===: 235,771 ( 2.1%)
FalseClass#===: 231,144 ( 2.0%)
String#sub!: 219,579 ( 1.9%)
Array#include?: 211,385 ( 1.8%)
Hash#fetch: 204,702 ( 1.8%)
Kernel#block_given?: 191,666 ( 1.7%)
Kernel#dup: 179,348 ( 1.6%)
BasicObject#!=: 176,181 ( 1.5%)
Class#new: 168,079 ( 1.5%)
Kernel#kind_of?: 165,634 ( 1.4%)
String#==: 163,674 ( 1.4%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,524,016):
iseq: 2,272,269 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,704 (21.0%)
alias: 310,747 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,294,241):
invokesuper: 2,375,446 (55.3%)
invokeblock: 810,955 (18.9%)
sendforward: 505,451 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,404 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,534,542):
send_without_block_polymorphic: 9,723,469 (38.1%)
send_no_profiles: 5,896,023 (23.1%)
send_without_block_not_optimized_method_type: 4,524,016 (17.7%)
not_optimized_instruction: 4,294,241 (16.8%)
send_without_block_no_profiles: 998,947 ( 3.9%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-8 unhandled YARV insns (100.0% of total 362,460):
checkkeyword: 190,694 (52.6%)
getclassvariable: 59,901 (16.5%)
invokesuperforward: 49,503 (13.7%)
getblockparam: 49,119 (13.6%)
opt_duparray_send: 11,978 ( 3.3%)
getconstant: 952 ( 0.3%)
checkmatch: 290 ( 0.1%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,798,744):
register_spill_on_alloc: 3,495,669 (92.0%)
register_spill_on_ccall: 184,712 ( 4.9%)
exception_handler: 118,363 ( 3.1%)
Top-15 side exit reasons (100.0% of total 10,637,319):
compile_error: 3,798,744 (35.7%)
guard_type_failure: 2,655,504 (25.0%)
guard_shape_failure: 1,917,217 (18.0%)
block_param_proxy_not_iseq_or_ifunc: 535,789 ( 5.0%)
unhandled_kwarg: 455,492 ( 4.3%)
patchpoint: 370,478 ( 3.5%)
unhandled_yarv_insn: 362,460 ( 3.4%)
unknown_newarray_send: 314,786 ( 3.0%)
unhandled_splat: 122,071 ( 1.1%)
unhandled_hir_insn: 83,066 ( 0.8%)
block_param_proxy_modified: 19,193 ( 0.2%)
guard_int_equals_failure: 1,914 ( 0.0%)
obj_to_string_fallback: 566 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 17 ( 0.0%)
send_count: 62,495,067
dynamic_send_count: 25,534,542 (40.9%)
optimized_send_count: 36,960,525 (59.1%)
iseq_optimized_send_count: 18,582,072 (29.7%)
inline_cfunc_optimized_send_count: 7,086,638 (11.3%)
non_variadic_cfunc_optimized_send_count: 8,392,657 (13.4%)
variadic_cfunc_optimized_send_count: 2,899,158 ( 4.6%)
dynamic_getivar_count: 7,365,994
dynamic_setivar_count: 7,248,500
compiled_iseq_count: 4,780
failed_iseq_count: 463
compile_time: 816ms
profile_time: 9ms
gc_time: 11ms
invalidation_time: 70ms
vm_write_pc_count: 64,363,541
vm_write_sp_count: 63,022,221
vm_write_locals_count: 63,022,221
vm_write_stack_count: 63,022,221
vm_write_to_parent_iseq_local_count: 292,458
vm_read_from_parent_iseq_local_count: 6,850,977
code_region_bytes: 23,019,520
side_exit_count: 10,637,319
total_insn_count: 517,303,190
vm_insn_count: 160,562,103
zjit_insn_count: 356,741,087
ratio_in_zjit: 69.0%
```
</details>
railsbench before:
<details>
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (66.1% of total 25,524,934):
Hash#[]=: 1,700,237 ( 6.7%)
String#getbyte: 1,572,123 ( 6.2%)
String#<<: 1,494,022 ( 5.9%)
Kernel#is_a?: 1,429,930 ( 5.6%)
String#empty?: 1,370,323 ( 5.4%)
Regexp#match?: 1,235,067 ( 4.8%)
Kernel#respond_to?: 1,198,251 ( 4.7%)
Hash#key?: 1,087,406 ( 4.3%)
String#setbyte: 810,022 ( 3.2%)
Integer#^: 766,624 ( 3.0%)
Kernel#block_given?: 603,613 ( 2.4%)
String#==: 590,409 ( 2.3%)
Class#new: 506,216 ( 2.0%)
Hash#delete: 455,288 ( 1.8%)
BasicObject#!=: 428,771 ( 1.7%)
Hash#fetch: 408,621 ( 1.6%)
String#ascii_only?: 373,915 ( 1.5%)
ObjectSpace::WeakKeyMap#[]: 287,957 ( 1.1%)
NilClass#===: 277,244 ( 1.1%)
Kernel#Array: 269,590 ( 1.1%)
Top-20 not annotated C methods (66.8% of total 25,392,654):
Hash#[]=: 1,700,416 ( 6.7%)
String#getbyte: 1,572,123 ( 6.2%)
Kernel#is_a?: 1,515,672 ( 6.0%)
String#<<: 1,494,022 ( 5.9%)
String#empty?: 1,370,478 ( 5.4%)
Regexp#match?: 1,235,067 ( 4.9%)
Kernel#respond_to?: 1,198,251 ( 4.7%)
Hash#key?: 1,087,406 ( 4.3%)
String#setbyte: 810,022 ( 3.2%)
Integer#^: 766,624 ( 3.0%)
Kernel#block_given?: 603,613 ( 2.4%)
String#==: 601,115 ( 2.4%)
Class#new: 506,216 ( 2.0%)
Hash#delete: 455,288 ( 1.8%)
BasicObject#!=: 428,876 ( 1.7%)
Hash#fetch: 408,621 ( 1.6%)
String#ascii_only?: 373,915 ( 1.5%)
ObjectSpace::WeakKeyMap#[]: 287,957 ( 1.1%)
NilClass#===: 277,244 ( 1.1%)
Kernel#Array: 269,590 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 186,159):
iseq: 112,747 (60.6%)
cfunc: 73,412 (39.4%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,142,248):
iseq: 3,464,671 (42.6%)
optimized: 2,632,884 (32.3%)
bmethod: 1,290,701 (15.9%)
alias: 706,020 ( 8.7%)
null: 47,942 ( 0.6%)
cfunc: 30 ( 0.0%)
Top-11 not optimized instructions (100.0% of total 8,394,873):
invokesuper: 5,602,274 (66.7%)
invokeblock: 1,764,936 (21.0%)
sendforward: 551,832 ( 6.6%)
opt_eq: 441,959 ( 5.3%)
opt_plus: 31,635 ( 0.4%)
opt_send_without_block: 1,163 ( 0.0%)
opt_lt: 372 ( 0.0%)
opt_mult: 251 ( 0.0%)
opt_ge: 193 ( 0.0%)
opt_neq: 149 ( 0.0%)
opt_or: 109 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 40,748,753):
send_without_block_polymorphic: 12,933,923 (31.7%)
send_no_profiles: 9,033,636 (22.2%)
not_optimized_instruction: 8,394,873 (20.6%)
send_without_block_not_optimized_method_type: 8,142,248 (20.0%)
send_without_block_no_profiles: 1,839,228 ( 4.5%)
send_without_block_cfunc_array_variadic: 215,046 ( 0.5%)
send_not_optimized_method_type: 186,159 ( 0.5%)
obj_to_string_not_string: 3,640 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,604,456):
getclassvariable: 458,136 (28.6%)
getblockparam: 455,921 (28.4%)
checkkeyword: 265,425 (16.5%)
invokesuperforward: 239,383 (14.9%)
expandarray: 137,305 ( 8.6%)
getconstant: 48,100 ( 3.0%)
checkmatch: 149 ( 0.0%)
once: 23 ( 0.0%)
opt_duparray_send: 14 ( 0.0%)
Top-3 compile error reasons (100.0% of total 5,570,130):
register_spill_on_alloc: 4,994,130 (89.7%)
exception_handler: 356,784 ( 6.4%)
register_spill_on_ccall: 219,216 ( 3.9%)
Top-13 side exit reasons (100.0% of total 12,412,181):
compile_error: 5,570,130 (44.9%)
unhandled_yarv_insn: 1,604,456 (12.9%)
guard_shape_failure: 1,462,872 (11.8%)
guard_type_failure: 845,891 ( 6.8%)
block_param_proxy_not_iseq_or_ifunc: 765,968 ( 6.2%)
unhandled_kwarg: 658,341 ( 5.3%)
patchpoint: 504,437 ( 4.1%)
unhandled_splat: 446,990 ( 3.6%)
unknown_newarray_send: 332,740 ( 2.7%)
unhandled_hir_insn: 160,205 ( 1.3%)
block_param_proxy_modified: 59,589 ( 0.5%)
obj_to_string_fallback: 553 ( 0.0%)
interrupt: 9 ( 0.0%)
send_count: 119,067,587
dynamic_send_count: 40,748,753 (34.2%)
optimized_send_count: 78,318,834 (65.8%)
iseq_optimized_send_count: 39,936,542 (33.5%)
inline_cfunc_optimized_send_count: 12,857,358 (10.8%)
non_variadic_cfunc_optimized_send_count: 19,722,584 (16.6%)
variadic_cfunc_optimized_send_count: 5,802,350 ( 4.9%)
dynamic_getivar_count: 10,980,323
dynamic_setivar_count: 12,962,726
compiled_iseq_count: 2,531
failed_iseq_count: 245
compile_time: 414ms
profile_time: 21ms
gc_time: 33ms
invalidation_time: 5ms
vm_write_pc_count: 129,093,714
vm_write_sp_count: 126,023,084
vm_write_locals_count: 126,023,084
vm_write_stack_count: 126,023,084
vm_write_to_parent_iseq_local_count: 385,461
vm_read_from_parent_iseq_local_count: 11,266,484
code_region_bytes: 12,156,928
side_exit_count: 12,412,181
total_insn_count: 866,780,158
vm_insn_count: 216,821,134
zjit_insn_count: 649,959,024
ratio_in_zjit: 75.0%
```
</details>
railsbench after:
<details>
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (66.0% of total 25,597,895):
Hash#[]=: 1,724,042 ( 6.7%)
String#getbyte: 1,572,123 ( 6.1%)
String#<<: 1,494,022 ( 5.8%)
Kernel#is_a?: 1,429,946 ( 5.6%)
String#empty?: 1,370,323 ( 5.4%)
Regexp#match?: 1,235,067 ( 4.8%)
Kernel#respond_to?: 1,198,251 ( 4.7%)
Hash#key?: 1,087,406 ( 4.2%)
String#setbyte: 810,022 ( 3.2%)
Integer#^: 766,624 ( 3.0%)
Kernel#block_given?: 603,613 ( 2.4%)
String#==: 590,699 ( 2.3%)
Class#new: 506,216 ( 2.0%)
Hash#delete: 455,288 ( 1.8%)
BasicObject#!=: 428,771 ( 1.7%)
Hash#fetch: 408,621 ( 1.6%)
String#ascii_only?: 373,915 ( 1.5%)
ObjectSpace::WeakKeyMap#[]: 287,957 ( 1.1%)
NilClass#===: 277,244 ( 1.1%)
Kernel#Array: 269,590 ( 1.1%)
Top-20 not annotated C methods (66.7% of total 25,465,615):
Hash#[]=: 1,724,221 ( 6.8%)
String#getbyte: 1,572,123 ( 6.2%)
Kernel#is_a?: 1,515,688 ( 6.0%)
String#<<: 1,494,022 ( 5.9%)
String#empty?: 1,370,478 ( 5.4%)
Regexp#match?: 1,235,067 ( 4.8%)
Kernel#respond_to?: 1,198,251 ( 4.7%)
Hash#key?: 1,087,406 ( 4.3%)
String#setbyte: 810,022 ( 3.2%)
Integer#^: 766,624 ( 3.0%)
Kernel#block_given?: 603,613 ( 2.4%)
String#==: 601,405 ( 2.4%)
Class#new: 506,216 ( 2.0%)
Hash#delete: 455,288 ( 1.8%)
BasicObject#!=: 428,876 ( 1.7%)
Hash#fetch: 408,621 ( 1.6%)
String#ascii_only?: 373,915 ( 1.5%)
ObjectSpace::WeakKeyMap#[]: 287,957 ( 1.1%)
NilClass#===: 277,244 ( 1.1%)
Kernel#Array: 269,590 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 186,159):
iseq: 112,747 (60.6%)
cfunc: 73,412 (39.4%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,142,248):
iseq: 3,464,671 (42.6%)
optimized: 2,632,884 (32.3%)
bmethod: 1,290,701 (15.9%)
alias: 706,020 ( 8.7%)
null: 47,942 ( 0.6%)
cfunc: 30 ( 0.0%)
Top-11 not optimized instructions (100.0% of total 8,442,456):
invokesuper: 5,649,857 (66.9%)
invokeblock: 1,764,936 (20.9%)
sendforward: 551,832 ( 6.5%)
opt_eq: 441,959 ( 5.2%)
opt_plus: 31,635 ( 0.4%)
opt_send_without_block: 1,163 ( 0.0%)
opt_lt: 372 ( 0.0%)
opt_mult: 251 ( 0.0%)
opt_ge: 193 ( 0.0%)
opt_neq: 149 ( 0.0%)
opt_or: 109 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 40,796,314):
send_without_block_polymorphic: 12,933,921 (31.7%)
send_no_profiles: 9,033,616 (22.1%)
not_optimized_instruction: 8,442,456 (20.7%)
send_without_block_not_optimized_method_type: 8,142,248 (20.0%)
send_without_block_no_profiles: 1,839,228 ( 4.5%)
send_without_block_cfunc_array_variadic: 215,046 ( 0.5%)
send_not_optimized_method_type: 186,159 ( 0.5%)
obj_to_string_not_string: 3,640 ( 0.0%)
Top-8 unhandled YARV insns (100.0% of total 1,467,151):
getclassvariable: 458,136 (31.2%)
getblockparam: 455,921 (31.1%)
checkkeyword: 265,425 (18.1%)
invokesuperforward: 239,383 (16.3%)
getconstant: 48,100 ( 3.3%)
checkmatch: 149 ( 0.0%)
once: 23 ( 0.0%)
opt_duparray_send: 14 ( 0.0%)
Top-3 compile error reasons (100.0% of total 5,825,923):
register_spill_on_alloc: 5,225,940 (89.7%)
exception_handler: 356,784 ( 6.1%)
register_spill_on_ccall: 243,199 ( 4.2%)
Top-13 side exit reasons (100.0% of total 12,530,763):
compile_error: 5,825,923 (46.5%)
unhandled_yarv_insn: 1,467,151 (11.7%)
guard_shape_failure: 1,462,876 (11.7%)
guard_type_failure: 845,913 ( 6.8%)
block_param_proxy_not_iseq_or_ifunc: 765,968 ( 6.1%)
unhandled_kwarg: 658,341 ( 5.3%)
patchpoint: 504,437 ( 4.0%)
unhandled_splat: 446,990 ( 3.6%)
unknown_newarray_send: 332,740 ( 2.7%)
unhandled_hir_insn: 160,273 ( 1.3%)
block_param_proxy_modified: 59,589 ( 0.5%)
obj_to_string_fallback: 553 ( 0.0%)
interrupt: 9 ( 0.0%)
send_count: 119,163,569
dynamic_send_count: 40,796,314 (34.2%)
optimized_send_count: 78,367,255 (65.8%)
iseq_optimized_send_count: 39,911,967 (33.5%)
inline_cfunc_optimized_send_count: 12,857,393 (10.8%)
non_variadic_cfunc_optimized_send_count: 19,770,401 (16.6%)
variadic_cfunc_optimized_send_count: 5,827,494 ( 4.9%)
dynamic_getivar_count: 10,980,323
dynamic_setivar_count: 12,986,381
compiled_iseq_count: 2,523
failed_iseq_count: 252
compile_time: 420ms
profile_time: 21ms
gc_time: 30ms
invalidation_time: 4ms
vm_write_pc_count: 128,973,665
vm_write_sp_count: 125,926,968
vm_write_locals_count: 125,926,968
vm_write_stack_count: 125,926,968
vm_write_to_parent_iseq_local_count: 385,752
vm_read_from_parent_iseq_local_count: 11,267,766
code_region_bytes: 12,189,696
side_exit_count: 12,530,763
total_insn_count: 866,667,490
vm_insn_count: 217,813,201
zjit_insn_count: 648,854,289
ratio_in_zjit: 74.9%
```
</details>
|
|
We have a lot of patchpoint exits on some applications and this helps
pin down why.
|
|
|
|
* ZJIT: Count unoptimized `Send`
This includes `Send` in `send fallback reasons` to guide future
optimizations.
* ZJIT: Create dedicated def_type counter for Send
|
|
|
|
The Counter::name() method creates a new String on every call, each call allocates memory and copies the string. Using %'static str would reduce memory pressure. The change is safe as no breaking changes to the API
|
|
|
|
* ZJIT: Add HIR for CCallWithFrame
* ZJIT: Update stats to count not inlined cfunc calls
* ZJIT: Stops optimizing SendWithoutBlock when TracePoint is activated
* ZJIT: Fallback to SendWithoutBlock when CCallWithFrame has too many args
* ZJIT: Rename cfun -> cfunc
|
|
|
|
|