diff options
| author | Your Name <you@example.com> | 2024-08-03 00:53:13 +0000 |
|---|---|---|
| committer | Alan Wu <XrXr@users.noreply.github.com> | 2024-08-07 18:49:20 -0400 |
| commit | 34715bdd910698e1aa9770b36c2b6e38e708c629 (patch) | |
| tree | d345af17dbb2ec3d564eb26cdb0cd4a26ba330eb /test | |
| parent | e271feb8663415d9ed8a55e0e78bd655a16e0201 (diff) | |
Tune codegen for rb_yield() calls landing in ISeqs
Unlike in older revisions in the year, GCC 11 isn't inlining the call
to vm_push_frame() inside invoke_iseq_block_from_c() anymore. We do
want it to be inlined since rb_yield() speed is fairly important.
Logs from -fopt-info-optimized-inline reveal that GCC was blowing its
code size budget inlining invoke_block_from_c_bh() into its various
callers, leaving suboptimal code for its body.
Take away some uses of the `inline` keyword and merge a common tail
call to vm_exec() for overall better code.
This tweak gives about 18% on a micro benchmark and 1% on the
chunky-png benchmark from yjit-bench. I tested on a Skylake server.
```
$ cat c-to-ruby-call.yml
benchmark:
- 0.upto(10_000_000) {}
$ benchmark-driver --chruby '+patch;master' c-to-ruby-call.yml
Warming up --------------------------------------
0.upto(10_000_000) {} 2.299 i/s - 3.000 times in 1.304689s (434.90ms/i)
Calculating -------------------------------------
+patch master
0.upto(10_000_000) {} 2.299 1.943 i/s - 6.000 times in 2.609393s 3.088353s
Comparison:
0.upto(10_000_000) {}
+patch: 2.3 i/s
master: 1.9 i/s - 1.18x slower
$ ruby run_benchmarks.rb --chruby 'master;+patch' chunky-png
<snip>
---------- ----------- ---------- ----------- ---------- -------------- -------------
bench master (ms) stddev (%) +patch (ms) stddev (%) +patch 1st itr master/+patch
chunky-png 1156.1 0.1 1142.2 0.2 1.01 1.01
---------- ----------- ---------- ----------- ---------- -------------- -------------
```
Notes
Notes:
Merged: https://github.com/ruby/ruby/pull/11321
Diffstat (limited to 'test')
0 files changed, 0 insertions, 0 deletions
