summaryrefslogtreecommitdiff
path: root/test
diff options
context:
space:
mode:
authorYour Name <you@example.com>2024-08-03 00:53:13 +0000
committerAlan Wu <XrXr@users.noreply.github.com>2024-08-07 18:49:20 -0400
commit34715bdd910698e1aa9770b36c2b6e38e708c629 (patch)
treed345af17dbb2ec3d564eb26cdb0cd4a26ba330eb /test
parente271feb8663415d9ed8a55e0e78bd655a16e0201 (diff)
Tune codegen for rb_yield() calls landing in ISeqs
Unlike in older revisions in the year, GCC 11 isn't inlining the call to vm_push_frame() inside invoke_iseq_block_from_c() anymore. We do want it to be inlined since rb_yield() speed is fairly important. Logs from -fopt-info-optimized-inline reveal that GCC was blowing its code size budget inlining invoke_block_from_c_bh() into its various callers, leaving suboptimal code for its body. Take away some uses of the `inline` keyword and merge a common tail call to vm_exec() for overall better code. This tweak gives about 18% on a micro benchmark and 1% on the chunky-png benchmark from yjit-bench. I tested on a Skylake server. ``` $ cat c-to-ruby-call.yml benchmark: - 0.upto(10_000_000) {} $ benchmark-driver --chruby '+patch;master' c-to-ruby-call.yml Warming up -------------------------------------- 0.upto(10_000_000) {} 2.299 i/s - 3.000 times in 1.304689s (434.90ms/i) Calculating ------------------------------------- +patch master 0.upto(10_000_000) {} 2.299 1.943 i/s - 6.000 times in 2.609393s 3.088353s Comparison: 0.upto(10_000_000) {} +patch: 2.3 i/s master: 1.9 i/s - 1.18x slower $ ruby run_benchmarks.rb --chruby 'master;+patch' chunky-png <snip> ---------- ----------- ---------- ----------- ---------- -------------- ------------- bench master (ms) stddev (%) +patch (ms) stddev (%) +patch 1st itr master/+patch chunky-png 1156.1 0.1 1142.2 0.2 1.01 1.01 ---------- ----------- ---------- ----------- ---------- -------------- ------------- ```
Notes
Notes: Merged: https://github.com/ruby/ruby/pull/11321
Diffstat (limited to 'test')
0 files changed, 0 insertions, 0 deletions