summaryrefslogtreecommitdiff
path: root/benchmark
AgeCommit message (Collapse)Author
2022-08-19Make benchmark indentation consistentTakashi Kokubun
Related to https://github.com/Shopify/yjit-bench/pull/109
2022-08-17Added vm setivar benchmark from yjit-benchJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6247
2022-08-15Optimize Marshal dump/load for large (> 31-bit) FIXNUM (#6229)John Hawthorn
* Optimize Marshal dump of large fixnum Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit platforms the 63-bit fixnums need to be represented in Marshal's BIGNUM. Previously this was done by converting to a bugnum and serializing the bignum object. This commit avoids allocating the intermediate bignum object, instead outputting the T_FIXNUM directly to a Marshal bignum. This maintains the same representation as the previous implementation, including not using LINKs for these large fixnums (an artifact of the previous implementation always allocating a new BIGNUM). This commit also avoids unnecessary st_lookups on immediate values, which we know will not be in that table. * Fastpath for loading FIXNUM from Marshal bignum * Run update-deps Notes: Merged-By: jhawthorn <john@hawthorn.email>
2022-08-09Update multiple assignment benchmarks to include non-literal array casesJeremy Evans
This allows them to show the effect of the previous newarray/expandarray to swap/opt_reverse optimization. This shows an 35-83% performance improvement in the four multiple assignment benchmarks that use this optimization. Notes: Merged: https://github.com/ruby/ruby/pull/6158
2022-08-08Update IO::Buffer#get_value benchmarkJean Boussier
- The method was renamed from `get` to `get_value` - Comparing to `String#unpack` isn't quite equivalent, `unpack1` is closer. - Use frozen_string_literal to avoid allocating a format string every time. - Use `N` format which is equivalent to `:U32` (`uint_32_t` big-endian). - Disable experimental warnings to not mess up the output. Notes: Merged: https://github.com/ruby/ruby/pull/6221
2022-07-25rb_str_buf_append: add a fast path for ENC_CODERANGE_VALIDJean Boussier
If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:-------------------|-----------:|---------:| |binary_concat_7bit | 554.816k| 556.460k| | | -| 1.00x| |utf8_concat_7bit | 556.367k| 555.101k| | | 1.00x| -| |utf8_concat_UTF8 | 412.555k| 556.824k| | | -| 1.35x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6163
2022-07-21string.c: use str_enc_fastpath in TERM_LENJean Boussier
Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 510.580k| 565.600k| | | -| 1.11x| |binary_concat_binary | 512.653k| 571.483k| | | -| 1.11x| |utf8_concat_utf8 | 511.396k| 566.879k| | | -| 1.11x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6160
2022-07-19rb_str_buf_append: fastpath to str_buf_catJean Boussier
If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_append_utf8 | 385.315k| 573.663k| | | -| 1.49x| |binary_append_binary | 446.579k| 574.898k| | | -| 1.29x| |utf8_append_utf8 | 430.936k| 573.394k| | | -| 1.33x| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat Notes: Merged: https://github.com/ruby/ruby/pull/6120
2022-07-15Add benchmarks for setting / getting ivars on genericsJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6144
2022-07-15Fixes ivar benchmarks to not depend on object allocationJemma Issroff
Prior to this change, we were measuring object allocation as well as setting instance variables within ivar benchmarks. With this change, we now only measure setting instance variables within ivar benchmarks. Notes: Merged: https://github.com/ruby/ruby/pull/6142
2022-07-06vm_opt_ltlt: call rb_str_buf_append directly if RHS is a StringJean Boussier
`rb_str_concat` does a lot of type checking we can easily bypass. ``` | |compare-ruby|built-ruby| |:--------------|-----------:|---------:| |string_concat | 362.007k| 398.965k| | | -| 1.10x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6095
2022-06-16Added vm_ivar benchmark for initializing an embedded objJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6030
2022-06-07Update the help message on /benchmarkTakashi Kokubun
I wanted to point out there's --output=all.
2022-05-28Add IO write throughput/locking overhead benchmark.Samuel Williams
Notes: Merged: https://github.com/ruby/ruby/pull/5419
2022-04-01Finer-grained constant cache invalidation (take 2)Kevin Newton
This commit reintroduces finer-grained constant cache invalidation. After 8008fb7 got merged, it was causing issues on token-threaded builds (such as on Windows). The issue was that when you're iterating through instruction sequences and using the translator functions to get back the instruction structs, you're either using `rb_vm_insn_null_translator` or `rb_vm_insn_addr2insn2` depending if it's a direct-threading build. `rb_vm_insn_addr2insn2` does some normalization to always return to you the non-trace version of whatever instruction you're looking at. `rb_vm_insn_null_translator` does not do that normalization. This means that when you're looping through the instructions if you're trying to do an opcode comparison, it can change depending on the type of threading that you're using. This can be very confusing. So, this commit creates a new translator function `rb_vm_insn_normalizing_translator` to always return the non-trace version so that opcode comparisons don't have to worry about different configurations. [Feature #18589] Notes: Merged: https://github.com/ruby/ruby/pull/5716
2022-03-25Revert "Finer-grained inline constant cache invalidation"Nobuyoshi Nakada
This reverts commits for [Feature #18589]: * 8008fb7352abc6fba433b99bf20763cf0d4adb38 "Update formatting per feedback" * 8f6eaca2e19828e92ecdb28b0fe693d606a03f96 "Delete ID from constant cache table if it becomes empty on ISEQ free" * 629908586b4bead1103267652f8b96b1083573a8 "Finer-grained inline constant cache invalidation" MSWin builds on AppVeyor have been crashing since the merger. Notes: Merged: https://github.com/ruby/ruby/pull/5715 Merged-By: nobu <nobu@ruby-lang.org>
2022-03-24Finer-grained inline constant cache invalidationKevin Newton
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated. ```ruby class A B = 1 end def foo A::B # inline cache depends on global counter end foo # populate inline cache foo # hit inline cache C = 1 # global counter increments, all caches are invalidated foo # misses inline cache due to `C = 1` ``` Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache. ```ruby class A B = 1 end def foo A::B # inline cache depends constants named "A" and "B" end foo # populate inline cache foo # hit inline cache C = 1 # caches that depend on the name "C" are invalidated foo # hits inline cache because IC only depends on "A" and "B" ``` Examples of breaking the new cache: ```ruby module C # Breaks `foo` cache because "A" constant is set and the cache in foo depends # on "A" and "B" class A; end end B = 1 ``` We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT. Notes: Merged: https://github.com/ruby/ruby/pull/5433
2022-02-23Constant time class to class ancestor lookupJohn Hawthorn
Previously when checking ancestors, we would walk all the way up the ancestry chain checking each parent for a matching class or module. I believe this was especially unfriendly to CPU cache since for each step we need to check two cache lines (the class and class ext). This check is used quite often in: * case statements * rescue statements * Calling protected methods * Class#is_a? * Module#=== * Module#<=> I believe it's most common to check a class against a parent class, to this commit aims to improve that (unfortunately does not help checking for an included Module). This is done by storing on each class the number and an array of all parent classes, in order (BasicObject is at index 0). Using this we can check whether a class is a subclass of another in constant time since we know the location to expect it in the hierarchy. Notes: Merged: https://github.com/ruby/ruby/pull/5568
2022-01-12Speed up and avoid kwarg hash alloc in Time.nowJohn Hawthorn
Previously Time.now was switched to use Time.new as it added support for the in: argument. Unfortunately because Class#new is a cfunc this requires always allocating a Hash. This commit switches Time.now back to using a builtin time_s_now. This avoids the extra Hash allocation and is about 3x faster. $ benchmark-driver -e './ruby;3.1::~/.rubies/ruby-3.1.0/bin/ruby;3.0::~/.rubies/ruby-3.0.2/bin/ruby' benchmark/time_now.yml Warming up -------------------------------------- Time.now 6.704M i/s - 6.710M times in 1.000814s (149.16ns/i, 328clocks/i) Time.now(in: "+09:00") 2.003M i/s - 2.112M times in 1.054330s (499.31ns/i) Calculating ------------------------------------- ./ruby 3.1 3.0 Time.now 7.693M 2.763M 6.394M i/s - 20.113M times in 2.614428s 7.278710s 3.145572s Time.now(in: "+09:00") 2.030M 1.260M 1.617M i/s - 6.008M times in 2.960132s 4.769378s 3.716537s Comparison: Time.now ./ruby: 7693129.7 i/s 3.0: 6394109.2 i/s - 1.20x slower 3.1: 2763282.5 i/s - 2.78x slower Time.now(in: "+09:00") ./ruby: 2029757.4 i/s 3.0: 1616652.3 i/s - 1.26x slower 3.1: 1259776.2 i/s - 1.61x slower Notes: Merged: https://github.com/ruby/ruby/pull/5429
2021-12-13Prepare for removing RubyVM::JIT (#5262)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-11-18Optimize dynamic string interpolation for symbol/true/false/nil/0-9Jeremy Evans
This provides a significant speedup for symbol, true, false, nil, and 0-9, class/module, and a small speedup in most other cases. Speedups (using included benchmarks): :symbol :: 60% 0-9 :: 50% Class/Module :: 50% nil/true/false :: 20% integer :: 10% [] :: 10% "" :: 3% One reason this approach is faster is it reduces the number of VM instructions for each interpolated value. Initial idea, approach, and benchmarks from Eric Wong. I applied the same approach against the master branch, updating it to handle the significant internal changes since this was first proposed 4 years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also expanded it to optimize true/false/nil/0-9/class/module, and added handling of missing methods, refined methods, and RUBY_DEBUG. This renames the tostring insn to anytostring, and adds an objtostring insn that implements the optimization. This requires making a few functions non-static, and adding some non-static functions. This disables 4 YJIT tests. Those tests should be reenabled after YJIT optimizes the new objtostring insn. Implements [Feature #13715] Co-authored-by: Eric Wong <e@80x24.org> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Yusuke Endoh <mame@ruby-lang.org> Co-authored-by: Koichi Sasada <ko1@atdot.net> Notes: Merged: https://github.com/ruby/ruby/pull/5002 Merged-By: jeremyevans <code@jeremyevans.net>
2021-11-14Skip string allocation in benchmark/time_at.ymlTakashi Kokubun
and also drop a weird newline from benchmark/array_sample.yml.
2021-11-15add benchmark/time_at.ymlKoichi Sasada
``` ruby_2_6 ruby_2_7 ruby_3_0 master modified Time.at(0) 12.362M 11.015M 9.499M 6.615M 9.000M i/s - 32.115M times in 2.597946s 2.915517s 3.380725s 4.854651s 3.568234s Time.at(0, 500) 7.542M 7.136M 8.252M 5.707M 5.646M i/s - 20.713M times in 2.746279s 2.902556s 2.510166s 3.629644s 3.668854s Time.at(0, in: "+09:00") 1.426M 1.346M 1.565M 1.674M 1.667M i/s - 4.240M times in 2.974049s 3.149753s 2.709416s 2.533043s 2.542853s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ``` Notes: Merged: https://github.com/ruby/ruby/pull/5112
2021-11-15add benchmark/array_sample.ymlKoichi Sasada
``` ruby_2_6 ruby_2_7 ruby_3_0 master modified ary.sample 32.113M 30.146M 11.162M 10.539M 26.620M i/s - 64.882M times in 2.020428s 2.152296s 5.812981s 6.156398s 2.437325s ary.sample(2) 9.420M 8.987M 7.500M 6.973M 7.191M i/s - 25.170M times in 2.672085s 2.800616s 3.355896s 3.609534s 3.500108s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ``` Notes: Merged: https://github.com/ruby/ruby/pull/5112
2021-11-10IO::Buffer for scheduler interface.Samuel Williams
Notes: Merged: https://github.com/ruby/ruby/pull/4621
2021-10-23add vm_ivar_of_class_setKoichi Sasada
benchmark for a class's ivar setter Notes: Merged: https://github.com/ruby/ruby/pull/5006
2021-10-23allow to access ivars of classes/modulesKoichi Sasada
if an ivar of a class/module refer to a shareable object, this ivar can be read from non-main Ractors. Notes: Merged: https://github.com/ruby/ruby/pull/5006
2021-09-30Use faster any_hash logic in rb_hashJohn Hawthorn
From the documentation of rb_obj_hash: > Certain core classes such as Integer use built-in hash calculations and > do not call the #hash method when used as a hash key. So if you override, say, Integer#hash it won't be used from rb_hash_aref and similar. This avoids method lookups in many common cases. This commit uses the same optimization in rb_hash, a method used internally and in the C API to get the hash value of an object. Usually this is used to build the hash of an object based on its elements. Previously it would always do a method lookup for 'hash'. This is primarily intended to speed up hashing of Arrays and Hashes, which call rb_hash for each element. compare-ruby: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux] built-ruby: ruby 3.1.0dev (2021-09-29T02:13:24Z fast_hash d670bf88b2) [x86_64-linux] # Iteration per second (i/s) | |compare-ruby|built-ruby| |:----------------|-----------:|---------:| |hash_aref_array | 1.008| 1.769| | | -| 1.76x| Notes: Merged: https://github.com/ruby/ruby/pull/4916
2021-09-12Add benchmarks to create Time instancesNobuyoshi Nakada
2021-08-29Support tracing of attr_reader and attr_writerJeremy Evans
In vm_call_method_each_type, check for c_call and c_return events before dispatching to vm_call_ivar and vm_call_attrset. With this approach, the call cache will still dispatch directly to those functions, so this change will only decrease performance for the first (uncached) call, and even then, the performance decrease is very minimal. This approach requires that we clear the call caches when tracing is enabled or disabled. The approach currently switches all vm_call_ivar and vm_call_attrset call caches to vm_call_general any time tracing is enabled or disabled. So it could theoretically result in a slowdown for code that constantly enables or disables tracing. This approach does not handle targeted tracepoints, but from my testing, c_call and c_return events are not supported for targeted tracepoints, so that shouldn't matter. This includes a benchmark showing the performance decrease is minimal if detectable at all. Fixes [Bug #16383] Fixes [Bug #10470] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/4767
2021-06-29Prefer qualified names under ThreadNobuyoshi Nakada
2021-06-18Add a cache for class variableseileencodes
Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/4544
2021-06-04Improve perfomance for Integer#size method [Feature #17135] (#3476)S.H
* Improve perfomance for Integer#size method [Feature #17135] * re-run ci * Let MJIT frame skip work for Integer#size Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-06-02Implemented some NilClass method in Ruby code is faster [Feature #17054] (#3366)S.H
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-05-28compile.c: Emit send for === calls in when statementsAlan Wu
The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls === without a call cache. Emit a send instruction to make the call instead. It includes a call cache. The call cache improves throughput of using when statements to check the class of a given object. This is useful for say, JSON serialization. Use of a regular send instead of checkmatch also avoids taking the VM lock every time, which is good for multi-ractor workloads. Calculating ------------------------------------- master post vm_case_classes 11.013M 16.172M i/s - 6.000M times in 0.544795s 0.371009s vm_case_lit 2.296 2.263 i/s - 1.000 times in 0.435606s 0.441826s vm_case 74.098M 64.338M i/s - 6.000M times in 0.080974s 0.093257s Comparison: vm_case_classes post: 16172114.4 i/s master: 11013316.9 i/s - 1.47x slower vm_case_lit master: 2.3 i/s post: 2.3 i/s - 1.01x slower vm_case master: 74097858.6 i/s post: 64338333.9 i/s - 1.15x slower The vm_case benchmark is a bit slower post patch, possibily due to the larger instruction sequence. The benchmark dispatches using opt_case_dispatch so was not running checkmatch and does not make the === call post patch. Notes: Merged: https://github.com/ruby/ruby/pull/4468
2021-05-11Revert "Filling cache values on cvar write"Aaron Patterson
This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3. This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
2021-05-11Add a cache for class variableseileencodes
This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/4340
2021-05-03Eagerly allocate instance variable tables along with objectAaron Patterson
This allows us to allocate the right size for the object in advance, meaning that we don't have to pay the cost of ivar table extension later. The idea is that if an object type ever became "extended" at some point, then it is very likely it will become extended again. So we may as well allocate the ivar table up front. Notes: Merged: https://github.com/ruby/ruby/pull/4216
2021-04-25[Doc] Fix a typo s/visilibity/visibility/wonda-tea-coffee
Notes: Merged: https://github.com/ruby/ruby/pull/4406
2021-04-21Evaluate multiple assignment left hand side before right hand sideJeremy Evans
In regular assignment, Ruby evaluates the left hand side before the right hand side. For example: ```ruby foo[0] = bar ``` Calls `foo`, then `bar`, then `[]=` on the result of `foo`. Previously, multiple assignment didn't work this way. If you did: ```ruby abc.def, foo[0] = bar, baz ``` Ruby would previously call `bar`, then `baz`, then `abc`, then `def=` on the result of `abc`, then `foo`, then `[]=` on the result of `foo`. This change makes multiple assignment similar to single assignment, changing the evaluation order of the above multiple assignment code to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on the result of `abc`, then `[]=` on the result of `foo`. Implementing this is challenging with the stack-based virtual machine. We need to keep track of all of the left hand side attribute setter receivers and setter arguments, and then keep track of the stack level while handling the assignment processing, so we can issue the appropriate topn instructions to get the receiver. Here's an example of how the multiple assignment is executed, showing the stack and instructions: ``` self # putself abc # send abc, self # putself abc, foo # send abc, foo, 0 # putobject 0 abc, foo, 0, [bar, baz] # evaluate RHS abc, foo, 0, [bar, baz], baz, bar # expandarray abc, foo, 0, [bar, baz], baz, bar, abc # topn 5 abc, foo, 0, [bar, baz], baz, abc, bar # swap abc, foo, 0, [bar, baz], baz, def= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz], baz, foo # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0 # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2 abc, foo, 0, [bar, baz], baz, []= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz] # pop [bar, baz], foo, 0, [bar, baz] # setn 3 [bar, baz], foo, 0 # pop [bar, baz], foo # pop [bar, baz] # pop ``` As multiple assignment must deal with splats, post args, and any level of nesting, it gets quite a bit more complex than this in non-trivial cases. To handle this, struct masgn_state is added to keep track of the overall state of the mass assignment, which stores a linked list of struct masgn_attrasgn, one for each assigned attribute. This adds a new optimization that replaces a topn 1/pop instruction combination with a single swap instruction for multiple assignment to non-aref attributes. This new approach isn't compatible with one of the optimizations previously used, in the case where the multiple assignment return value was not needed, there was no lhs splat, and one of the left hand side used an attribute setter. This removes that optimization. Removing the optimization allowed for removing the POP_ELEMENT and adjust_stack functions. This adds a benchmark to measure how much slower multiple assignment is with the correct evaluation order. This benchmark shows: * 4-9% decrease for attribute sets * 14-23% decrease for array member sets * Basically same speed for local variable sets Importantly, it shows no significant difference between the popped (where return value of the multiple assignment is not needed) and !popped (where return value of the multiple assignment is needed) cases for attribute and array member sets. This indicates the previous optimization, which was dropped in the evaluation order fix and only affected the popped case, is not important to performance. Fixes [Bug #4443] Notes: Merged: https://github.com/ruby/ruby/pull/4390 Merged-By: jeremyevans <code@jeremyevans.net>
2021-04-11st.c: skip all deleted entries [Bug #17779]tompng (tomoya ishida)
Update the start entry skipping all already deleted entries. Fixes performance issue of `Hash#first` in a certain case.
2021-03-16Improve Enumerable#tally performanceNobuyoshi Nakada
Iteration per second (i/s) | |compare-ruby|built-ruby| |:------|-----------:|---------:| |tally | 52.814| 114.936| | | -| 2.18x| Notes: Merged: https://github.com/ruby/ruby/pull/4278
2021-03-10Add a benchmark for RubyVM::InstructionSequence.load_from_binaryJean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/4119
2021-03-10proc.c: make bind_call use existing callable method entry when possibleJean Boussier
The most common use case for `bind_call` is to protect from core methods being redefined, for instance a typical use: ```ruby UNBOUND_METHOD_MODULE_NAME = Module.instance_method(:name) def real_mod_name(mod) UNBOUND_METHOD_MODULE_NAME.bind_call(mod) end ``` But it's extremely common that the method wasn't actually redefined. In such case we can avoid creating a new callable method entry, and simply delegate to the receiver. This result in a 1.5-2X speed-up for the fast path, and little to no impact on the slowpath: ``` compare-ruby: ruby 3.1.0dev (2021-02-05T06:33:00Z master b2674c1fd7) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-02-15T10:35:17Z bind-call-fastpath d687e06615) [x86_64-darwin19] | |compare-ruby|built-ruby| |:---------|-----------:|---------:| |fastpath | 11.325M| 16.393M| | | -| 1.45x| |slowpath | 10.488M| 10.242M| | | 1.02x| -| ``` Notes: Merged: https://github.com/ruby/ruby/pull/4188
2021-02-19Improve performance some Numeric methods [Feature #17632] (#4190)S.H
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-02-11Do not run File.write while Ractors are runningTakashi Kokubun
also make sure all local variables have the __bmdv_ prefix.
2021-02-10Add a benchmark-driver runner for Ractor (#4172)Takashi Kokubun
* Add a benchmark-driver runner for Ractor * Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow in Ruby 3.0 Ractor * Fetching Time could also be slow * Fix a comment * Assert overriding a private method Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-02-10Simple benchmark of Float#to_sNobuyoshi Nakada
2021-02-08Improve performance Float#positive? and Float#negative? [Feature #17614] (#4160)S.H
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-01-13Rename RubyVM::MJIT to RubyVM::JITTakashi Kokubun
because the name "MJIT" is an internal code name, it's inconsistent with --jit while they are related to each other, and I want to discourage future JIT implementation-specific (e.g. MJIT-specific) APIs by this rename. [Feature #17490]