<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/benchmark, branch ruby_3_0</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Allow inlining Integer#-@ and #~</title>
<updated>2020-12-23T06:32:19+00:00</updated>
<author>
<name>Takashi Kokubun</name>
<email>takashikkbn@gmail.com</email>
</author>
<published>2020-12-23T06:23:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=dbb4f1996939d0ce977e6b37579e28dd886428ff'/>
<id>dbb4f1996939d0ce977e6b37579e28dd886428ff</id>
<content type='text'>
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)'
before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux]
last_commit=Allow inlining Integer#-@ and #~
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_comp(1)      44.006M      70.417M i/s -     40.000M times in 0.908967s 0.568042s
      mjit_uminus(1)      44.333M      68.422M i/s -     40.000M times in 0.902255s 0.584603s

Comparison:
                     mjit_comp(1)
         after --jit:  70417331.4 i/s
        before --jit:  44005980.4 i/s - 1.60x  slower

                   mjit_uminus(1)
         after --jit:  68422468.8 i/s
        before --jit:  44333371.0 i/s - 1.54x  slower
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)'
before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux]
last_commit=Allow inlining Integer#-@ and #~
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_comp(1)      44.006M      70.417M i/s -     40.000M times in 0.908967s 0.568042s
      mjit_uminus(1)      44.333M      68.422M i/s -     40.000M times in 0.902255s 0.584603s

Comparison:
                     mjit_comp(1)
         after --jit:  70417331.4 i/s
        before --jit:  44005980.4 i/s - 1.60x  slower

                   mjit_uminus(1)
         after --jit:  68422468.8 i/s
        before --jit:  44333371.0 i/s - 1.54x  slower
```
</pre>
</div>
</content>
</entry>
<entry>
<title>fix duplicated name</title>
<updated>2020-12-16T01:29:48+00:00</updated>
<author>
<name>Koichi Sasada</name>
<email>ko1@atdot.net</email>
</author>
<published>2020-12-16T01:29:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=dff67c809f786c94933583a450dbd9398fa07039'/>
<id>dff67c809f786c94933583a450dbd9398fa07039</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) &amp;&amp;</title>
<updated>2020-12-04T15:45:54+00:00</updated>
<author>
<name>Benoit Daloze</name>
<email>eregontp@gmail.com</email>
</author>
<published>2020-12-04T15:40:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=b4ec4a41c24105efbb43f9b70ca7f36d22f98294'/>
<id>b4ec4a41c24105efbb43f9b70ca7f36d22f98294</id>
<content type='text'>
* Otherwise those tests, etc cannot run on alternative Ruby implementations.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Otherwise those tests, etc cannot run on alternative Ruby implementations.
</pre>
</div>
</content>
</entry>
<entry>
<title>Set allocator on class creation</title>
<updated>2020-11-16T22:41:40+00:00</updated>
<author>
<name>Alan Wu</name>
<email>XrXr@users.noreply.github.com</email>
</author>
<published>2020-11-12T20:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=68ffc8db088a7b29613a3746be5cc996be9f66fe'/>
<id>68ffc8db088a7b29613a3746be5cc996be9f66fe</id>
<content type='text'>
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```
</pre>
</div>
</content>
</entry>
<entry>
<title>Add a benchmark for polymorphic ivar setting</title>
<updated>2020-11-09T22:05:41+00:00</updated>
<author>
<name>Aaron Patterson</name>
<email>tenderlove@ruby-lang.org</email>
</author>
<published>2020-11-09T20:02:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d7581370fd7cef8743c317a1a119215cf064bb73'/>
<id>d7581370fd7cef8743c317a1a119215cf064bb73</id>
<content type='text'>
This benchmark demonstrates the performance of setting an instance
variable when the type of object is constantly changing.  This benchmark
should give us an idea of the performance of ivar setting in a
polymorphic environment
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This benchmark demonstrates the performance of setting an instance
variable when the type of object is constantly changing.  This benchmark
should give us an idea of the performance of ivar setting in a
polymorphic environment
</pre>
</div>
</content>
</entry>
<entry>
<title>eagerly initialize ivar table when index is small enough</title>
<updated>2020-11-09T17:44:16+00:00</updated>
<author>
<name>Aaron Patterson</name>
<email>tenderlove@ruby-lang.org</email>
</author>
<published>2020-11-06T18:11:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=eb229994e5b53e201e776ea103970751d3b1725b'/>
<id>eb229994e5b53e201e776ea103970751d3b1725b</id>
<content type='text'>
When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
</pre>
</div>
</content>
</entry>
<entry>
<title>Added benchmark of vm_send by variable [ci skip]</title>
<updated>2020-10-28T00:47:46+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2020-10-27T23:26:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8f9c113f354f6d5d1a430d47ca0cb1c1438bee93'/>
<id>8f9c113f354f6d5d1a430d47ca0cb1c1438bee93</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Improve the performance of super</title>
<updated>2020-09-23T18:52:36+00:00</updated>
<author>
<name>eileencodes</name>
<email>eileencodes@gmail.com</email>
</author>
<published>2020-08-11T17:22:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=637d1cc0c051132e8562b77d8faeb6c099011dfa'/>
<id>637d1cc0c051132e8562b77d8faeb6c099011dfa</id>
<content type='text'>
This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval &lt;&lt;-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson &lt;tenderlove@ruby-lang.org&gt;
Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval &lt;&lt;-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson &lt;tenderlove@ruby-lang.org&gt;
Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Optimize ObjectSpace.dump_all</title>
<updated>2020-09-09T18:11:36+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2020-08-18T07:41:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5001cc47169614ea07d87651c95c2ee185e374e0'/>
<id>5001cc47169614ea07d87651c95c2ee185e374e0</id>
<content type='text'>
The two main optimization are:
  - buffer writes for improved performance
  - avoid formatting functions when possible

```

|                   |compare-ruby|built-ruby|
|:------------------|-----------:|---------:|
|dump_all_string    |       1.038|   195.925|
|                   |           -|   188.77x|
|dump_all_file      |      33.453|   139.645|
|                   |           -|     4.17x|
|dump_all_dev_null  |      44.030|   278.552|
|                   |           -|     6.33x|
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The two main optimization are:
  - buffer writes for improved performance
  - avoid formatting functions when possible

```

|                   |compare-ruby|built-ruby|
|:------------------|-----------:|---------:|
|dump_all_string    |       1.038|   195.925|
|                   |           -|   188.77x|
|dump_all_file      |      33.453|   139.645|
|                   |           -|     4.17x|
|dump_all_dev_null  |      44.030|   278.552|
|                   |           -|     6.33x|
```
</pre>
</div>
</content>
</entry>
<entry>
<title>Improved Enumerable::Lazy#zip</title>
<updated>2020-07-23T07:57:26+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2020-07-22T00:52:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=54acb3dd52a5fe75c32c3e36a6984d3ec4314a72'/>
<id>54acb3dd52a5fe75c32c3e36a6984d3ec4314a72</id>
<content type='text'>
|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|first_ary           |    290.514k|  296.331k|
|                    |           -|     1.02x|
|first_nonary        |    166.954k|  169.178k|
|                    |           -|     1.01x|
|first_noarg         |    299.547k|  305.358k|
|                    |           -|     1.02x|
|take3_ary           |    129.388k|  188.360k|
|                    |           -|     1.46x|
|take3_nonary        |     90.684k|  112.688k|
|                    |           -|     1.24x|
|take3_noarg         |    131.940k|  189.471k|
|                    |           -|     1.44x|
|chain-first_ary     |    195.913k|  286.194k|
|                    |           -|     1.46x|
|chain-first_nonary  |    127.483k|  168.716k|
|                    |           -|     1.32x|
|chain-first_noarg   |    201.252k|  298.562k|
|                    |           -|     1.48x|
|chain-take3_ary     |    101.189k|  183.188k|
|                    |           -|     1.81x|
|chain-take3_nonary  |     75.381k|  112.301k|
|                    |           -|     1.49x|
|chain-take3_noarg   |    101.483k|  192.148k|
|                    |           -|     1.89x|
|block               |    296.696k|  292.877k|
|                    |       1.01x|         -|
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|first_ary           |    290.514k|  296.331k|
|                    |           -|     1.02x|
|first_nonary        |    166.954k|  169.178k|
|                    |           -|     1.01x|
|first_noarg         |    299.547k|  305.358k|
|                    |           -|     1.02x|
|take3_ary           |    129.388k|  188.360k|
|                    |           -|     1.46x|
|take3_nonary        |     90.684k|  112.688k|
|                    |           -|     1.24x|
|take3_noarg         |    131.940k|  189.471k|
|                    |           -|     1.44x|
|chain-first_ary     |    195.913k|  286.194k|
|                    |           -|     1.46x|
|chain-first_nonary  |    127.483k|  168.716k|
|                    |           -|     1.32x|
|chain-first_noarg   |    201.252k|  298.562k|
|                    |           -|     1.48x|
|chain-take3_ary     |    101.189k|  183.188k|
|                    |           -|     1.81x|
|chain-take3_nonary  |     75.381k|  112.301k|
|                    |           -|     1.49x|
|chain-take3_noarg   |    101.483k|  192.148k|
|                    |           -|     1.89x|
|block               |    296.696k|  292.877k|
|                    |       1.01x|         -|
</pre>
</div>
</content>
</entry>
</feed>
