ruby.git/benchmark, branch ruby_3_0

Allow inlining Integer#-@ and #~

2020-12-23T06:32:19+00:00

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)'
before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux]
last_commit=Allow inlining Integer#-@ and #~
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_comp(1)      44.006M      70.417M i/s -     40.000M times in 0.908967s 0.568042s
      mjit_uminus(1)      44.333M      68.422M i/s -     40.000M times in 0.902255s 0.584603s

Comparison:
                     mjit_comp(1)
         after --jit:  70417331.4 i/s
        before --jit:  44005980.4 i/s - 1.60x  slower

                   mjit_uminus(1)
         after --jit:  68422468.8 i/s
        before --jit:  44333371.0 i/s - 1.54x  slower
```

fix duplicated name

2020-12-16T01:29:48+00:00

Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) &&

2020-12-04T15:45:54+00:00

* Otherwise those tests, etc cannot run on alternative Ruby implementations.

Set allocator on class creation

2020-11-16T22:41:40+00:00

Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```

Add a benchmark for polymorphic ivar setting

2020-11-09T22:05:41+00:00

This benchmark demonstrates the performance of setting an instance
variable when the type of object is constantly changing.  This benchmark
should give us an idea of the performance of ivar setting in a
polymorphic environment

eagerly initialize ivar table when index is small enough

2020-11-09T17:44:16+00:00

When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.

Added benchmark of vm_send by variable [ci skip]

2020-10-28T00:47:46+00:00

Improve the performance of super

2020-09-23T18:52:36+00:00

This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval <<-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson 
Co-authored-by: John Hawthorn

Optimize ObjectSpace.dump_all

2020-09-09T18:11:36+00:00

The two main optimization are:
  - buffer writes for improved performance
  - avoid formatting functions when possible

```

|                   |compare-ruby|built-ruby|
|:------------------|-----------:|---------:|
|dump_all_string    |       1.038|   195.925|
|                   |           -|   188.77x|
|dump_all_file      |      33.453|   139.645|
|                   |           -|     4.17x|
|dump_all_dev_null  |      44.030|   278.552|
|                   |           -|     6.33x|
```

Improved Enumerable::Lazy#zip

2020-07-23T07:57:26+00:00

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|first_ary           |    290.514k|  296.331k|
|                    |           -|     1.02x|
|first_nonary        |    166.954k|  169.178k|
|                    |           -|     1.01x|
|first_noarg         |    299.547k|  305.358k|
|                    |           -|     1.02x|
|take3_ary           |    129.388k|  188.360k|
|                    |           -|     1.46x|
|take3_nonary        |     90.684k|  112.688k|
|                    |           -|     1.24x|
|take3_noarg         |    131.940k|  189.471k|
|                    |           -|     1.44x|
|chain-first_ary     |    195.913k|  286.194k|
|                    |           -|     1.46x|
|chain-first_nonary  |    127.483k|  168.716k|
|                    |           -|     1.32x|
|chain-first_noarg   |    201.252k|  298.562k|
|                    |           -|     1.48x|
|chain-take3_ary     |    101.189k|  183.188k|
|                    |           -|     1.81x|
|chain-take3_nonary  |     75.381k|  112.301k|
|                    |           -|     1.49x|
|chain-take3_noarg   |    101.483k|  192.148k|
|                    |           -|     1.89x|
|block               |    296.696k|  292.877k|
|                    |       1.01x|         -|