<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/yjit/src/cruby.rs, branch v3_4_9</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>YJIT: Abort expandarray optimization if method_missing is defined</title>
<updated>2025-12-01T17:45:35+00:00</updated>
<author>
<name>Randy Stauner</name>
<email>randy@r4s6.net</email>
</author>
<published>2025-11-26T02:29:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=28b2e2ede97b569c2f6121e7808975fbd5c6f298'/>
<id>28b2e2ede97b569c2f6121e7808975fbd5c6f298</id>
<content type='text'>
Fixes: [Bug #21707]
[AW: rewrote comments]
Co-authored-by: Alan Wu &lt;alanwu@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes: [Bug #21707]
[AW: rewrote comments]
Co-authored-by: Alan Wu &lt;alanwu@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Generate specialized code for Symbol for objtostring (#12247)</title>
<updated>2024-12-04T21:34:16+00:00</updated>
<author>
<name>Maximillian Polhill</name>
<email>xodene@github.com</email>
</author>
<published>2024-12-04T21:34:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1c4dbb133e8de0e2f194e659e8d3d47171e32643'/>
<id>1c4dbb133e8de0e2f194e659e8d3d47171e32643</id>
<content type='text'>
* YJIT: Generate specialized code for Symbol for objtostring

Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;

* Update yjit/src/codegen.rs

---------

Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;
Co-authored-by: Maxime Chevalier-Boisvert &lt;maximechevalierb@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* YJIT: Generate specialized code for Symbol for objtostring

Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;

* Update yjit/src/codegen.rs

---------

Co-authored-by: John Hawthorn &lt;john@hawthorn.email&gt;
Co-authored-by: Maxime Chevalier-Boisvert &lt;maximechevalierb@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Optimize instructions when creating an array just to call `include?` (#12123)</title>
<updated>2024-11-26T19:31:08+00:00</updated>
<author>
<name>Randy Stauner</name>
<email>randy.stauner@shopify.com</email>
</author>
<published>2024-11-26T19:31:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1dd40ec18a55ff46f52d0ba44ff5d7923f57c08f'/>
<id>1dd40ec18a55ff46f52d0ba44ff5d7923f57c08f</id>
<content type='text'>
* Add opt_duparray_send insn to skip the allocation on `#include?`

If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.

This adds a BOP for include? and tracks redefinition for it on Array.

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;

* YJIT: Implement opt_duparray_send include_p

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;

* Update opt_newarray_send to support simple forms of include?(arg)

Similar to opt_duparray_send but for non-static arrays.

* YJIT: Implement opt_newarray_send include_p

---------

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add opt_duparray_send insn to skip the allocation on `#include?`

If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.

This adds a BOP for include? and tracks redefinition for it on Array.

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;

* YJIT: Implement opt_duparray_send include_p

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;

* Update opt_newarray_send to support simple forms of include?(arg)

Similar to opt_duparray_send but for non-static arrays.

* YJIT: Implement opt_newarray_send include_p

---------

Co-authored-by: Andrew Novoselac &lt;andrew.novoselac@shopify.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Add compilation log (#11818)</title>
<updated>2024-10-17T21:36:43+00:00</updated>
<author>
<name>Kevin Menard</name>
<email>kevin@nirvdrum.com</email>
</author>
<published>2024-10-17T21:36:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=158b8cb52ec58c8ef8f5241a9db1c6dea4285253'/>
<id>158b8cb52ec58c8ef8f5241a9db1c6dea4285253</id>
<content type='text'>
* YJIT: Add `--yjit-compilation-log` flag to print out the compilation log at exit.

* YJIT: Add an option to enable the compilation log at runtime.

* YJIT: Fix a typo in the `IseqPayload` docs.

* YJIT: Add stubs for getting the YJIT compilation log in memory.

* YJIT: Add a compilation log based on a circular buffer to cap the log size.

* YJIT: Allow specifying either a file or directory name for the YJIT compilation log.

The compilation log will be populated as compilation events occur. If a directory is supplied, then a filename based on the PID will be used as the write target. If a file name is supplied instead, the log will be written to that file.

* YJIT: Add JIT compilation of C function substitutions to the compilation log.

* YJIT: Add compilation events to the circular buffer even if output is sent to a file.

Previously, the two modes were treated as being exclusive of one another. However, it could be beneficial to log all events to a file while also allowing for direct access of the last N events via `RubyVM::YJIT.compilation_log`.

* YJIT: Make timestamps the first element in the YJIT compilation log tuple.

* YJIT: Stream log to stderr if `--yjit-compilation-log` is supplied without an argument.

* YJIT: Eagerly compute compilation log messages to avoid hanging on to references that may GC.

* YJIT: Log all compiled blocks, not just the method entry points.

* YJIT: Remove all compilation events other than block compilation to slim down the log.

* YJIT: Replace circular buffer iterator with a consuming loop.

* YJIT: Support `--yjit-compilation-log=quiet` as a way to activate the in-memory log without printing it.

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;

* YJIT: Promote the compilation log to being the one YJIT log.

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;

* Update doc/yjit/yjit.md

* Update doc/yjit/yjit.md

---------

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;
Co-authored-by: Maxime Chevalier-Boisvert &lt;maximechevalierb@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* YJIT: Add `--yjit-compilation-log` flag to print out the compilation log at exit.

* YJIT: Add an option to enable the compilation log at runtime.

* YJIT: Fix a typo in the `IseqPayload` docs.

* YJIT: Add stubs for getting the YJIT compilation log in memory.

* YJIT: Add a compilation log based on a circular buffer to cap the log size.

* YJIT: Allow specifying either a file or directory name for the YJIT compilation log.

The compilation log will be populated as compilation events occur. If a directory is supplied, then a filename based on the PID will be used as the write target. If a file name is supplied instead, the log will be written to that file.

* YJIT: Add JIT compilation of C function substitutions to the compilation log.

* YJIT: Add compilation events to the circular buffer even if output is sent to a file.

Previously, the two modes were treated as being exclusive of one another. However, it could be beneficial to log all events to a file while also allowing for direct access of the last N events via `RubyVM::YJIT.compilation_log`.

* YJIT: Make timestamps the first element in the YJIT compilation log tuple.

* YJIT: Stream log to stderr if `--yjit-compilation-log` is supplied without an argument.

* YJIT: Eagerly compute compilation log messages to avoid hanging on to references that may GC.

* YJIT: Log all compiled blocks, not just the method entry points.

* YJIT: Remove all compilation events other than block compilation to slim down the log.

* YJIT: Replace circular buffer iterator with a consuming loop.

* YJIT: Support `--yjit-compilation-log=quiet` as a way to activate the in-memory log without printing it.

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;

* YJIT: Promote the compilation log to being the one YJIT log.

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;

* Update doc/yjit/yjit.md

* Update doc/yjit/yjit.md

---------

Co-authored-by: Randy Stauner &lt;randy.stauner@shopify.com&gt;
Co-authored-by: Maxime Chevalier-Boisvert &lt;maximechevalierb@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Merge `impl VALUE` blocks [ci skip]</title>
<updated>2024-10-02T17:47:35+00:00</updated>
<author>
<name>Alan Wu</name>
<email>alanwu@ruby-lang.org</email>
</author>
<published>2024-10-02T17:47:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=2f5ab4c4b8cea493022655577f70eb5d0256c64e'/>
<id>2f5ab4c4b8cea493022655577f70eb5d0256c64e</id>
<content type='text'>
Reported by Kevin Menard.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reported by Kevin Menard.
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Encode doubles to VALUE objects and move stat generation to rust (#11388)</title>
<updated>2024-08-28T02:24:17+00:00</updated>
<author>
<name>Randy Stauner</name>
<email>randy.stauner@shopify.com</email>
</author>
<published>2024-08-28T02:24:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=942317ebf8a5e4a85189411ee4d48267f21ecca5'/>
<id>942317ebf8a5e4a85189411ee4d48267f21ecca5</id>
<content type='text'>
* YJIT: Encode doubles to VALUE objects and move stat generation to rust

Stats that can now be generated from rust have been moved there.

* Move object_shape_count call for runtime_stats to rust

This reduces the ruby method to a single primitive.

* Change hash_aset_usize from macro to function</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* YJIT: Encode doubles to VALUE objects and move stat generation to rust

Stats that can now be generated from rust have been moved there.

* Move object_shape_count call for runtime_stats to rust

This reduces the ruby method to a single primitive.

* Change hash_aset_usize from macro to function</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Enhance the `String#&lt;&lt;` method substitution to handle integer codepoint values. (#11032)</title>
<updated>2024-08-02T19:45:22+00:00</updated>
<author>
<name>Kevin Menard</name>
<email>kevin@nirvdrum.com</email>
</author>
<published>2024-08-02T19:45:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=04a6165ac07f8f2107fbbd3a5665944fb27bc092'/>
<id>04a6165ac07f8f2107fbbd3a5665944fb27bc092</id>
<content type='text'>
* Document why we need to explicitly spill registers.

* Simplify passing a byte value to `str_buf_cat`.

* YJIT: Enhance the `String#&lt;&lt;` method substitution to handle integer codepoint values.

* YJIT: Move runtime type check into YJIT.

Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Document why we need to explicitly spill registers.

* Simplify passing a byte value to `str_buf_cat`.

* YJIT: Enhance the `String#&lt;&lt;` method substitution to handle integer codepoint values.

* YJIT: Move runtime type check into YJIT.

Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.</pre>
</div>
</content>
</entry>
<entry>
<title>Expand opt_newarray_send to support Array#pack with buffer keyword arg</title>
<updated>2024-07-29T20:26:58+00:00</updated>
<author>
<name>Randy Stauner</name>
<email>randy.stauner@shopify.com</email>
</author>
<published>2024-07-20T17:03:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=acbb8d4fb56ac3b5894991760a075dbef78d10e3'/>
<id>acbb8d4fb56ac3b5894991760a075dbef78d10e3</id>
<content type='text'>
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.

$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'

before:

```
== disasm: #&lt;ISeq:&lt;main&gt;@-e:1 (1,0)-(1,34)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 &lt;calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE&gt;
0007 newarray                               1
0009 putchilledstring                       "E*"
0011 getlocal_WC_0                          b@0
0013 opt_send_without_block                 &lt;calldata!mid:pack, argc:2, kw:[#&lt;Symbol:0x000000000023110c&gt;], KWARG&gt;
0015 leave
```

after:

```
== disasm: #&lt;ISeq:&lt;main&gt;@-e:1 (1,0)-(1,34)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 &lt;calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE&gt;
0007 putchilledstring                       "E*"
0009 getlocal                               b@0, 0
0012 opt_newarray_send                      3, 5
0015 leave
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.

$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'

before:

```
== disasm: #&lt;ISeq:&lt;main&gt;@-e:1 (1,0)-(1,34)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 &lt;calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE&gt;
0007 newarray                               1
0009 putchilledstring                       "E*"
0011 getlocal_WC_0                          b@0
0013 opt_send_without_block                 &lt;calldata!mid:pack, argc:2, kw:[#&lt;Symbol:0x000000000023110c&gt;], KWARG&gt;
0015 leave
```

after:

```
== disasm: #&lt;ISeq:&lt;main&gt;@-e:1 (1,0)-(1,34)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 &lt;calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE&gt;
0007 putchilledstring                       "E*"
0009 getlocal                               b@0, 0
0012 opt_newarray_send                      3, 5
0015 leave
```
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Fix `cargo doc --document-private-items` warnings [ci skip]</title>
<updated>2024-06-28T17:44:35+00:00</updated>
<author>
<name>Alan Wu</name>
<email>alanwu@ruby-lang.org</email>
</author>
<published>2024-06-28T17:44:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3e14fe7c2115a71ac46bca50443c12c4be516efc'/>
<id>3e14fe7c2115a71ac46bca50443c12c4be516efc</id>
<content type='text'>
Mostly putting angle brackets around links to follow markdown syntax.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mostly putting angle brackets around links to follow markdown syntax.
</pre>
</div>
</content>
</entry>
<entry>
<title>Optimized forwarding callers and callees</title>
<updated>2024-06-18T16:28:25+00:00</updated>
<author>
<name>Aaron Patterson</name>
<email>tenderlove@ruby-lang.org</email>
</author>
<published>2024-04-15T17:48:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=cdf33ed5f37f9649c482c3ba1d245f0d80ac01ce'/>
<id>cdf33ed5f37f9649c482c3ba1d245f0d80ac01ce</id>
<content type='text'>
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          -&gt;  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #&lt;ISeq:delegator@-e:1 (1,0)-(1,39)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   &lt;calldata!mid:delegatee, argc:0, FCALL|FORWARDING&gt;, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -&gt; 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #&lt;ISeq:delegator@-e:1 (1,0)-(1,39)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   &lt;calldata!mid:delegatee, argc:0, FCALL|FORWARDING&gt;, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -&gt; 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn &lt;john@hawthorn.email&gt;
Co-Authored-By: Alan Wu &lt;XrXr@users.noreply.github.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          -&gt;  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #&lt;ISeq:delegator@-e:1 (1,0)-(1,39)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   &lt;calldata!mid:delegatee, argc:0, FCALL|FORWARDING&gt;, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -&gt; 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #&lt;ISeq:delegator@-e:1 (1,0)-(1,39)&gt;
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   &lt;calldata!mid:delegatee, argc:0, FCALL|FORWARDING&gt;, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -&gt; 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn &lt;john@hawthorn.email&gt;
Co-Authored-By: Alan Wu &lt;XrXr@users.noreply.github.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
