<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/internal/parse.h, branch v3_2_11</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods</title>
<updated>2022-11-21T00:01:34+00:00</updated>
<author>
<name>yui-knk</name>
<email>spiketeika@gmail.com</email>
</author>
<published>2022-09-23T13:40:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d8601621edcf29e3323b90dcf04b774edd9fb45e'/>
<id>d8601621edcf29e3323b90dcf04b774edd9fb45e</id>
<content type='text'>
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = &lt;&lt;~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = &lt;&lt;~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
</pre>
</div>
</content>
</entry>
<entry>
<title>Add error_tolerant option to RubyVM::AST</title>
<updated>2022-10-08T08:59:11+00:00</updated>
<author>
<name>yui-knk</name>
<email>spiketeika@gmail.com</email>
</author>
<published>2022-09-25T08:53:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=fbbdbdd8911ffb24d98bb71c7c33d24609ce7dfe'/>
<id>fbbdbdd8911ffb24d98bb71c7c33d24609ce7dfe</id>
<content type='text'>
If this option is enabled, SyntaxError is not raised and Node is
returned even if passed script is broken.

[Feature #19013]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If this option is enabled, SyntaxError is not raised and Node is
returned even if passed script is broken.

[Feature #19013]
</pre>
</div>
</content>
</entry>
<entry>
<title>internal/*.h: skip doxygen</title>
<updated>2021-09-10T11:00:06+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2021-06-08T00:40:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=daf0c04a47e5aaede2f2a3e3663148dff96ff770'/>
<id>daf0c04a47e5aaede2f2a3e3663148dff96ff770</id>
<content type='text'>
These contents are purely implementation details, not worth appearing in
CAPI documents. [ci skip]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These contents are purely implementation details, not worth appearing in
CAPI documents. [ci skip]
</pre>
</div>
</content>
</entry>
<entry>
<title>ast.c: Rename "save_script_lines" to "keep_script_lines"</title>
<updated>2021-08-20T07:18:36+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2021-08-20T07:18:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=cad83fa3c4491153df0561b06bb261e25a831d0f'/>
<id>cad83fa3c4491153df0561b06bb261e25a831d0f</id>
<content type='text'>
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
</pre>
</div>
</content>
</entry>
<entry>
<title>ast.rb: RubyVM::AST.parse and .of accepts `save_script_lines: true`</title>
<updated>2021-06-17T17:34:27+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2021-06-17T14:43:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=acae5f363dfaedd9c2873cee68c9498da3c072f5'/>
<id>acae5f363dfaedd9c2873cee68c9498da3c072f5</id>
<content type='text'>
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
</pre>
</div>
</content>
</entry>
<entry>
<title>add #include guard hack</title>
<updated>2020-04-13T07:06:00+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2020-04-10T05:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=4ff3f205408ff8bb413d69151105d301858136ba'/>
<id>4ff3f205408ff8bb413d69151105d301858136ba</id>
<content type='text'>
According to MSVC manual (*1), cl.exe can skip including a header file
when that:

- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.

GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).

Sun C lacked #pragma once for a looong time.  Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.

This changeset modifies header files so that each of them include
strictly one #ifndef...#endif.  I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]

*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
According to MSVC manual (*1), cl.exe can skip including a header file
when that:

- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.

GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).

Sun C lacked #pragma once for a looong time.  Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.

This changeset modifies header files so that each of them include
strictly one #ifndef...#endif.  I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]

*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge pull request #2991 from shyouhei/ruby.h</title>
<updated>2020-04-08T04:28:13+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2020-04-08T04:28:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=9e6e39c3512f7a962c44dc3729c98a0f8be90341'/>
<id>9e6e39c3512f7a962c44dc3729c98a0f8be90341</id>
<content type='text'>
Split ruby.h</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Split ruby.h</pre>
</div>
</content>
</entry>
<entry>
<title>decouple internal.h headers</title>
<updated>2019-12-26T11:45:12+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2019-12-04T08:16:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5e22f873ed26092522f9bfc617d729bac88b284f'/>
<id>5e22f873ed26092522f9bfc617d729bac88b284f</id>
<content type='text'>
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead.  This would significantly
speed up incremental builds.

We take the following inclusion order in this changeset:

1.  "ruby/config.h", where _GNU_SOURCE is defined (must be the very
    first thing among everything).
2.  RUBY_EXTCONF_H if any.
3.  Standard C headers, sorted alphabetically.
4.  Other system headers, maybe guarded by #ifdef
5.  Everything else, sorted alphabetically.

Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead.  This would significantly
speed up incremental builds.

We take the following inclusion order in this changeset:

1.  "ruby/config.h", where _GNU_SOURCE is defined (must be the very
    first thing among everything).
2.  RUBY_EXTCONF_H if any.
3.  Standard C headers, sorted alphabetically.
4.  Other system headers, maybe guarded by #ifdef
5.  Everything else, sorted alphabetically.

Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
</pre>
</div>
</content>
</entry>
<entry>
<title>other minior internal header tweaks</title>
<updated>2019-12-26T11:45:12+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2019-12-03T05:47:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=bf53d6c7d19f877c821901b3288d7f80955ffbb7'/>
<id>bf53d6c7d19f877c821901b3288d7f80955ffbb7</id>
<content type='text'>
These headers need no rewrite.  Just add some minor tweaks, like
addition of #include lines.  Mainly cosmetic.

TIMET_MAX_PLUS_ONE was deleted because the macro was used from only
one place (directly write expression there).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These headers need no rewrite.  Just add some minor tweaks, like
addition of #include lines.  Mainly cosmetic.

TIMET_MAX_PLUS_ONE was deleted because the macro was used from only
one place (directly write expression there).
</pre>
</div>
</content>
</entry>
<entry>
<title>internal/symbol.h rework</title>
<updated>2019-12-26T11:45:12+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2019-12-04T05:15:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=ce2c97d738d6eb374e6dedf6e082b06a61ab6ef9'/>
<id>ce2c97d738d6eb374e6dedf6e082b06a61ab6ef9</id>
<content type='text'>
Some declatations are moved from internal/parse.h, to reflect the fact
that they are defined in symbol.c.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some declatations are moved from internal/parse.h, to reflect the fact
that they are defined in symbol.c.
</pre>
</div>
</content>
</entry>
</feed>
