diff options
| author | Koichi ITO <koic.ito@gmail.com> | 2024-09-10 12:22:13 +0900 |
|---|---|---|
| committer | git <svn-admin@ruby-lang.org> | 2024-09-20 17:17:21 +0000 |
| commit | 75ed086348da66e4cfe9488ae9ece5462dd2aef9 (patch) | |
| tree | d9d02b7ea4bb7ee0202135328314951e12dd0fed | |
| parent | 69f28ab715a02692fb2a9128bed46044963cbb50 (diff) | |
[ruby/prism] Fix `kDO_LAMBDA` token incompatibility for `Prism::Translation::Parser::Lexer`
## Summary
This PR fixes `kDO_LAMBDA` token incompatibility between Parser gem and `Prism::Translation::Parser` for lambda `do` block.
### Parser gem (Expected)
Returns `kDO_LAMBDA` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
### `Prism::Translation::Parser` (Actual)
Previously, the parser returned `kDO` token when parsing the following:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
After the update, the parser now returns `kDO_LAMBDA` token for the same input:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
## Additional Information
Unfortunately, this kind of edge case doesn't work as expected; `kDO` is returned instead of `kDO_LAMBDA`.
However, since `kDO` is already being returned in this case, there is no change in behavior.
### Parser gem
Returns `tLAMBDA` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision https://github.com/ruby/prism/commit/ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```
### `Prism::Translation::Parser`
Returns `kDO` token:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision https://github.com/ruby/prism/commit/ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```
As the intention is not to address such special cases at this point, a comment has been left indicating that this case still returns `kDO`.
In other words, `kDO_LAMBDA` will now be returned except for edge cases after this PR.
https://github.com/ruby/prism/commit/2ee480654c
| -rw-r--r-- | lib/prism/translation/parser/lexer.rb | 15 | ||||
| -rw-r--r-- | test/prism/ruby/parser_test.rb | 2 |
2 files changed, 15 insertions, 2 deletions
diff --git a/lib/prism/translation/parser/lexer.rb b/lib/prism/translation/parser/lexer.rb index 5ca507f7e7..db7dbb1c87 100644 --- a/lib/prism/translation/parser/lexer.rb +++ b/lib/prism/translation/parser/lexer.rb @@ -187,6 +187,12 @@ module Prism EXPR_BEG = 0x1 # :nodoc: EXPR_LABEL = 0x400 # :nodoc: + # It is used to determine whether `do` is of the token type `kDO` or `kDO_LAMBDA`. + # + # NOTE: In edge cases like `-> (foo = -> (bar) {}) do end`, please note that `kDO` is still returned + # instead of `kDO_LAMBDA`, which is expected: https://github.com/ruby/prism/pull/3046 + LAMBDA_TOKEN_TYPES = [:kDO_LAMBDA, :tLAMBDA, :tLAMBEG] + # The `PARENTHESIS_LEFT` token in Prism is classified as either `tLPAREN` or `tLPAREN2` in the Parser gem. # The following token types are listed as those classified as `tLPAREN`. LPAREN_CONVERSION_TOKEN_TYPES = [ @@ -194,7 +200,7 @@ module Prism :tEQL, :tLPAREN, :tLPAREN2, :tLSHFT, :tNL, :tOP_ASGN, :tOROP, :tPIPE, :tSEMI, :tSTRING_DBEG, :tUMINUS, :tUPLUS ] - private_constant :TYPES, :EXPR_BEG, :EXPR_LABEL, :LPAREN_CONVERSION_TOKEN_TYPES + private_constant :TYPES, :EXPR_BEG, :EXPR_LABEL, :LAMBDA_TOKEN_TYPES, :LPAREN_CONVERSION_TOKEN_TYPES # The Parser::Source::Buffer that the tokens were lexed from. attr_reader :source_buffer @@ -236,6 +242,13 @@ module Prism location = Range.new(source_buffer, offset_cache[token.location.start_offset], offset_cache[token.location.end_offset]) case type + when :kDO + types = tokens.map(&:first) + nearest_lambda_token_type = types.reverse.find { |type| LAMBDA_TOKEN_TYPES.include?(type) } + + if nearest_lambda_token_type == :tLAMBDA + type = :kDO_LAMBDA + end when :tCHARACTER value.delete_prefix!("?") when :tCOMMENT diff --git a/test/prism/ruby/parser_test.rb b/test/prism/ruby/parser_test.rb index 87749efdda..606a0e54f6 100644 --- a/test/prism/ruby/parser_test.rb +++ b/test/prism/ruby/parser_test.rb @@ -268,7 +268,7 @@ module Prism # There are a lot of tokens that have very specific meaning according # to the context of the parser. We don't expose that information in # prism, so we need to normalize these tokens a bit. - if actual_token[0] == :kDO && %i[kDO_BLOCK kDO_LAMBDA].include?(expected_token[0]) + if expected_token[0] == :kDO_BLOCK && actual_token[0] == :kDO actual_token[0] = expected_token[0] end |
