Ignore invalid escapes in regexp comments

Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. Handle nested character classes as well. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294] Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
author: Jeremy Evans <code@jeremyevans.net> 2022-06-06 13:50:03 -0700
committer: GitHub <noreply@github.com> 2022-06-06 13:50:03 -0700
commit: ec3542229b29ec93062e9d90e877ea29d3c19472 (patch)
tree: 1f96f5d38b24e9605bccf074765a23c6a6e018a8 /test/ruby/test_regexp.rb
parent: c85d1cda86d75ee2c3f7b42f22c543409cb5a186 (diff)
1 files changed, 53 insertions, 0 deletions
diff --git a/test/ruby/test_regexp.rb b/test/ruby/test_regexp.rb
index 84687c5380..71d56ad027 100644
--- a/test/ruby/test_regexp.rb
+++ b/test/ruby/test_regexp.rb
@@ -91,6 +91,59 @@ class TestRegexp < Test::Unit::TestCase
     assert_warn('', '[ruby-core:82328] [Bug #13798]') {re.to_s}
   end
 
+  def test_extended_comment_invalid_escape_bug_18294
+    assert_separately([], <<-RUBY)
+      re = / C:\\\\[a-z]{5} # e.g. C:\\users /x
+      assert_match(re, 'C:\\users')
+      assert_not_match(re, 'C:\\user')
+
+      re = /
+        foo  # \\M-ca
+        bar
+      /x
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /
+        f[#o]o  # \\M-ca
+        bar
+      /x
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /
+        f[[:alnum:]#]o  # \\M-ca
+        bar
+      /x
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /
+        f(?# \\M-ca)oo  # \\M-ca
+        bar
+      /x
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /f(?# \\M-ca)oobar/
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /[-(?# fca)]oobar/
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+
+      re = /f(?# ca\0\\M-ca)oobar/
+      assert_match(re, 'foobar')
+      assert_not_match(re, 'foobaz')
+    RUBY
+
+    assert_raise(SyntaxError) {eval "/\\users/x"}
+    assert_raise(SyntaxError) {eval "/[\\users]/x"}
+    assert_raise(SyntaxError) {eval "/(?<\\users)/x"}
+    assert_raise(SyntaxError) {eval "/# \\users/"}
+  end
+
   def test_union
     assert_equal :ok, begin
       Regexp.union(
author	Jeremy Evans <code@jeremyevans.net>	2022-06-06 13:50:03 -0700
committer	GitHub <noreply@github.com>	2022-06-06 13:50:03 -0700
commit	ec3542229b29ec93062e9d90e877ea29d3c19472 (patch)
tree	1f96f5d38b24e9605bccf074765a23c6a6e018a8 /test/ruby/test_regexp.rb
parent	c85d1cda86d75ee2c3f7b42f22c543409cb5a186 (diff)