summaryrefslogtreecommitdiff
path: root/test/ruby
diff options
context:
space:
mode:
authorJeremy Evans <code@jeremyevans.net>2023-01-27 11:08:49 -0800
committerJeremy Evans <code@jeremyevans.net>2023-01-30 08:51:12 -0800
commiteccfc978fd6f65332eb70c9a46fbb4d5110bbe0a (patch)
treec739a7f9659ec7d73ea3eabda184e321aa4aeaf8 /test/ruby
parent3f54d09a5b8b6e4fd734abc8911e170d5967b5b0 (diff)
Fix parsing of regexps that toggle extended mode on/off inside regexp
This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit didn't handle cases where extended mode was turned on/off inside the regexp. There are two ways to turn extended mode on/off: ``` /(?-x:#y)#z /x =~ '#y' /(?-x)#y(?x)#z /x =~ '#y' ``` These can be nested inside the same regexp: ``` /(?-x:(?x)#x (?-x)#y)#z /x =~ '#y' ``` As you can probably imagine, this makes handling these regexps somewhat complex. Due to the nesting inside portions of regexps, the unassign_nonascii function needs to be recursive. In recursive mode, it needs to track both opening and closing parentheses, similar to how it already tracked opening and closing brackets for character classes. When scanning the regexp and coming to `(?` not followed by `#`, scan for options, and use `x` and `i` to determine whether to turn on or off extended mode. For `:`, indicting only the current regexp section should have the extended mode switched, recurse with the extended mode set or unset. For `)`, indicating the remainder of the regexp (or current regexp portion if already recursing) should turn extended mode on or off, just change the extended mode flag and keep scanning. While testing this, I noticed that `a`, `d`, and `u` are accepted as options, in addition to `i`, `m`, and `x`, but I can't see where those options are documented. I'm not sure whether or not handling `a`, `d`, and `u` as options is a bug. Fixes [Bug #19379]
Notes
Notes: Merged: https://github.com/ruby/ruby/pull/7192
Diffstat (limited to 'test/ruby')
-rw-r--r--test/ruby/test_regexp.rb56
1 files changed, 56 insertions, 0 deletions
diff --git a/test/ruby/test_regexp.rb b/test/ruby/test_regexp.rb
index 98bf41d2f1..c871580aeb 100644
--- a/test/ruby/test_regexp.rb
+++ b/test/ruby/test_regexp.rb
@@ -144,6 +144,62 @@ class TestRegexp < Test::Unit::TestCase
assert_raise(SyntaxError) {eval "/# \\users/"}
end
+ def test_nonextended_section_of_extended_regexp_bug_19379
+ assert_separately([], <<-'RUBY')
+ re = /(?-x:#)/x
+ assert_match(re, '#')
+ assert_not_match(re, '-')
+
+ re = /(?xi:#
+ y)/
+ assert_match(re, 'Y')
+ assert_not_match(re, '-')
+
+ re = /(?mix:#
+ y)/
+ assert_match(re, 'Y')
+ assert_not_match(re, '-')
+
+ re = /(?x-im:#
+ y)/i
+ assert_match(re, 'y')
+ assert_not_match(re, 'Y')
+
+ re = /(?-imx:(?xim:#
+ y))/x
+ assert_match(re, 'y')
+ assert_not_match(re, '-')
+
+ re = /(?x)#
+ y/
+ assert_match(re, 'y')
+ assert_not_match(re, 'Y')
+
+ re = /(?mx-i)#
+ y/i
+ assert_match(re, 'y')
+ assert_not_match(re, 'Y')
+
+ re = /(?-imx:(?xim:#
+ (?-x)y#))/x
+ assert_match(re, 'Y#')
+ assert_not_match(re, '-#')
+
+ re = /(?imx:#
+ (?-xim:#(?im)#(?x)#
+ )#
+ (?x)#
+ y)/
+ assert_match(re, '###Y')
+ assert_not_match(re, '###-')
+
+ re = %r{#c-\w+/comment/[\w-]+}
+ re = %r{https?://[^/]+#{re}}x
+ assert_match(re, 'http://foo#c-x/comment/bar')
+ assert_not_match(re, 'http://foo#cx/comment/bar')
+ RUBY
+ end
+
def test_union
assert_equal :ok, begin
Regexp.union(