<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/test/ruby/test_regexp.rb, branch v3_4_9</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>[Backport #13671] Fix that "ss" in look-behind causes syntax error</title>
<updated>2025-11-06T18:25:26+00:00</updated>
<author>
<name>K.Takata</name>
<email>kentkt@csc.jp</email>
</author>
<published>2019-01-25T09:54:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3150a1d989d81089c7da7d0491321e370e71f482'/>
<id>3150a1d989d81089c7da7d0491321e370e71f482</id>
<content type='text'>
Fixes k-takata/Onigmo#92.

This fix was ported from oniguruma:
https://github.com/kkos/oniguruma/commit/257082dac8c6019198b56324012f0bd1830ff4ba

https://github.com/k-takata/Onigmo/commit/b1a5445fbeba97b3e94a733c2ce11c033453af73
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes k-takata/Onigmo#92.

This fix was ported from oniguruma:
https://github.com/kkos/oniguruma/commit/257082dac8c6019198b56324012f0bd1830ff4ba

https://github.com/k-takata/Onigmo/commit/b1a5445fbeba97b3e94a733c2ce11c033453af73
</pre>
</div>
</content>
</entry>
<entry>
<title>Make word prop match join_control to conform to UTS 18</title>
<updated>2025-08-27T22:17:15+00:00</updated>
<author>
<name>Janosch Müller</name>
<email>janosch84@gmail.com</email>
</author>
<published>2023-04-13T18:43:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5a42d267bfabc86f86cae2e83de24b1b86bc316a'/>
<id>5a42d267bfabc86f86cae2e83de24b1b86bc316a</id>
<content type='text'>
See &lt;https://bugs.ruby-lang.org/issues/19417#note-3&gt;.

https://unicode.org/reports/tr18/#word states word should match join_control chars.

It did not previously:

```ruby
[*0x0..0xD799, *0xE000..0x10FFFF].map { |n| n.chr 'utf-8' } =&gt; all_chars
all_chars.grep(/\p{join_control}/) =&gt; jc
jc.count # =&gt; 2
jc.grep(/\p{word}/).count # =&gt; 0
```
[Backport #19417]

---

Backporting note: I regenerated `enc/unicode/15.0.0/name2ctype.h` using
`make update-unicode`.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
See &lt;https://bugs.ruby-lang.org/issues/19417#note-3&gt;.

https://unicode.org/reports/tr18/#word states word should match join_control chars.

It did not previously:

```ruby
[*0x0..0xD799, *0xE000..0x10FFFF].map { |n| n.chr 'utf-8' } =&gt; all_chars
all_chars.grep(/\p{join_control}/) =&gt; jc
jc.count # =&gt; 2
jc.grep(/\p{word}/).count # =&gt; 0
```
[Backport #19417]

---

Backporting note: I regenerated `enc/unicode/15.0.0/name2ctype.h` using
`make update-unicode`.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix regex timeout double-free after stack_double</title>
<updated>2024-11-12T07:33:21+00:00</updated>
<author>
<name>John Hawthorn</name>
<email>john@hawthorn.email</email>
</author>
<published>2024-11-05T02:05:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8409edc4971f34cf0d77c375909c5b8f7b1e058a'/>
<id>8409edc4971f34cf0d77c375909c5b8f7b1e058a</id>
<content type='text'>
As of 10574857ce167869524b97ee862b610928f6272f, it's possible to crash
on a double free due to `stk_alloc` AKA `msa-&gt;stack_p` being freed
twice, once at the end of match_at and a second time in `FREE_MATCH_ARG`
in the parent caller.

Fixes [Bug #20886]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As of 10574857ce167869524b97ee862b610928f6272f, it's possible to crash
on a double free due to `stk_alloc` AKA `msa-&gt;stack_p` being freed
twice, once at the end of match_at and a second time in `FREE_MATCH_ARG`
in the parent caller.

Fixes [Bug #20886]
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix memory leak in Regexp capture group when timeout</title>
<updated>2024-07-25T13:23:49+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2024-07-24T19:16:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=10574857ce167869524b97ee862b610928f6272f'/>
<id>10574857ce167869524b97ee862b610928f6272f</id>
<content type='text'>
[Bug #20650]

The capture group allocates memory that is leaked when it times out.

For example:

    re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        re =~ str
      rescue Regexp::TimeoutError
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    34688
    56416
    78288
    100368
    120784
    140704
    161904
    183568
    204320
    224800

After:

    16288
    16288
    16880
    16896
    16912
    16928
    16944
    17184
    17184
    17200
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Bug #20650]

The capture group allocates memory that is leaked when it times out.

For example:

    re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        re =~ str
      rescue Regexp::TimeoutError
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    34688
    56416
    78288
    100368
    120784
    140704
    161904
    183568
    204320
    224800

After:

    16288
    16288
    16880
    16896
    16912
    16928
    16944
    17184
    17184
    17200
</pre>
</div>
</content>
</entry>
<entry>
<title>Add MatchData#bytebegin and MatchData#byteend</title>
<updated>2024-07-16T05:48:06+00:00</updated>
<author>
<name>Shugo Maeda</name>
<email>shugo@ruby-lang.org</email>
</author>
<published>2024-06-12T02:35:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e048a073a3cba04576b8f6a1673c283e4e20cd90'/>
<id>e048a073a3cba04576b8f6a1673c283e4e20cd90</id>
<content type='text'>
These methods return the byte-based offset of the beginning or end of the specified match.

[Feature #20576]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These methods return the byte-based offset of the beginning or end of the specified match.

[Feature #20576]
</pre>
</div>
</content>
</entry>
<entry>
<title>TestRegexp#test_match_cache_positive_look_behind: Extend the timeout limit</title>
<updated>2024-06-07T14:29:59+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2024-06-07T14:29:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=239378613b710f96bd078301bb4061078e088524'/>
<id>239378613b710f96bd078301bb4061078e088524</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>TestRegexp#test_timeout_shorter_than_global: Extend the timeout limit</title>
<updated>2024-06-07T14:11:10+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2024-06-07T14:11:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=36b3fea0ff02645af071097e11801e6d2293bc95'/>
<id>36b3fea0ff02645af071097e11801e6d2293bc95</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>TestRegexp#test_s_timeout: accept timeout errors more tolerantly</title>
<updated>2024-06-07T13:37:08+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2024-06-07T13:34:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=91b86f1b4f1b6b269cca800fbbe53415f0d8d173'/>
<id>91b86f1b4f1b6b269cca800fbbe53415f0d8d173</id>
<content type='text'>
This test seems flaky on macOS GitHub Actions
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This test seems flaky on macOS GitHub Actions
</pre>
</div>
</content>
</entry>
<entry>
<title>Don't use assert_separately in Bug 20453 test</title>
<updated>2024-04-25T15:28:56+00:00</updated>
<author>
<name>Daniel Colson</name>
<email>danieljamescolson@gmail.com</email>
</author>
<published>2024-04-25T14:45:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3a5d9553a7b2c21d121160b1646e43884825ede0'/>
<id>3a5d9553a7b2c21d121160b1646e43884825ede0</id>
<content type='text'>
https://github.com/ruby/ruby/pull/10630#discussion_r1579565056

The PR was merged before I had a chance to address this feedback.
`assert_separately` is not necessary for this test if I don't use a
global timeout.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/ruby/ruby/pull/10630#discussion_r1579565056

The PR was merged before I had a chance to address this feedback.
`assert_separately` is not necessary for this test if I don't use a
global timeout.
</pre>
</div>
</content>
</entry>
<entry>
<title>[Bug #20453] segfault in Regexp timeout</title>
<updated>2024-04-25T14:28:18+00:00</updated>
<author>
<name>Daniel Colson</name>
<email>danieljamescolson@gmail.com</email>
</author>
<published>2024-04-25T02:20:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d292a9b98ce03c76dbe13138d20b9fbf613cc02d'/>
<id>d292a9b98ce03c76dbe13138d20b9fbf613cc02d</id>
<content type='text'>
https://bugs.ruby-lang.org/issues/20228 started freeing `stk_base` to
avoid a memory leak. But `stk_base` is sometimes stack allocated (using
`xalloca`), so the free only works if the regex stack has grown enough
to hit `stack_double` (which uses `xmalloc` and `xrealloc`).

To reproduce the problem on master and 3.3.1:

```ruby
Regexp.timeout = 0.001
/^(a*)x$/ =~ "a" * 1000000 + "x"'
```

Some details about this potential fix:

`stk_base == stk_alloc` on
[init](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1153),
so if `stk_base != stk_alloc` we can be sure we called
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1210)
and it's safe to free. It's also safe to free if we've
[saved](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1187-L1189)
the stack to `msa-&gt;stack_p`, since we do the `stk_base != stk_alloc`
check before saving.

This matches the check we do inside
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1221)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://bugs.ruby-lang.org/issues/20228 started freeing `stk_base` to
avoid a memory leak. But `stk_base` is sometimes stack allocated (using
`xalloca`), so the free only works if the regex stack has grown enough
to hit `stack_double` (which uses `xmalloc` and `xrealloc`).

To reproduce the problem on master and 3.3.1:

```ruby
Regexp.timeout = 0.001
/^(a*)x$/ =~ "a" * 1000000 + "x"'
```

Some details about this potential fix:

`stk_base == stk_alloc` on
[init](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1153),
so if `stk_base != stk_alloc` we can be sure we called
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1210)
and it's safe to free. It's also safe to free if we've
[saved](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1187-L1189)
the stack to `msa-&gt;stack_p`, since we do the `stk_base != stk_alloc`
check before saving.

This matches the check we do inside
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1221)
</pre>
</div>
</content>
</entry>
</feed>
