<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/regparse.c, branch v3_2_11</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>merge revision(s) 37ed86fd3c798e298fad9db6e7df1f3f45e1e03b: [Backport #20161]</title>
<updated>2024-01-18T02:51:58+00:00</updated>
<author>
<name>nagachika</name>
<email>nagachika@ruby-lang.org</email>
</author>
<published>2024-01-18T02:51:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=a26b41bf7a2db69b0889ed599f568a4ba2529eba'/>
<id>a26b41bf7a2db69b0889ed599f568a4ba2529eba</id>
<content type='text'>
	Fix memory leak in regexp grapheme clusters

	[Bug #20161]

	The cc-&gt;mbuf gets overwritten, so we need to free it to not leak memory.

	For example:

	    str = "hello world".encode(Encoding::UTF_32LE)

	    10.times do
	      1_000.times do
	        str.grapheme_clusters
	      end

	      puts `ps -o rss= -p #{$$}`
	    end

	Before:

	    15536
	    15760
	    15920
	    16144
	    16304
	    16480
	    16640
	    16784
	    17008
	    17280

	After:

	    15584
	    15584
	    15760
	    15824
	    15888
	    15888
	    15888
	    15888
	    16048
	    16112
	---
	 regparse.c | 3 ++-
	 1 file changed, 2 insertions(+), 1 deletion(-)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
	Fix memory leak in regexp grapheme clusters

	[Bug #20161]

	The cc-&gt;mbuf gets overwritten, so we need to free it to not leak memory.

	For example:

	    str = "hello world".encode(Encoding::UTF_32LE)

	    10.times do
	      1_000.times do
	        str.grapheme_clusters
	      end

	      puts `ps -o rss= -p #{$$}`
	    end

	Before:

	    15536
	    15760
	    15920
	    16144
	    16304
	    16480
	    16640
	    16784
	    17008
	    17280

	After:

	    15584
	    15584
	    15760
	    15824
	    15888
	    15888
	    15888
	    15888
	    16048
	    16112
	---
	 regparse.c | 3 ++-
	 1 file changed, 2 insertions(+), 1 deletion(-)
</pre>
</div>
</content>
</entry>
<entry>
<title>Prevent potential buffer overrun in onigmo</title>
<updated>2022-10-25T08:02:43+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2022-10-25T06:45:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1d2d25dcadda0764f303183ac091d0c87b432566'/>
<id>1d2d25dcadda0764f303183ac091d0c87b432566</id>
<content type='text'>
A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun
if incomplete bytes of a UTF-8 character is placed at the end of a
string. Because this pattern is used in several places in onigmo,
this change fixes the issue in the side of `enclen`: the function should
not return a number that is larger than `pend - p`.

Co-Authored-By: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun
if incomplete bytes of a UTF-8 character is placed at the end of a
string. Because this pattern is used in several places in onigmo,
this change fixes the issue in the side of `enclen`: the function should
not return a number that is larger than `pend - p`.

Co-Authored-By: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Prevent buffer overrun in regparse.c</title>
<updated>2022-10-25T04:20:25+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2022-10-25T04:20:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=902e459b733a92c3ccdd8762427f71dded997e7c'/>
<id>902e459b733a92c3ccdd8762427f71dded997e7c</id>
<content type='text'>
A regexp that ends with an escape following an incomplete UTF-8 char
might cause buffer overrun. Found by OSS-Fuzz.

```
$ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0          \\\xE6".b)'
==296213== Memcheck, a memory error detector
==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b)
==296213==
==296213== Warning: client switching stacks?  SP change: 0x1ffe8020e0 --&gt; 0x1ffeffff10
==296213==          to suppress, use: --max-stackframe=8379952 or greater
==296213== Invalid read of size 1
==296213==    at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x339568: memcpy (string_fortified.h:29)
==296213==    by 0x339568: onig_strcpy (regparse.c:271)
==296213==    by 0x339568: onig_node_str_cat (regparse.c:1413)
==296213==    by 0x33CBA0: parse_exp (regparse.c:6198)
==296213==    by 0x33EDE4: parse_branch (regparse.c:6511)
==296213==    by 0x33EEA2: parse_subexp (regparse.c:6544)
==296213==    by 0x34019C: parse_regexp (regparse.c:6593)
==296213==    by 0x34019C: onig_parse_make_tree (regparse.c:6638)
==296213==    by 0x32782D: onig_compile_ruby (regcomp.c:5779)
==296213==    by 0x313EFA: onig_new_with_source (re.c:876)
==296213==    by 0x313EFA: make_regexp (re.c:900)
==296213==    by 0x313EFA: rb_reg_initialize (re.c:3136)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==  Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd
==296213==    at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x20FA7B: objspace_xmalloc0 (gc.c:12146)
==296213==    by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132)
==296213==    by 0x31359D: unescape_escaped_nonascii (re.c:2690)
==296213==    by 0x313A9D: unescape_nonascii (re.c:2869)
==296213==    by 0x313A9D: rb_reg_preprocess (re.c:2992)
==296213==    by 0x313DFC: rb_reg_initialize (re.c:3109)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==    by 0x3E957B: rb_call (vm_eval.c:877)
==296213==    by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074)
==296213==    by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991)
==296213==
==296213==
==296213== HEAP SUMMARY:
==296213==     in use at exit: 35,476,538 bytes in 9,489 blocks
==296213==   total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated
==296213==
==296213== LEAK SUMMARY:
==296213==    definitely lost: 316,081 bytes in 2,989 blocks
==296213==    indirectly lost: 136,808 bytes in 2,361 blocks
==296213==      possibly lost: 1,048,624 bytes in 3 blocks
==296213==    still reachable: 33,975,025 bytes in 4,136 blocks
==296213==         suppressed: 0 bytes in 0 blocks
==296213== Rerun with --leak-check=full to see details of leaked memory
==296213==
==296213== For lists of detected and suppressed errors, rerun with: -s
==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A regexp that ends with an escape following an incomplete UTF-8 char
might cause buffer overrun. Found by OSS-Fuzz.

```
$ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0          \\\xE6".b)'
==296213== Memcheck, a memory error detector
==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b)
==296213==
==296213== Warning: client switching stacks?  SP change: 0x1ffe8020e0 --&gt; 0x1ffeffff10
==296213==          to suppress, use: --max-stackframe=8379952 or greater
==296213== Invalid read of size 1
==296213==    at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x339568: memcpy (string_fortified.h:29)
==296213==    by 0x339568: onig_strcpy (regparse.c:271)
==296213==    by 0x339568: onig_node_str_cat (regparse.c:1413)
==296213==    by 0x33CBA0: parse_exp (regparse.c:6198)
==296213==    by 0x33EDE4: parse_branch (regparse.c:6511)
==296213==    by 0x33EEA2: parse_subexp (regparse.c:6544)
==296213==    by 0x34019C: parse_regexp (regparse.c:6593)
==296213==    by 0x34019C: onig_parse_make_tree (regparse.c:6638)
==296213==    by 0x32782D: onig_compile_ruby (regcomp.c:5779)
==296213==    by 0x313EFA: onig_new_with_source (re.c:876)
==296213==    by 0x313EFA: make_regexp (re.c:900)
==296213==    by 0x313EFA: rb_reg_initialize (re.c:3136)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==  Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd
==296213==    at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x20FA7B: objspace_xmalloc0 (gc.c:12146)
==296213==    by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132)
==296213==    by 0x31359D: unescape_escaped_nonascii (re.c:2690)
==296213==    by 0x313A9D: unescape_nonascii (re.c:2869)
==296213==    by 0x313A9D: rb_reg_preprocess (re.c:2992)
==296213==    by 0x313DFC: rb_reg_initialize (re.c:3109)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==    by 0x3E957B: rb_call (vm_eval.c:877)
==296213==    by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074)
==296213==    by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991)
==296213==
==296213==
==296213== HEAP SUMMARY:
==296213==     in use at exit: 35,476,538 bytes in 9,489 blocks
==296213==   total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated
==296213==
==296213== LEAK SUMMARY:
==296213==    definitely lost: 316,081 bytes in 2,989 blocks
==296213==    indirectly lost: 136,808 bytes in 2,361 blocks
==296213==      possibly lost: 1,048,624 bytes in 3 blocks
==296213==    still reachable: 33,975,025 bytes in 4,136 blocks
==296213==         suppressed: 0 bytes in 0 blocks
==296213== Rerun with --leak-check=full to see details of leaked memory
==296213==
==296213== For lists of detected and suppressed errors, rerun with: -s
==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```</pre>
</div>
</content>
</entry>
<entry>
<title>Fix some UBSAN false positives (#6115)</title>
<updated>2022-07-12T18:48:10+00:00</updated>
<author>
<name>Kevin Backhouse</name>
<email>kevinbackhouse@github.com</email>
</author>
<published>2022-07-12T18:48:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8c1808151f4c1b44e8b0fe935c571f05b2641b8b'/>
<id>8c1808151f4c1b44e8b0fe935c571f05b2641b8b</id>
<content type='text'>
* Fix some UBSAN false positives.
* ruby tool/update-deps --fix</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Fix some UBSAN false positives.
* ruby tool/update-deps --fix</pre>
</div>
</content>
</entry>
<entry>
<title>regparse.c: Suppress false-positive warnings of GCC 12.1</title>
<updated>2022-06-21T02:32:02+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2022-06-13T06:47:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=f44547c99913093b397e15a0240b7bce3f7c53ca'/>
<id>f44547c99913093b397e15a0240b7bce3f7c53ca</id>
<content type='text'>
http://rubyci.s3.amazonaws.com/arch/ruby-master/log/20220613T030003Z.log.html.gz
```
regparse.c:264:15: warning: array subscript 56 is outside array bounds of ‘Node[1]’ {aka ‘struct _Node[1]’} [-Warray-bounds]
```

and

```
/usr/include/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ pointer overflow between offset 32 and size [9223372036854775792, 9223372036854775807] [-Warray-bounds]
```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
http://rubyci.s3.amazonaws.com/arch/ruby-master/log/20220613T030003Z.log.html.gz
```
regparse.c:264:15: warning: array subscript 56 is outside array bounds of ‘Node[1]’ {aka ‘struct _Node[1]’} [-Warray-bounds]
```

and

```
/usr/include/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ pointer overflow between offset 32 and size [9223372036854775792, 9223372036854775807] [-Warray-bounds]
```
</pre>
</div>
</content>
</entry>
<entry>
<title>Add printf-style format attribute to oniguruma functions</title>
<updated>2021-09-27T10:02:45+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2021-09-27T10:02:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=efa0c31ce518bb26aca80392cce7fc5471ca9fef'/>
<id>efa0c31ce518bb26aca80392cce7fc5471ca9fef</id>
<content type='text'>
Also make the format string compatible with literal strings which
are const arrays of "plain" chars.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also make the format string compatible with literal strings which
are const arrays of "plain" chars.</pre>
</div>
</content>
</entry>
<entry>
<title>Do not reduce quantifiers if it affects which text will be matched</title>
<updated>2020-12-02T17:42:02+00:00</updated>
<author>
<name>Jeremy Evans</name>
<email>code@jeremyevans.net</email>
</author>
<published>2020-11-23T22:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=9e73177d5362c1986814f411961b712967dc5f97'/>
<id>9e73177d5362c1986814f411961b712967dc5f97</id>
<content type='text'>
Quantifier reduction when using +?)* and +?)+ should not be done
as it affects which text will be matched.

This removes the need for the RQ_PQ_Q ReduceType, so remove the
enum entry and related switch case.

Test that these are the only two patterns affected by testing all
quantifier reduction tuples for both the captured and uncaptured
cases and making sure the matched text is the same for both.

Fixes [Bug #17341]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Quantifier reduction when using +?)* and +?)+ should not be done
as it affects which text will be matched.

This removes the need for the RQ_PQ_Q ReduceType, so remove the
enum entry and related switch case.

Test that these are the only two patterns affected by testing all
quantifier reduction tuples for both the captured and uncaptured
cases and making sure the matched text is the same for both.

Fixes [Bug #17341]
</pre>
</div>
</content>
</entry>
<entry>
<title>Detect the premature end of char property in regexp</title>
<updated>2020-11-24T15:01:30+00:00</updated>
<author>
<name>Jeremy Evans</name>
<email>code@jeremyevans.net</email>
</author>
<published>2020-11-23T19:03:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=b26d6c70e0f08050ca23388bb0e8442f73269c73'/>
<id>b26d6c70e0f08050ca23388bb0e8442f73269c73</id>
<content type='text'>
Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in
fetch_char_property_to_ctype and only set otherwise if an ending
} is found.

Fixes [Bug #17340]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in
fetch_char_property_to_ctype and only set otherwise if an ending
} is found.

Fixes [Bug #17340]
</pre>
</div>
</content>
</entry>
<entry>
<title>Fixed misspellings</title>
<updated>2019-12-20T00:32:42+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2019-12-20T00:19:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=db166290088fb7d39d01f68b9860253893d4f1a7'/>
<id>db166290088fb7d39d01f68b9860253893d4f1a7</id>
<content type='text'>
Fixed misspellings reported at [Bug #16437], only in ruby and rubyspec.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixed misspellings reported at [Bug #16437], only in ruby and rubyspec.
</pre>
</div>
</content>
</entry>
<entry>
<title>st_foreach now free from ANYARGS</title>
<updated>2019-08-27T06:52:26+00:00</updated>
<author>
<name>卜部昌平</name>
<email>shyouhei@ruby-lang.org</email>
</author>
<published>2019-08-26T07:06:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6dd60cf114701f1ff3526381c0e742c588af2f91'/>
<id>6dd60cf114701f1ff3526381c0e742c588af2f91</id>
<content type='text'>
After 5e86b005c0f2ef30df2f9906c7e2f3abefe286a2, I now think ANYARGS is
dangerous and should be extinct.  This commit deletes ANYARGS from
st_foreach.  I strongly believe that this commit should have had come
with b0af0592fdd9e9d4e4b863fde006d67ccefeac21, which added extra
parameter to st_foreach callbacks.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After 5e86b005c0f2ef30df2f9906c7e2f3abefe286a2, I now think ANYARGS is
dangerous and should be extinct.  This commit deletes ANYARGS from
st_foreach.  I strongly believe that this commit should have had come
with b0af0592fdd9e9d4e4b863fde006d67ccefeac21, which added extra
parameter to st_foreach callbacks.
</pre>
</div>
</content>
</entry>
</feed>
