<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/regparse.c, branch v3_3_11</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Fix indents in Onigmo files to use spaces instead of tabs (#14047) [no ci]</title>
<updated>2025-11-02T05:05:12+00:00</updated>
<author>
<name>Hiroya Fujinami</name>
<email>make.just.on@gmail.com</email>
</author>
<published>2025-07-31T04:08:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e02c892ade90ee401daf26292fbb99d32af7f619'/>
<id>e02c892ade90ee401daf26292fbb99d32af7f619</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Backport 37ed86fd3c798e298fad9db6e7df1f3f45e1e03b (#10248)</title>
<updated>2024-03-14T07:53:14+00:00</updated>
<author>
<name>NARUSE, Yui</name>
<email>nurse@users.noreply.github.com</email>
</author>
<published>2024-03-14T07:53:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=577f9c7a8334bb33512f01e7db95f6fb15e280b2'/>
<id>577f9c7a8334bb33512f01e7db95f6fb15e280b2</id>
<content type='text'>
merge revision(s) 37ed86fd3c798e298fad9db6e7df1f3f45e1e03b: [Backport #--ticket=20161]

	Fix memory leak in regexp grapheme clusters

	[Bug #20161]

	The cc-&gt;mbuf gets overwritten, so we need to free it to not leak memory.

	For example:

	    str = "hello world".encode(Encoding::UTF_32LE)

	    10.times do
	      1_000.times do
	        str.grapheme_clusters
	      end

	      puts `ps -o rss= -p #{$$}`
	    end

	Before:

	    15536
	    15760
	    15920
	    16144
	    16304
	    16480
	    16640
	    16784
	    17008
	    17280

	After:

	    15584
	    15584
	    15760
	    15824
	    15888
	    15888
	    15888
	    15888
	    16048
	    16112
	---
	 regparse.c | 3 ++-
	 1 file changed, 2 insertions(+), 1 deletion(-)</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
merge revision(s) 37ed86fd3c798e298fad9db6e7df1f3f45e1e03b: [Backport #--ticket=20161]

	Fix memory leak in regexp grapheme clusters

	[Bug #20161]

	The cc-&gt;mbuf gets overwritten, so we need to free it to not leak memory.

	For example:

	    str = "hello world".encode(Encoding::UTF_32LE)

	    10.times do
	      1_000.times do
	        str.grapheme_clusters
	      end

	      puts `ps -o rss= -p #{$$}`
	    end

	Before:

	    15536
	    15760
	    15920
	    16144
	    16304
	    16480
	    16640
	    16784
	    17008
	    17280

	After:

	    15584
	    15584
	    15760
	    15824
	    15888
	    15888
	    15888
	    15888
	    16048
	    16112
	---
	 regparse.c | 3 ++-
	 1 file changed, 2 insertions(+), 1 deletion(-)</pre>
</div>
</content>
</entry>
<entry>
<title>Improve error and memory handling</title>
<updated>2023-11-08T13:05:58+00:00</updated>
<author>
<name>Adam Hess</name>
<email>HParker@github.com</email>
</author>
<published>2023-11-07T06:46:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=f694bd158c4aaffbbb9e4b2f0608c6d428a4999c'/>
<id>f694bd158c4aaffbbb9e4b2f0608c6d428a4999c</id>
<content type='text'>
Apply Nobu's suggestions which improve style, memory handling and error correction.

Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Apply Nobu's suggestions which improve style, memory handling and error correction.

Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix regex from regex memory corruption</title>
<updated>2023-11-08T13:05:58+00:00</updated>
<author>
<name>Adam Hess</name>
<email>adamhess1991@gmail.com</email>
</author>
<published>2023-11-01T07:01:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=05cde4155cd43a79733ab4996db2d8b1f64c4fb5'/>
<id>05cde4155cd43a79733ab4996db2d8b1f64c4fb5</id>
<content type='text'>
before this change, creating a regex from a regex with a named capture, Regexp.new(/(?&lt;name&gt;)/), causes memory to be shared between the two named capture groups which can cause a segfault if the original is GCed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
before this change, creating a regex from a regex with a named capture, Regexp.new(/(?&lt;name&gt;)/), causes memory to be shared between the two named capture groups which can cause a segfault if the original is GCed.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix onigmo name table without st</title>
<updated>2023-11-03T01:41:48+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2023-11-02T14:35:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5cff4c5aa375787924e2df5c0b981dd922b95a8c'/>
<id>5cff4c5aa375787924e2df5c0b981dd922b95a8c</id>
<content type='text'>
Co-authored-by: Adam Hess &lt;HParker@github.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-authored-by: Adam Hess &lt;HParker@github.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix functions for name tables as `st_foreach_callback_func`</title>
<updated>2023-11-02T06:00:39+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2023-11-02T03:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=4218e913d8d1d1e4c2fb123348fd98721e2b0ba8'/>
<id>4218e913d8d1d1e4c2fb123348fd98721e2b0ba8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Don't check for null pointer in calls to free</title>
<updated>2023-06-30T13:13:31+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-06-29T20:31:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=58386814a7c7275f66ffa111175fca2fe307a1b5'/>
<id>58386814a7c7275f66ffa111175fca2fe307a1b5</id>
<content type='text'>
According to the C99 specification section 7.20.3.2 paragraph 2:

&gt; If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
According to the C99 specification section 7.20.3.2 paragraph 2:

&gt; If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
</pre>
</div>
</content>
</entry>
<entry>
<title>Prevent potential buffer overrun in onigmo</title>
<updated>2022-10-25T08:02:43+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2022-10-25T06:45:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1d2d25dcadda0764f303183ac091d0c87b432566'/>
<id>1d2d25dcadda0764f303183ac091d0c87b432566</id>
<content type='text'>
A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun
if incomplete bytes of a UTF-8 character is placed at the end of a
string. Because this pattern is used in several places in onigmo,
this change fixes the issue in the side of `enclen`: the function should
not return a number that is larger than `pend - p`.

Co-Authored-By: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun
if incomplete bytes of a UTF-8 character is placed at the end of a
string. Because this pattern is used in several places in onigmo,
this change fixes the issue in the side of `enclen`: the function should
not return a number that is larger than `pend - p`.

Co-Authored-By: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Prevent buffer overrun in regparse.c</title>
<updated>2022-10-25T04:20:25+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2022-10-25T04:20:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=902e459b733a92c3ccdd8762427f71dded997e7c'/>
<id>902e459b733a92c3ccdd8762427f71dded997e7c</id>
<content type='text'>
A regexp that ends with an escape following an incomplete UTF-8 char
might cause buffer overrun. Found by OSS-Fuzz.

```
$ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0          \\\xE6".b)'
==296213== Memcheck, a memory error detector
==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b)
==296213==
==296213== Warning: client switching stacks?  SP change: 0x1ffe8020e0 --&gt; 0x1ffeffff10
==296213==          to suppress, use: --max-stackframe=8379952 or greater
==296213== Invalid read of size 1
==296213==    at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x339568: memcpy (string_fortified.h:29)
==296213==    by 0x339568: onig_strcpy (regparse.c:271)
==296213==    by 0x339568: onig_node_str_cat (regparse.c:1413)
==296213==    by 0x33CBA0: parse_exp (regparse.c:6198)
==296213==    by 0x33EDE4: parse_branch (regparse.c:6511)
==296213==    by 0x33EEA2: parse_subexp (regparse.c:6544)
==296213==    by 0x34019C: parse_regexp (regparse.c:6593)
==296213==    by 0x34019C: onig_parse_make_tree (regparse.c:6638)
==296213==    by 0x32782D: onig_compile_ruby (regcomp.c:5779)
==296213==    by 0x313EFA: onig_new_with_source (re.c:876)
==296213==    by 0x313EFA: make_regexp (re.c:900)
==296213==    by 0x313EFA: rb_reg_initialize (re.c:3136)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==  Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd
==296213==    at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x20FA7B: objspace_xmalloc0 (gc.c:12146)
==296213==    by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132)
==296213==    by 0x31359D: unescape_escaped_nonascii (re.c:2690)
==296213==    by 0x313A9D: unescape_nonascii (re.c:2869)
==296213==    by 0x313A9D: rb_reg_preprocess (re.c:2992)
==296213==    by 0x313DFC: rb_reg_initialize (re.c:3109)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==    by 0x3E957B: rb_call (vm_eval.c:877)
==296213==    by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074)
==296213==    by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991)
==296213==
==296213==
==296213== HEAP SUMMARY:
==296213==     in use at exit: 35,476,538 bytes in 9,489 blocks
==296213==   total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated
==296213==
==296213== LEAK SUMMARY:
==296213==    definitely lost: 316,081 bytes in 2,989 blocks
==296213==    indirectly lost: 136,808 bytes in 2,361 blocks
==296213==      possibly lost: 1,048,624 bytes in 3 blocks
==296213==    still reachable: 33,975,025 bytes in 4,136 blocks
==296213==         suppressed: 0 bytes in 0 blocks
==296213== Rerun with --leak-check=full to see details of leaked memory
==296213==
==296213== For lists of detected and suppressed errors, rerun with: -s
==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A regexp that ends with an escape following an incomplete UTF-8 char
might cause buffer overrun. Found by OSS-Fuzz.

```
$ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0          \\\xE6".b)'
==296213== Memcheck, a memory error detector
==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b)
==296213==
==296213== Warning: client switching stacks?  SP change: 0x1ffe8020e0 --&gt; 0x1ffeffff10
==296213==          to suppress, use: --max-stackframe=8379952 or greater
==296213== Invalid read of size 1
==296213==    at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x339568: memcpy (string_fortified.h:29)
==296213==    by 0x339568: onig_strcpy (regparse.c:271)
==296213==    by 0x339568: onig_node_str_cat (regparse.c:1413)
==296213==    by 0x33CBA0: parse_exp (regparse.c:6198)
==296213==    by 0x33EDE4: parse_branch (regparse.c:6511)
==296213==    by 0x33EEA2: parse_subexp (regparse.c:6544)
==296213==    by 0x34019C: parse_regexp (regparse.c:6593)
==296213==    by 0x34019C: onig_parse_make_tree (regparse.c:6638)
==296213==    by 0x32782D: onig_compile_ruby (regcomp.c:5779)
==296213==    by 0x313EFA: onig_new_with_source (re.c:876)
==296213==    by 0x313EFA: make_regexp (re.c:900)
==296213==    by 0x313EFA: rb_reg_initialize (re.c:3136)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==  Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd
==296213==    at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213==    by 0x20FA7B: objspace_xmalloc0 (gc.c:12146)
==296213==    by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132)
==296213==    by 0x31359D: unescape_escaped_nonascii (re.c:2690)
==296213==    by 0x313A9D: unescape_nonascii (re.c:2869)
==296213==    by 0x313A9D: rb_reg_preprocess (re.c:2992)
==296213==    by 0x313DFC: rb_reg_initialize (re.c:3109)
==296213==    by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213==    by 0x318555: rb_reg_init_str (re.c:3205)
==296213==    by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213==    by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213==    by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213==    by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213==    by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213==    by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213==    by 0x3E957B: rb_call (vm_eval.c:877)
==296213==    by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074)
==296213==    by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991)
==296213==
==296213==
==296213== HEAP SUMMARY:
==296213==     in use at exit: 35,476,538 bytes in 9,489 blocks
==296213==   total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated
==296213==
==296213== LEAK SUMMARY:
==296213==    definitely lost: 316,081 bytes in 2,989 blocks
==296213==    indirectly lost: 136,808 bytes in 2,361 blocks
==296213==      possibly lost: 1,048,624 bytes in 3 blocks
==296213==    still reachable: 33,975,025 bytes in 4,136 blocks
==296213==         suppressed: 0 bytes in 0 blocks
==296213== Rerun with --leak-check=full to see details of leaked memory
==296213==
==296213== For lists of detected and suppressed errors, rerun with: -s
==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```</pre>
</div>
</content>
</entry>
<entry>
<title>Fix some UBSAN false positives (#6115)</title>
<updated>2022-07-12T18:48:10+00:00</updated>
<author>
<name>Kevin Backhouse</name>
<email>kevinbackhouse@github.com</email>
</author>
<published>2022-07-12T18:48:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8c1808151f4c1b44e8b0fe935c571f05b2641b8b'/>
<id>8c1808151f4c1b44e8b0fe935c571f05b2641b8b</id>
<content type='text'>
* Fix some UBSAN false positives.
* ruby tool/update-deps --fix</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Fix some UBSAN false positives.
* ruby tool/update-deps --fix</pre>
</div>
</content>
</entry>
</feed>
