v1.2.3'/>
summaryrefslogtreecommitdiff
path: root/re.c
AgeCommit message (Collapse)Author
2026-02-01Use ruby_sized_xfreeJean Boussier
2026-01-16MatchData: Avoid large stack allocations in MatchData (GH-15872)Andrii Furmanets
2026-01-07[DOC] Harmonize #=~ methods (#15814)Burdette Lamar
2026-01-07[DOC] Harmonize #[] methodsBurdette Lamar
2026-01-06[DOC] Harmonize #== methods (#15805)Burdette Lamar
2026-01-05[DOC] Harmonize #=== methodsBurdetteLamar
2025-12-29Move MEMO_NEW to imemo.c and rename to rb_imemo_memo_newPeter Zhu
2025-12-26Remove a no longer used prototype declaration in re.cNobuyoshi Nakada
Include internal/error.h instead.
2025-10-23use `SET_SHAREABLE`Koichi Sasada
to adopt strict shareable rule. * (basically) shareable objects only refer shareable objects * (exception) shareable objects can refere unshareable objects but should not leak reference to unshareable objects to Ruby world
2025-08-19ZJIT: Compile toregexp (#14200)Daniel Colson
`toregexp` is fairly similar to `concatstrings`, so this commit extracts a helper for pushing and popping operands on the native stack. There's probably opportunity to move some of this into lir (e.g. Alan suggested a push_many that could use STP on ARM to push 2 at a time), but I might save that for another day.
2025-08-04[DOC] Fill undocumented documentsNobuyoshi Nakada
2025-06-17* adjust indentNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/13634
2025-06-12Only use regex internal reg_cache when in main ractorLuke Gruber
Using this `reg_cache` is racy across ractors, so don't use it when in a ractor. Also, its use across ractors can cause a regular expression created in 1 ractor to be used in another ractor (an isolation bug). Notes: Merged: https://github.com/ruby/ruby/pull/13598
2025-06-10Fix regular expressions across ractors that match different encodingsLuke Gruber
In commit d42b9ffb206, an optimization was introduced that can speed up Regexp#match by 15% when it matches with strings of different encodings. This optimization, however, does not work across ractors. To fix this, we only use the optimization if no ractors have been started. In the future, we could use atomics for the reference counting if we find it's needed and if it's more performant. The backtrace of the misbehaving native thread: ``` * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5 frame #7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13 frame #8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14 ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5 ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20 frame #9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27 frame #10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial] frame #11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial] frame #12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12 frame #13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11 frame #14: 0x000000010022826c ruby`rb_str_sub ``` You can reproduce this by running: ``` RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb ``` However, you need to run it with multiple ractors at once. Co-authored-by: jhawthorn <john@hawthorn.email> Notes: Merged: https://github.com/ruby/ruby/pull/13568
2025-03-11Fix memory leak in rb_reg_search_set_matchPeter Zhu
https://github.com/ruby/ruby/pull/12801 changed regexp matches to reuse the backref, which causes memory to leak if the original registers of the match is not freed. For example, the following script leaks memory: 10.times do 1_000_000.times do "aaaaaaaaaaa".gsub(/a/, "") end puts `ps -o rss= -p #{$$}` end Before: 774256 1535152 2297360 3059280 3821296 4583552 5160304 5091456 5114256 4980192 After: 12480 11440 11696 11632 11632 11760 11824 11824 11824 11888 Notes: Merged: https://github.com/ruby/ruby/pull/12905
2025-02-24Reuse the backref if it isn't marked as busy.Jean Boussier
[Misc #20652]
2025-02-24String#gsub! Elide MatchData allocation when we know it can't escapeJean Boussier
In gsub is used with a string replacement or a map that doesn't have a default proc, we know for sure no code can cause the MatchData to escape the `gsub` call. In such case, we still have to allocate a new MatchData because we don't know what is the lifetime of the backref, but for any subsequent match we can re-use the MatchData we allocated ourselves, reducing allocations significantly. This partially fixes [Misc #20652], except when a block is used, and partially reduce the performance impact of abc0304cb28cb9dcc3476993bc487884c139fd11 / [Bug #17507] ``` compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf146399f) +PRISM [arm64-darwin24] built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24] warming up.... | |compare-ruby|built-ruby| |:----------------|-----------:|---------:| |escape | 3.577k| 3.697k| | | -| 1.03x| |escape_bin | 5.869k| 6.743k| | | -| 1.15x| |escape_utf8 | 3.448k| 3.738k| | | -| 1.08x| |escape_utf8_bin | 6.361k| 7.267k| | | -| 1.14x| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2025-01-16[DOC] Follow up link to heading changesNobuyoshi Nakada
The section "Special global variables" has changed: e021754db013ca9cd6dbd68b416425b32ee81490: Special Global Variables 2b4b513ef046c25c0a8d3d7b10a0566314b27099: Regexp Global Variables e50b7bf784b53ac126986dd7f9fd22ccc9b59c60: Regexp@Global+Variables Notes: Merged: https://github.com/ruby/ruby/pull/12587
2024-12-15Fix links to syntax/literals.rdocStan Lo
Notes: Merged: https://github.com/ruby/ruby/pull/12348
2024-08-09[DOC] Regexp.last_match returns `$~`, not `$!`Alan Wu
2024-07-26Fix memory leak in String#start_with? when regexp times outPeter Zhu
[Bug #20653] This commit refactors how Onigmo handles timeout. Instead of raising a timeout error, onig_search will return a ONIGERR_TIMEOUT which the caller can free memory, and then raise a timeout error. This fixes a memory leak in String#start_with when the regexp times out. For example: regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001) str = "a" * 1000000 + "x" 10.times do 100.times do str.start_with?(regex) rescue end puts `ps -o rss= -p #{$$}` end Before: 33216 51936 71152 81728 97152 103248 120384 133392 133520 133616 After: 14912 15376 15824 15824 16128 16128 16144 16144 16160 16160 Notes: Merged: https://github.com/ruby/ruby/pull/11247
2024-07-16Add MatchData#bytebegin and MatchData#byteendShugo Maeda
These methods return the byte-based offset of the beginning or end of the specified match. [Feature #20576]
2024-04-18Add a hint of `ASCII-8BIT` being `BINARY`Jean Boussier
[Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.
2024-02-02Fix memory leak in OnigRegion when match raisesPeter Zhu
[Bug #20228] rb_reg_onig_match can raise a Regexp::TimeoutError, which would cause the OnigRegion to leak.
2024-02-02Fix memory leak in stk_base when Regexp timeoutPeter Zhu
[Bug #20228] If rb_reg_check_timeout raises a Regexp::TimeoutError, then the stk_base will leak.
2024-01-07* expand tabs. [ci skip]git
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2024-01-08Adjust styles and indents [ci skip]Nobuyoshi Nakada
2024-01-01Don't create T_MATCH object if /regexp/.match(string) doesn't matchLuke Gruber
Fixes [Bug #20104]
2023-12-24Fix Regexp#inspect for GC compactionPeter Zhu
rb_reg_desc was not safe for GC compaction because it took in the C string and length but not the backing String object so it get moved during compaction. This commit changes rb_reg_desc to use the string from the Regexp object. The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_inspect_under_gc_compact_stress [test/ruby/test_regexp.rb:474]: <"(?-mix:\\/)|"> expected but was <"/\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00/">.
2023-12-24Fix Regexp#match for GC compactionPeter Zhu
The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_match_under_gc_compact_stress: NoMethodError: undefined method `match' for nil test_regexp.rb:878:in `block in test_match_under_gc_compact_stress'
2023-12-23Fix Regexp#to_s for GC compactionPeter Zhu
The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_to_s_under_gc_compact_stress = 13.46 s 1) Failure: TestRegexp#test_to_s_under_gc_compact_stress [test/ruby/test_regexp.rb:81]: <"(?-mix:abcd\u3042)"> expected but was <"(?-mix:\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030)">.
2023-12-19[DOC] State MatchData#[] when multiple captures with the same nameNobuyoshi Nakada