summaryrefslogtreecommitdiff
path: root/re.c
AgeCommit message (Collapse)Author
2023-07-31Reuse Regexp ptr when recompilingPeter Zhu
When matching an incompatible encoding, the Regexp needs to recompile. If `usecnt == 0`, then we can reuse the `ptr` because nothing else is using it. This avoids allocating another `regex_t`. This speeds up matches that switch to incompatible encodings by 15%. Branch: ``` Regex#match? with different encoding 1.431M (± 1.3%) i/s - 7.264M in 5.076153s Regex#match? with same encoding 16.858M (± 1.1%) i/s - 85.347M in 5.063279s ``` Base: ``` Regex#match? with different encoding 1.248M (± 2.0%) i/s - 6.342M in 5.083151s Regex#match? with same encoding 16.377M (± 1.1%) i/s - 82.519M in 5.039504s ``` Script: ``` regex = /foo/ str1 = "日本語" str2 = "English".force_encoding("ASCII-8BIT") Benchmark.ips do |x| x.report("Regex#match? with different encoding") do |times| i = 0 while i < times regex.match?(str1) regex.match?(str2) i += 1 end end x.report("Regex#match? with same encoding") do |times| i = 0 while i < times regex.match?(str1) i += 1 end end end ```
2023-07-27Resurrect rb_reg_prepare_re C APITakashi Kokubun
Existing strscan releases rely on this C API. It means that the current Ruby master doesn't work if your Gemfile.lock has strscan unless it's locked to 3.0.7, which is not released yet. To fix it, let's not remove the C API we've exposed to users.
2023-07-27Don't load RREGEXP_PTR twicePeter Zhu
2023-07-27Refactor err string in rb_reg_prepare_rePeter Zhu
2023-07-27Add function rb_reg_onig_matchPeter Zhu
rb_reg_onig_match performs preparation, error handling, and cleanup for matching a regex against a string. This reduces repetitive code and removes the need for StringScanner to access internal data of regex. Notes: Merged: https://github.com/ruby/ruby/pull/8123
2023-07-20Embed struct rmatch into GC slot (#8097)Kunshan Wang
2023-06-27Stop allocating unused backref strings at `defined?`Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/7983
2023-06-27Use `rb_reg_nth_defined` instead of `rb_match_nth_defined`Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/7983
2023-06-20[DOC] Regexp doc (#7923)Burdette Lamar
Notes: Merged-By: peterzhu2118 <peter@peterzhu.ca>
2023-06-09* expand tabs. [ci skip]git
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2023-06-09Optimize `Regexp#dup` and `Regexp.new(/RE/)`Nobuyoshi Nakada
When copying from another regexp, copy already built `regex_t` instead of re-compiling its source. Notes: Merged: https://github.com/ruby/ruby/pull/7922
2023-04-23Use UTF-8 encoding for literal extended regexps with UTF-8 characters in ↵Jeremy Evans
comments Fixes [Bug #19455] Notes: Merged: https://github.com/ruby/ruby/pull/7592
2023-04-19MatchData#named_captures: add optional symbolize_names keyword (#6952)Vladimir Dementyev
Notes: Merged-By: ioquatix <samuel@codeotaku.com>
2023-04-06[Feature #19474] Refactor NEWOBJ macrosMatt Valentine-House
NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec Notes: Merged: https://github.com/ruby/ruby/pull/7393
2023-03-06Stop exporting symbols for MJITTakashi Kokubun
Notes: Merged: https://github.com/ruby/ruby/pull/7459
2023-03-06[DOC] Fix options of `Regexp#initialize`Nobuyoshi Nakada
`Integer#|` is bit-wise OR operator, not logical OR. Notes: Merged: https://github.com/ruby/ruby/pull/7435
2023-03-06`rb_scan_args` never fills optional arguments with `Qundef`Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/7435
2023-03-03[Bug #19471] `Regexp.compile` should handle keyword argumentsNobuyoshi Nakada
As well as `Regexp.new`, it should pass keyword arguments to the `Regexp#initialize` method. Notes: Merged: https://github.com/ruby/ruby/pull/7431
2023-03-01Remove support for the Regexp.new 3rd argumentJeremy Evans
This was deprecated in Ruby 3.2. Fixes [Bug #18797] Notes: Merged: https://github.com/ruby/ruby/pull/7039
2023-02-26Adjust `else` style to be consistent in each files [ci skip]Nobuyoshi Nakada
2023-02-19Remove (newly unneeded) remarks about aliasesBurdetteLamar
2023-02-10Implement Write Barrier for RMatch objectsJean Boussier
They only have two references. Notes: Merged: https://github.com/ruby/ruby/pull/7286
2023-02-10[DOC] Fix typo in document of regexp [ci skip]OKURA Masafumi
Notes: Merged: https://github.com/ruby/ruby/pull/7283 Merged-By: nobu <nobu@ruby-lang.org>
2023-02-09Remove `REG_LITERAL` flagNobuyoshi Nakada
All `Regexp` literals are frozen now. Notes: Merged: https://github.com/ruby/ruby/pull/7276
2023-01-30Fix parsing of regexps that toggle extended mode on/off inside regexpJeremy Evans
This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit didn't handle cases where extended mode was turned on/off inside the regexp. There are two ways to turn extended mode on/off: ``` /(?-x:#y)#z /x =~ '#y' /(?-x)#y(?x)#z /x =~ '#y' ``` These can be nested inside the same regexp: ``` /(?-x:(?x)#x (?-x)#y)#z /x =~ '#y' ``` As you can probably imagine, this makes handling these regexps somewhat complex. Due to the nesting inside portions of regexps, the unassign_nonascii function needs to be recursive. In recursive mode, it needs to track both opening and closing parentheses, similar to how it already tracked opening and closing brackets for character classes. When scanning the regexp and coming to `(?` not followed by `#`, scan for options, and use `x` and `i` to determine whether to turn on or off extended mode. For `:`, indicting only the current regexp section should have the extended mode switched, recurse with the extended mode set or unset. For `)`, indicating the remainder of the regexp (or current regexp portion if already recursing) should turn extended mode on or off, just change the extended mode flag and keep scanning. While testing this, I noticed that `a`, `d`, and `u` are accepted as options, in addition to `i`, `m`, and `x`, but I can't see where those options are documented. I'm not sure whether or not handling `a`, `d`, and `u` as options is a bug. Fixes [Bug #19379] Notes: Merged: https://github.com/ruby/ruby/pull/7192
2023-01-16[DOC] Correction to RDoc for Regexp.new (#7130)Burdette Lamar
Correction to RDoc for Regexp.new Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-12-22Always issue deprecation warning when calling Regexp.new with 3rd positional ↵Jeremy Evans
argument Previously, only certain values of the 3rd argument triggered a deprecation warning. First step for fix for bug #18797. Support for the 3rd argument will be removed after the release of Ruby 3.2. Fix minor fallout discovered by the tests. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/6976
2022-12-22Refactor `reg_extract_args` to return regexp if givenNobuyoshi Nakada
2022-12-22Share argument parsing in `Regexp#initialize` and `Regexp.linear_time?`Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/6988
2022-12-19typo in doc [ci skip]卜部昌平
2022-12-19Note about Regexp.linera_time? [ci skip]卜部昌平
2022-12-14Add `Regexp.linear_time?` (#6901)TSUYUSATO Kitsune
Notes: Merged-By: makenowjust <make.just.on@gmail.com>
2022-12-02Introduce encoding check macroS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6700
2022-12-01Prevent segfault in String#scan with ObjectSpace.each_objectYusuke Endoh
Calling `String#scan` without a block creates an incomplete MatchData object whose `RMATCH(match)->str` is Qfalse. Usually this object is not leaked, but it was possible to pull it by using ObjectSpace.each_object. This change hides the internal MatchData object by using rb_obj_hide. Fixes [Bug #19159] Notes: Merged: https://github.com/ruby/ruby/pull/6836
2022-11-16Using UNDEF_P macroS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6721
2022-11-08Suppress false warning by a bug of gccNobuyoshi Nakada
GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before `match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`. [Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 Notes: Merged: https://github.com/ruby/ruby/pull/6690
2022-10-24Refactor timeout-setting code to a functionYusuke Endoh
2022-10-24Refactor timeout-related code in re.c a littleYusuke Endoh
2022-10-24Fix per-instance Regexp timeout (#6621)Yusuke Endoh
Fix per-instance Regexp timeout This makes it follow what was decided in [Bug #19055]: * `Regexp.new(str, timeout: nil)` should respect the global timeout * `Regexp.new(str, timeout: huge_val)` should use the maximum value that can be represented in the internal representation * `Regexp.new(str, timeout: 0 or negative value)` should raise an error Notes: Merged-By: mame <mame@ruby-lang.org>
2022-10-23Fix argument & Remove enumS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6616
2022-10-23Introduce rb_memsearch_with_char_size functionS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6616
2022-10-10* expand tabs. [ci skip]git
Tabs were expanded because the file did not have any tab indentation in unedited lines. Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
2022-10-10Should use dedecated function `Check_Type`Nobuyoshi Nakada
2022-10-10Add MatchData#deconstruct/deconstruct_keysVladimir Dementyev
Notes: Merged: https://github.com/ruby/ruby/pull/6216
2022-08-18[DOC] `offset` argument of Regexp#matchNobuyoshi Nakada
2022-08-02Speed up setting the backref match objectAaron Patterson
This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example: ```ruby "hello world" =~ /hello/ p $~ ``` When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object[^1] and use `rb_reg_region_copy`[^2] to make a deep copy of the stack allocated `re_registers` struct[^3] in to the newly created Ruby object. This could possibly trigger GC[^4], and would allocate new memory. This patch makes a shallow copy of the `re_registers` struct on to the Match object allowing the match object to manage the `re_registers` pointer and also avoiding some calls to `xmalloc` and some manual memcopy. Benchmark looks like this: ```ruby require "benchmark/ips" def test_re thing thing =~ /hello/ end Benchmark.ips do |x| x.report("re hit") do test_re "hello world" end x.report("re miss") do test_re "world" end end ``` Before this patch: ``` $ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s ``` After this patch: ``` $ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s ``` Matches get faster and misses maintain the same speed [^1]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737 [^2]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738 [^3]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686 [^4]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981 Notes: Merged: https://github.com/ruby/ruby/pull/6206
2022-07-21Expand tabs [ci skip]Takashi Kokubun
[Misc #18891] Notes: Merged: https://github.com/ruby/ruby/pull/6094
2022-06-26[DOC] Fix a typo [ci skip]Kazuhiro NISHIYAMA
2022-06-20Document that Regexp#source does not retain lexer escapesJeremy Evans
Related to [Feature #18838] Notes: Merged: https://github.com/ruby/ruby/pull/6047
2022-06-20[Feature #18788] [DOC] String options to `Regexp.new`Nobuyoshi Nakada
Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org> Notes: Merged: https://github.com/ruby/ruby/pull/6039