summaryrefslogtreecommitdiff
path: root/re.c
AgeCommit message (Collapse)Author
2022-08-18[DOC] `offset` argument of Regexp#matchNobuyoshi Nakada
2022-08-02Speed up setting the backref match objectAaron Patterson
This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example: ```ruby "hello world" =~ /hello/ p $~ ``` When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object[^1] and use `rb_reg_region_copy`[^2] to make a deep copy of the stack allocated `re_registers` struct[^3] in to the newly created Ruby object. This could possibly trigger GC[^4], and would allocate new memory. This patch makes a shallow copy of the `re_registers` struct on to the Match object allowing the match object to manage the `re_registers` pointer and also avoiding some calls to `xmalloc` and some manual memcopy. Benchmark looks like this: ```ruby require "benchmark/ips" def test_re thing thing =~ /hello/ end Benchmark.ips do |x| x.report("re hit") do test_re "hello world" end x.report("re miss") do test_re "world" end end ``` Before this patch: ``` $ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s ``` After this patch: ``` $ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s ``` Matches get faster and misses maintain the same speed [^1]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737 [^2]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738 [^3]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686 [^4]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981 Notes: Merged: https://github.com/ruby/ruby/pull/6206
2022-07-21Expand tabs [ci skip]Takashi Kokubun
[Misc #18891] Notes: Merged: https://github.com/ruby/ruby/pull/6094
2022-06-26[DOC] Fix a typo [ci skip]Kazuhiro NISHIYAMA
2022-06-20Document that Regexp#source does not retain lexer escapesJeremy Evans
Related to [Feature #18838] Notes: Merged: https://github.com/ruby/ruby/pull/6047
2022-06-20[Feature #18788] [DOC] String options to `Regexp.new`Nobuyoshi Nakada
Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org> Notes: Merged: https://github.com/ruby/ruby/pull/6039
2022-06-20[Feature #18788] Support options as `String` to `Regexp.new`Nobuyoshi Nakada
`Regexp.new` now supports passing the regexp flags not only as an `Integer`, but also as a `String. Unknown flags raise errors. Notes: Merged: https://github.com/ruby/ruby/pull/6039
2022-06-20Warn suspicious flag to `Regexp.new`Nobuyoshi Nakada
Now second argument should be `true`, `false`, `nil` or Integer. This flag is confused with third argument some times. Notes: Merged: https://github.com/ruby/ruby/pull/6039
2022-06-20[DOC] Refine Regexp.new argument descriptionsNobuyoshi Nakada
2022-06-20[DOC] Regexp timeout is float or nilNobuyoshi Nakada
2022-06-20[DOC] Fixed omissions in Regexp.new argumentsNobuyoshi Nakada
2022-06-06Ignore invalid escapes in regexp commentsJeremy Evans
Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. Handle nested character classes as well. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294] Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/5721 Merged-By: jeremyevans <code@jeremyevans.net>
2022-04-18[DOC] Enhanced RDoc for MatchData (#5822)Burdette Lamar
Treats: #to_s #named_captures #string #inspect #hash #== Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-18Enhanced RDoc for MatchData (#5821)Burdette Lamar
Treats: #[] #values_at Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-18Enhanced RDoc for MatchData (#5820)Burdette Lamar
Treats: #pre_match #post_match #to_a #captures Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-18[DOC] Enhanced RDoc for MatchData (#5819)Burdette Lamar
Treats: #begin #end #match #match_length Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-18[DOC] Enhanced RDoc for MatchData (#5818)Burdette Lamar
Treats: #regexp #names #size #offset Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-18[DOC] Enhanced RDoc for Regexp (#5815)Burdette Lamar
Treats: ::new ::escape ::try_convert ::union ::last_match Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-16[DOC] Enhanced RDoc for Regexp (#5812)Burdette Lamar
Treats: #fixed_encoding? #hash #== #=~ #match #match? Also, in regexp.rdoc: Changes heading from 'Special Global Variables' to 'Regexp Global Variables'. Add tiny section 'Regexp Interpolation'. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-15[DOC] Enhanced RDoc for Regexp (#5807)Burdette Lamar
Treats: #source #inspect #to_s #casefold? #options #names #named_captures Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-31Return only captured range in `MatchData` [Bug #18670]Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/5740 Merged-By: nobu <nobu@ruby-lang.org>
2022-03-31re.c: stop a wrong warning of "flags ignored" on Regexp.new(//)Yusuke Endoh
[Bug #18669]
2022-03-30internal/ractor.h: AddedYusuke Endoh
Currently it has only one function prototype. Notes: Merged: https://github.com/ruby/ruby/pull/5703
2022-03-30re.c: raise Regexp::TimeoutError instead of RuntimeErrorYusuke Endoh
Notes: Merged: https://github.com/ruby/ruby/pull/5703
2022-03-30re.c: Add `timeout` keyword for Regexp.new and Regexp#timeoutYusuke Endoh
Notes: Merged: https://github.com/ruby/ruby/pull/5703
2022-03-30re.c: Add Regexp.timeout= and Regexp.timeoutYusuke Endoh
[Feature #17837] Notes: Merged: https://github.com/ruby/ruby/pull/5703
2022-02-19Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518)Shugo Maeda
* Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110] Co-authored-by: NARUSE, Yui <naruse@airemix.jp> Notes: Merged-By: shugo <shugo@ruby-lang.org>
2022-02-18LONG2NUM() should be used for rmatch_offset::{beg,end}Shugo Maeda
https://github.com/ruby/ruby/pull/5518#discussion_r809645406
2022-02-08[DOC] Fix broken links to literals.rdocNobuyoshi Nakada
2022-01-17Replace to RBOOL macroS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/5449
2021-12-03Adding links to literals and Kernel (#5192)Burdette Lamar
* Adding links to literals and Kernel Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2021-10-03Using NIL_P macro instead of `== Qnil`S.H
Notes: Merged: https://github.com/ruby/ruby/pull/4925 Merged-By: nobu <nobu@ruby-lang.org>
2021-10-01Avoid race condition in Regexp#matchJeremy Evans
In certain conditions, Regexp#match could return a MatchData with missing captures. This seems to require at the least, multiple threads calling a method that calls the same block/proc/lambda which calls Regexp#match. The race condition happens because the MatchData is passed from indirectly via the backref, and other threads can modify the backref. Fix the issue by: 1. Not reusing the existing MatchData from the backref, and always allocating a new MatchData. 2. Passing the MatchData directly to the caller using a VALUE*, instead of indirectly through the backref. It's likely that variants of this issue exist for other Regexp methods. Anywhere that MatchData is passed implicitly through the backref is probably vulnerable to this issue. Fixes [Bug #17507] Notes: Merged: https://github.com/ruby/ruby/pull/4734
2021-09-16[Feature #18172] Add MatchData#match_lengthNobuyoshi Nakada
The method to return the length of the matched substring corresponding to the given argument. Notes: Merged: https://github.com/ruby/ruby/pull/4851
2021-09-16[Feature #18172] Add MatchData#matchNobuyoshi Nakada
The method to return the single matched substring corresponding to the given argument. Notes: Merged: https://github.com/ruby/ruby/pull/4851
2021-09-15Refactor and Using RBOOL macroS.H
Notes: Merged: https://github.com/ruby/ruby/pull/4837 Merged-By: nobu <nobu@ruby-lang.org>
2021-09-12Extract backref_number_checkNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/4822
2021-09-12Preserve the encoding of the argument in IndexError [Bug #18160]Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/4822
2021-09-01Show default argument explicitly for Rexexp#match? [ci skip]Martin Dürst
2021-09-01Fix minor grammar issue in documentation of Regexp#match? [ci skip]Martin Dürst
2021-08-02Using RBOOL macroS.H
Notes: Merged: https://github.com/ruby/ruby/pull/4695 Merged-By: nobu <nobu@ruby-lang.org>
2021-06-03Warn more duplicate literal hash keysNobuyoshi Nakada
Following non-special_const literals: * T_REGEXP Notes: Merged: https://github.com/ruby/ruby/pull/4548
2021-06-01Add static modifier to C function in re.c (#3153)S.H
* add static modifier for rb_reg_eqq func * add static modifier for rb_check_regexp_type func Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2021-02-07[DOC] {Array,MatchData}#values_at understand ranges [ci skip]Nobuyoshi Nakada
2021-01-05[DOC] Fix grammar: "is same as" -> "is the same as"Marcus Stollsteimer
2020-12-18Use category: :deprecated in warnings that are related to deprecationJeremy Evans
Also document that both :deprecated and :experimental are supported :category option values. The locations where warnings were marked as deprecation warnings was previously reviewed by shyouhei. Comment a couple locations where deprecation warnings should probably be used but are not currently used because deprecation warning enablement has not occurred at the time they are called (RUBY_FREE_MIN, RUBY_HEAP_MIN_SLOTS, -K). Add assert_deprecated_warn to test assertions. Use this to simplify some tests, and fix failing tests after marking some warnings with deprecated category. Notes: Merged: https://github.com/ruby/ruby/pull/3917
2020-11-28[Feature #17136] Remove special behavior from $KCODENobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/3483
2020-10-27freeze dynamic regexp literalsKoichi Sasada
Regexp literals are frozen, and also dynamically comppiled Regexp literals (/#{expr}/) are frozen. Notes: Merged: https://github.com/ruby/ruby/pull/3676
2020-10-20Some global variables can be accessed from ractorsKoichi Sasada
Some global variables should be used from non-main Ractors. [Bug #17268] ```ruby # ractor-local (derived from created ractor): debug '$DEBUG' => $DEBUG, '$-d' => $-d, # ractor-local (derived from created ractor): verbose '$VERBOSE' => $VERBOSE, '$-w' => $-w, '$-W' => $-W, '$-v' => $-v, # process-local (readonly): other commandline parameters '$-p' => $-p, '$-l' => $-l, '$-a' => $-a, # process-local (readonly): getpid '$$' => $$, # thread local: process result '$?' => $?, # scope local: match '$~' => $~.inspect, '$&' => $&, '$`' => $`, '$\'' => $', '$+' => $+, '$1' => $1, # scope local: last line '$_' => $_, # scope local: last backtrace '$@' => $@, '$!' => $!, # ractor local: stdin, out, err '$stdin' => $stdin.inspect, '$stdout' => $stdout.inspect, '$stderr' => $stderr.inspect, ``` Notes: Merged: https://github.com/ruby/ruby/pull/3670
2020-08-28Try to fix compile error on windowsKazuhiro NISHIYAMA
https://github.com/ruby/ruby/runs/1041040167?check_suite_focus=true#step:11:177 ``` compiling ../src/re.c re.c ../src/re.c(317): error C2057: expected constant expression ../src/re.c(317): error C2466: cannot allocate an array of constant size 0 ../src/re.c(467): error C2057: expected constant expression ../src/re.c(467): error C2466: cannot allocate an array of constant size 0 ../src/re.c(467): error C2133: 'opts': unknown size ../src/re.c(559): error C2057: expected constant expression ../src/re.c(559): error C2466: cannot allocate an array of constant size 0 ../src/re.c(559): error C2133: 'optbuf': unknown size ../src/re.c(673): error C2057: expected constant expression ../src/re.c(673): error C2466: cannot allocate an array of constant size 0 ../src/re.c(673): error C2133: 'opts': unknown size NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.EXE"' : return code '0x2' Stop. ```