summaryrefslogtreecommitdiff
path: root/ext/strscan/strscan.c
AgeCommit message (Collapse)Author
2024-12-16[ruby/strscan] [DOC] Add syntax highlighting to MarkDown code blocksAlexander Momchilov
(https://github.com/ruby/strscan/pull/126) Split off from https://github.com/ruby/ruby/pull/12322 https://github.com/ruby/strscan/commit/9bee37e0f5
2024-12-16[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/fd140b8582
2024-12-12Lock released version of strscan-3.1.1Hiroshi SHIBATA
2024-12-02[ruby/strscan] Micro optimize encoding checksJean Boussier
(https://github.com/ruby/strscan/pull/117) Profiling shows a lot of time spent in various encoding check functions. I'm working on optimizing them on the Ruby side, but if we assume most strings are one of the simple 3 encodings, we can skip a lot of overhead. ```ruby require 'strscan' require 'benchmark/ips' source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze def scan_to_i(source) scanner = StringScanner.new(source) while number = scanner.scan(/\d+/) number.to_i scanner.skip(",") end end def scan_integer(source) scanner = StringScanner.new(source) while scanner.scan_integer scanner.skip(",") end end Benchmark.ips do |x| x.report("scan.to_i") { scan_to_i(source) } x.report("scan_integer") { scan_integer(source) } x.compare! end ``` Before: ``` ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- scan.to_i 93.000 i/100ms scan_integer 232.000 i/100ms Calculating ------------------------------------- scan.to_i 933.191 (± 0.2%) i/s (1.07 ms/i) - 4.743k in 5.082597s scan_integer 2.326k (± 0.8%) i/s (429.99 μs/i) - 11.832k in 5.087974s Comparison: scan_integer: 2325.6 i/s scan.to_i: 933.2 i/s - 2.49x slower ``` After: ``` ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- scan.to_i 96.000 i/100ms scan_integer 274.000 i/100ms Calculating ------------------------------------- scan.to_i 969.489 (± 0.2%) i/s (1.03 ms/i) - 4.896k in 5.050114s scan_integer 2.756k (± 0.1%) i/s (362.88 μs/i) - 13.974k in 5.070837s Comparison: scan_integer: 2755.8 i/s scan.to_i: 969.5 i/s - 2.84x slower ``` https://github.com/ruby/strscan/commit/c02b1ce684
2024-12-02StringScanner#scan_integer support base 16 integers (#116)Jean Boussier
Followup: https://github.com/ruby/strscan/pull/115 `scan_integer` is now implemented in Ruby as to efficiently handle keyword arguments without allocating a Hash. Given the goal of `scan_integer` is to more effciently parse integers without having to allocate an intermediary object, using `rb_scan_args` would defeat the purpose. Additionally, the C implementation now uses `rb_isdigit` and `rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-11-27[ruby/strscan] Implement #scan_integer to efficiently parse IntegerJean Boussier
(https://github.com/ruby/strscan/pull/115) Fix: https://github.com/ruby/strscan/issues/113 This allows to directly parse an Integer from a String without needing to first allocate a sub string. Notes: The implementation is limited by design, it's meant as a first step, only the most straightforward, based 10 integers are supported. https://github.com/ruby/strscan/commit/6a3c74b4c8
2024-10-26[ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: RemoveNAITOH Jun
unnecessary use of `rb_enc_get()` (https://github.com/ruby/strscan/pull/108) - before: #106 ## Why? In `rb_strseq_index()`, the result of `rb_enc_check()` is used. - https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368 > enc = rb_enc_check(str, sub); > return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len, offset, enc); - https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318 ```C strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len, const char *sub_ptr, long sub_len, long offset, rb_encoding *enc) { const char *search_start = str_ptr; long pos, search_len = str_len - offset; for (;;) { const char *t; pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc); ``` ## Benchmark It shows String as a pattern is 1.24x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 9.225M i/s - 9.328M times in 1.011068s (108.40ns/i) regexp_var 9.327M i/s - 9.413M times in 1.009214s (107.21ns/i) string 9.200M i/s - 9.355M times in 1.016840s (108.70ns/i) string_var 11.249M i/s - 11.255M times in 1.000578s (88.90ns/i) Calculating ------------------------------------- regexp 9.565M i/s - 27.676M times in 2.893476s (104.55ns/i) regexp_var 10.111M i/s - 27.982M times in 2.767496s (98.90ns/i) string 10.060M i/s - 27.600M times in 2.743465s (99.40ns/i) string_var 12.519M i/s - 33.746M times in 2.695615s (79.88ns/i) Comparison: string_var: 12518707.2 i/s regexp_var: 10111089.6 i/s - 1.24x slower string: 10060144.4 i/s - 1.24x slower regexp: 9565124.4 i/s - 1.31x slower ``` https://github.com/ruby/strscan/commit/ff2d7afa19
2024-10-26[ruby/strscan] Use C90 as far as supporting 2.6 or earlierNobuyoshi Nakada
(https://github.com/ruby/strscan/pull/101) https://github.com/ruby/strscan/commit/d31274f41b
2024-09-17[ruby/strscan] Accept String as a pattern at non headNAITOH Jun
(https://github.com/ruby/strscan/pull/106) It supports non-head match cases such as StringScanner#scan_until. If we use a String as a pattern, we can improve match performance. Here is a result of the including benchmark. ## CRuby It shows String as a pattern is 1.18x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 9.403M i/s - 9.548M times in 1.015459s (106.35ns/i) regexp_var 9.162M i/s - 9.248M times in 1.009479s (109.15ns/i) string 8.966M i/s - 9.274M times in 1.034343s (111.54ns/i) string_var 11.051M i/s - 11.190M times in 1.012538s (90.49ns/i) Calculating ------------------------------------- regexp 10.319M i/s - 28.209M times in 2.733707s (96.91ns/i) regexp_var 10.032M i/s - 27.485M times in 2.739807s (99.68ns/i) string 9.681M i/s - 26.897M times in 2.778397s (103.30ns/i) string_var 12.162M i/s - 33.154M times in 2.726046s (82.22ns/i) Comparison: string_var: 12161920.6 i/s regexp: 10318949.7 i/s - 1.18x slower regexp_var: 10031617.6 i/s - 1.21x slower string: 9680843.7 i/s - 1.26x slower ``` ## JRuby It shows String as a pattern is 2.11x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 7.591M i/s - 7.544M times in 0.993780s (131.74ns/i) regexp_var 6.143M i/s - 6.125M times in 0.997038s (162.77ns/i) string 14.135M i/s - 14.079M times in 0.996067s (70.75ns/i) string_var 14.079M i/s - 14.057M times in 0.998420s (71.03ns/i) Calculating ------------------------------------- regexp 9.409M i/s - 22.773M times in 2.420268s (106.28ns/i) regexp_var 10.116M i/s - 18.430M times in 1.821820s (98.85ns/i) string 21.389M i/s - 42.404M times in 1.982519s (46.75ns/i) string_var 20.897M i/s - 42.237M times in 2.021187s (47.85ns/i) Comparison: string: 21389191.1 i/s string_var: 20897327.5 i/s - 1.02x slower regexp_var: 10116464.7 i/s - 2.11x slower regexp: 9409222.3 i/s - 2.27x slower ``` See: https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736 --------- https://github.com/ruby/strscan/commit/f9d96c446a Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-08-31Added pre-release suffix for development version of default gemsHiroshi SHIBATA
https://github.com/ruby/stringio/issues/81
2024-06-04Sync strscan HEAD again.Hiroshi SHIBATA
https://github.com/ruby/strscan/pull/99 split document with multi-byte chars.
2024-05-30Revert "[ruby/strscan] Doc for StringScanner"Hiroshi SHIBATA
This reverts commit 974ed1408c516d1e8f992f0b304e2de6f8bd5c1f.
2024-05-30Revert "Fix reference path for strscan documentation"Hiroshi SHIBATA
This reverts commit 1fa93fb9488a32018101689fd727965fd5874eb5.
2024-05-30Fix reference path for strscan documentationHiroshi SHIBATA
2024-05-30[ruby/strscan] Doc for StringScannerBurdette Lamar
(https://github.com/ruby/strscan/pull/96) #peek_byte and #scan_byte not updated (not available in my repo -- sorry). --------- https://github.com/ruby/strscan/commit/0123da7352 Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
2024-02-26[ruby/strscan] Add a method for peeking and reading bytes asAaron Patterson
integers (https://github.com/ruby/strscan/pull/89) This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the current byte, return it as an integer, and advance the cursor. `peek_byte` will return the current byte as an integer without advancing the cursor. Currently `StringScanner#get_byte` returns a string, but I want to get the current byte without allocating a string. I think this will help with writing high performance lexers. --------- https://github.com/ruby/strscan/commit/873aba2e5d Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/ba338b882c
2024-02-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/842845af1f
2024-01-19[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/d6f97ec102
2024-01-14[ruby/strscan] StringScanner#captures: Return nil not "" forNAITOH Jun
unmached capture (https://github.com/ruby/strscan/pull/72) fix https://github.com/ruby/strscan/issues/70 If there is no substring matching the group (s[3]), the behavior is different. If there is no substring matching the group, the corresponding element (s[3]) should be nil. ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", ""] s.captures.compact #=> ["foo", "bar", ""] ``` ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", nil] s.captures.compact #=> ["foo", "bar"] ``` https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html ``` /(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0 $~.to_a #=> ["foobar", "foo", "bar", nil] $~.captures #=> ["foo", "bar", nil] $~.captures.compact #=> ["foo", "bar"] ``` * StringScanner#captures is not yet documented. https://docs.ruby-lang.org/ja/latest/class/StringScanner.html https://github.com/ruby/strscan/commit/1fbfdd3c6f
2023-12-25Revert "Rollback to released version numbers of stringio and strscan"Hiroshi SHIBATA
This reverts commit 6a79e53823e328281b9e9eee53cd141af28f8548.
2023-12-16Rollback to released version numbers of stringio and strscanHiroshi SHIBATA
2023-11-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/1b3393be05
2023-07-28[ruby/strscan] Fix indentation in strscan.cPeter Zhu
[ci skip]
2023-07-27Add function rb_reg_onig_matchPeter Zhu
rb_reg_onig_match performs preparation, error handling, and cleanup for matching a regex against a string. This reduces repetitive code and removes the need for StringScanner to access internal data of regex. Notes: Merged: https://github.com/ruby/ruby/pull/8123
2023-07-27[ruby/strscan] Sync missed commitPeter Zhu
Syncs commit ruby/strscan@76b377a5d875ec77282d9319d62d8f24fe283b40.
2023-02-21[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/681cde0f27
2023-02-21[ruby/strscan] Mention return value of `rest?` in the docOKURA Masafumi
(https://github.com/ruby/strscan/pull/49) The doc of `rest?` was unclear about return value. This commit adds the return value to the doc.
2022-12-26[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/3ada12613d Notes: Merged: https://github.com/ruby/ruby/pull/7025
2022-12-09Merge strscan-3.0.5Hiroshi SHIBATA
Notes: Merged: https://github.com/ruby/ruby/pull/6890
2021-10-24[ruby/strscan] Bump versionSutou Kouhei
If we use the same version as the default strscan gem in Ruby, "gem install" doesn't extract .gem. It fails "gem install" because "gem install" can't find ext/strscan/ to be built. https://github.com/ruby/strscan/commit/3ceafa6cdc Notes: Merged: https://github.com/ruby/ruby/pull/5011
2021-05-06[ruby/strscan] Replace "iff" with "if and only if" (#18)Gannon McGibbon
iff means if and only if, but readers without that knowledge might assume this to be a spelling mistake. To me, this seems like exclusionary language that is unnecessary. Simply using "if and only if" instead should suffice. https://github.com/ruby/strscan/commit/066451c11e
2021-05-06[ruby/strscan] Fix segmentation fault of `StringScanner#charpos` when ↵Kenichi Kamiya
`String#byteslice` returns non string value [Bug #17756] (#20) https://github.com/ruby/strscan/commit/92961cde2b
2021-02-10Update class documentation for StringScannerJeremy Evans
The [] wasn't being displayed, and try to fix formatting for bol? and << (even if they aren't linked). Fixes [Bug #17620]
2020-12-18[strscan] Fix license comment and filesKenta Murata
https://github.com/ruby/strscan/commit/a999f2c6d1
2020-12-18[strscan] Version 3.0.0Kenta Murata
https://github.com/ruby/strscan/commit/08645e4e77
2020-12-18[strscan] Make strscan Ractor safe (#17)Kenta Murata
* Make strscan Ractor safe * Add test-unit in the development dependencies https://github.com/ruby/strscan/commit/3c93c2bebe
2020-10-02mark regex internal to string scannerAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/3623
2020-09-02Document that StringScanner#matched_size returns size in bytes [ci skip]Jeremy Evans
Fixes [Bug #17139]
2020-08-31[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/df90d541fa
2020-08-31[ruby/strscan] Replaced examples using $KCODE with encodingsNobuyoshi Nakada
`$KCODE` has been deprecated and not effective since years ago. https://github.com/ruby/strscan/commit/7c4dbd4cb3
2020-04-08Suppress -Wshorten-64-to-32 warningsNobuyoshi Nakada
2019-11-18[ruby/strscan] Remove taint supportJeremy Evans
Ruby 2.7 deprecates taint and it no longer has an effect. The lack of taint support should not cause a problem in previous Ruby versions. Notes: Merged: https://github.com/ruby/ruby/pull/2476
2019-10-14Fixed overflow at onig_region_setNobuyoshi Nakada
To get rid of a bug of `onig_region_set` which takes `int`s instead of `OnigPosition`s, set elements of `beg` and `end` members directly, for the time being.
2019-10-14Import StringScanner 1.0.3 (#2553)Sutou Kouhei
Notes: Merged-By: kou <kou@clear-code.com>
2018-02-16no ID cache in Init functionsnobu
Init functions are called only once, cache is useless. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62429 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-08ext/strscan/strscan.c: [DOC] grammar fixesstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-29strscan.c: add MatchData-like methodsnobu
* ext/strscan/strscan.c: added `size`, `captures` and `values_at` to StringScanner, shorthands of accessing the matched data. based on the patch by apeiros (Stefan Rusterholz) at [ruby-core:20412]. [Feature #836] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-21strscan.c: fix segfault in arefnobu
* ext/strscan/strscan.c (strscan_aref): fix segfault after get_byte or getch which do not apply regexp. [ruby-core:82116] [Bug #13759] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-28strscan.c: minlnobu
* ext/strscan/strscan.c (minl): extract to reduce repeated S_LEN. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56282 b2dd03c8-39d4-4d8f-98ff-823fe69b080e