summaryrefslogtreecommitdiff
path: root/ext/strscan
AgeCommit message (Collapse)Author
2025-07-15[ruby/strscan] Run `have_func` with the header providing the declarationsNobuyoshi Nakada
https://github.com/ruby/strscan/commit/18c0a59b65
2025-07-15[ruby/strscan] Update extconf.rbNobuyoshi Nakada
(https://github.com/ruby/strscan/pull/158) - `have_func` includes "ruby.h" by default. - include "ruby/re.h" where `rb_reg_onig_match` is declared. https://github.com/ruby/strscan/commit/1ac96f47e9
2024-12-16[ruby/strscan] [DOC] Add syntax highlighting to MarkDown code blocksAlexander Momchilov
(https://github.com/ruby/strscan/pull/126) Split off from https://github.com/ruby/ruby/pull/12322 https://github.com/ruby/strscan/commit/9bee37e0f5
2024-12-16[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/fd140b8582
2024-12-12Lock released version of strscan-3.1.1Hiroshi SHIBATA
2024-12-02Removed trailing spacesHiroshi SHIBATA
2024-12-02[ruby/strscan] Micro optimize encoding checksJean Boussier
(https://github.com/ruby/strscan/pull/117) Profiling shows a lot of time spent in various encoding check functions. I'm working on optimizing them on the Ruby side, but if we assume most strings are one of the simple 3 encodings, we can skip a lot of overhead. ```ruby require 'strscan' require 'benchmark/ips' source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze def scan_to_i(source) scanner = StringScanner.new(source) while number = scanner.scan(/\d+/) number.to_i scanner.skip(",") end end def scan_integer(source) scanner = StringScanner.new(source) while scanner.scan_integer scanner.skip(",") end end Benchmark.ips do |x| x.report("scan.to_i") { scan_to_i(source) } x.report("scan_integer") { scan_integer(source) } x.compare! end ``` Before: ``` ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- scan.to_i 93.000 i/100ms scan_integer 232.000 i/100ms Calculating ------------------------------------- scan.to_i 933.191 (± 0.2%) i/s (1.07 ms/i) - 4.743k in 5.082597s scan_integer 2.326k (± 0.8%) i/s (429.99 μs/i) - 11.832k in 5.087974s Comparison: scan_integer: 2325.6 i/s scan.to_i: 933.2 i/s - 2.49x slower ``` After: ``` ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- scan.to_i 96.000 i/100ms scan_integer 274.000 i/100ms Calculating ------------------------------------- scan.to_i 969.489 (± 0.2%) i/s (1.03 ms/i) - 4.896k in 5.050114s scan_integer 2.756k (± 0.1%) i/s (362.88 μs/i) - 13.974k in 5.070837s Comparison: scan_integer: 2755.8 i/s scan.to_i: 969.5 i/s - 2.84x slower ``` https://github.com/ruby/strscan/commit/c02b1ce684
2024-12-02StringScanner#scan_integer support base 16 integers (#116)Jean Boussier
Followup: https://github.com/ruby/strscan/pull/115 `scan_integer` is now implemented in Ruby as to efficiently handle keyword arguments without allocating a Hash. Given the goal of `scan_integer` is to more effciently parse integers without having to allocate an intermediary object, using `rb_scan_args` would defeat the purpose. Additionally, the C implementation now uses `rb_isdigit` and `rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-11-27[ruby/strscan] Implement #scan_integer to efficiently parse IntegerJean Boussier
(https://github.com/ruby/strscan/pull/115) Fix: https://github.com/ruby/strscan/issues/113 This allows to directly parse an Integer from a String without needing to first allocate a sub string. Notes: The implementation is limited by design, it's meant as a first step, only the most straightforward, based 10 integers are supported. https://github.com/ruby/strscan/commit/6a3c74b4c8
2024-10-26[ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: RemoveNAITOH Jun
unnecessary use of `rb_enc_get()` (https://github.com/ruby/strscan/pull/108) - before: #106 ## Why? In `rb_strseq_index()`, the result of `rb_enc_check()` is used. - https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368 > enc = rb_enc_check(str, sub); > return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len, offset, enc); - https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318 ```C strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len, const char *sub_ptr, long sub_len, long offset, rb_encoding *enc) { const char *search_start = str_ptr; long pos, search_len = str_len - offset; for (;;) { const char *t; pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc); ``` ## Benchmark It shows String as a pattern is 1.24x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 9.225M i/s - 9.328M times in 1.011068s (108.40ns/i) regexp_var 9.327M i/s - 9.413M times in 1.009214s (107.21ns/i) string 9.200M i/s - 9.355M times in 1.016840s (108.70ns/i) string_var 11.249M i/s - 11.255M times in 1.000578s (88.90ns/i) Calculating ------------------------------------- regexp 9.565M i/s - 27.676M times in 2.893476s (104.55ns/i) regexp_var 10.111M i/s - 27.982M times in 2.767496s (98.90ns/i) string 10.060M i/s - 27.600M times in 2.743465s (99.40ns/i) string_var 12.519M i/s - 33.746M times in 2.695615s (79.88ns/i) Comparison: string_var: 12518707.2 i/s regexp_var: 10111089.6 i/s - 1.24x slower string: 10060144.4 i/s - 1.24x slower regexp: 9565124.4 i/s - 1.31x slower ``` https://github.com/ruby/strscan/commit/ff2d7afa19
2024-10-26[ruby/strscan] Use C90 as far as supporting 2.6 or earlierNobuyoshi Nakada
(https://github.com/ruby/strscan/pull/101) https://github.com/ruby/strscan/commit/d31274f41b
2024-09-17[ruby/strscan] Accept String as a pattern at non headNAITOH Jun
(https://github.com/ruby/strscan/pull/106) It supports non-head match cases such as StringScanner#scan_until. If we use a String as a pattern, we can improve match performance. Here is a result of the including benchmark. ## CRuby It shows String as a pattern is 1.18x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 9.403M i/s - 9.548M times in 1.015459s (106.35ns/i) regexp_var 9.162M i/s - 9.248M times in 1.009479s (109.15ns/i) string 8.966M i/s - 9.274M times in 1.034343s (111.54ns/i) string_var 11.051M i/s - 11.190M times in 1.012538s (90.49ns/i) Calculating ------------------------------------- regexp 10.319M i/s - 28.209M times in 2.733707s (96.91ns/i) regexp_var 10.032M i/s - 27.485M times in 2.739807s (99.68ns/i) string 9.681M i/s - 26.897M times in 2.778397s (103.30ns/i) string_var 12.162M i/s - 33.154M times in 2.726046s (82.22ns/i) Comparison: string_var: 12161920.6 i/s regexp: 10318949.7 i/s - 1.18x slower regexp_var: 10031617.6 i/s - 1.21x slower string: 9680843.7 i/s - 1.26x slower ``` ## JRuby It shows String as a pattern is 2.11x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 7.591M i/s - 7.544M times in 0.993780s (131.74ns/i) regexp_var 6.143M i/s - 6.125M times in 0.997038s (162.77ns/i) string 14.135M i/s - 14.079M times in 0.996067s (70.75ns/i) string_var 14.079M i/s - 14.057M times in 0.998420s (71.03ns/i) Calculating ------------------------------------- regexp 9.409M i/s - 22.773M times in 2.420268s (106.28ns/i) regexp_var 10.116M i/s - 18.430M times in 1.821820s (98.85ns/i) string 21.389M i/s - 42.404M times in 1.982519s (46.75ns/i) string_var 20.897M i/s - 42.237M times in 2.021187s (47.85ns/i) Comparison: string: 21389191.1 i/s string_var: 20897327.5 i/s - 1.02x slower regexp_var: 10116464.7 i/s - 2.11x slower regexp: 9409222.3 i/s - 2.27x slower ``` See: https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736 --------- https://github.com/ruby/strscan/commit/f9d96c446a Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-08-31Added pre-release suffix for development version of default gemsHiroshi SHIBATA
https://github.com/ruby/stringio/issues/81
2024-06-04Sync strscan HEAD again.Hiroshi SHIBATA
https://github.com/ruby/strscan/pull/99 split document with multi-byte chars.
2024-05-30Revert "[ruby/strscan] Doc for StringScanner"Hiroshi SHIBATA
This reverts commit 974ed1408c516d1e8f992f0b304e2de6f8bd5c1f.
2024-05-30Revert "Fix reference path for strscan documentation"Hiroshi SHIBATA
This reverts commit 1fa93fb9488a32018101689fd727965fd5874eb5.
2024-05-30Fix reference path for strscan documentationHiroshi SHIBATA
2024-05-30[ruby/strscan] Doc for StringScannerBurdette Lamar
(https://github.com/ruby/strscan/pull/96) #peek_byte and #scan_byte not updated (not available in my repo -- sorry). --------- https://github.com/ruby/strscan/commit/0123da7352 Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
2024-04-27ruby tool/update-deps --fix卜部昌平
2024-02-26[ruby/strscan] Add a method for peeking and reading bytes asAaron Patterson
integers (https://github.com/ruby/strscan/pull/89) This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the current byte, return it as an integer, and advance the cursor. `peek_byte` will return the current byte as an integer without advancing the cursor. Currently `StringScanner#get_byte` returns a string, but I want to get the current byte without allocating a string. I think this will help with writing high performance lexers. --------- https://github.com/ruby/strscan/commit/873aba2e5d Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/ba338b882c
2024-02-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/842845af1f
2024-01-19[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/d6f97ec102
2024-01-14[ruby/strscan] StringScanner#captures: Return nil not "" forNAITOH Jun
unmached capture (https://github.com/ruby/strscan/pull/72) fix https://github.com/ruby/strscan/issues/70 If there is no substring matching the group (s[3]), the behavior is different. If there is no substring matching the group, the corresponding element (s[3]) should be nil. ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", ""] s.captures.compact #=> ["foo", "bar", ""] ``` ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", nil] s.captures.compact #=> ["foo", "bar"] ``` https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html ``` /(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0 $~.to_a #=> ["foobar", "foo", "bar", nil] $~.captures #=> ["foo", "bar", nil] $~.captures.compact #=> ["foo", "bar"] ``` * StringScanner#captures is not yet documented. https://docs.ruby-lang.org/ja/latest/class/StringScanner.html https://github.com/ruby/strscan/commit/1fbfdd3c6f
2023-12-25Revert "Rollback to released version numbers of stringio and strscan"Hiroshi SHIBATA
This reverts commit 6a79e53823e328281b9e9eee53cd141af28f8548.
2023-12-16Rollback to released version numbers of stringio and strscanHiroshi SHIBATA
2023-11-08[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/1b3393be05
2023-07-28[ruby/strscan] Fix indentation in strscan.cPeter Zhu
[ci skip]
2023-07-27Add function rb_reg_onig_matchPeter Zhu
rb_reg_onig_match performs preparation, error handling, and cleanup for matching a regex against a string. This reduces repetitive code and removes the need for StringScanner to access internal data of regex. Notes: Merged: https://github.com/ruby/ruby/pull/8123
2023-07-27[ruby/strscan] Sync missed commitPeter Zhu
Syncs commit ruby/strscan@76b377a5d875ec77282d9319d62d8f24fe283b40.
2023-02-28Update the depend filesMatt Valentine-House
Notes: Merged: https://github.com/ruby/ruby/pull/7310
2023-02-27Remove intern/gc.h from Make depsMatt Valentine-House
Notes: Merged: https://github.com/ruby/ruby/pull/7330
2023-02-21[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/681cde0f27
2023-02-21[ruby/strscan] Mention return value of `rest?` in the docOKURA Masafumi
(https://github.com/ruby/strscan/pull/49) The doc of `rest?` was unclear about return value. This commit adds the return value to the doc.
2023-02-08Extract include/ruby/internal/attr/packed_struct.hNobuyoshi Nakada
Split `PACKED_STRUCT` and `PACKED_STRUCT_UNALIGNED` macros into the macros bellow: * `RBIMPL_ATTR_PACKED_STRUCT_BEGIN` * `RBIMPL_ATTR_PACKED_STRUCT_END` * `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_BEGIN` * `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_END` Notes: Merged: https://github.com/ruby/ruby/pull/7268
2022-12-26[ruby/strscan] Bump versionSutou Kouhei
https://github.com/ruby/strscan/commit/3ada12613d Notes: Merged: https://github.com/ruby/ruby/pull/7025
2022-12-09Merge strscan-3.0.5Hiroshi SHIBATA
Notes: Merged: https://github.com/ruby/ruby/pull/6890
2022-02-22[Feature #18249] Update dependenciesPeter Zhu
Notes: Merged: https://github.com/ruby/ruby/pull/5474
2021-11-21Update dependenciesNobuyoshi Nakada
2021-10-24[ruby/strscan] Bump versionSutou Kouhei
If we use the same version as the default strscan gem in Ruby, "gem install" doesn't extract .gem. It fails "gem install" because "gem install" can't find ext/strscan/ to be built. https://github.com/ruby/strscan/commit/3ceafa6cdc Notes: Merged: https://github.com/ruby/ruby/pull/5011
2021-10-05ruby tool/update-deps --fix卜部昌平
Notes: Merged: https://github.com/ruby/ruby/pull/4909
2021-05-06[ruby/strscan] Replace "iff" with "if and only if" (#18)Gannon McGibbon
iff means if and only if, but readers without that knowledge might assume this to be a spelling mistake. To me, this seems like exclusionary language that is unnecessary. Simply using "if and only if" instead should suffice. https://github.com/ruby/strscan/commit/066451c11e
2021-05-06[ruby/strscan] Fix segmentation fault of `StringScanner#charpos` when ↵Kenichi Kamiya
`String#byteslice` returns non string value [Bug #17756] (#20) https://github.com/ruby/strscan/commit/92961cde2b
2021-05-06Import from https://github.com/ruby/strscan/pull/19Hiroshi SHIBATA
* Use Gemfile instead of Gem::Specification#add_development_dependency. * Use pend instead of skip for test-unit.
2021-04-13dependency updates卜部昌平
Notes: Merged: https://github.com/ruby/ruby/pull/4371
2021-02-10Update class documentation for StringScannerJeremy Evans
The [] wasn't being displayed, and try to fix formatting for bol? and << (even if they aren't linked). Fixes [Bug #17620]
2020-12-18[strscan] Fix license comment and filesKenta Murata
https://github.com/ruby/strscan/commit/a999f2c6d1
2020-12-18[strscan] Version 3.0.0Kenta Murata
https://github.com/ruby/strscan/commit/08645e4e77
2020-12-18[strscan] Make strscan Ractor safe (#17)Kenta Murata
* Make strscan Ractor safe * Add test-unit in the development dependencies https://github.com/ruby/strscan/commit/3c93c2bebe
2020-10-02mark regex internal to string scannerAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/3623