summaryrefslogtreecommitdiff
path: root/test/strscan
AgeCommit message (Collapse)Author
2024-12-02[ruby/strscan] test: don't omit "(...)" for method calls that have at least ↵Sutou Kouhei
one argument https://github.com/ruby/strscan/commit/dddae9c99a
2024-12-02StringScanner#scan_integer support base 16 integers (#116)Jean Boussier
Followup: https://github.com/ruby/strscan/pull/115 `scan_integer` is now implemented in Ruby as to efficiently handle keyword arguments without allocating a Hash. Given the goal of `scan_integer` is to more effciently parse integers without having to allocate an intermediary object, using `rb_scan_args` would defeat the purpose. Additionally, the C implementation now uses `rb_isdigit` and `rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-12-02[ruby/strscan] Prevent a warning "ambiguous first argument" during aYusuke Endoh
test (https://github.com/ruby/strscan/pull/118) https://rubyci.s3.amazonaws.com/debian11/ruby-master/log/20241128T153002Z.log.html.gz ``` /home/chkbuild/chkbuild/tmp/build/20241128T153002Z/ruby/test/strscan/test_stringscanner.rb:908: warning: ambiguous first argument; put parentheses or a space even after `-` operator ``` https://github.com/ruby/strscan/commit/af3fd2f045
2024-11-27[ruby/strscan] Implement #scan_integer to efficiently parse IntegerJean Boussier
(https://github.com/ruby/strscan/pull/115) Fix: https://github.com/ruby/strscan/issues/113 This allows to directly parse an Integer from a String without needing to first allocate a sub string. Notes: The implementation is limited by design, it's meant as a first step, only the most straightforward, based 10 integers are supported. https://github.com/ruby/strscan/commit/6a3c74b4c8
2024-10-26[ruby/strscan] [JRuby] Optimize `scan()`: Remove duplicate `ifNAITOH Jun
(restLen() < patternsize()) return context.nil;` checks in `!headonly`. (https://github.com/ruby/strscan/pull/110) - before: #109 ## Why? https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373 This means the following : `if (str.size() - curr < pattern.size()) return context.nil;` A similar check is made within `StringSupport#index()` within `!headonly`. https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720 ```Java public static int index(ByteList source, ByteList other, int offset, Encoding enc) { int sourceLen = source.realSize(); int sourceBegin = source.begin(); int otherLen = other.realSize(); if (otherLen == 0) return offset; if (sourceLen - offset < otherLen) return -1; ``` - source = `strBL` - other = `patternBL` - offset = `strBeg + curr` This means the following : `if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return -1;` Both checks are the same. ## Benchmark It shows String as a pattern is 2.40x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 7.613M i/s - 7.593M times in 0.997350s (131.35ns/i) regexp_var 7.793M i/s - 7.772M times in 0.997364s (128.32ns/i) string 13.222M i/s - 13.199M times in 0.998297s (75.63ns/i) string_var 15.283M i/s - 15.216M times in 0.995667s (65.43ns/i) Calculating ------------------------------------- regexp 10.003M i/s - 22.840M times in 2.283361s (99.97ns/i) regexp_var 9.991M i/s - 23.378M times in 2.340019s (100.09ns/i) string 23.454M i/s - 39.666M times in 1.691221s (42.64ns/i) string_var 23.998M i/s - 45.848M times in 1.910447s (41.67ns/i) Comparison: string_var: 23998466.3 i/s string: 23453777.5 i/s - 1.02x slower regexp: 10002809.4 i/s - 2.40x slower regexp_var: 9990580.1 i/s - 2.40x slower ``` https://github.com/ruby/strscan/commit/843e931d13
2024-09-17[ruby/strscan] Accept String as a pattern at non headNAITOH Jun
(https://github.com/ruby/strscan/pull/106) It supports non-head match cases such as StringScanner#scan_until. If we use a String as a pattern, we can improve match performance. Here is a result of the including benchmark. ## CRuby It shows String as a pattern is 1.18x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 9.403M i/s - 9.548M times in 1.015459s (106.35ns/i) regexp_var 9.162M i/s - 9.248M times in 1.009479s (109.15ns/i) string 8.966M i/s - 9.274M times in 1.034343s (111.54ns/i) string_var 11.051M i/s - 11.190M times in 1.012538s (90.49ns/i) Calculating ------------------------------------- regexp 10.319M i/s - 28.209M times in 2.733707s (96.91ns/i) regexp_var 10.032M i/s - 27.485M times in 2.739807s (99.68ns/i) string 9.681M i/s - 26.897M times in 2.778397s (103.30ns/i) string_var 12.162M i/s - 33.154M times in 2.726046s (82.22ns/i) Comparison: string_var: 12161920.6 i/s regexp: 10318949.7 i/s - 1.18x slower regexp_var: 10031617.6 i/s - 1.21x slower string: 9680843.7 i/s - 1.26x slower ``` ## JRuby It shows String as a pattern is 2.11x faster than Regexp as a pattern. ``` $ benchmark-driver benchmark/check_until.yaml Warming up -------------------------------------- regexp 7.591M i/s - 7.544M times in 0.993780s (131.74ns/i) regexp_var 6.143M i/s - 6.125M times in 0.997038s (162.77ns/i) string 14.135M i/s - 14.079M times in 0.996067s (70.75ns/i) string_var 14.079M i/s - 14.057M times in 0.998420s (71.03ns/i) Calculating ------------------------------------- regexp 9.409M i/s - 22.773M times in 2.420268s (106.28ns/i) regexp_var 10.116M i/s - 18.430M times in 1.821820s (98.85ns/i) string 21.389M i/s - 42.404M times in 1.982519s (46.75ns/i) string_var 20.897M i/s - 42.237M times in 2.021187s (47.85ns/i) Comparison: string: 21389191.1 i/s string_var: 20897327.5 i/s - 1.02x slower regexp_var: 10116464.7 i/s - 2.11x slower regexp: 9409222.3 i/s - 2.27x slower ``` See: https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736 --------- https://github.com/ruby/strscan/commit/f9d96c446a Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-03-27[ruby/strscan] Omit tests for `#scan_byte` and `#peek_byte` onAndrii Konchyn
TruffleRuby temporary (https://github.com/ruby/strscan/pull/91) The methods were added in #89 but they aren't implemented in TruffleRuby yet. So let's omit them for now to have CI green. https://github.com/ruby/strscan/commit/844d963b56
2024-02-26[ruby/strscan] Add a method for peeking and reading bytes asAaron Patterson
integers (https://github.com/ruby/strscan/pull/89) This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the current byte, return it as an integer, and advance the cursor. `peek_byte` will return the current byte as an integer without advancing the cursor. Currently `StringScanner#get_byte` returns a string, but I want to get the current byte without allocating a string. I think this will help with writing high performance lexers. --------- https://github.com/ruby/strscan/commit/873aba2e5d Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-08[ruby/strscan] Don't add begin to length for new string sliceCharles Oliver Nutter
(https://github.com/ruby/strscan/pull/87) Fixes https://github.com/ruby/strscan/pull/86 https://github.com/ruby/strscan/commit/c17b015c00
2024-01-19[ruby/strscan] Add test to check encoding for empty stringNAITOH Jun
(https://github.com/ruby/strscan/pull/80) See: https://github.com/ruby/strscan/issues/78#issuecomment-1890849891 https://github.com/ruby/strscan/commit/d0508518a9
2024-01-14[ruby/strscan] StringScanner#captures: Return nil not "" forNAITOH Jun
unmached capture (https://github.com/ruby/strscan/pull/72) fix https://github.com/ruby/strscan/issues/70 If there is no substring matching the group (s[3]), the behavior is different. If there is no substring matching the group, the corresponding element (s[3]) should be nil. ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", ""] s.captures.compact #=> ["foo", "bar", ""] ``` ``` s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba..."> s.scan /(foo)(bar)(BAZ)?/ #=> "foobar" s[0] #=> "foobar" s[1] #=> "foo" s[2] #=> "bar" s[3] #=> nil s.captures #=> ["foo", "bar", nil] s.captures.compact #=> ["foo", "bar"] ``` https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html ``` /(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0 $~.to_a #=> ["foobar", "foo", "bar", nil] $~.captures #=> ["foo", "bar", nil] $~.captures.compact #=> ["foo", "bar"] ``` * StringScanner#captures is not yet documented. https://docs.ruby-lang.org/ja/latest/class/StringScanner.html https://github.com/ruby/strscan/commit/1fbfdd3c6f
2023-07-27[ruby/strscan] Sync missed commitPeter Zhu
Syncs commit ruby/strscan@76b377a5d875ec77282d9319d62d8f24fe283b40.
2023-02-21[ruby/strscan] Mask out this test on JRuby/WindowsCharles Oliver Nutter
See https://github.com/jruby/jruby/issues/7644 for the root issue, which will require fixes to JRuby's regular expression engine, JOni. https://github.com/ruby/strscan/commit/29a65abff2
2023-02-21[ruby/strscan] test: Run test more with fixed anchor modeSutou Kouhei
(https://github.com/ruby/strscan/pull/60) fix https://github.com/ruby/strscan/pull/56
2023-02-21[ruby/strscan] Add test case to `test_string`OKURA Masafumi
(https://github.com/ruby/strscan/pull/58) `string` returns the original string after `scan` is called. Current test doesn't check this behavior and now it's covered.
2022-12-09Merge strscan-3.0.5Hiroshi SHIBATA
Notes: Merged: https://github.com/ruby/ruby/pull/6890
2021-05-06[ruby/strscan] Fix segmentation fault of `StringScanner#charpos` when ↵Kenichi Kamiya
`String#byteslice` returns non string value [Bug #17756] (#20) https://github.com/ruby/strscan/commit/92961cde2b
2021-05-06Import from https://github.com/ruby/strscan/pull/19Hiroshi SHIBATA
* Use Gemfile instead of Gem::Specification#add_development_dependency. * Use pend instead of skip for test-unit.
2020-12-18[strscan] Make strscan Ractor safe (#17)Kenta Murata
* Make strscan Ractor safe * Add test-unit in the development dependencies https://github.com/ruby/strscan/commit/3c93c2bebe
2019-11-18Deprecate taint/trust and related methods, and make the methods no-opsJeremy Evans
This removes the related tests, and puts the related specs behind version guards. This affects all code in lib, including some libraries that may want to support older versions of Ruby. Notes: Merged: https://github.com/ruby/ruby/pull/2476
2019-10-14Import StringScanner 1.0.3 (#2553)Sutou Kouhei
Notes: Merged-By: kou <kou@clear-code.com>
2017-11-29strscan.c: add MatchData-like methodsnobu
* ext/strscan/strscan.c: added `size`, `captures` and `values_at` to StringScanner, shorthands of accessing the matched data. based on the patch by apeiros (Stefan Rusterholz) at [ruby-core:20412]. [Feature #836] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-21strscan.c: fix segfault in arefnobu
* ext/strscan/strscan.c (strscan_aref): fix segfault after get_byte or getch which do not apply regexp. [ruby-core:82116] [Bug #13759] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-06{ext,test}/strscan: Specify frozen_string_literal: true.kazu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-16Add frozen_string_literal: false for all filesnaruse
When you change this to true, you may need to add more tests. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53141 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-08-03strscan.c: encoding in messagesnobu
* ext/strscan/strscan.c (strscan_aref): preserve argument encoding in error messages. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-03-11* test: get rid of warnings.usa
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-05-24* ext/strscan/strscan.c (strscan_aref): raise error if givennaruse
name reference is not found. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-05-21* ext/strscan/strscan.c (strscan_aref): support named captures.naruse
patched by Konstantin Haase [ruby-core:54664] [Feature #8343] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-11-28Added #charpos for multibyte string position.ryan
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37916 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-02-14avoid method redefinition.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26663 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-26* ext/strscan/strscan.c (strscan_set_string): set string should not benobu
dupped or frozen, because freezing it causes #concat method failure, and unnecessary to dup without freezing. a patch from Aaron Patterson at [ruby-core:25145]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24679 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-06-17* ext/strscan/strscan.c (Init_strscan): remove obsoletematz
matchedsize method, use matched_size instead. [ruby-dev:38591] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23721 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-24* test: assert_raises has been deprecated since a long time ago.nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-06-05* test/stringio/test_stringio.rb: add tests to achieve over 95% testmame
coverage of stringio. * test/strscan/test_stringscanner.rb: ditto for strscan. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16847 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-12* re.c (rb_reg_prepare_re): made non static with small refactoring.matz
* ext/strscan/strscan.c (strscan_do_scan): should adjust encoding before regex searching. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16387 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-28add a test.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-28* ext/strscan/strscan.c (str_new): new function for allocate an stringakr
with encoding propagation. (extract_range): use str_new. (extract_beg_len): ditto. (strscan_peek): ditto. (strscan_rest): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14772 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-14* test/socket/test_socket.rb: update not to use 1.8 assignment tomatz
external local variable in the block parameters. [ruby-dev:32251] * test/strscan/test_stringscanner.rb: avoid $KCODE, and use String#force_encoding(). [ruby-dev:32251] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2006-07-26* ext/strscan/strscan.c (strscan_do_scan): StringScanner.new("").scan(//) ↵aamine
should return "". [ruby-Bugs:4361] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10606 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2004-03-05* ext/strscan/strscan.c: new method StringScanner#initialize_copy to allow ↵aamine
#dup and #clone. * test/strscan/test_strscan.rb: test StringScanner#dup. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@5889 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2004-02-18 * test/*: should not depend on $KCODE.nahi
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@5764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2003-12-16introduce some new methodsaamine
* ext/strscan/strscan.c: new method StringScanner#beginning_of_line? (alias #bol?) * ext/strscan/strscan.c: new method StringScanner#concat and #<<. * ext/strscan/strscan.c: StringScanner#new(str) does not duplicate nor freeze STR (allow destructive modification). * test/strscan/test_stringscanner.rb: test new methods above. * test/strscan/test_stringscanner.rb: test destructive string modification. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@5201 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2003-09-17* test/strscan/test_stringscanner.rb: require test/unit.aamine
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@4563 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2003-09-17* test/strscan/test_stringscanner.rb: new file.aamine
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@4560 b2dd03c8-39d4-4d8f-98ff-823fe69b080e