<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/test/strscan, branch v3_4_9</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>[ruby/strscan] test: don't omit "(...)" for method calls that have at least one argument</title>
<updated>2024-12-02T01:50:34+00:00</updated>
<author>
<name>Sutou Kouhei</name>
<email>kou@clear-code.com</email>
</author>
<published>2024-11-29T06:22:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=9a7f050eda62b8492d3d0fd8ecc32df854f05874'/>
<id>9a7f050eda62b8492d3d0fd8ecc32df854f05874</id>
<content type='text'>
https://github.com/ruby/strscan/commit/dddae9c99a
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/ruby/strscan/commit/dddae9c99a
</pre>
</div>
</content>
</entry>
<entry>
<title>StringScanner#scan_integer support base 16 integers (#116)</title>
<updated>2024-12-02T01:50:34+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-27T08:31:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=79cc3d26ed3a038750988070d81912ece31c735b'/>
<id>79cc3d26ed3a038750988070d81912ece31c735b</id>
<content type='text'>
Followup: https://github.com/ruby/strscan/pull/115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.

Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Followup: https://github.com/ruby/strscan/pull/115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.

Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Prevent a warning "ambiguous first argument" during a</title>
<updated>2024-12-02T01:50:34+00:00</updated>
<author>
<name>Yusuke Endoh</name>
<email>mame@ruby-lang.org</email>
</author>
<published>2024-11-29T00:41:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=5514485e1336382b02f61c5e2f127ec9d437b201'/>
<id>5514485e1336382b02f61c5e2f127ec9d437b201</id>
<content type='text'>
test
(https://github.com/ruby/strscan/pull/118)

https://rubyci.s3.amazonaws.com/debian11/ruby-master/log/20241128T153002Z.log.html.gz
```
/home/chkbuild/chkbuild/tmp/build/20241128T153002Z/ruby/test/strscan/test_stringscanner.rb:908: warning: ambiguous first argument; put parentheses or a space even after `-` operator
```

https://github.com/ruby/strscan/commit/af3fd2f045
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
test
(https://github.com/ruby/strscan/pull/118)

https://rubyci.s3.amazonaws.com/debian11/ruby-master/log/20241128T153002Z.log.html.gz
```
/home/chkbuild/chkbuild/tmp/build/20241128T153002Z/ruby/test/strscan/test_stringscanner.rb:908: warning: ambiguous first argument; put parentheses or a space even after `-` operator
```

https://github.com/ruby/strscan/commit/af3fd2f045
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Implement #scan_integer to efficiently parse Integer</title>
<updated>2024-11-27T00:24:07+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-26T08:22:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d5de1a57893b16aff7bc3336b34fa2e9acefb3d2'/>
<id>d5de1a57893b16aff7bc3336b34fa2e9acefb3d2</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/115)

Fix: https://github.com/ruby/strscan/issues/113

This allows to directly parse an Integer from a String without needing
to first allocate a sub string.

Notes:

The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.

https://github.com/ruby/strscan/commit/6a3c74b4c8
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/115)

Fix: https://github.com/ruby/strscan/issues/113

This allows to directly parse an Integer from a String without needing
to first allocate a sub string.

Notes:

The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.

https://github.com/ruby/strscan/commit/6a3c74b4c8
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] [JRuby] Optimize `scan()`: Remove duplicate `if</title>
<updated>2024-10-26T09:44:15+00:00</updated>
<author>
<name>NAITOH Jun</name>
<email>naitoh@gmail.com</email>
</author>
<published>2024-10-19T06:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e61bb75a8650075e2283502de8853c3228307fbc'/>
<id>e61bb75a8650075e2283502de8853c3228307fbc</id>
<content type='text'>
(restLen() &lt; patternsize()) return context.nil;` checks in
`!headonly`.
(https://github.com/ruby/strscan/pull/110)

- before: #109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr &lt; pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset &lt; otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) &lt; patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

https://github.com/ruby/strscan/commit/843e931d13
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(restLen() &lt; patternsize()) return context.nil;` checks in
`!headonly`.
(https://github.com/ruby/strscan/pull/110)

- before: #109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr &lt; pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset &lt; otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) &lt; patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

https://github.com/ruby/strscan/commit/843e931d13
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Accept String as a pattern at non head</title>
<updated>2024-09-17T06:12:25+00:00</updated>
<author>
<name>NAITOH Jun</name>
<email>naitoh@gmail.com</email>
</author>
<published>2024-09-14T00:32:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d81b0588bb3c97167d1f7e2d2a74185e0c19b68c'/>
<id>d81b0588bb3c97167d1f7e2d2a74185e0c19b68c</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736

---------

https://github.com/ruby/strscan/commit/f9d96c446a

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736

---------

https://github.com/ruby/strscan/commit/f9d96c446a

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Omit tests for `#scan_byte` and `#peek_byte` on</title>
<updated>2024-03-27T03:17:01+00:00</updated>
<author>
<name>Andrii Konchyn</name>
<email>andry.konchin@gmail.com</email>
</author>
<published>2024-03-27T00:39:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8fa6c364925bff4e704d4c0fd73555fb33aa7029'/>
<id>8fa6c364925bff4e704d4c0fd73555fb33aa7029</id>
<content type='text'>
TruffleRuby temporary
(https://github.com/ruby/strscan/pull/91)

The methods were added in #89 but they aren't implemented in TruffleRuby
yet. So let's omit them for now to have CI green.

https://github.com/ruby/strscan/commit/844d963b56
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TruffleRuby temporary
(https://github.com/ruby/strscan/pull/91)

The methods were added in #89 but they aren't implemented in TruffleRuby
yet. So let's omit them for now to have CI green.

https://github.com/ruby/strscan/commit/844d963b56
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Add a method for peeking and reading bytes as</title>
<updated>2024-02-26T06:54:54+00:00</updated>
<author>
<name>Aaron Patterson</name>
<email>tenderlove@ruby-lang.org</email>
</author>
<published>2024-02-26T00:45:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=164e464b042239cdbd14d3751a7f907754d580ce'/>
<id>164e464b042239cdbd14d3751a7f907754d580ce</id>
<content type='text'>
integers
(https://github.com/ruby/strscan/pull/89)

This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.

Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.

---------

https://github.com/ruby/strscan/commit/873aba2e5d

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
integers
(https://github.com/ruby/strscan/pull/89)

This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.

Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.

---------

https://github.com/ruby/strscan/commit/873aba2e5d

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Don't add begin to length for new string slice</title>
<updated>2024-02-08T05:43:56+00:00</updated>
<author>
<name>Charles Oliver Nutter</name>
<email>headius@headius.com</email>
</author>
<published>2024-02-03T10:56:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=39f2e37ff1c12cf4c9fec0b697a1495bc1930995'/>
<id>39f2e37ff1c12cf4c9fec0b697a1495bc1930995</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/87)

Fixes https://github.com/ruby/strscan/pull/86

https://github.com/ruby/strscan/commit/c17b015c00
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/87)

Fixes https://github.com/ruby/strscan/pull/86

https://github.com/ruby/strscan/commit/c17b015c00
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Add test to check encoding for empty string</title>
<updated>2024-01-19T01:49:12+00:00</updated>
<author>
<name>NAITOH Jun</name>
<email>naitoh@gmail.com</email>
</author>
<published>2024-01-14T12:26:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=91f35305807f7303bfb58ccdffe86820a2300b8c'/>
<id>91f35305807f7303bfb58ccdffe86820a2300b8c</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/80)

See: https://github.com/ruby/strscan/issues/78#issuecomment-1890849891

https://github.com/ruby/strscan/commit/d0508518a9
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/80)

See: https://github.com/ruby/strscan/issues/78#issuecomment-1890849891

https://github.com/ruby/strscan/commit/d0508518a9
</pre>
</div>
</content>
</entry>
</feed>
