<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/ext/strscan/strscan.c, branch v3_4_9</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>[ruby/strscan] [DOC] Add syntax highlighting to MarkDown code blocks</title>
<updated>2024-12-16T01:10:34+00:00</updated>
<author>
<name>Alexander Momchilov</name>
<email>amomchilov@users.noreply.github.com</email>
</author>
<published>2024-12-13T01:28:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=41e24c2f3e9a5ff29cccbfe92ecf4d412e5a4e0d'/>
<id>41e24c2f3e9a5ff29cccbfe92ecf4d412e5a4e0d</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/126)

Split off from https://github.com/ruby/ruby/pull/12322

https://github.com/ruby/strscan/commit/9bee37e0f5
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/126)

Split off from https://github.com/ruby/ruby/pull/12322

https://github.com/ruby/strscan/commit/9bee37e0f5
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Bump version</title>
<updated>2024-12-16T01:10:34+00:00</updated>
<author>
<name>Sutou Kouhei</name>
<email>kou@clear-code.com</email>
</author>
<published>2024-12-12T02:41:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=219c2eee5a4a2b76f054c396635893e6139694a4'/>
<id>219c2eee5a4a2b76f054c396635893e6139694a4</id>
<content type='text'>
https://github.com/ruby/strscan/commit/fd140b8582
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/ruby/strscan/commit/fd140b8582
</pre>
</div>
</content>
</entry>
<entry>
<title>Lock released version of strscan-3.1.1</title>
<updated>2024-12-12T07:14:25+00:00</updated>
<author>
<name>Hiroshi SHIBATA</name>
<email>hsbt@ruby-lang.org</email>
</author>
<published>2024-12-12T07:14:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=78ca87f8a8c79f0af1b7c6a0d819faacd75ec76e'/>
<id>78ca87f8a8c79f0af1b7c6a0d819faacd75ec76e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Micro optimize encoding checks</title>
<updated>2024-12-02T01:50:34+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-28T04:15:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=636d57bd1c523ef3653708e4010270919a01b2a0'/>
<id>636d57bd1c523ef3653708e4010270919a01b2a0</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/117)

Profiling shows a lot of time spent in various encoding check functions.
I'm working on optimizing them on the Ruby side, but if we assume most
strings are one of the simple 3 encodings, we can skip a lot of
overhead.

```ruby
require 'strscan'
require 'benchmark/ips'

source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze

def scan_to_i(source)
  scanner = StringScanner.new(source)
  while number = scanner.scan(/\d+/)
    number.to_i
    scanner.skip(",")
  end
end

def scan_integer(source)
  scanner = StringScanner.new(source)
  while scanner.scan_integer
    scanner.skip(",")
  end
end

Benchmark.ips do |x|
  x.report("scan.to_i") { scan_to_i(source) }
  x.report("scan_integer") { scan_integer(source) }
  x.compare!
end
```

Before:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           scan.to_i    93.000 i/100ms
        scan_integer   232.000 i/100ms
Calculating -------------------------------------
           scan.to_i    933.191 (± 0.2%) i/s    (1.07 ms/i) -      4.743k in   5.082597s
        scan_integer      2.326k (± 0.8%) i/s  (429.99 μs/i) -     11.832k in   5.087974s

Comparison:
        scan_integer:     2325.6 i/s
           scan.to_i:      933.2 i/s - 2.49x  slower
```

After:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           scan.to_i    96.000 i/100ms
        scan_integer   274.000 i/100ms
Calculating -------------------------------------
           scan.to_i    969.489 (± 0.2%) i/s    (1.03 ms/i) -      4.896k in   5.050114s
        scan_integer      2.756k (± 0.1%) i/s  (362.88 μs/i) -     13.974k in   5.070837s

Comparison:
        scan_integer:     2755.8 i/s
           scan.to_i:      969.5 i/s - 2.84x  slower
```

https://github.com/ruby/strscan/commit/c02b1ce684
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/117)

Profiling shows a lot of time spent in various encoding check functions.
I'm working on optimizing them on the Ruby side, but if we assume most
strings are one of the simple 3 encodings, we can skip a lot of
overhead.

```ruby
require 'strscan'
require 'benchmark/ips'

source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze

def scan_to_i(source)
  scanner = StringScanner.new(source)
  while number = scanner.scan(/\d+/)
    number.to_i
    scanner.skip(",")
  end
end

def scan_integer(source)
  scanner = StringScanner.new(source)
  while scanner.scan_integer
    scanner.skip(",")
  end
end

Benchmark.ips do |x|
  x.report("scan.to_i") { scan_to_i(source) }
  x.report("scan_integer") { scan_integer(source) }
  x.compare!
end
```

Before:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           scan.to_i    93.000 i/100ms
        scan_integer   232.000 i/100ms
Calculating -------------------------------------
           scan.to_i    933.191 (± 0.2%) i/s    (1.07 ms/i) -      4.743k in   5.082597s
        scan_integer      2.326k (± 0.8%) i/s  (429.99 μs/i) -     11.832k in   5.087974s

Comparison:
        scan_integer:     2325.6 i/s
           scan.to_i:      933.2 i/s - 2.49x  slower
```

After:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           scan.to_i    96.000 i/100ms
        scan_integer   274.000 i/100ms
Calculating -------------------------------------
           scan.to_i    969.489 (± 0.2%) i/s    (1.03 ms/i) -      4.896k in   5.050114s
        scan_integer      2.756k (± 0.1%) i/s  (362.88 μs/i) -     13.974k in   5.070837s

Comparison:
        scan_integer:     2755.8 i/s
           scan.to_i:      969.5 i/s - 2.84x  slower
```

https://github.com/ruby/strscan/commit/c02b1ce684
</pre>
</div>
</content>
</entry>
<entry>
<title>StringScanner#scan_integer support base 16 integers (#116)</title>
<updated>2024-12-02T01:50:34+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-27T08:31:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=79cc3d26ed3a038750988070d81912ece31c735b'/>
<id>79cc3d26ed3a038750988070d81912ece31c735b</id>
<content type='text'>
Followup: https://github.com/ruby/strscan/pull/115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.

Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Followup: https://github.com/ruby/strscan/pull/115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.

Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Implement #scan_integer to efficiently parse Integer</title>
<updated>2024-11-27T00:24:07+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-26T08:22:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d5de1a57893b16aff7bc3336b34fa2e9acefb3d2'/>
<id>d5de1a57893b16aff7bc3336b34fa2e9acefb3d2</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/115)

Fix: https://github.com/ruby/strscan/issues/113

This allows to directly parse an Integer from a String without needing
to first allocate a sub string.

Notes:

The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.

https://github.com/ruby/strscan/commit/6a3c74b4c8
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/115)

Fix: https://github.com/ruby/strscan/issues/113

This allows to directly parse an Integer from a String without needing
to first allocate a sub string.

Notes:

The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.

https://github.com/ruby/strscan/commit/6a3c74b4c8
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: Remove</title>
<updated>2024-10-26T09:44:15+00:00</updated>
<author>
<name>NAITOH Jun</name>
<email>naitoh@gmail.com</email>
</author>
<published>2024-10-16T00:59:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e73f35ddaf0510f5ce620340454cb69cd4228162'/>
<id>e73f35ddaf0510f5ce620340454cb69cd4228162</id>
<content type='text'>
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108)

- before: #106

## Why?

In `rb_strseq_index()`, the result of `rb_enc_check()` is used.

-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368
&gt; enc = rb_enc_check(str, sub);

&gt; return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);

-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
            const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
    const char *search_start = str_ptr;
    long pos, search_len = str_len - offset;

    for (;;) {
        const char *t;
        pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```

## Benchmark

It shows String as a pattern is 1.24x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.225M i/s -      9.328M times in 1.011068s (108.40ns/i)
          regexp_var     9.327M i/s -      9.413M times in 1.009214s (107.21ns/i)
              string     9.200M i/s -      9.355M times in 1.016840s (108.70ns/i)
          string_var    11.249M i/s -     11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
              regexp     9.565M i/s -     27.676M times in 2.893476s (104.55ns/i)
          regexp_var    10.111M i/s -     27.982M times in 2.767496s (98.90ns/i)
              string    10.060M i/s -     27.600M times in 2.743465s (99.40ns/i)
          string_var    12.519M i/s -     33.746M times in 2.695615s (79.88ns/i)

Comparison:
          string_var:  12518707.2 i/s
          regexp_var:  10111089.6 i/s - 1.24x  slower
              string:  10060144.4 i/s - 1.24x  slower
              regexp:   9565124.4 i/s - 1.31x  slower
```

https://github.com/ruby/strscan/commit/ff2d7afa19
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108)

- before: #106

## Why?

In `rb_strseq_index()`, the result of `rb_enc_check()` is used.

-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368
&gt; enc = rb_enc_check(str, sub);

&gt; return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);

-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
            const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
    const char *search_start = str_ptr;
    long pos, search_len = str_len - offset;

    for (;;) {
        const char *t;
        pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```

## Benchmark

It shows String as a pattern is 1.24x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.225M i/s -      9.328M times in 1.011068s (108.40ns/i)
          regexp_var     9.327M i/s -      9.413M times in 1.009214s (107.21ns/i)
              string     9.200M i/s -      9.355M times in 1.016840s (108.70ns/i)
          string_var    11.249M i/s -     11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
              regexp     9.565M i/s -     27.676M times in 2.893476s (104.55ns/i)
          regexp_var    10.111M i/s -     27.982M times in 2.767496s (98.90ns/i)
              string    10.060M i/s -     27.600M times in 2.743465s (99.40ns/i)
          string_var    12.519M i/s -     33.746M times in 2.695615s (79.88ns/i)

Comparison:
          string_var:  12518707.2 i/s
          regexp_var:  10111089.6 i/s - 1.24x  slower
              string:  10060144.4 i/s - 1.24x  slower
              regexp:   9565124.4 i/s - 1.31x  slower
```

https://github.com/ruby/strscan/commit/ff2d7afa19
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Use C90 as far as supporting 2.6 or earlier</title>
<updated>2024-10-26T09:44:15+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2024-10-01T21:28:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d6046bccb7bbfd7b1c5810da16a5c86ee22a19fc'/>
<id>d6046bccb7bbfd7b1c5810da16a5c86ee22a19fc</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/101)

https://github.com/ruby/strscan/commit/d31274f41b
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/101)

https://github.com/ruby/strscan/commit/d31274f41b
</pre>
</div>
</content>
</entry>
<entry>
<title>[ruby/strscan] Accept String as a pattern at non head</title>
<updated>2024-09-17T06:12:25+00:00</updated>
<author>
<name>NAITOH Jun</name>
<email>naitoh@gmail.com</email>
</author>
<published>2024-09-14T00:32:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d81b0588bb3c97167d1f7e2d2a74185e0c19b68c'/>
<id>d81b0588bb3c97167d1f7e2d2a74185e0c19b68c</id>
<content type='text'>
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736

---------

https://github.com/ruby/strscan/commit/f9d96c446a

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1736

---------

https://github.com/ruby/strscan/commit/f9d96c446a

Co-authored-by: Sutou Kouhei &lt;kou@clear-code.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Added pre-release suffix for development version of default gems</title>
<updated>2024-08-31T05:22:17+00:00</updated>
<author>
<name>Hiroshi SHIBATA</name>
<email>hsbt@ruby-lang.org</email>
</author>
<published>2024-08-31T05:19:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=32f134bb8541b21b941c49c68b5bf91cf62c97dc'/>
<id>32f134bb8541b21b941c49c68b5bf91cf62c97dc</id>
<content type='text'>
https://github.com/ruby/stringio/issues/81
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/ruby/stringio/issues/81
</pre>
</div>
</content>
</entry>
</feed>
