diff options
| author | NAITOH Jun <naitoh@gmail.com> | 2024-10-16 09:59:44 +0900 |
|---|---|---|
| committer | Hiroshi SHIBATA <hsbt@ruby-lang.org> | 2024-10-26 18:44:15 +0900 |
| commit | e73f35ddaf0510f5ce620340454cb69cd4228162 (patch) | |
| tree | 44b13a3165485ce8f5aceaa4dd2a0b1b0012b378 | |
| parent | d6046bccb7bbfd7b1c5810da16a5c86ee22a19fc (diff) | |
[ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: Remove
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108)
- before: #106
## Why?
In `rb_strseq_index()`, the result of `rb_enc_check()` is used.
-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368
> enc = rb_enc_check(str, sub);
> return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);
-
https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
const char *search_start = str_ptr;
long pos, search_len = str_len - offset;
for (;;) {
const char *t;
pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```
## Benchmark
It shows String as a pattern is 1.24x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.225M i/s - 9.328M times in 1.011068s (108.40ns/i)
regexp_var 9.327M i/s - 9.413M times in 1.009214s (107.21ns/i)
string 9.200M i/s - 9.355M times in 1.016840s (108.70ns/i)
string_var 11.249M i/s - 11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
regexp 9.565M i/s - 27.676M times in 2.893476s (104.55ns/i)
regexp_var 10.111M i/s - 27.982M times in 2.767496s (98.90ns/i)
string 10.060M i/s - 27.600M times in 2.743465s (99.40ns/i)
string_var 12.519M i/s - 33.746M times in 2.695615s (79.88ns/i)
Comparison:
string_var: 12518707.2 i/s
regexp_var: 10111089.6 i/s - 1.24x slower
string: 10060144.4 i/s - 1.24x slower
regexp: 9565124.4 i/s - 1.31x slower
```
https://github.com/ruby/strscan/commit/ff2d7afa19
| -rw-r--r-- | ext/strscan/strscan.c | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/ext/strscan/strscan.c b/ext/strscan/strscan.c index 1da53d8620..e1559cb5c3 100644 --- a/ext/strscan/strscan.c +++ b/ext/strscan/strscan.c @@ -709,7 +709,7 @@ strscan_do_scan(VALUE self, VALUE pattern, int succptr, int getstr, int headonly } else { StringValue(pattern); - rb_enc_check(p->str, pattern); + rb_encoding *enc = rb_enc_check(p->str, pattern); if (S_RESTLEN(p) < RSTRING_LEN(pattern)) { return Qnil; } @@ -719,9 +719,10 @@ strscan_do_scan(VALUE self, VALUE pattern, int succptr, int getstr, int headonly return Qnil; } set_registers(p, RSTRING_LEN(pattern)); - } else { + } + else { long pos = rb_memsearch(RSTRING_PTR(pattern), RSTRING_LEN(pattern), - CURPTR(p), S_RESTLEN(p), rb_enc_get(pattern)); + CURPTR(p), S_RESTLEN(p), enc); if (pos == -1) { return Qnil; } |
