ruby.git/benchmark, branch master

pathname: Add simple benchmarks

2026-05-13T08:37:48+00:00

Speed up Integer#to_s with a two digit lookup table (#16719)

2026-05-08T17:10:12+00:00

* numeric: emit two decimal digits per iteration in rb_fix2str

Replace the digit-at-a-time loop in rb_fix2str with the standard
itoa 2-digit lookup table for base 10.  Each iteration now
writes two digits using a single (u % 100, u / 100) pair, so the
number of loop iterations is halved for multi-digit integers.
The classic per-digit loop is kept for non-base-10 conversion.

Benchmark (Apple M-series, 5M-10M ops, best of 3 runs):

  case            base       patch      delta
  ---------       -----      -----      -----
  1-digit   (5)   64 ns/op   64 ns/op    -0%
  2-digit   (42)  64 ns/op   65 ns/op    +2%  (noise)
  3-digit   (400) 66 ns/op   64 ns/op    -3%
  5-digit   (12345)          69 ns/op   67 ns/op    -3%
  10-digit  (1234567890)     77 ns/op   67 ns/op   -13%
  19-digit  (2^62-1)        111 ns/op   75 ns/op   -33%

The crossover is at ~3 digits: below that the constant setup
dominates and the benefit is within noise, above that the halved
iteration count shows up linearly.  Typical Rails payloads mix
short IDs (1-5 digits) and longer values (timestamps, nanos,
large counts), so the win is workload-dependent but strictly
non-negative for real code.

Correctness: 100k random fuzz across the full fixnum range plus
targeted edges (0, ±1, ±99, ±100, 2^30-1, 2^62-1, etc.) all pass.
make test-all shows 34694 tests, 7325860 assertions, 0 new
failures (same pre-existing TestArgf#test_puts flake as on
master) — test_integer.rb alone runs 38 tests / 421628 assertions
of which Integer#to_s exercises the bulk, all pass.

The 200-byte lookup table sits in .rodata and fits in a single
cache line of its own (3 lines for the whole table).  No change
to public API, no change to bignum conversion, no change to
non-base-10 conversion paths.

* bignum: emit two decimal digits per iteration in big2str_2bdigits

Extend the 2-digit lookup-table itoa optimisation from rb_fix2str to
the inner conversion loop used by Bignum#to_s.  big2str_2bdigits has
two code paths — a leading-chunk path that emits variable-length
digits, and a recursive-chunk path that emits a fixed-width zero-
padded block — and both gain from the halved division count.  The
classic per-digit loop is preserved for non-base-10 conversion.

Moves the ruby_decimal_digit_pairs table from a file-static in
numeric.c to bignum.c next to ruby_digitmap, and exposes it through
internal/bignum.h so both files share the same 200-byte .rodata
instance.

Benchmark (Apple M-series, best of 3 runs, measures bignum-only
speedup against the preceding fixnum commit):

  case            base      patch     delta
  ---------       -----     -----     -----
  big_20dig   10^19+...  146 ns/op 124 ns/op  -15%
  big_40dig   10^39+...  174 ns/op 152 ns/op  -13%
  big_100dig  10^99+42   236 ns/op 213 ns/op  -10%
  big_500dig  10^499+7  1119 ns/op 1086 ns/op  -3%
  big_1000dig 10^999    3490 ns/op 3459 ns/op  -1%
  fix_19dig   2^62-1      76 ns/op   76 ns/op   0% (unchanged path)

Wins concentrate in the 20-100 digit range where big2str_2bdigits
is the dominant cost.  Above ~500 digits the Karatsuba divmod
recursion dominates and the digit-emission saving shrinks to the
noise floor.  The 20-100 range is what actual Ruby code exercises
(financial high-precision sums, nanosecond timestamps, large
counters); crypto-size (1000+ digit) bignums are rare in to_s paths.

Correctness: 100k random fixnum fuzz unchanged, 500 random bignum
fuzz up to 2^256 with cross-check against sprintf("%d"), bases
2/8/16/36 round-trip, plus edge cases (0, just-above-fixnum, ±2^100,
20-digit strings near the fixnum boundary).  test/ruby/test_integer.rb
stays at 38 tests / 421628 assertions / 0 failures, test_bignum.rb
passes 74 / 607 / 0 failures, full make test-all reports 34694
tests / 0 new failures (same TestArgf#test_puts pre-existing flake
as master).

* benchmark: add int_to_s yaml for Integer#to_s

Reproducible benchmark for the two preceding commits.  Covers:

- 1/2/3/5/10/19-digit positive fixnums (spans the break-even point
  and the two large-number wins at the top)
- A negative fixnum (exercises the minus-sign prepend path)
- 20/40/100-digit bignums (spans the big2str_2bdigits win range)
- Two string-interpolation scenarios, so reviewers can see how much
  of the Integer#to_s speedup reaches real code that allocates the
  result string too

Intended to be consumed by benchmark-driver against master vs
int-to-s-twodigit for A/B comparison.  Matches the numbers in the
commit messages of 5bfb7e02a2 and c5df6de835.

---------

Co-authored-by: tomoya ishida

ZJIT: Add HIR tests and benchmarks for numeric predicate annotations

2026-04-14T21:04:52+00:00

Add snapshot tests verifying correct HIR generation for each annotated
method:
- Float cfuncs (nan?, finite?, infinite?) emit CCall with BoolExact
  or Fixnum|NilClass return type
- Integer builtins (zero?, even?, odd?) emit InvokeBuiltin with
  BoolExact return type
- Float builtins (zero?, positive?, negative?) emit InvokeBuiltin
  with BoolExact return type

Add benchmark yml files for Float and Integer predicates to the
benchmark/ directory.

dir.c: cache and revalidate working directory

2026-04-10T09:49:58+00:00

`rb_dir_getwd_ospath()` is called quite frequently, but
the overwhelming majority of the time, the current directory
didn't change.

We can also assume that most of the time, `PATH_MAX` is enough
for `getcwd`, hence we can first attempt to use a small stack
buffer rather than always allocate on the heap.

This way we can keep the last `pwd` and revalidate it with no
allocation.

On macOS syscalls are fairly slow, so the gain isn't very large.

macOS:

```
compare-ruby: ruby 4.1.0dev (2026-04-09T05:19:02Z master c091c186e4) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-04-09T06:37:20Z get-cwd-cache ea02126d79) +PRISM [arm64-darwin25]
```

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|Dir.pwd  |    105.183k|  113.420k|
|         |           -|     1.08x|
```

Linux (inside virtualized Docker)

```
compare-ruby: ruby 4.1.0dev (2026-04-07T08:26:25Z master fcd210086c) +PRISM [aarch64-linux]
built-ruby: ruby 4.1.0dev (2026-04-09T06:38:09Z get-cwd-cache 6774af9ba7) +PRISM [aarch64-linux]
```

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|Dir.pwd  |      4.157M|    5.541M|
|         |           -|     1.33x|

Add single byte fast path to `File.expand_path`

2026-04-10T06:37:12+00:00

Similar to previous changes to `File.join` & al.

```
compare-ruby: ruby 4.1.0dev (2026-04-09T12:24:09Z master c919778017) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-04-09T17:42:24Z expand-path-mbenc 4fd047f73c) +PRISM [arm64-darwin25]
```

|             |compare-ruby|built-ruby|
|:------------|-----------:|---------:|
|expand_path  |    719.828k|    1.922M|
|             |           -|     2.67x|

Speed up memmem on Apple

2026-03-13T13:15:55+00:00

Apple's libc implementation of memmem is super slow (it is a forked
version of freebsd's that never got vectorized). Instead, we should
fall back to the rolling hash on Apple. In the attached benchmark,
I'm seeing 1.07% slower to 30.34% slower, depending on the
haystack.

For reference, here are the various implementations I checked:

* musl: https://git.musl-libc.org/cgit/musl/tree/src/string/memmem.c
* freebsd: https://github.com/freebsd/freebsd-src/blob/main/lib/libc/string/memmem.c
* apple: https://github.com/apple-oss-distributions/Libc/blob/main/string/FreeBSD/memmem.c

You can see Apple just linearly searches through the string and
calls memcmp each time, whereas the other two do a window'd rolling
hash similar to the fallback Ruby already has.

Look up slot sizes for allocations in a table

2026-03-09T15:09:53+00:00

Also remove BASE_SLOT_SIZE.

Optimize File.basename

2026-01-21T10:23:01+00:00

The actual algorithm is largely unchanged, just allowed to use
singlebyte checks for common encodings.

It could certainly be optimized much further, as here again it often
scans from the front of the string when we're interested in the back of
it. But the algorithm as many Windows only corner cases so I'd rather
ship a good improvement now and eventually come back to it later.

Most of improvement here is from the reduced setup cost (avodi double
null checks, avoid duping the argument, etc), and skipping the multi-byte
checks.

```
compare-ruby: ruby 4.1.0dev (2026-01-19T03:51:30Z master 631bf19b37) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-21T08:21:05Z opt-basename 7eb11745b2) +PRISM [arm64-darwin25]
```

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|long       |      3.412M|   18.158M|
|           |           -|     5.32x|
|long_name  |      1.981M|    8.580M|
|           |           -|     4.33x|
|withext    |      3.200M|   12.986M|
|           |           -|     4.06x|

Optimize `File.extname` for common encodings

2026-01-20T08:58:51+00:00

Similar optimizations to the ones performed in GH-15907.

- Skip the expensive multi-byte encoding handling for the common
  encodings that are known to be safe.
- Use `CheckPath` to save on copying the argument and only scan it for
  NULL bytes once.
- Create the return string with rb_enc_str_new instead of rb_str_subseq
  as it's going to be a very small string anyway.

This could be optimized a little bit further by searching for both `.` and `dirsep`
in one pass,

```
compare-ruby: ruby 4.1.0dev (2026-01-19T03:51:30Z master 631bf19b37) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-20T07:33:42Z master 6fb50434e3) +PRISM [arm64-darwin25]
```

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|long       |      3.606M|   22.229M|
|           |           -|     6.17x|
|long_name  |      2.254M|   13.416M|
|           |           -|     5.95x|
|short      |     16.488M|   29.969M|
|           |           -|     1.82x|

file.c: dirname_n also use strrdirsep when n > 1

2026-01-20T07:33:42+00:00

It's both simpler and faster.

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|long   |      3.960M|   24.072M|
|       |           -|     6.08x|
|short  |     15.417M|   29.841M|
|       |           -|     1.94x|
|n_4    |      3.858M|   18.415M|
|       |           -|     4.77x|