<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/internal/string.h, branch v4.0.4</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Add SHAPE_ID_HAS_IVAR_MASK for quick ivar check</title>
<updated>2025-06-13T17:46:29+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2025-06-13T09:23:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=a99d941cacbb9d5d277400abf76f5648f91009ea'/>
<id>a99d941cacbb9d5d277400abf76f5648f91009ea</id>
<content type='text'>
This allow checking if an object has ivars with just a shape_id
mask.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allow checking if an object has ivars with just a shape_id
mask.
</pre>
</div>
</content>
</entry>
<entry>
<title>Lock-free hash set for fstrings [Feature #21268]</title>
<updated>2025-04-18T04:03:54+00:00</updated>
<author>
<name>John Hawthorn</name>
<email>john@hawthorn.email</email>
</author>
<published>2025-03-07T00:16:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=57b6a7503f19a9ffb53b1c505c7e093d7a6e33a4'/>
<id>57b6a7503f19a9ffb53b1c505c7e093d7a6e33a4</id>
<content type='text'>
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
</pre>
</div>
</content>
</entry>
<entry>
<title>Extract rb_gc_free_fstring to string.c</title>
<updated>2025-04-18T04:03:54+00:00</updated>
<author>
<name>John Hawthorn</name>
<email>john@hawthorn.email</email>
</author>
<published>2025-03-11T06:49:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=89199a47db25a89cbce05f1be6685e96ded32db5'/>
<id>89199a47db25a89cbce05f1be6685e96ded32db5</id>
<content type='text'>
This allows more flexibility in how we deal with the fstring table
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows more flexibility in how we deal with the fstring table
</pre>
</div>
</content>
</entry>
<entry>
<title>Freeze $/ and make it ractor safe</title>
<updated>2025-03-27T16:54:56+00:00</updated>
<author>
<name>Étienne Barrié</name>
<email>etienne.barrie@gmail.com</email>
</author>
<published>2025-03-24T11:17:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6ecfe643b5d8d64682c6f6bce5b27db5c007331d'/>
<id>6ecfe643b5d8d64682c6f6bce5b27db5c007331d</id>
<content type='text'>
[Feature #21109]

By always freezing when setting the global rb_rs variable, we can ensure
it is not modified and can be accessed from a ractor.

We're also making sure it's an instance of String and does not have any
instance variables.

Of course, if $/ is changed at runtime, it may cause surprising behavior
but doing so is deprecated already anyway.

Co-authored-by: Jean Boussier &lt;jean.boussier@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Feature #21109]

By always freezing when setting the global rb_rs variable, we can ensure
it is not modified and can be accessed from a ractor.

We're also making sure it's an instance of String and does not have any
instance variables.

Of course, if $/ is changed at runtime, it may cause surprising behavior
but doing so is deprecated already anyway.

Co-authored-by: Jean Boussier &lt;jean.boussier@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069)</title>
<updated>2024-11-13T17:25:09+00:00</updated>
<author>
<name>Randy Stauner</name>
<email>randy.stauner@shopify.com</email>
</author>
<published>2024-11-13T17:25:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=beafae97505f9def3967e958bb1f7bc7fd7b9a7a'/>
<id>beafae97505f9def3967e958bb1f7bc7fd7b9a7a</id>
<content type='text'>
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments

String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku

This speeds up these benchmarks by 1-2%.

* YJIT: Try harder to get type info for `String#[]`

In the large generated code of the mail gem the context doesn't have
the type info.  In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;
Co-authored-by: Takashi Kokubun (k0kubun) &lt;takashikkbn@gmail.com&gt;

---------

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;
Co-authored-by: Takashi Kokubun (k0kubun) &lt;takashikkbn@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments

String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku

This speeds up these benchmarks by 1-2%.

* YJIT: Try harder to get type info for `String#[]`

In the large generated code of the mail gem the context doesn't have
the type info.  In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;
Co-authored-by: Takashi Kokubun (k0kubun) &lt;takashikkbn@gmail.com&gt;

---------

Co-authored-by: Maxime Chevalier-Boisvert &lt;maxime.chevalierboisvert@shopify.com&gt;
Co-authored-by: Takashi Kokubun (k0kubun) &lt;takashikkbn@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Mark strings returned by Symbol#to_s as chilled  (#12065)</title>
<updated>2024-11-13T14:20:00+00:00</updated>
<author>
<name>Jean byroot Boussier</name>
<email>jean.boussier+github@shopify.com</email>
</author>
<published>2024-11-13T14:20:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6deeec5d459ecff5ec4628523b14ac7379fd942e'/>
<id>6deeec5d459ecff5ec4628523b14ac7379fd942e</id>
<content type='text'>
* Use FL_USER0 for ELTS_SHARED

This makes space in RString for two bits for chilled strings.

* Mark strings returned by `Symbol#to_s` as chilled

[Feature #20350]

`STR_CHILLED` now spans on two user flags. If one bit is set it
marks a chilled string literal, if it's the other it marks a
`Symbol#to_s` chilled string.

Since it's not possible, and doesn't make much sense to include
debug info when `--debug-frozen-string-literal` is set, we can't
include allocation source, but we can safely include the symbol
name in the warning message, making it much easier to find the source
of the issue.

Co-Authored-By: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;

---------

Co-authored-by: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;
Co-authored-by: Jean Boussier &lt;jean.boussier@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Use FL_USER0 for ELTS_SHARED

This makes space in RString for two bits for chilled strings.

* Mark strings returned by `Symbol#to_s` as chilled

[Feature #20350]

`STR_CHILLED` now spans on two user flags. If one bit is set it
marks a chilled string literal, if it's the other it marks a
`Symbol#to_s` chilled string.

Since it's not possible, and doesn't make much sense to include
debug info when `--debug-frozen-string-literal` is set, we can't
include allocation source, but we can safely include the symbol
name in the warning message, making it much easier to find the source
of the issue.

Co-Authored-By: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;

---------

Co-authored-by: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;
Co-authored-by: Jean Boussier &lt;jean.boussier@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Show where mutated chilled strings were allocated</title>
<updated>2024-10-21T10:33:02+00:00</updated>
<author>
<name>Étienne Barrié</name>
<email>etienne.barrie@gmail.com</email>
</author>
<published>2024-10-14T09:28:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=257f78fb671151f1db06dcd8e35cf4cc736f735e'/>
<id>257f78fb671151f1db06dcd8e35cf4cc736f735e</id>
<content type='text'>
[Feature #20205]

The warning now suggests running with --debug-frozen-string-literal:

```
test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```

When using --debug-frozen-string-literal, the location where the string
was created is shown:

```
test.rb:3: warning: literal string will be frozen in the future
test.rb:1: info: the string was created here
```

When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW.
When mutating chilled strings and deprecation warnings are not enabled,
the overhead is a simple warning category enabled check.

Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Feature #20205]

The warning now suggests running with --debug-frozen-string-literal:

```
test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```

When using --debug-frozen-string-literal, the location where the string
was created is shown:

```
test.rb:3: warning: literal string will be frozen in the future
test.rb:1: info: the string was created here
```

When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW.
When mutating chilled strings and deprecation warnings are not enabled,
the overhead is a simple warning category enabled check.

Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Don't export unnecessary string functions</title>
<updated>2024-09-16T18:38:49+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2024-09-16T16:29:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1e53e46275e2f49a711ff90adddc804d11a347b1'/>
<id>1e53e46275e2f49a711ff90adddc804d11a347b1</id>
<content type='text'>
These functions are not used publicly, so we don't need to export them.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These functions are not used publicly, so we don't need to export them.
</pre>
</div>
</content>
</entry>
<entry>
<title>Precompute embedded string literals hash code</title>
<updated>2024-05-28T05:32:41+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>byroot@ruby-lang.org</email>
</author>
<published>2024-04-08T10:04:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=9e9f1d9301b05604d475573ddd18d6bf5185466c'/>
<id>9e9f1d9301b05604d475573ddd18d6bf5185466c</id>
<content type='text'>
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié &lt;etienne.barrie@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Stop marking chilled strings as frozen</title>
<updated>2024-05-28T05:32:33+00:00</updated>
<author>
<name>Étienne Barrié</name>
<email>etienne.barrie@gmail.com</email>
</author>
<published>2024-05-27T09:22:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=1376881e9afe6ff673f64afa791cf30f57147ee2'/>
<id>1376881e9afe6ff673f64afa791cf30f57147ee2</id>
<content type='text'>
They were initially made frozen to avoid false positives for cases such
as:

    str = str.dup if str.frozen?

But this may cause bugs and is generally confusing for users.

[Feature #20205]

Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
They were initially made frozen to avoid false positives for cases such
as:

    str = str.dup if str.frozen?

But this may cause bugs and is generally confusing for users.

[Feature #20205]

Co-authored-by: Jean Boussier &lt;byroot@ruby-lang.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
