ruby.git/test/ruby/test_string.rb, branch v3_4_9

Don't modify fstrings in rb_str_tmp_frozen_no_embed_acquire (#15104)

2025-11-08T03:44:56+00:00

[Bug #21671]

merge revision(s) fa85d23ff4a02985ebfe0716b0ff768f5b4fe13d: [Backport #21380]

2025-07-14T21:23:45+00:00

	[Bug #21380] Prohibit modification in String#split block

	Reported at https://hackerone.com/reports/3163876

Many of Oniguruma functions need valid encoding strings

2024-11-26T02:46:34+00:00

Check negative integer underflow

2024-11-26T02:46:34+00:00

Rename size_pool -> heap

2024-10-03T20:20:09+00:00

Now that we've inlined the eden_heap into the size_pool, we should
rename the size_pool to heap. So that Ruby contains multiple heaps, with
different sized objects.

The term heap as a collection of memory pages is more in memory
management nomenclature, whereas size_pool was a name chosen out of
necessity during the development of the Variable Width Allocation
features of Ruby.

The concept of size pools was introduced in order to facilitate
different sized objects (other than the default 40 bytes). They wrapped
the eden heap and the tomb heap, and some related state, and provided a
reasonably simple way of duplicating all related concerns, to provide
multiple pools that all shared the same structure but held different
objects.

Since then various changes have happend in Ruby's memory layout:

* The concept of tomb heaps has been replaced by a global free pages list,
  with each page having it's slot size reconfigured at the point when it
  is resurrected
* the eden heap has been inlined into the size pool itself, so that now
  the size pool directly controls the free_pages list, the sweeping
  page, the compaction cursor and the other state that was previously
  being managed by the eden heap.

Now that there is no need for a heap wrapper, we should refer to the
collection of pages containing Ruby objects as a heap again rather than
a size pool

Implement String#append_as_bytes(String | Integer, ...)

2024-09-09T13:04:51+00:00

[Feature #20594]

A handy method to construct a string out of multiple chunks.

Contrary to `String#concat`, it doesn't do any encoding negociation,
and simply append the content as bytes regardless of whether this
result in a broken string or not.

It's the caller responsibility to check for `String#valid_encoding?`
in cases where it's needed.

When passed integers, only the lower byte is considered, like in
`String#setbyte`.

Fix memory leak in String#start_with? when regexp times out

2024-07-26T12:42:38+00:00

[Bug #20653]

This commit refactors how Onigmo handles timeout. Instead of raising a
timeout error, onig_search will return a ONIGERR_TIMEOUT which the
caller can free memory, and then raise a timeout error.

This fixes a memory leak in String#start_with when the regexp times out.
For example:

    regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        str.start_with?(regex)
      rescue
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    33216
    51936
    71152
    81728
    97152
    103248
    120384
    133392
    133520
    133616

After:

    14912
    15376
    15824
    15824
    16128
    16128
    16144
    16144
    16160
    16160

Stop marking chilled strings as frozen

2024-05-28T05:32:33+00:00

They were initially made frozen to avoid false positives for cases such
as:

    str = str.dup if str.frozen?

But this may cause bugs and is generally confusing for users.

[Feature #20205]

Co-authored-by: Jean Boussier

test_uplus_minus: Use a different string literal

2024-04-17T08:35:14+00:00

This test fail relatively frequently and it's unclear what is
happening.

```
str: {"address":"0x7fbdeb26d4e0", "type":"STRING", "shape_id":1, "slot_size":40, "class":"0x7fbdd1e0ec50", "frozen":true, "embedded":true, "fstring":true, "bytesize":3, "value":"bar", "encoding":"UTF-8", "coderange":"7bit", "memsize":40, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
bar: {"address":"0x7fbdd0a8b138", "type":"STRING", "shape_id":1, "slot_size":40, "class":"0x7fbdd1e0ec50", "frozen":true, "embedded":true, "fstring":true, "bytesize":3, "value":"bar", "encoding":"UTF-8", "coderange":"7bit", "memsize":40, "flags":{"wb_protected":true}}
```

The `"bar".freeze` literal correctly put an old-gen fstring on the stack.
But `-%w(b a r).join('')` returns a young-gen fstring, which suggest it
somehow failed to find the old one in the `frozen_strings` table.

This could be caused by another test corrupting the table, or corrupting
the `"bar"` fstring.

By using a different literal value we can learn whether the bug is specific
to `"bar"` (used in many tests) or more general.

Include more debug information in test_uplus_minus

2024-04-15T12:56:33+00:00