ruby.git/encoding.c, branch v3_4_9

string.c: Directly create strings with the correct encoding

2024-11-13T12:32:32+00:00

While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.

On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.

After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.

By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.

After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.

Move common code to `enc_compatible_latter`

2024-10-05T07:06:54+00:00

Fix corruption of internal encoding string

2024-06-27T18:06:40+00:00

[Bug #20598]

Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have
their strings corrupted when Encoding.default_internal is set to nil.

Co-authored-by: Matthew Valentine-House

Fix corruption of encoding name string

2024-06-27T13:47:22+00:00

[Bug #20595]

enc_set_default_encoding will free the C string if the encoding is nil,
but the C string can be used by the encoding name string. This will cause
the encoding name string to be corrupted.

Consider the following code:

    Encoding.default_internal = Encoding::ASCII_8BIT
    names = Encoding.default_internal.names
    p names
    Encoding.default_internal = nil
    p names

It outputs:

    ["ASCII-8BIT", "BINARY", "internal"]
    ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"]

Co-authored-by: Matthew Valentine-House

Add a hint of `ASCII-8BIT` being `BINARY`

2024-04-18T08:17:26+00:00

[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.

Refactor VM root modules

2024-03-06T20:33:43+00:00

This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm->mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.

Fix memory leak in setting encodings

2024-01-03T18:31:43+00:00

There is a memory leak in Encoding.default_external= and
Encoding.default_internal= because the duplicated name is not freed
when overwriting.

    10.times do
      1_000_000.times do
        Encoding.default_internal = nil
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

     25664
     41504
     57360
     73232
     89168
    105056
    120944
    136816
    152720
    168576

After:

    9648
    9648
    9648
    9680
    9680
    9680
    9680
    9680
    9680
    9680

Free everything at shutdown

2023-12-07T20:52:35+00:00

when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada 
Co-authored-by: Peter Zhu

Mark Encoding as Write Barrier protected

2023-02-07T10:48:57+00:00

It doesn't even have a mark function.
It's only about a hundred objects, but not reason
to scan them every time.

Remove Encoding#replicate

2023-01-11T12:41:41+00:00