<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/encoding.c, branch v3_4_9</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>string.c: Directly create strings with the correct encoding</title>
<updated>2024-11-13T12:32:32+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>jean.boussier@gmail.com</email>
</author>
<published>2024-11-13T08:34:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=fae86a701edf9afef6b05199fe8f6651b1e155ea'/>
<id>fae86a701edf9afef6b05199fe8f6651b1e155ea</id>
<content type='text'>
While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.

On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.

After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.

By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.

After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.

On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.

After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.

By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.

After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.
</pre>
</div>
</content>
</entry>
<entry>
<title>Move common code to `enc_compatible_latter`</title>
<updated>2024-10-05T07:06:54+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2024-10-05T07:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=4b065bbe2b3e2c5ff747cb27bb1763aa040453a3'/>
<id>4b065bbe2b3e2c5ff747cb27bb1763aa040453a3</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix corruption of internal encoding string</title>
<updated>2024-06-27T18:06:40+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2024-06-27T18:02:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=176c4bb3c7db87ca5b0486012cb6a005105448c5'/>
<id>176c4bb3c7db87ca5b0486012cb6a005105448c5</id>
<content type='text'>
[Bug #20598]

Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have
their strings corrupted when Encoding.default_internal is set to nil.

Co-authored-by: Matthew Valentine-House &lt;matt@eightbitraptor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Bug #20598]

Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have
their strings corrupted when Encoding.default_internal is set to nil.

Co-authored-by: Matthew Valentine-House &lt;matt@eightbitraptor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix corruption of encoding name string</title>
<updated>2024-06-27T13:47:22+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2024-06-26T20:12:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=c6a0d03649c686a537c1f513a1e32205ac6a6512'/>
<id>c6a0d03649c686a537c1f513a1e32205ac6a6512</id>
<content type='text'>
[Bug #20595]

enc_set_default_encoding will free the C string if the encoding is nil,
but the C string can be used by the encoding name string. This will cause
the encoding name string to be corrupted.

Consider the following code:

    Encoding.default_internal = Encoding::ASCII_8BIT
    names = Encoding.default_internal.names
    p names
    Encoding.default_internal = nil
    p names

It outputs:

    ["ASCII-8BIT", "BINARY", "internal"]
    ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"]

Co-authored-by: Matthew Valentine-House &lt;matt@eightbitraptor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Bug #20595]

enc_set_default_encoding will free the C string if the encoding is nil,
but the C string can be used by the encoding name string. This will cause
the encoding name string to be corrupted.

Consider the following code:

    Encoding.default_internal = Encoding::ASCII_8BIT
    names = Encoding.default_internal.names
    p names
    Encoding.default_internal = nil
    p names

It outputs:

    ["ASCII-8BIT", "BINARY", "internal"]
    ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"]

Co-authored-by: Matthew Valentine-House &lt;matt@eightbitraptor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add a hint of `ASCII-8BIT` being `BINARY`</title>
<updated>2024-04-18T08:17:26+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>byroot@ruby-lang.org</email>
</author>
<published>2024-02-19T12:35:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3a7846b1aa4c10d86dc5a91c6df94f89d60bb0c3'/>
<id>3a7846b1aa4c10d86dc5a91c6df94f89d60bb0c3</id>
<content type='text'>
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
</pre>
</div>
</content>
</entry>
<entry>
<title>Refactor VM root modules</title>
<updated>2024-03-06T20:33:43+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>byroot@ruby-lang.org</email>
</author>
<published>2024-03-03T09:46:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=d4f3dcf4dff80ded472ba3061e62c6c676ffab8c'/>
<id>d4f3dcf4dff80ded472ba3061e62c6c676ffab8c</id>
<content type='text'>
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm-&gt;mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm-&gt;mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm-&gt;mark_object_ary` as it's just wasted space.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm-&gt;mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm-&gt;mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm-&gt;mark_object_ary` as it's just wasted space.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix memory leak in setting encodings</title>
<updated>2024-01-03T18:31:43+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2024-01-03T16:15:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=c7ce2f537f96ab2cf2f5fc2982d6147866ff5340'/>
<id>c7ce2f537f96ab2cf2f5fc2982d6147866ff5340</id>
<content type='text'>
There is a memory leak in Encoding.default_external= and
Encoding.default_internal= because the duplicated name is not freed
when overwriting.

    10.times do
      1_000_000.times do
        Encoding.default_internal = nil
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

     25664
     41504
     57360
     73232
     89168
    105056
    120944
    136816
    152720
    168576

After:

    9648
    9648
    9648
    9680
    9680
    9680
    9680
    9680
    9680
    9680
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a memory leak in Encoding.default_external= and
Encoding.default_internal= because the duplicated name is not freed
when overwriting.

    10.times do
      1_000_000.times do
        Encoding.default_internal = nil
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

     25664
     41504
     57360
     73232
     89168
    105056
    120944
    136816
    152720
    168576

After:

    9648
    9648
    9648
    9680
    9680
    9680
    9680
    9680
    9680
    9680
</pre>
</div>
</content>
</entry>
<entry>
<title>Free everything at shutdown</title>
<updated>2023-12-07T20:52:35+00:00</updated>
<author>
<name>Adam Hess</name>
<email>adamhess1991@gmail.com</email>
</author>
<published>2023-10-12T18:15:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6816e8efcff3be75f8020cd1b0ea57d3cd664bbc'/>
<id>6816e8efcff3be75f8020cd1b0ea57d3cd664bbc</id>
<content type='text'>
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
Co-authored-by: Peter Zhu &lt;peter@peterzhu.ca&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada &lt;nobu@ruby-lang.org&gt;
Co-authored-by: Peter Zhu &lt;peter@peterzhu.ca&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Mark Encoding as Write Barrier protected</title>
<updated>2023-02-07T10:48:57+00:00</updated>
<author>
<name>Jean Boussier</name>
<email>byroot@ruby-lang.org</email>
</author>
<published>2023-02-06T22:18:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=60c924770d6f0ce05c04c8c0a60a9bf23c79d85f'/>
<id>60c924770d6f0ce05c04c8c0a60a9bf23c79d85f</id>
<content type='text'>
It doesn't even have a mark function.
It's only about a hundred objects, but not reason
to scan them every time.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It doesn't even have a mark function.
It's only about a hundred objects, but not reason
to scan them every time.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove Encoding#replicate</title>
<updated>2023-01-11T12:41:41+00:00</updated>
<author>
<name>Benoit Daloze</name>
<email>eregontp@gmail.com</email>
</author>
<published>2023-01-06T14:07:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6abe20e87b74a5a672dc59f72fa1f550ceab430c'/>
<id>6abe20e87b74a5a672dc59f72fa1f550ceab430c</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
