<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/test/ruby/test_gc_compact.rb, branch v3_3_11</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Add a fudge factor to the GC compaction move up/down tests</title>
<updated>2023-12-09T20:49:51+00:00</updated>
<author>
<name>KJ Tsanaktsidis</name>
<email>kj@kjtsanaktsidis.id.au</email>
</author>
<published>2023-12-08T23:55:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=c0b6ea7c8b5dc6e48ecf6e14e1dbd135d079f0fc'/>
<id>c0b6ea7c8b5dc6e48ecf6e14e1dbd135d079f0fc</id>
<content type='text'>
There seems to be another manifestation of bug #20021, where some of the
compaction tests are failing on i686 for unrelated PR's because of fake
"live" references to moved objects on the machine stack.

We _could_ solve this by counting how many objects are pinned during
compaction, but doing that involves pushing down the mark &amp; pin bitset
merge into gc_compact_plane and out of gc_compact_page, which I thought
was pretty ugly.

Now that we've solved bug #20022 though, we're able to compact
arbitrarily many objects with GC.verify_compaction_references, so the
number of objects we're moving is now 50,000 instead of 500. Since
that's now much larger than the number of objects likely to be pinned, I
think it's safe enough to just add a fudge-factor to the tests.

Any _other_ change in GC.verify_compaction_references that breaks
compaction is now highly likely to break the assertion by more than 10
objects, since it's operating on so many more in the first place.

[Bug #20021]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There seems to be another manifestation of bug #20021, where some of the
compaction tests are failing on i686 for unrelated PR's because of fake
"live" references to moved objects on the machine stack.

We _could_ solve this by counting how many objects are pinned during
compaction, but doing that involves pushing down the mark &amp; pin bitset
merge into gc_compact_plane and out of gc_compact_page, which I thought
was pretty ugly.

Now that we've solved bug #20022 though, we're able to compact
arbitrarily many objects with GC.verify_compaction_references, so the
number of objects we're moving is now 50,000 instead of 500. Since
that's now much larger than the number of objects likely to be pinned, I
think it's safe enough to just add a fudge-factor to the tests.

Any _other_ change in GC.verify_compaction_references that breaks
compaction is now highly likely to break the assertion by more than 10
objects, since it's operating on so many more in the first place.

[Bug #20021]
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix GC.verify_compaction_references not moving every object</title>
<updated>2023-12-07T15:19:35+00:00</updated>
<author>
<name>KJ Tsanaktsidis</name>
<email>kj@kjtsanaktsidis.id.au</email>
</author>
<published>2023-11-27T05:50:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=cbc0e0bef08f9389f5bbe76de016795f01c3bc76'/>
<id>cbc0e0bef08f9389f5bbe76de016795f01c3bc76</id>
<content type='text'>
The intention of GC.verify_compaction_references is, I believe, to force
every single movable object to be moved, so that it's possible to debug
native extensions which not correctly updating their references to
objects they mark as movable.

To do this, it doubles the number of allocated pages for each size pool,
and sorts the heap pages so that the free ones are swept first; thus,
every object in an old page should be moved into a free slot in one of
the new pages.

This worked fine until movement of objects _between_ size pools during
compaction was implemented. That causes some problems for
verify_compaction_references:

* We were doubling the number of pages in each size pool, but actually
  if some objects need to move into a _different_ pool, there's no
  guarantee that they'll be enough room in that one.
* It's possible for the sweep &amp; compact cursors to meet in one size pool
  before all the objects that want to move into that size pool from
  another are processed by the compaction.

You can see these problems by changing some of the movement tests in
test_gc_compact.rb to try and move e.g. 50,000 objects instead of
500; the test is not able to actually move all of the objects in a
single compaction run.

To fix this, we do two things in verify_compaction_references:

* Firstly, we add enough pages to every size pool to make them the same
  size. This ensures that their compact cursors will all have space to
  move during compaction (even if that means empty pages are
  pointlessly compacted)
* Then, we examine every object and determine where it _wants_ to be
  compacted into. We use this information to add additional pages to
  each size pool to handle all objects which should live there.

With these two changes, we can move arbitrary amounts of objects into
the correct size pool in a single call to verify_compaction_references.

My _motivation_ for performing this work was to try and fix some test
stability issues in test_gc_compact.rb. I now no longer think that we
actually see this particular bug in rubyci.org, but I also think
verify_compaction_references should do what it says on the tin, so it's
worth keeping.

[Bug #20022]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The intention of GC.verify_compaction_references is, I believe, to force
every single movable object to be moved, so that it's possible to debug
native extensions which not correctly updating their references to
objects they mark as movable.

To do this, it doubles the number of allocated pages for each size pool,
and sorts the heap pages so that the free ones are swept first; thus,
every object in an old page should be moved into a free slot in one of
the new pages.

This worked fine until movement of objects _between_ size pools during
compaction was implemented. That causes some problems for
verify_compaction_references:

* We were doubling the number of pages in each size pool, but actually
  if some objects need to move into a _different_ pool, there's no
  guarantee that they'll be enough room in that one.
* It's possible for the sweep &amp; compact cursors to meet in one size pool
  before all the objects that want to move into that size pool from
  another are processed by the compaction.

You can see these problems by changing some of the movement tests in
test_gc_compact.rb to try and move e.g. 50,000 objects instead of
500; the test is not able to actually move all of the objects in a
single compaction run.

To fix this, we do two things in verify_compaction_references:

* Firstly, we add enough pages to every size pool to make them the same
  size. This ensures that their compact cursors will all have space to
  move during compaction (even if that means empty pages are
  pointlessly compacted)
* Then, we examine every object and determine where it _wants_ to be
  compacted into. We use this information to add additional pages to
  each size pool to handle all objects which should live there.

With these two changes, we can move arbitrary amounts of objects into
the correct size pool in a single call to verify_compaction_references.

My _motivation_ for performing this work was to try and fix some test
stability issues in test_gc_compact.rb. I now no longer think that we
actually see this particular bug in rubyci.org, but I also think
verify_compaction_references should do what it says on the tin, so it's
worth keeping.

[Bug #20022]
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix flaky "Expected 499 to be &gt;= 500" assertion in test_gc_compact.rb</title>
<updated>2023-11-27T16:02:11+00:00</updated>
<author>
<name>KJ Tsanaktsidis</name>
<email>kj@kjtsanaktsidis.id.au</email>
</author>
<published>2023-11-27T06:33:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=8427a8a655e2a04bfdc6a645ec967405d3617137'/>
<id>8427a8a655e2a04bfdc6a645ec967405d3617137</id>
<content type='text'>
There have been some sproradically flaky tests related to GC compaction,
which fail with:

  1) Failure:
TestGCCompact#test_moving_hashes_down_size_pools [/test/ruby/test_gc_compact.rb:442]:
Expected 499 to be &gt;= 500.

What's happening here, is that, _sometimes_, depending on very unlucky
combinations of machine things, one of the expected-to-be-moved hashes
might be found on the machine stack during GC, and thus pinned.

One factor which seems to make this _more_ likely is that GCC 11 on
Ubuntu 22.04 seems to want to allocate 440 bytes of stack space for
`gc_start`, which is much more than it actually uses on the common code
path. The result is that there are some 50-odd VALUE-sized cells "live"
on the stack which may well contain valid heap pointers from previous
function calls, and will need to be pinned.

This is, of course, totally normal and expected; Ruby's GC is
conservative and if there is the possibility that a VALUE might be live
on the machine stack, it can't be moved. However, it does make these
tests flaky.

This commit "fixes" the tests by performing the work in a fiber; the
fiber goes out of scope and should be collected by the call to
verify_compaction_references, so there should be no references to the
to-be-moved objects floating around on the machine stack.

Fixes [#20021]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There have been some sproradically flaky tests related to GC compaction,
which fail with:

  1) Failure:
TestGCCompact#test_moving_hashes_down_size_pools [/test/ruby/test_gc_compact.rb:442]:
Expected 499 to be &gt;= 500.

What's happening here, is that, _sometimes_, depending on very unlucky
combinations of machine things, one of the expected-to-be-moved hashes
might be found on the machine stack during GC, and thus pinned.

One factor which seems to make this _more_ likely is that GCC 11 on
Ubuntu 22.04 seems to want to allocate 440 bytes of stack space for
`gc_start`, which is much more than it actually uses on the common code
path. The result is that there are some 50-odd VALUE-sized cells "live"
on the stack which may well contain valid heap pointers from previous
function calls, and will need to be pinned.

This is, of course, totally normal and expected; Ruby's GC is
conservative and if there is the possibility that a VALUE might be live
on the machine stack, it can't be moved. However, it does make these
tests flaky.

This commit "fixes" the tests by performing the work in a fiber; the
fiber goes out of scope and should be collected by the call to
verify_compaction_references, so there should be no references to the
to-be-moved objects floating around on the machine stack.

Fixes [#20021]
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "Tests to move between size pools are flaky on Windows too"</title>
<updated>2023-08-04T13:13:57+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-08-03T15:08:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=61b76e74afa0976ff97685aa6e762633a3d43376'/>
<id>61b76e74afa0976ff97685aa6e762633a3d43376</id>
<content type='text'>
This reverts commit c5abe0d08f8f7686422e6eef374cf8c78aefacb6.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit c5abe0d08f8f7686422e6eef374cf8c78aefacb6.
</pre>
</div>
</content>
</entry>
<entry>
<title>Tests to move between size pools are flaky on Windows too [ci skip]</title>
<updated>2023-08-02T05:19:44+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2023-08-02T05:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=c5abe0d08f8f7686422e6eef374cf8c78aefacb6'/>
<id>c5abe0d08f8f7686422e6eef374cf8c78aefacb6</id>
<content type='text'>
Needs more investigations.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Needs more investigations.
</pre>
</div>
</content>
</entry>
<entry>
<title>Skip flaky test on Solaris</title>
<updated>2023-08-01T00:02:32+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-08-01T00:02:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=ec0e6809f9247a8028500d3e915ab01f63c7d59b'/>
<id>ec0e6809f9247a8028500d3e915ab01f63c7d59b</id>
<content type='text'>
This test is flaky on "SPARC Solaris 10 (gcc)" CI with this message:

TestGCCompact#test_moving_objects_between_size_pools [test/ruby/test_gc_compact.rb:378]:
Expected 499 to be &gt;= 500.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This test is flaky on "SPARC Solaris 10 (gcc)" CI with this message:

TestGCCompact#test_moving_objects_between_size_pools [test/ruby/test_gc_compact.rb:378]:
Expected 499 to be &gt;= 500.
</pre>
</div>
</content>
</entry>
<entry>
<title>Assert that at least one element has been embedded</title>
<updated>2023-07-31T15:46:53+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-07-31T14:26:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=547d2378acca6ee376a9f1b0a619c919e834b3cb'/>
<id>547d2378acca6ee376a9f1b0a619c919e834b3cb</id>
<content type='text'>
It's not guaranteed that the first element will always be embedded.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's not guaranteed that the first element will always be embedded.
</pre>
</div>
</content>
</entry>
<entry>
<title>Skip test on Solaris SPARC</title>
<updated>2023-06-23T14:37:04+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-06-23T14:35:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=01507d2f80a99985c5f25027d87e58b8a97b3da2'/>
<id>01507d2f80a99985c5f25027d87e58b8a97b3da2</id>
<content type='text'>
This test fails on Solaris SPARC with the following error and I can't
figure out why:
  TestGCCompact#test_moving_hashes_down_size_pools
  Expected 499 to be &gt;= 500.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This test fails on Solaris SPARC with the following error and I can't
figure out why:
  TestGCCompact#test_moving_hashes_down_size_pools
  Expected 499 to be &gt;= 500.
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert debugging code in test_gc_compact.rb</title>
<updated>2023-06-06T14:18:50+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-06-05T14:54:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=fae2f80d06f5058b40e91f62ba27fb01f2463d12'/>
<id>fae2f80d06f5058b40e91f62ba27fb01f2463d12</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>More debug code to GC compaction test</title>
<updated>2023-05-31T20:16:50+00:00</updated>
<author>
<name>Peter Zhu</name>
<email>peter@peterzhu.ca</email>
</author>
<published>2023-05-31T20:16:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e4163112f6b99d9c205f6bc260878dcb00954a13'/>
<id>e4163112f6b99d9c205f6bc260878dcb00954a13</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
