| Age | Commit message (Collapse) | Author |
|
|
|
Fixes issue pointed out in https://bugs.ruby-lang.org/issues/21084#note-7.
The following script crashes:
wmap = ObjectSpace::WeakMap.new
GC.disable # only manual GCs
GC.start
GC.start
retain = []
50.times do
k = Object.new
wmap[k] = true
retain << k
end
GC.start # wmap promoted, other objects still young
retain.clear
GC.start(full_mark: false)
wmap.keys.each(&:itself) # call method on keys to cause crash
|
|
|
|
[Feature #21084]
# Summary
The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`.
This presents challenges because Ruby's GC is incremental, meaning that if the
`ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory
access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents
any cleanup to be run (e.g. freeing memory or deleting entries from hash
tables). This ticket proposes `rb_gc_declare_weak_references` which declares
that an object has weak references and calls a cleanup function after marking,
allowing the object to clean up any memory for dead objects.
# Introduction
In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an
API allowing objects to mark weak references, the function signature looks like
this:
```c
void rb_gc_mark_weak(VALUE *ptr);
```
`rb_gc_mark_weak` is called during the marking phase of the GC to specify that
the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced.
`rb_gc_mark_weak` appends this pointer to a list that is processed after the
marking phase of the GC. If the object at `*ptr` is no longer alive, then it
overwrites the object reference with a special value (`*ptr = Qundef`).
However, this API resulted in two challenges:
1. Ruby's default GC is incremental, which means that the GC is not ran in one
phase, but rather split into chunks of work that interleaves with Ruby
execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc
heap, and that memory could be realloc'd or even free'd. We had to use
workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal
memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted
runtime performance, and increased memory usage.
2. When an object dies, `rb_gc_mark_weak` only overwites the reference with
`Qundef`. This means that if we want to do any cleanup (e.g. free a piece of
memory or delete a hash table entry), we could not do that and had to defer
this process elsewhere (e.g. during marking or runtime).
In this ticket, I'm proposing a new API for weak references. Instead of an
object marking its weak references during the marking phase, the object declares
that it has weak references using the `rb_gc_declare_weak_references` function.
This declaration occurs during runtime (e.g. after the object has been created)
rather than during GC.
After an object declares that it has weak references, it will have its callback
function called after marking as long as that object is alive. This callback
function can then call a special function `rb_gc_handle_weak_references_alive_p`
to determine whether its references are alive. This will allow the callback
function to do whatever it wants on the object, allowing it to perform any
cleanup work it needs.
This significantly simplifies the code for `ObjectSpace::WeakMap` and
`ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for
the limitations of `rb_gc_mark_weak`.
# Performance
The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now
about 60% faster because the implementation has been simplified and the number
of allocations has been reduced. We can see that there is not a significant
impact on the performance of `ObjectSpace::WeakMap#[]`.
Base:
```
ObjectSpace::WeakMap#[]=
4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s
ObjectSpace::WeakMap#[]
30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s
```
Branch:
```
ObjectSpace::WeakMap#[]=
7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s
ObjectSpace::WeakMap#[]
30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s
```
Code:
```
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
wmap = ObjectSpace::WeakMap.new
key = Object.new
val = Object.new
wmap[key] = val
Benchmark.ips do |x|
x.report("ObjectSpace::WeakMap#[]=") do |times|
i = 0
while i < times
wmap[Object.new] = Object.new
i += 1
end
end
x.report("ObjectSpace::WeakMap#[]") do |times|
i = 0
while i < times
wmap[key]
wmap[val] # does not exist
i += 1
end
end
end
```
# Alternative designs
Currently, `rb_gc_declare_weak_references` is designed to be an internal-only
API. This allows us to assume the object types that call
`rb_gc_declare_weak_references`. In the future, if we want to open up this API
to third parties, we may want to change this function to something like:
```c
void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj));
```
This will allow the third party to implement a custom `callback` that gets
called after the marking phase of GC to clean up any dead references. I chose
not to implement this design because it is less efficient as we would need to
store a mapping from `obj` to `callback`, which requires extra memory.
|
|
|
|
not hold
|
|
This guard was removed in https://github.com/ruby/ruby/pull/13497
on the justification that some GC may need to be notified even for
immediate.
But the two currently available GCs don't, and there are plenty
of assumtions GCs don't everywhere, notably in YJIT and ZJIT.
This optimization is also not so micro (but not huge either).
I routinely see 1-2% wasted there on micro-benchmarks.
So perhaps if in the future we actually need this, it might make
sense to introduce a way for GCs to declare that as an option,
but in the meantime it's extra overhead with little gain.
|
|
I believe this was accidentally left in as part of
2beb3798bac52624c3170138f8ef65869f1da6c0
|
|
This reverts commit 228d13f6ed914d1e7f6bd2416e3f5be8283be865.
This commit makes default.c and mmtk.c depend on shape.h, which prevents
them from building independently.
|
|
|
|
|
|
Attempt to fix the following SEGV:
```
ruby(gc_mark) ../src/gc/default/default.c:4429
ruby(gc_mark_children+0x45) [0x560b380bf8b5] ../src/gc/default/default.c:4625
ruby(gc_mark_stacked_objects) ../src/gc/default/default.c:4647
ruby(gc_mark_stacked_objects_all) ../src/gc/default/default.c:4685
ruby(gc_marks_rest) ../src/gc/default/default.c:5707
ruby(gc_marks+0x4e7) [0x560b380c41c1] ../src/gc/default/default.c:5821
ruby(gc_start) ../src/gc/default/default.c:6502
ruby(heap_prepare+0xa4) [0x560b380c4efc] ../src/gc/default/default.c:2074
ruby(heap_next_free_page) ../src/gc/default/default.c:2289
ruby(newobj_cache_miss) ../src/gc/default/default.c:2396
ruby(RB_SPECIAL_CONST_P+0x0) [0x560b380c5df4] ../src/gc/default/default.c:2420
ruby(RB_BUILTIN_TYPE) ../src/include/ruby/internal/value_type.h:184
ruby(newobj_init) ../src/gc/default/default.c:2136
ruby(rb_gc_impl_new_obj) ../src/gc/default/default.c:2500
ruby(newobj_of) ../src/gc.c:996
ruby(rb_imemo_new+0x37) [0x560b380d8bed] ../src/imemo.c:46
ruby(imemo_fields_new) ../src/imemo.c:105
ruby(rb_imemo_fields_new) ../src/imemo.c:120
```
I have no reproduction, but my understanding based on the backtrace
and error is that GC is triggered inside `newobj_init` causing the
new object to be marked while in a incomplete state.
I believe the fix is to pass the `shape_id` down to `newobj_init`
so it can be set before the GC has a chance to trigger.
|
|
This should be less common than than many of the other flags, so should
not inflate the heap too much. This is desirable because reducing the
number of remembered objects will improve minor GC speeds.
|
|
Fix sign-compare warning
|
|
|
|
|
|
This allows us to do less work when allocating a fresh page.
|
|
|
|
Since we do not run a Ractor barrier before forking, it's possible that
another other Ractor is halfway through allocating an object during forking.
This may lead to allocated_objects_count being off by one.
For example, the following script reproduces the bug:
100.times do |i|
Ractor.new(i) do |j|
10000.times do |i|
"#{j}-#{i}"
end
Ractor.receive
end
pid = fork { GC.verify_internal_consistency }
_, status = Process.waitpid2 pid
raise unless status.success?
end
We need to run with `taskset -c 1` to force it to use a single CPU core
to more consistenly reproduce the bug:
heap_pages_final_slots: 1, total_freed_objects: 16628
test.rb:8: [BUG] inconsistent live slot number: expect 19589, but 19588.
ruby 4.0.0dev (2025-11-25T03:06:55Z master 55892f5994) +PRISM [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0007 p:---- s:0029 e:000028 l:y b:---- CFUNC :verify_internal_consistency
c:0006 p:0004 s:0025 e:000024 l:n b:---- BLOCK test.rb:8 [FINISH]
c:0005 p:---- s:0022 e:000021 l:y b:---- CFUNC :fork
c:0004 p:0012 s:0018 E:0014c0 l:n b:---- BLOCK test.rb:8
c:0003 p:0024 s:0011 e:000010 l:y b:0001 METHOD <internal:numeric>:257
c:0002 p:0005 s:0006 E:001730 l:n b:---- EVAL test.rb:1 [FINISH]
c:0001 p:0000 s:0003 E:001d20 l:y b:---- DUMMY [FINISH]
-- Ruby level backtrace information ----------------------------------------
test.rb:1:in '<main>'
<internal:numeric>:257:in 'times'
test.rb:8:in 'block in <main>'
test.rb:8:in 'fork'
test.rb:8:in 'block (2 levels) in <main>'
test.rb:8:in 'verify_internal_consistency'
-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1
-- C level backtrace information -------------------------------------------
ruby(rb_print_backtrace+0x14) [0x61b67ac48b60] vm_dump.c:1105
ruby(rb_vm_bugreport) vm_dump.c:1450
ruby(rb_bug_without_die_internal+0x5f) [0x61b67a818a28] error.c:1098
ruby(rb_bug) error.c:1116
ruby(gc_verify_internal_consistency_+0xbdd) [0x61b67a83d8ed] gc/default/default.c:5186
ruby(gc_verify_internal_consistency+0x2d) [0x61b67a83d960] gc/default/default.c:5241
ruby(rb_gc_verify_internal_consistency) gc/default/default.c:8950
ruby(gc_verify_internal_consistency_m) gc/default/default.c:8966
ruby(vm_call_cfunc_with_frame_+0x10d) [0x61b67a9e50fd] vm_insnhelper.c:3902
ruby(vm_sendish+0x111) [0x61b67a9eeaf1] vm_insnhelper.c:6124
ruby(vm_exec_core+0x84) [0x61b67aa07434] insns.def:903
ruby(vm_exec_loop+0xa) [0x61b67a9f8155] vm.c:2811
ruby(rb_vm_exec) vm.c:2787
ruby(vm_yield_with_cref+0x90) [0x61b67a9fd2ea] vm.c:1865
ruby(vm_yield) vm.c:1873
ruby(rb_yield) vm_eval.c:1362
ruby(rb_protect+0xef) [0x61b67a81fe6f] eval.c:1154
ruby(rb_f_fork+0x16) [0x61b67a8e98ab] process.c:4293
ruby(rb_f_fork) process.c:4284
|
|
|
|
|
|
|
|
[Feature #20408]
|
|
[Feature #20408]
|
|
[Feature #20408]
|
|
It may return sizes that aren't allocatable for arrays and strings.
|
|
rb_gc_verify_shareable is not GC implementation specific so it should live
in gc.c.
|
|
We shouldn't run any ruby code with the VM lock held.
|
|
We can avoid taking this barrier if we're not incremental marking or lazy sweeping.
I found this was taking a significant amount of samples when profiling `Psych.load`
in multiple ractors due to the vm barrier. With this change, we get significant improvements
in ractor benchmarks that allocate lots of objects.
-- Psych.load benchmark --
```
Before: After:
r: itr: time r: itr: time
0 #1: 960ms 0 #1: 943ms
0 #2: 979ms 0 #2: 939ms
0 #3: 968ms 0 #3: 948ms
0 #4: 963ms 0 #4: 946ms
0 #5: 964ms 0 #5: 944ms
1 #1: 947ms 1 #1: 940ms
1 #2: 950ms 1 #2: 947ms
1 #3: 962ms 1 #3: 950ms
1 #4: 947ms 1 #4: 945ms
1 #5: 947ms 1 #5: 943ms
2 #1: 1131ms 2 #1: 1005ms
2 #2: 1153ms 2 #2: 996ms
2 #3: 1155ms 2 #3: 1003ms
2 #4: 1205ms 2 #4: 1012ms
2 #5: 1179ms 2 #5: 1012ms
4 #1: 1555ms 4 #1: 1209ms
4 #2: 1509ms 4 #2: 1244ms
4 #3: 1529ms 4 #3: 1254ms
4 #4: 1512ms 4 #4: 1267ms
4 #5: 1513ms 4 #5: 1245ms
6 #1: 2122ms 6 #1: 1584ms
6 #2: 2080ms 6 #2: 1532ms
6 #3: 2079ms 6 #3: 1476ms
6 #4: 2021ms 6 #4: 1463ms
6 #5: 1999ms 6 #5: 1461ms
8 #1: 2741ms 8 #1: 1630ms
8 #2: 2711ms 8 #2: 1632ms
8 #3: 2688ms 8 #3: 1654ms
8 #4: 2641ms 8 #4: 1684ms
8 #5: 2656ms 8 #5: 1752ms
```
|
|
|
|
to adopt strict shareable rule.
* (basically) shareable objects only refer shareable objects
* (exception) shareable objects can refere unshareable objects
but should not leak reference to unshareable objects to Ruby world
|
|
* `RB_OBJ_SET_SHAREABLE(obj)` makes obj shareable.
All of reachable objects from `obj` should be shareable.
* `RB_OBJ_SET_FROZEN_SHAREABLE(obj)` same as above
but freeze `obj` before making it shareable.
Also `rb_gc_verify_shareable(obj)` is introduced to check
the `obj` does not violate shareable rule (an shareable object
only refers shareable objects) strictly.
The rule has some exceptions (some shareable objects can refer to
unshareable objects, such as a Ractor object (which is a shareable
object) can refer to the Ractor local objects.
To handle such case, `check_shareable` flag is also introduced.
`STRICT_VERIFY_SHAREABLE` macro is also introduced to verify
the strict shareable rule at `SET_SHAREABLE`.
|
|
|
|
This isn't (yet?) safe to do because it concurrently modifies GC
structures and dfree functions are not necessarily safe to do without
stopping all Ractors.
If it was safe to do this we should also do it for
gc_enter_event_continue. I do think sweeping could be done concurrently
with the mutator and in parallel, but that requires more work first.
|
|
We should only be executing WBs when GC is not running. We ran into this
issue when debugging 3cd2407045a67838cf2ab949e5164676b6870958.
|
|
Previously we were tracking down a bug where this was used after being
valid.
Co-authored-by: Luke Gruber <luke.gru@gmail.com>
|
|
Previously on our mark_and_move we were calling rb_gc_mark, which isn't
safe to call at compaction time.
Co-authored-by: Luke Gruber <luke.gru@gmail.com>
|
|
When we mark a T_NONE, we crash with the object and parent object information
in the bug report. However, if the parent object is young then it is Qfalse.
For example, a bug report looks like:
[BUG] try to mark T_NONE object (obj: 0x00003990e42d7c70 T_NONE/, parent: (none))
This commit changes it to always set the parent object and also adds a
new field parent_object_old_p to quickly determine if the parent object
is old or not.
|
|
The FL_WB_PROTECTED flag is no longer used and is not set on objects, so
that assertion cannot be true. Instead, we should use RVALUE_WB_UNPROTECTED.
|
|
Setting v1, v2, v3 when we allocate an object assumes that we always
allocate 40 byte objects. By removing v1, v2, v3, we can make the base
slot size another size.
|
|
|
|
klass is not used, so we can shrink RZombie down to 32 bytes.
|
|
|
|
rb_obj_info_dump outputs to stderr, which is not outputted to the bug
report, so this information is lost.
|
|
|
|
If we malloc when the current Ractor is locked, we can deadlock because
GC requires VM lock and Ractor barrier. If another Ractor is waiting on
this Ractor lock, then it will deadlock because the other Ractor will
never join the barrier.
For example, this script deadlocks:
r = Ractor.new do
loop do
Ractor::Port.new
end
end
100000.times do |i|
r.send(nil)
puts i
end
On debug builds, it fails with this assertion error:
vm_sync.c:75: Assertion Failed: vm_lock_enter:cr->sync.locked_by != rb_ractor_self(cr)
On non-debug builds, we can see that it deadlocks in the debugger:
Main Ractor:
frame #3: 0x000000010021fdc4 miniruby`rb_native_mutex_lock(lock=<unavailable>) at thread_pthread.c:115:14
frame #4: 0x0000000100193eb8 miniruby`ractor_send0 [inlined] ractor_lock(r=<unavailable>, file=<unavailable>, line=1180) at ractor.c:73:5
frame #5: 0x0000000100193eb0 miniruby`ractor_send0 [inlined] ractor_send_basket(ec=<unavailable>, rp=0x0000000131092840, b=0x000000011c63de80, raise_on_error=true) at ractor_sync.c:1180:5
frame #6: 0x0000000100193eac miniruby`ractor_send0(ec=<unavailable>, rp=0x0000000131092840, obj=4, move=<unavailable>, raise_on_error=true) at ractor_sync.c:1211:5
Second Ractor:
frame #2: 0x00000001002208d0 miniruby`rb_ractor_sched_barrier_start [inlined] rb_native_cond_wait(cond=<unavailable>, mutex=<unavailable>) at thread_pthread.c:221:13
frame #3: 0x00000001002208cc miniruby`rb_ractor_sched_barrier_start(vm=0x000000013180d600, cr=0x0000000131093460) at thread_pthread.c:1438:13
frame #4: 0x000000010028a328 miniruby`rb_vm_barrier at vm_sync.c:262:13 [artificial]
frame #5: 0x00000001000dfa6c miniruby`gc_start [inlined] rb_gc_vm_barrier at gc.c:179:5
frame #6: 0x00000001000dfa68 miniruby`gc_start [inlined] gc_enter(objspace=0x000000013180fc00, event=gc_enter_event_start, lock_lev=<unavailable>) at default.c:6636:9
frame #7: 0x00000001000dfa48 miniruby`gc_start(objspace=0x000000013180fc00, reason=<unavailable>) at default.c:6361:5
frame #8: 0x00000001000e3fd8 miniruby`objspace_malloc_increase_body [inlined] garbage_collect(objspace=0x000000013180fc00, reason=512) at default.c:6341:15
frame #9: 0x00000001000e3fa4 miniruby`objspace_malloc_increase_body [inlined] garbage_collect_with_gvl(objspace=0x000000013180fc00, reason=512) at default.c:6741:16
frame #10: 0x00000001000e3f88 miniruby`objspace_malloc_increase_body(objspace=0x000000013180fc00, mem=<unavailable>, new_size=<unavailable>, old_size=<unavailable>, type=<unavailable>) at default.c:8007:13
frame #11: 0x00000001000e3c44 miniruby`rb_gc_impl_malloc [inlined] objspace_malloc_fixup(objspace=0x000000013180fc00, mem=0x000000011c700000, size=12582912) at default.c:8085:5
frame #12: 0x00000001000e3c30 miniruby`rb_gc_impl_malloc(objspace_ptr=0x000000013180fc00, size=12582912) at default.c:8182:12
frame #13: 0x00000001000d4584 miniruby`ruby_xmalloc [inlined] ruby_xmalloc_body(size=<unavailable>) at gc.c:5128:12
frame #14: 0x00000001000d4568 miniruby`ruby_xmalloc(size=<unavailable>) at gc.c:5118:34
frame #15: 0x00000001001eb184 miniruby`rb_st_init_existing_table_with_size(tab=0x000000011c2b4b40, type=<unavailable>, size=<unavailable>) at st.c:559:39
frame #16: 0x00000001001ebc74 miniruby`rebuild_table_if_necessary [inlined] rb_st_init_table_with_size(type=0x00000001004f4a78, size=524287) at st.c:585:5
frame #17: 0x00000001001ebc5c miniruby`rebuild_table_if_necessary [inlined] rebuild_table(tab=0x000000013108e2f0) at st.c:753:19
frame #18: 0x00000001001ebbfc miniruby`rebuild_table_if_necessary(tab=0x000000013108e2f0) at st.c:1125:9
frame #19: 0x00000001001eba08 miniruby`rb_st_insert(tab=0x000000013108e2f0, key=262144, value=4767566624) at st.c:1143:5
frame #20: 0x0000000100194b84 miniruby`ractor_port_initialzie [inlined] ractor_add_port(r=0x0000000131093460, id=262144) at ractor_sync.c:399:9
frame #21: 0x0000000100194b58 miniruby`ractor_port_initialzie [inlined] ractor_port_init(rpv=4750065560, r=0x0000000131093460) at ractor_sync.c:87:5
frame #22: 0x0000000100194b34 miniruby`ractor_port_initialzie(self=4750065560) at ractor_sync.c:103:12
|
|
[Bug #21548]
In lazy sweeping, if we need to allocate an object in a heap where we
weren't able to free any slots, but we also either have empty pages or
could allocate new pages, then we want to preemptively claim a page
because it's possible that sweeping another heap will call gc_sweep_finish_heap,
which may use up all of the empty/allocatable pages. If other heaps are
not finished sweeping then we do not finish this GC and we will end up
triggering a new GC cycle during this GC phase.
|
|
|
|
rb_gc_impl_writebarrier_remember is not Ractor safe because it writes to
bitmaps and also pushes onto the mark stack during incremental marking.
We should acquire the VM lock to prevent race conditions.
In the case that the object is not old, there is no performance impact.
However, we can see a performance impact in this microbenchmark where the
object is old:
4.times.map do
Ractor.new do
ary = []
3.times { GC.start }
10_000_000.times do |i|
ary.push(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
ary.clear
end
end
end.map(&:value)
Before:
Time (mean ± σ): 682.4 ms ± 5.1 ms [User: 2564.8 ms, System: 16.0 ms]
After:
Time (mean ± σ): 5.522 s ± 0.096 s [User: 8.237 s, System: 7.931 s]
Co-Authored-By: Luke Gruber <luke.gruber@shopify.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
|
Assuming not all objects are moved during compaction, it
is preferable to avoid rewriting references that haven't moved
as to avoid invalidating potentially shared memory pages.
|