diff options
| author | Matt Valentine-House <matt@eightbitraptor.com> | 2026-01-14 21:51:49 +0000 |
|---|---|---|
| committer | Matt Valentine-House <matt@eightbitraptor.com> | 2026-01-26 18:01:09 +0000 |
| commit | c21f3490d1f28b43564639ae8563bc2e02e828a4 (patch) | |
| tree | d89af65583de8d257a538198a80efbdd87d804a4 | |
| parent | 5add7c3ea9a13e657fc7cba78b2633b9548a55aa (diff) | |
Implement a fast path for sweeping (gc_sweep_fast_path_p).
[Feature #21846]
There is a single path through our GC Sweeping code, and we always call
rb_gc_obj_free_vm_weak_references and rb_gc_obj_free before adding the
object back to the freelist.
We do this even when the object has no external resources that require
being free'd and has no weak references pointing to it.
This commit introduces a conservative fast path through gc_sweep_plane
that uses the object flags to identify certain cases where these calls
can be skipped - for these objects we just add them straight back on the
freelist. Any object for which gc_sweep_fast_path_p returns false will
use the current full sweep code (referred to here as the slow path).
Currently there are 2 checks that
will _always_ require an object to go down the slow path:
1. Has it's object_id been observed and stored in the id2ref_table
2. Has it got generic ivars in the gen_fields table
If neither of these are true, then we run some flag checks on the object
and send the following cases down the fast path:
- Objects that are not heap allocated
- Embedded strings that aren't in the fstring table
- Embedded Arrays
- Embedded Hashes
- Embedded Bignums
- Embedded Strings
- Floats, Rationals and Complex
- Various IMEMO subtypes that do no allocation
We've benchmarked this code using ruby-bench as well as the gcbench
benchmarks inside Ruby (benchmarks/gc) and this patch results in a
modest speed improvement on almost all of the headline benchmarks (2% in
railsbench with YJIT enabled), and an observable 30% improvement in time
spent sweeping during the GC benchmarks:
```
master: ruby 4.1.0dev (2026-01-19T12:03:33Z master 859920dfd2) +YJIT +PRISM [x86_64-linux]
experiment: ruby 4.1.0dev (2026-01-16T21:36:46Z mvh-sweep-fast-pat.. c3ffe377a1) +YJIT +PRISM [x86_64-linux]
-------------- ----------- ---------- --------------- ---------- ------------------ -----------------
bench master (ms) stddev (%) experiment (ms) stddev (%) experiment 1st itr master/experiment
lobsters N/A N/A N/A N/A N/A N/A
activerecord 132.5 0.9 132.5 1.0 1.056 1.001
chunky-png 577.2 0.4 580.1 0.4 0.994 0.995
erubi-rails 902.9 0.2 894.3 0.2 1.040 1.010
hexapdf 1763.9 3.3 1760.6 3.7 1.027 1.002
liquid-c 56.9 0.6 56.7 1.4 1.004 1.003
liquid-compile 46.3 2.1 46.1 2.1 1.005 1.004
liquid-render 77.8 0.8 75.1 0.9 1.023 1.036
mail 114.7 0.4 113.0 1.4 1.054 1.015
psych-load 1635.4 1.4 1625.9 0.5 0.988 1.006
railsbench 1685.4 2.4 1650.1 2.0 0.989 1.021
rubocop 133.5 8.1 130.3 7.8 1.002 1.024
ruby-lsp 140.3 1.9 137.5 1.8 1.007 1.020
sequel 64.6 0.7 63.9 0.7 1.003 1.011
shipit 1196.2 4.3 1181.5 4.2 1.003 1.012
-------------- ----------- ---------- --------------- ---------- ------------------ -----------------
Legend:
- experiment 1st itr: ratio of master/experiment time for the first benchmarking iteration.
- master/experiment: ratio of master/experiment time. Higher is better for experiment. Above 1 represents a speedup.
```
```
Benchmark │ Wall(B) Sweep(B) Mark(B) │ Wall(E) Sweep(E) Mark(E) │ Wall Δ Sweep Δ
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
null │ 0.000s 1ms 4ms │ 0.000s 1ms 4ms │ 0% 0%
hash1 │ 4.330s 875ms 46ms │ 3.960s 531ms 44ms │ +8.6% +39.3%
hash2 │ 6.356s 243ms 988ms │ 6.298s 176ms 1.03s │ +0.9% +27.6%
rdoc │ 37.337s 2.42s 1.09s │ 36.678s 2.11s 1.20s │ +1.8% +13.1%
binary_trees │ 3.366s 426ms 252ms │ 3.082s 275ms 239ms │ +8.4% +35.4%
ring │ 5.252s 14ms 2.47s │ 5.327s 12ms 2.43s │ -1.4% +14.3%
redblack │ 2.966s 28ms 41ms │ 2.940s 21ms 38ms │ +0.9% +25.0%
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
Legend: (B) = Baseline, (E) = Experiment, Δ = improvement (positive = faster)
Wall = total wallclock, Sweep = GC sweeping time, Mark = GC marking time
Times are median of 3 runs
```
These results are also borne out when YJIT is disabled:
```
master: ruby 4.1.0dev (2026-01-19T12:03:33Z master 859920dfd2) +PRISM [x86_64-linux]
experiment: ruby 4.1.0dev (2026-01-16T21:36:46Z mvh-sweep-fast-pat.. c3ffe377a1) +PRISM [x86_64-linux]
-------------- ----------- ---------- --------------- ---------- ------------------ -----------------
bench master (ms) stddev (%) experiment (ms) stddev (%) experiment 1st itr master/experiment
lobsters N/A N/A N/A N/A N/A N/A
activerecord 389.6 0.3 377.5 0.3 1.032 1.032
chunky-png 1123.4 0.2 1109.2 0.2 1.013 1.013
erubi-rails 1754.3 0.1 1725.7 0.1 1.035 1.017
hexapdf 3346.5 0.9 3326.9 0.7 1.003 1.006
liquid-c 84.0 0.5 83.5 0.5 0.992 1.006
liquid-compile 74.0 1.5 73.5 1.4 1.011 1.008
liquid-render 199.9 0.4 199.6 0.4 1.000 1.002
mail 177.8 0.4 176.4 0.4 1.069 1.008
psych-load 2749.6 0.7 2777.0 0.0 0.980 0.990
railsbench 2983.0 1.0 2965.5 0.8 1.041 1.006
rubocop 228.8 1.0 227.5 1.2 1.015 1.005
ruby-lsp 221.8 0.9 216.1 0.8 1.011 1.026
sequel 89.1 0.5 89.1 1.8 1.005 1.000
shipit 2385.6 1.6 2371.8 1.0 1.002 1.006
-------------- ----------- ---------- --------------- ---------- ------------------ -----------------
Legend:
- experiment 1st itr: ratio of master/experiment time for the first benchmarking iteration.
- master/experiment: ratio of master/experiment time. Higher is better for experiment. Above 1 represents a speedup.
```
```
Benchmark │ Wall(B) Sweep(B) Mark(B) │ Wall(E) Sweep(E) Mark(E) │ Wall Δ Sweep Δ
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
null │ 0.000s 1ms 4ms │ 0.000s 1ms 3ms │ 0% 0%
hash1 │ 4.349s 877ms 45ms │ 4.045s 532ms 44ms │ +7.0% +39.3%
hash2 │ 6.575s 235ms 967ms │ 6.540s 181ms 1.04s │ +0.5% +23.0%
rdoc │ 45.782s 2.23s 1.14s │ 44.925s 1.90s 1.01s │ +1.9% +15.0%
binary_trees │ 6.433s 426ms 252ms │ 6.268s 278ms 240ms │ +2.6% +34.7%
ring │ 6.584s 17ms 2.33s │ 6.738s 13ms 2.33s │ -2.3% +30.8%
redblack │ 13.334s 31ms 42ms │ 13.296s 24ms 107ms │ +0.3% +22.6%
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
Legend: (B) = Baseline, (E) = Experiment, Δ = improvement (positive = faster)
Wall = total wallclock, Sweep = GC sweeping time, Mark = GC marking time
Times are median of 3 runs
```
| -rw-r--r-- | gc.c | 2 | ||||
| -rw-r--r-- | gc/default/default.c | 163 |
2 files changed, 133 insertions, 32 deletions
@@ -596,6 +596,7 @@ rb_gc_guarded_ptr_val(volatile VALUE *ptr, VALUE val) #endif static const char *obj_type_name(VALUE obj); +static st_table *id2ref_tbl; #include "gc/default/default.c" #if USE_MODULAR_GC && !defined(HAVE_DLOPEN) @@ -1831,7 +1832,6 @@ rb_gc_pointer_to_heap_p(VALUE obj) #define OBJ_ID_INCREMENT (RUBY_IMMEDIATE_MASK + 1) #define LAST_OBJECT_ID() (object_id_counter * OBJ_ID_INCREMENT) static VALUE id2ref_value = 0; -static st_table *id2ref_tbl = NULL; #if SIZEOF_SIZE_T == SIZEOF_LONG_LONG static size_t object_id_counter = 1; diff --git a/gc/default/default.c b/gc/default/default.c index 013c074994..ff43e38ab9 100644 --- a/gc/default/default.c +++ b/gc/default/default.c @@ -843,6 +843,93 @@ heap_page_in_global_empty_pages_pool(rb_objspace_t *objspace, struct heap_page * #define GET_HEAP_WB_UNPROTECTED_BITS(x) (&GET_HEAP_PAGE(x)->wb_unprotected_bits[0]) #define GET_HEAP_MARKING_BITS(x) (&GET_HEAP_PAGE(x)->marking_bits[0]) + +#ifndef BUILDING_MODULAR_GC +static inline bool +gc_sweep_fast_path_p(VALUE obj) +{ + VALUE flags = RBASIC(obj)->flags; + + if (flags & FL_FINALIZE) return false; + + switch (flags & RUBY_T_MASK) { + case T_IMEMO: + switch (imemo_type(obj)) { + case imemo_constcache: + case imemo_cref: + case imemo_ifunc: + case imemo_memo: + case imemo_svar: + case imemo_throw_data: + return true; + default: + return false; + } + + case T_DATA: + if (flags & RUBY_FL_USERPRIV0) { + uintptr_t type = (uintptr_t)RTYPEDDATA(obj)->type; + if (type & TYPED_DATA_EMBEDDED) { + RUBY_DATA_FUNC dfree = ((const rb_data_type_t *)(type & TYPED_DATA_PTR_MASK))->function.dfree; + return (uintptr_t)dfree + 1 <= 1; + } + } + return false; + + case T_OBJECT: + case T_STRING: + case T_ARRAY: + case T_HASH: + case T_BIGNUM: + case T_STRUCT: + case T_FLOAT: + case T_RATIONAL: + case T_COMPLEX: + break; + + default: + return false; + } + + shape_id_t shape_id = RBASIC_SHAPE_ID(obj); + if (id2ref_tbl && rb_shape_has_object_id(shape_id)) return false; + + switch (flags & RUBY_T_MASK) { + case T_OBJECT: + return !(flags & ROBJECT_HEAP); + + case T_STRING: + if (flags & (RSTRING_NOEMBED | RSTRING_FSTR)) return false; + return !rb_shape_has_fields(shape_id); + + case T_ARRAY: + if (!(flags & RARRAY_EMBED_FLAG)) return false; + return !rb_shape_has_fields(shape_id); + + case T_HASH: + if (flags & RHASH_ST_TABLE_FLAG) return false; + return !rb_shape_has_fields(shape_id); + + case T_BIGNUM: + if (!(flags & BIGNUM_EMBED_FLAG)) return false; + return !rb_shape_has_fields(shape_id); + + case T_STRUCT: + if (!(flags & RSTRUCT_EMBED_LEN_MASK)) return false; + if (flags & RSTRUCT_GEN_FIELDS) return !rb_shape_has_fields(shape_id); + return true; + + case T_FLOAT: + case T_RATIONAL: + case T_COMPLEX: + return !rb_shape_has_fields(shape_id); + + default: + UNREACHABLE_RETURN(false); + } +} +#endif + #define RVALUE_AGE_BITMAP_INDEX(n) (NUM_IN_PAGE(n) / (BITS_BITLENGTH / RVALUE_AGE_BIT_COUNT)) #define RVALUE_AGE_BITMAP_OFFSET(n) ((NUM_IN_PAGE(n) % (BITS_BITLENGTH / RVALUE_AGE_BIT_COUNT)) * RVALUE_AGE_BIT_COUNT) @@ -3481,15 +3568,34 @@ gc_sweep_plane(rb_objspace_t *objspace, rb_heap_t *heap, uintptr_t p, bits_t bit rb_asan_unpoison_object(vp, false); if (bitset & 1) { switch (BUILTIN_TYPE(vp)) { - default: /* majority case */ - gc_report(2, objspace, "page_sweep: free %p\n", (void *)p); + case T_MOVED: + if (objspace->flags.during_compacting) { + /* The sweep cursor shouldn't have made it to any + * T_MOVED slots while the compact flag is enabled. + * The sweep cursor and compact cursor move in + * opposite directions, and when they meet references will + * get updated and "during_compacting" should get disabled */ + rb_bug("T_MOVED shouldn't be seen until compaction is finished"); + } + gc_report(3, objspace, "page_sweep: %s is added to freelist\n", rb_obj_info(vp)); + ctx->empty_slots++; + RVALUE_AGE_SET_BITMAP(vp, 0); + heap_page_add_freeobj(objspace, sweep_page, vp); + break; + case T_ZOMBIE: + /* already counted */ + break; + case T_NONE: + ctx->empty_slots++; /* already freed */ + break; + + default: #if RGENGC_CHECK_MODE if (!is_full_marking(objspace)) { if (RVALUE_OLD_P(objspace, vp)) rb_bug("page_sweep: %p - old while minor GC.", (void *)p); if (RVALUE_REMEMBERED(objspace, vp)) rb_bug("page_sweep: %p - remembered.", (void *)p); } #endif - if (RVALUE_WB_UNPROTECTED(objspace, vp)) CLEAR_IN_BITMAP(GET_HEAP_WB_UNPROTECTED_BITS(vp), vp); #if RGENGC_CHECK_MODE @@ -3501,42 +3607,37 @@ gc_sweep_plane(rb_objspace_t *objspace, rb_heap_t *heap, uintptr_t p, bits_t bit #undef CHECK #endif - rb_gc_event_hook(vp, RUBY_INTERNAL_EVENT_FREEOBJ); +#ifndef BUILDING_MODULAR_GC + if (gc_sweep_fast_path_p(vp)) { + if (UNLIKELY(objspace->hook_events & RUBY_INTERNAL_EVENT_FREEOBJ)) { + rb_gc_event_hook(vp, RUBY_INTERNAL_EVENT_FREEOBJ); + } - rb_gc_obj_free_vm_weak_references(vp); - if (rb_gc_obj_free(objspace, vp)) { - // always add free slots back to the swept pages freelist, - // so that if we're compacting, we can re-use the slots (void)VALGRIND_MAKE_MEM_UNDEFINED((void*)p, BASE_SLOT_SIZE); RVALUE_AGE_SET_BITMAP(vp, 0); heap_page_add_freeobj(objspace, sweep_page, vp); - gc_report(3, objspace, "page_sweep: %s is added to freelist\n", rb_obj_info(vp)); + gc_report(3, objspace, "page_sweep: %s (fast path) added to freelist\n", rb_obj_info(vp)); ctx->freed_slots++; } - else { - ctx->final_slots++; - } - break; + else +#endif + { + gc_report(2, objspace, "page_sweep: free %p\n", (void *)p); - case T_MOVED: - if (objspace->flags.during_compacting) { - /* The sweep cursor shouldn't have made it to any - * T_MOVED slots while the compact flag is enabled. - * The sweep cursor and compact cursor move in - * opposite directions, and when they meet references will - * get updated and "during_compacting" should get disabled */ - rb_bug("T_MOVED shouldn't be seen until compaction is finished"); + rb_gc_event_hook(vp, RUBY_INTERNAL_EVENT_FREEOBJ); + + rb_gc_obj_free_vm_weak_references(vp); + if (rb_gc_obj_free(objspace, vp)) { + (void)VALGRIND_MAKE_MEM_UNDEFINED((void*)p, BASE_SLOT_SIZE); + RVALUE_AGE_SET_BITMAP(vp, 0); + heap_page_add_freeobj(objspace, sweep_page, vp); + gc_report(3, objspace, "page_sweep: %s is added to freelist\n", rb_obj_info(vp)); + ctx->freed_slots++; + } + else { + ctx->final_slots++; + } } - gc_report(3, objspace, "page_sweep: %s is added to freelist\n", rb_obj_info(vp)); - ctx->empty_slots++; - RVALUE_AGE_SET_BITMAP(vp, 0); - heap_page_add_freeobj(objspace, sweep_page, vp); - break; - case T_ZOMBIE: - /* already counted */ - break; - case T_NONE: - ctx->empty_slots++; /* already freed */ break; } } |
