ruby.git - The Ruby Programming Language

Age	Commit message (Collapse)	Author
2025-12-29	Add rb_gc_register_pinning_obj	Peter Zhu

2025-12-25	Implement declaring weak references	Peter Zhu
	[Feature #21084] # Summary The current way of marking weak references uses `rb_gc_mark_weak(VALUE ptr)`. This presents challenges because Ruby's GC is incremental, meaning that if the `ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory access. This also overwrites `ptr = Qundef` if `ptr` is dead, which prevents any cleanup to be run (e.g. freeing memory or deleting entries from hash tables). This ticket proposes `rb_gc_declare_weak_references` which declares that an object has weak references and calls a cleanup function after marking, allowing the object to clean up any memory for dead objects. # Introduction In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an API allowing objects to mark weak references, the function signature looks like this: ```c void rb_gc_mark_weak(VALUE ptr); ``` `rb_gc_mark_weak` is called during the marking phase of the GC to specify that the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced. `rb_gc_mark_weak` appends this pointer to a list that is processed after the marking phase of the GC. If the object at `ptr` is no longer alive, then it overwrites the object reference with a special value (`ptr = Qundef`). However, this API resulted in two challenges: 1. Ruby's default GC is incremental, which means that the GC is not ran in one phase, but rather split into chunks of work that interleaves with Ruby execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc heap, and that memory could be realloc'd or even free'd. We had to use workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted runtime performance, and increased memory usage. 2. When an object dies, `rb_gc_mark_weak` only overwites the reference with `Qundef`. This means that if we want to do any cleanup (e.g. free a piece of memory or delete a hash table entry), we could not do that and had to defer this process elsewhere (e.g. during marking or runtime). In this ticket, I'm proposing a new API for weak references. Instead of an object marking its weak references during the marking phase, the object declares that it has weak references using the `rb_gc_declare_weak_references` function. This declaration occurs during runtime (e.g. after the object has been created) rather than during GC. After an object declares that it has weak references, it will have its callback function called after marking as long as that object is alive. This callback function can then call a special function `rb_gc_handle_weak_references_alive_p` to determine whether its references are alive. This will allow the callback function to do whatever it wants on the object, allowing it to perform any cleanup work it needs. This significantly simplifies the code for `ObjectSpace::WeakMap` and `ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for the limitations of `rb_gc_mark_weak`. # Performance The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now about 60% faster because the implementation has been simplified and the number of allocations has been reduced. We can see that there is not a significant impact on the performance of `ObjectSpace::WeakMap#[]`. Base: ``` ObjectSpace::WeakMap#[]= 4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s ObjectSpace::WeakMap#[] 30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s ``` Branch: ``` ObjectSpace::WeakMap#[]= 7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s ObjectSpace::WeakMap#[] 30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s ``` Code: ``` require "bundler/inline" gemfile do source "https://rubygems.org" gem "benchmark-ips" end wmap = ObjectSpace::WeakMap.new key = Object.new val = Object.new wmap[key] = val Benchmark.ips do \|x\| x.report("ObjectSpace::WeakMap#[]=") do \|times\| i = 0 while i < times wmap[Object.new] = Object.new i += 1 end end x.report("ObjectSpace::WeakMap#[]") do \|times\| i = 0 while i < times wmap[key] wmap[val] # does not exist i += 1 end end end ``` # Alternative designs Currently, `rb_gc_declare_weak_references` is designed to be an internal-only API. This allows us to assume the object types that call `rb_gc_declare_weak_references`. In the future, if we want to open up this API to third parties, we may want to change this function to something like: ```c void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj)); ``` This will allow the third party to implement a custom `callback` that gets called after the marking phase of GC to clean up any dead references. I chose not to implement this design because it is less efficient as we would need to store a mapping from `obj` to `callback`, which requires extra memory.
2025-12-03	Handle NEWOBJ tracepoints settings fields	Jean Boussier
	[Bug #21710] - struct.c: `struct_alloc` It is possible for a `NEWOBJ` tracepoint call back to write fields into a newly allocated object before `struct_alloc` had the time to set the `RSTRUCT_GEN_FIELDS` flags and such. Hence we can't blindly initialize the `fields_obj` reference to `0` we first need to check no fields were added yet. - object.c: `rb_class_allocate_instance` Similarly, if a `NEWOBJ` tracepoint tries to set fields on the object, the `shape_id` must already be set, as it's required on T_OBJECT to know where to write fields. `NEWOBJ_OF` had to be refactored to accept a `shape_id`.
2025-10-23	use `SET_SHAREABLE`	Koichi Sasada
	to adopt strict shareable rule. * (basically) shareable objects only refer shareable objects * (exception) shareable objects can refere unshareable objects but should not leak reference to unshareable objects to Ruby world
2025-10-23	add SET_SHAREABLE macros	Koichi Sasada
	* `RB_OBJ_SET_SHAREABLE(obj)` makes obj shareable. All of reachable objects from `obj` should be shareable. * `RB_OBJ_SET_FROZEN_SHAREABLE(obj)` same as above but freeze `obj` before making it shareable. Also `rb_gc_verify_shareable(obj)` is introduced to check the `obj` does not violate shareable rule (an shareable object only refers shareable objects) strictly. The rule has some exceptions (some shareable objects can refer to unshareable objects, such as a Ractor object (which is a shareable object) can refer to the Ractor local objects. To handle such case, `check_shareable` flag is also introduced. `STRICT_VERIFY_SHAREABLE` macro is also introduced to verify the strict shareable rule at `SET_SHAREABLE`.
2025-07-03	imemo_fields_set: save copying when reassigning a variable	Jean Boussier
	If we still fit in the existing imemo/fields object we can update it atomically, saving a reallocation.
2025-05-08	Move `object_id` in object fields.	Jean Boussier
	And get rid of the `obj_to_id_tbl` It's no longer needed, the `object_id` is now stored inline in the object alongside instance variables. We still need the inverse table in case `_id2ref` is invoked, but we lazily build it by walking the heap if that happens. The `object_id` concern is also no longer a GC implementation concern, but a generic implementation. Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com> Notes: Merged: https://github.com/ruby/ruby/pull/13159
2025-04-04	Ractor: revert to moving object bytes, but size pool aware	Jean Boussier
	Using `rb_obj_clone` introduce other problems, such as `initialize_*` callbacks invocation in the context of the parent ractor. So we can revert back to copy the content of the object slots, but in a way that is aware of size pools. Notes: Merged: https://github.com/ruby/ruby/pull/13070
2025-03-31	Ractor: Fix moving embedded objects	Jean Boussier
	[Bug #20271] [Bug #20267] [Bug #20255] `rb_obj_alloc(RBASIC_CLASS(obj))` will always allocate from the basic 40B pool, so if `obj` is larger than `40B`, we'll create a corrupted object when we later copy the shape_id. Instead we can use the same logic than ractor copy, which is to use `rb_obj_clone`, and later ask the GC to free the original object. We then must turn it into a `T_OBJECT`, because otherwise just changing its class to `RactorMoved` leaves a lot of ways to keep using the object, e.g.: ``` a = [1, 2, 3] Ractor.new{}.send(a, move: true) [].concat(a) # Should raise, but wasn't. ``` If it turns out that `rb_obj_clone` isn't performant enough for some uses, we can always have carefully crafted specialized paths for the types that would benefit from it. Notes: Merged: https://github.com/ruby/ruby/pull/13008
2025-02-19	Add rb_gc_object_metadata API	Peter Zhu
	This function replaces the internal rb_obj_gc_flags API. rb_gc_object_metadata returns an array of name and value pairs, with the last element having 0 for the name. Notes: Merged: https://github.com/ruby/ruby/pull/12777
2025-01-11	Remove stale declaration for modular GC	Nobuyoshi Nakada
	Notes: Merged: https://github.com/ruby/ruby/pull/12546
2024-12-16	Check whether object is valid in allocation_info_tracer_compact	Peter Zhu
	When reference updating ObjectSpace.trace_object_allocations, we need to check whether the object is valid or not because it does not mark the object so the object may be dead. This can cause a segmentation fault if the object is on a free heap page. For example, the following script crashes: require "objspace" objs = [] ObjectSpace.trace_object_allocations do 1_000_000.times do objs << Object.new end end objs = nil # Free pages that the objs were on GC.start # Run compaction and check that it doesn't crash GC.compact Notes: Merged: https://github.com/ruby/ruby/pull/12360
2024-12-06	Add rb_gc_impl_active_gc_name to gc/gc_impl.h	Peter Zhu
	Notes: Merged: https://github.com/ruby/ruby/pull/12271
2024-12-05	Use rb_gc_enable/rb_gc_disable_no_rest instead of ruby_disable_gc	Peter Zhu
	We should use the rb_gc_enable/rb_gc_disable_no_rest APIs instead of directly setting the ruby_disable_gc variable. Notes: Merged: https://github.com/ruby/ruby/pull/12264
2024-12-05	Standardize on the name "modular GC"	Peter Zhu
	We have name fragmentation for this feature, including "shared GC", "modular GC", and "external GC". This commit standardizes the feature name to "modular GC" and the implementation to "GC library". Notes: Merged: https://github.com/ruby/ruby/pull/12261
2024-11-25	Place all non-default GC API behind USE_SHARED_GC	Matt Valentine-House
	So that it doesn't get included in the generated binaries for builds that don't support loading shared GC modules Co-Authored-By: Peter Zhu <peter@peterzhu.ca> Notes: Merged: https://github.com/ruby/ruby/pull/12149
2024-11-21	Annotate anonymous mmap	Kunshan Wang
	Use PR_SET_VMA_ANON_NAME to set human-readable names for anonymous virtual memory areas mapped by `mmap()` when compiled and run on Linux 5.17 or higher. This makes it convenient for developers to debug mmap. Notes: Merged: https://github.com/ruby/ruby/pull/12119
2024-11-14	Include the currently active GC in RUBY_DESCRIPTION	Matt Valentine-House
	This will add +MOD_GC to the version string and Ruby description when Ruby is compiled with shared gc support. When shared GC support is compiled in and a GC module has been loaded using RUBY_GC_LIBRARY, the version string will include the name of the currently active GC as reported by the rb_gc_active_gc_name function in the form +MOD_GC[gc_name] [Feature #20794] Notes: Merged: https://github.com/ruby/ruby/pull/11872
2024-10-03	Rename size_pool -> heap	Matt Valentine-House
	Now that we've inlined the eden_heap into the size_pool, we should rename the size_pool to heap. So that Ruby contains multiple heaps, with different sized objects. The term heap as a collection of memory pages is more in memory management nomenclature, whereas size_pool was a name chosen out of necessity during the development of the Variable Width Allocation features of Ruby. The concept of size pools was introduced in order to facilitate different sized objects (other than the default 40 bytes). They wrapped the eden heap and the tomb heap, and some related state, and provided a reasonably simple way of duplicating all related concerns, to provide multiple pools that all shared the same structure but held different objects. Since then various changes have happend in Ruby's memory layout: * The concept of tomb heaps has been replaced by a global free pages list, with each page having it's slot size reconfigured at the point when it is resurrected * the eden heap has been inlined into the size pool itself, so that now the size pool directly controls the free_pages list, the sweeping page, the compaction cursor and the other state that was previously being managed by the eden heap. Now that there is no need for a heap wrapper, we should refer to the collection of pages containing Ruby objects as a heap again rather than a size pool Notes: Merged: https://github.com/ruby/ruby/pull/11771
2024-07-03	Move ruby_load_external_gc_from_argv to gc.h	Peter Zhu

2024-07-03	[Feature #20470] Split GC into gc_impl.c	Peter Zhu
	This commit splits gc.c into two files: - gc.c now only contains code not specific to Ruby GC. This includes code to mark objects (which the GC implementation may choose not to use) and wrappers for internal APIs that the implementation may need to use (e.g. locking the VM). - gc_impl.c now contains the implementation of Ruby's GC. This includes marking, sweeping, compaction, and statistics. Most importantly, gc_impl.c only uses public APIs in Ruby and a limited set of functions exposed in gc.c. This allows us to build gc_impl.c independently of Ruby and plug Ruby's GC into itself.
2024-04-24	Add ruby_mimcalloc	Peter Zhu
	Many places call ruby_mimmalloc then MEMZERO. This can be reduced by using ruby_mimcalloc instead.
2024-04-18	Remove unused rb_size_pool_slot_size	Peter Zhu

2024-04-15	Initialize external GC Library	Matt Valentine-House
	Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
2024-03-27	Fix setting GC stress at boot when objspace not available	Peter Zhu

2024-03-26	Refactor init_copy gc attributes	eileencodes
	This PR moves `rb_copy_wb_protected_attribute` and `rb_gc_copy_finalizer` into a single function called `rb_gc_copy_attributes` to be called by `init_copy`. This reduces the surface area of the GC API. Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2024-03-25	Fix --debug=gc_stress flag	Peter Zhu
	ruby_env_debug_option gets called after Init_gc_stress, so the --debug=gc_stress flag never works.
2024-03-20	Make rb_aligned_malloc private	Peter Zhu
	It is not used anywhere else.
2024-03-18	Remove duplicated function prototype rb_gc_disable_no_rest	Peter Zhu

2024-03-18	Remove rb_raw_obj_info_basic	Peter Zhu
	It's not used outside of gc.c.
2024-03-14	[Feature #20265] Remove rb_newobj_of and RB_NEWOBJ_OF	Peter Zhu

2024-03-13	Don't directly read the SIZE_POOL_COUNT in shapes	Peter Zhu
	This removes the assumption about SIZE_POOL_COUNT for shapes.
2024-03-13	Simplify NEWOBJ_OF macro	Peter Zhu

2024-03-11	Use NEWOBJ_OF_ec in NEWOBJ_OF_0	Peter Zhu

2024-03-08	Retire RUBY_MARK_UNLESS_NULL	Jean Boussier
	Marking `Qnil` or `Qfalse` works fine, having an extra macro to avoid it isn't needed.
2024-02-28	Make rb_define_finalizer_no_check private	Peter Zhu

2024-02-28	Remove unused rb_gc_id2ref_obj_tbl	Peter Zhu

2024-02-26	Remove rb_objspace_marked_object_p	Peter Zhu
	rb_objspace_marked_object_p is no longer used in the objspace module, so we can remove it.
2024-02-26	Make rb_objspace_data_type_memsize private	Peter Zhu
	rb_objspace_data_type_memsize is not used in the objspace module, so we can make it private.
2024-02-26	Remove unused rb_objspace_each_objects_without_setup	Peter Zhu

2024-02-22	Extract imemo functions from gc.c into imemo.c	Peter Zhu

2024-02-14	Move rb_class_allocate_instance from gc.c to object.c	Peter Zhu

2024-01-11	Fix crash when printing RGENGC_DEBUG=5 output from GC	KJ Tsanaktsidis
	I was trying to debug an (unrelated) issue in the GC, and wanted to turn on the trace-level GC output by compiling it with -DRGENGC_DEBUG=5. Unfortunately, this actually causes a crash in newobj_init() because the code there tries to log the obj_info() of the newly created object. However, the object is not actually sufficiently set up for some of the things that obj_info() tries to do: * The instance variable table for a class is not yet initialized, and when using variable-length RVALUES, said ivar table is embedded in as-yet unitialized memory after the struct RValue. Attempting to read this, as obj_info() does, causes a crash. * T_DATA variables need to dereference their ->type field to print out the underlying C type name, which is not set up until newobj_fill() is called. To fix this, create a new method `obj_info_basic`, which dumps out only the parts of the object that are valid before the object is fully initialized. [Fixes #18795]
2023-11-24	Fix compaction for generic ivars	Peter Zhu
	When generic instance variable has a shape, it is marked movable. If it it transitions to too complex, it needs to update references otherwise it may have incorrect references.
2023-10-23	rb_shape_transition_shape_capa: use optimal sizes transitions	Jean Boussier
	Previously the growth was 3(embed), 6, 12, 24, ... With this change it's now 3(embed), 8, 16, 32, 64, ... by default. However, since power of two isn't the best size for all allocators, if `malloc_usable_size` is vailable, we use it to discover the best offset. On Linux/glibc 2.35 for instance, the growth will be 3(embed), 7, 15, 31 to avoid wasting 8B per object. Test program: ```c size_t test(size_t slots) { size_t allocated = slots * VALUE_SIZE; void test_ptr = malloc(allocated); size_t wasted = malloc_usable_size(test_ptr) - allocated; free(test_ptr); fprintf(stderr, "slots = %lu, wasted_bytes = %lu\n", slots, wasted); return wasted; } int main(int argc, char argv[]) { size_t best_padding = 0; size_t padding = 0; for (padding = 0; padding <= 2; padding++) { size_t wasted = test(8 - padding); if (wasted == 0) { best_padding = padding; break; } } size_t index = 0; fprintf(stderr, "=============== naive ================\n"); size_t list_size = 4; for (index = 0; index < 10; index++) { test(list_size); list_size = 2; } fprintf(stderr, "=============== auto-padded (-%lu) ================\n", best_padding); list_size = 4; for (index = 0; index < 10; index ++) { test(list_size - best_padding); list_size = 2; } fprintf(stderr, "\n\n"); return 0; } ``` ``` ===== glibc ====== slots = 8, wasted_bytes = 8 slots = 7, wasted_bytes = 0 =============== naive ================ slots = 4, wasted_bytes = 8 slots = 8, wasted_bytes = 8 slots = 16, wasted_bytes = 8 slots = 32, wasted_bytes = 8 slots = 64, wasted_bytes = 8 slots = 128, wasted_bytes = 8 slots = 256, wasted_bytes = 8 slots = 512, wasted_bytes = 8 slots = 1024, wasted_bytes = 8 slots = 2048, wasted_bytes = 8 =============== auto-padded (-1) ================ slots = 3, wasted_bytes = 0 slots = 7, wasted_bytes = 0 slots = 15, wasted_bytes = 0 slots = 31, wasted_bytes = 0 slots = 63, wasted_bytes = 0 slots = 127, wasted_bytes = 0 slots = 255, wasted_bytes = 0 slots = 511, wasted_bytes = 0 slots = 1023, wasted_bytes = 0 slots = 2047, wasted_bytes = 0 ``` ``` ========== jemalloc ======= slots = 8, wasted_bytes = 0 =============== naive ================ slots = 4, wasted_bytes = 0 slots = 8, wasted_bytes = 0 slots = 16, wasted_bytes = 0 slots = 32, wasted_bytes = 0 slots = 64, wasted_bytes = 0 slots = 128, wasted_bytes = 0 slots = 256, wasted_bytes = 0 slots = 512, wasted_bytes = 0 slots = 1024, wasted_bytes = 0 slots = 2048, wasted_bytes = 0 =============== auto-padded (-0) ================ slots = 4, wasted_bytes = 0 slots = 8, wasted_bytes = 0 slots = 16, wasted_bytes = 0 slots = 32, wasted_bytes = 0 slots = 64, wasted_bytes = 0 slots = 128, wasted_bytes = 0 slots = 256, wasted_bytes = 0 slots = 512, wasted_bytes = 0 slots = 1024, wasted_bytes = 0 slots = 2048, wasted_bytes = 0 ```
2023-09-06	Fix crash in WeakMap during compaction	Peter Zhu
	WeakMap can crash during compaction because the st_insert could allocate memory.
2023-09-05	Introduce rb_gc_remove_weak	Peter Zhu
	If we're during incremental marking, then Ruby code can execute that deallocates certain memory buffers that have been called with rb_gc_mark_weak, which can cause use-after-free bugs. Notes: Merged: https://github.com/ruby/ruby/pull/8375
2023-08-31	Prevent rb_gc_mark_values from pinning objects	Matt Valentine-House
	This is an internal only function not exposed to the C extension API. It's only use so far is from rb_vm_mark, where it's used to mark the values in the vm->trap_list.cmd array. There shouldn't be any reason why these cannot move. This commit allows them to move by updating their references during the reference updating step of compaction. To do this we've introduced another internal function rb_gc_update_values as a partner to rb_gc_mark_values. This allows us to refactor rb_gc_mark_values to not pin Notes: Merged: https://github.com/ruby/ruby/pull/8341
2023-08-25	Implement weak references in the GC	Peter Zhu
	[Feature #19783] This commit adds support for weak references in the GC through the function `rb_gc_mark_weak`. Unlike strong references, weak references does not mark the object, but rather lets the GC know that an object refers to another one. If the child object is freed, the pointer from the parent object is overwritten with `Qundef`. Co-Authored-By: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/8113
2023-07-17	Implement Process.warmup	Jean Boussier
	[Feature #18885] For now, the optimizations performed are: - Run a major GC - Compact the heap - Promote all surviving objects to oldgen Other optimizations may follow. Notes: Merged: https://github.com/ruby/ruby/pull/7662