summaryrefslogtreecommitdiff
path: root/test/ruby/test_set.rb
AgeCommit message (Collapse)Author
2025-12-17Revert the override of Enumerator#to_set that performed size checksAkinori Musha
[Bug #21780]
2025-12-12Fix Set#^ to not mutate its argument (#15296)Gil Desmarais
* test(set): add test Set#xor does not mutate other_set * Fix Set#^ to not mutate its argument
2025-11-20Support backwards compatibility for Set subclassesJeremy Evans
For subclasses from Set, require `set/subclass_compatible`, and extend the subclass and include a module in it that makes it more backwards compatible with the pure Ruby Set implementation used before Ruby 4. The module included in the subclass contains a near-copy of the previous Set implementation, with the following changes: * Accesses to `@hash` are generally replaced with `super` calls. In some cases, they are replaced with a call to another instance method. * Some methods that only accessed `@hash` and nothing else are not defined, so they inherit behavior from core Set. * The previous `Set#divide` implementation is not used, to avoid depending on tsort. This fixes the following two issues: * [Bug #21375] Set[] does not call #initialize * [Bug #21396] Set#initialize should call Set#add on items passed in It should also fix the vast majority of backwards compatibility issues in other cases where code subclassed Set and depended on implementation details (such as which methods call which other methods). This does not affect Set internals, so Set itself remains fast. For users who want to subclass Set but do not need to worry about backwards compatibility, they can subclass from Set::CoreSet, a Set subclass that does not have the backward compatibility layer included.
2025-11-13Add size checks to Range#to_set and Enumerator#to_set [Bug #21654]Akinori Musha
These two class are most common sources of infinite sequences. This change should effectively prevent accidental infinite loops when calling to_set on them. [Bug #21513]
2025-11-13Revert "[Bug #21513] Raise on converting endless range to set"Akinori Musha
This reverts commit d4020dd5faf28486123853e7f00c36139fc07793, which introduced performance regression for objects like ActiveRecord::Relation by calling the costly #size method on them.
2025-08-12set.c: Store `set_table->bins` at the end of `set_table->entries`Jean Boussier
This saves one pointer in `struct set_table`, which would allow `Set` objects to still fit in 80B TypedData slots even if RTypedData goes from 32B to 40B large. The existing set benchmark seem to show this doesn't have a very significant impact. Smaller sets are a bit faster, larger sets a bit slower. It seem consistent over multiple runs, but it's unclear how much of that is just error margin. ``` compare-ruby: ruby 3.5.0dev (2025-08-12T02:14:57Z master 428937a536) +YJIT +PRISM [arm64-darwin24] built-ruby: ruby 3.5.0dev (2025-08-12T07:22:26Z set-entries-bounds da30024fdc) +YJIT +PRISM [arm64-darwin24] warming up........ | |compare-ruby|built-ruby| |:------------------------|-----------:|---------:| |new_0 | 15.459M| 15.823M| | | -| 1.02x| |new_10 | 3.484M| 3.574M| | | -| 1.03x| |new_100 | 546.992k| 564.679k| | | -| 1.03x| |new_1000 | 49.391k| 48.169k| | | 1.03x| -| |aref_0 | 18.643M| 19.350M| | | -| 1.04x| |aref_10 | 5.941M| 6.006M| | | -| 1.01x| |aref_100 | 822.197k| 814.219k| | | 1.01x| -| |aref_1000 | 83.230k| 79.411k| | | 1.05x| -| ```
2025-07-29[Bug #21513] Raise on converting endless range to setviralpraxis
ref: https://bugs.ruby-lang.org/issues/21513 Before this patch, trying to convert endless range (e.g. `(1..)`) to set (using `to_set`) would hang
2025-06-25Include Set subclass name in Set#inspect outputJeremy Evans
Fixes [Bug #21377] Co-authored-by: zzak <zzak@hey.com>
2025-06-25Simplify Set#inspect outputJeremy Evans
As Set is now a core collection class, it should have special inspect output. Ideally, inspect output should be suitable to eval, similar to array and hash (assuming the elements are also suitable to eval): set = Set[1, 2, 3] eval(set.inspect) == set # should be true The simplest way to do this is to use the Set[] syntax. This deliberately does not use any subclass name in the output, similar to array and hash. It is more important that users know they are dealing with a set than which subclass: Class.new(Set)[] # this does: Set[] # not: #<Class:0x00000c21c78699e0>[] This inspect change breaks the power_assert bundled gem tests, so add power_assert to TEST_BUNDLED_GEMS_ALLOW_FAILURES in the workflows. Implements [Feature #21389]
2025-06-24[Bug #21449] Fix Set#divide{|a,b|} using Union-find structure (#13680)tomoya ishida
* [Bug #21449] Fix Set#divide{|a,b|} using Union-find structure Implements Union-find structure with path compression. Since divide{|a,b|} calls the given block n**2 times in the worst case, there is no need to implement union-by-rank or union-by-size optimization. * Avoid internal arrays from being modified from block passed to Set#divide Internal arrays can be modified from yielded block through ObjectSpace. Freeze readonly array, use ALLOCV_N instead of mutable array.
2025-06-09Add missing write barrier in set_i_initialize_copyJohn Hawthorn
When we copy the table from one set to another we need to run write barriers. Notes: Merged: https://github.com/ruby/ruby/pull/13558
2025-05-04Handle mutating of array passed to Set.new during iterationJeremy Evans
This avoids a heap-use-after-free. Fixes [Bug #21306] Notes: Merged: https://github.com/ruby/ruby/pull/13253
2025-05-04Handle mutation of array being merged into setJeremy Evans
Check length of array during every iteration, as a #hash method could truncate the array, resulting in heap-use-after-free. Fixes [Bug #21305] Notes: Merged: https://github.com/ruby/ruby/pull/13253
2025-04-29test/ruby/test_set.rb: mmtk doesn't have GC.compactJean Boussier
2025-04-29Don't call hash tombstone compaction from GC compactionAaron Patterson
Tombstone removal may possibly require allocation, and we're not allowed to allocate during GC. This commit also renames `set_compact` to `set_update_references` to differentiate tombstone removal compaction with GC object compaction. Co-Authored-By: Max Bernstein <max.bernstein@shopify.com> Co-authored-by: Jean Boussier <jean.boussier@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/13206
2025-04-28Support Marshal.{dump,load} for core SetJeremy Evans
This was missed when adding core Set, because it's handled implicitly for T_OBJECT. Keep marshal compatibility between core Set and stdlib Set, so you can unmarshal core Set with stdlib Set and vice versa. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/13185 Merged-By: jeremyevans <code@jeremyevans.net>
2025-04-26Implement Set as a core classJeremy Evans
Set has been an autoloaded standard library since Ruby 3.2. The standard library Set is less efficient than it could be, as it uses Hash for storage, which stores unnecessary values for each key. Implementation details: * Core Set uses a modified version of `st_table`, named `set_table`. than `s/st_/set_/`, the main difference is that the stored records do not have values, making them 1/3 smaller. `st_table_entry` stores `hash`, `key`, and `record` (value), while `set_table_entry` only stores `hash` and `key`. This results in large sets using ~33% less memory compared to stdlib Set. For small sets, core Set uses 12% more memory (160 byte object slot and 64 malloc bytes, while stdlib set uses 40 for Set and 160 for Hash). More memory is used because the set_table is embedded and 72 bytes in the object slot are currently wasted. Hopefully we can make this more efficient and have it stored in an 80 byte object slot in the future. * All methods are implemented as cfuncs, except the pretty_print methods, which were moved to `lib/pp.rb` (which is where the pretty_print methods for other core classes are defined). As is typical for core classes, internal calls call C functions and not Ruby methods. For example, to check if something is a Set, `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the related object. * Almost all methods use the same algorithm that the pure-Ruby implementation used. The exception is when calling `Set#divide` with a block with 2-arity. The pure-Ruby method used tsort to implement this. I developed an algorithm that only allocates a single intermediate hash and does not need tsort. * The `flatten_merge` protected method is no longer necessary, so it is not implemented (it could be). * Similar to Hash/Array, subclasses of Set are no longer reflected in `inspect` output. * RDoc from stdlib Set was moved to core Set, with minor updates. This includes a comprehensive benchmark suite for all public Set methods. As you would expect, the native version is faster in the vast majority of cases, and multiple times faster in many cases. There are a few cases where it is significantly slower: * Set.new with no arguments (~1.6x) * Set#compare_by_identity for small sets (~1.3x) * Set#clone for small sets (~1.5x) * Set#dup for small sets (~1.7x) These are slower as Set does not currently use the AR table optimization that Hash does, so a new set_table is initialized for each call. I'm not sure it's worth the complexity to have an AR table-like optimization for small sets (for hashes it makes sense, as small hashes are used everywhere in Ruby). The rbs and repl_type_completor bundled gems will need updates to support core Set. The pull request marks them as allowed failures. This passes all set tests with no changes. The following specs needed modification: * Modifying frozen set error message (changed for the better) * `Set#divide` when passed a 2-arity block no longer yields the same object as both the first and second argument (this seems like an issue with the previous implementation). * Set-like objects that override `is_a?` such that `is_a?(Set)` return `true` are no longer treated as Set instances. * `Set.allocate.hash` is no longer the same as `nil.hash` * `Set#join` no longer calls `Set#to_a` (it calls the underlying C function). * `Set#flatten_merge` protected method is not implemented. Previously, `set.rb` added a `SortedSet` autoload, which loads `set/sorted_set.rb`. This replaces the `Set` autoload in `prelude.rb` with a `SortedSet` autoload, but I recommend removing it and `set/sorted_set.rb`. This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`, reflecting that switch to a core class. This does not move the spec files, as I'm not sure how they should be handled. Internally, this uses the st_* types and functions as much as possible, and only adds set_* types and functions as needed. The underlying set_table implementation is stored in st.c, but there is no public C-API for it, nor is there one planned, in order to keep the ability to change the internals going forward. For internal uses of st_table with Qtrue values, those can probably be replaced with set_table. To do that, include internal/set_table.h. To handle symbol visibility (rb_ prefix), internal/set_table.h uses the same macro approach that include/ruby/st.h uses. The Set class (rb_cSet) and all methods are defined in set.c. There isn't currently a C-API for the Set class, though C-API functions can be added as needed going forward. Implements [Feature #21216] Co-authored-by: Jean Boussier <jean.boussier@gmail.com> Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>