summaryrefslogtreecommitdiff
path: root/test/ruby/test_string.rb
AgeCommit message (Collapse)Author
2025-11-07Don't modify fstrings in rb_str_tmp_frozen_no_embed_acquire (#15104)John Hawthorn
[Bug #21671]
2025-07-14merge revision(s) fa85d23ff4a02985ebfe0716b0ff768f5b4fe13d: [Backport #21380]Takashi Kokubun
[Bug #21380] Prohibit modification in String#split block Reported at https://hackerone.com/reports/3163876
2024-11-26Many of Oniguruma functions need valid encoding stringsNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-11-26Check negative integer underflowNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-10-03Rename size_pool -> heapMatt Valentine-House
Now that we've inlined the eden_heap into the size_pool, we should rename the size_pool to heap. So that Ruby contains multiple heaps, with different sized objects. The term heap as a collection of memory pages is more in memory management nomenclature, whereas size_pool was a name chosen out of necessity during the development of the Variable Width Allocation features of Ruby. The concept of size pools was introduced in order to facilitate different sized objects (other than the default 40 bytes). They wrapped the eden heap and the tomb heap, and some related state, and provided a reasonably simple way of duplicating all related concerns, to provide multiple pools that all shared the same structure but held different objects. Since then various changes have happend in Ruby's memory layout: * The concept of tomb heaps has been replaced by a global free pages list, with each page having it's slot size reconfigured at the point when it is resurrected * the eden heap has been inlined into the size pool itself, so that now the size pool directly controls the free_pages list, the sweeping page, the compaction cursor and the other state that was previously being managed by the eden heap. Now that there is no need for a heap wrapper, we should refer to the collection of pages containing Ruby objects as a heap again rather than a size pool Notes: Merged: https://github.com/ruby/ruby/pull/11771
2024-09-09Implement String#append_as_bytes(String | Integer, ...)Jean Boussier
[Feature #20594] A handy method to construct a string out of multiple chunks. Contrary to `String#concat`, it doesn't do any encoding negociation, and simply append the content as bytes regardless of whether this result in a broken string or not. It's the caller responsibility to check for `String#valid_encoding?` in cases where it's needed. When passed integers, only the lower byte is considered, like in `String#setbyte`. Notes: Merged: https://github.com/ruby/ruby/pull/11552
2024-07-26Fix memory leak in String#start_with? when regexp times outPeter Zhu
[Bug #20653] This commit refactors how Onigmo handles timeout. Instead of raising a timeout error, onig_search will return a ONIGERR_TIMEOUT which the caller can free memory, and then raise a timeout error. This fixes a memory leak in String#start_with when the regexp times out. For example: regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001) str = "a" * 1000000 + "x" 10.times do 100.times do str.start_with?(regex) rescue end puts `ps -o rss= -p #{$$}` end Before: 33216 51936 71152 81728 97152 103248 120384 133392 133520 133616 After: 14912 15376 15824 15824 16128 16128 16144 16144 16160 16160 Notes: Merged: https://github.com/ruby/ruby/pull/11247
2024-05-28Stop marking chilled strings as frozenÉtienne Barrié
They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-04-17test_uplus_minus: Use a different string literalJean Boussier
This test fail relatively frequently and it's unclear what is happening. ``` str: {"address":"0x7fbdeb26d4e0", "type":"STRING", "shape_id":1, "slot_size":40, "class":"0x7fbdd1e0ec50", "frozen":true, "embedded":true, "fstring":true, "bytesize":3, "value":"bar", "encoding":"UTF-8", "coderange":"7bit", "memsize":40, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}} bar: {"address":"0x7fbdd0a8b138", "type":"STRING", "shape_id":1, "slot_size":40, "class":"0x7fbdd1e0ec50", "frozen":true, "embedded":true, "fstring":true, "bytesize":3, "value":"bar", "encoding":"UTF-8", "coderange":"7bit", "memsize":40, "flags":{"wb_protected":true}} ``` The `"bar".freeze` literal correctly put an old-gen fstring on the stack. But `-%w(b a r).join('')` returns a young-gen fstring, which suggest it somehow failed to find the old one in the `frozen_strings` table. This could be caused by another test corrupting the table, or corrupting the `"bar"` fstring. By using a different literal value we can learn whether the bug is specific to `"bar"` (used in many tests) or more general.
2024-04-15Include more debug information in test_uplus_minusJean Boussier
2024-04-15Add more assertions in `test_uplus_minus`Jean Boussier
Not along after 1b830740ba8371c4bcfdfc6eb2cb7e0ae81a84e0 CI started to rarely fail this test: ``` TestString#test_uplus_minus: Test::Unit::AssertionFailedError: uminus deduplicates [Feature #13077]. 1) Failure: TestString#test_uplus_minus [/tmp/ruby/src/trunk/test/ruby/test_string.rb:3368]: ``` It's unclear what is going on, but one possibility is that `"bar".freeze` might no longer compile correctly. Another possibility is that another test redefine `String#freeze`, causing `opt_str_freeze` to no longer return an `fstring`.
2024-04-04Prevent "ambiguous first argument" warningsYusuke Endoh
2024-03-27[DOC] remove repetitive words in commentscrazeteam
Signed-off-by: crazeteam <lilujing@outlook.com>
2024-03-25[Bug #20389] Chilled string cannot be a shared rootNobuyoshi Nakada
2024-03-20Avoid deprecation warnings in TestStringJean Boussier
2024-03-19Implement chilled stringsÉtienne Barrié
[Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-03-12[Bug #20277] Remove stale `String` test conditionalsNobuyoshi Nakada
These instance variables for conditional execution have remained unchanged for nearly twenty years, since YARV merger.
2024-02-27Add a failing test for https://bugs.ruby-lang.org/issues/20305.Fable Phippen
This bug demonstrates the issue I reported. It passes on commit https://github.com/ruby/ruby/commit/114e71d06280f9c57b9859ee4405ae89a989ddb6 but does not pass on commit (the immediate child of the above commit) https://github.com/ruby/ruby/commit/1d2d25dcadda0764f303183ac091d0c87b432566
2024-02-22Skip under_gc_compact_stress on s390x (#10073)Takashi Kokubun
2024-02-22[Bug #20292] Truncate embedded string to new capacityNobuyoshi Nakada
2024-01-16Fix coderange of invalid_encoding_string.<<(ord)tompng
Appending valid encoding character can change coderange from invalid to valid. Example: "\x95".force_encoding('sjis')<<0x5C will be a valid string "\x{955C}"
2024-01-08Fix memory leak in grapheme clustersPeter Zhu
[Bug #20150] String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end Before: 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 After: 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896
2023-12-23Fix String#sub for GC compactionPeter Zhu
The test fails when RGENGC_CHECK_MODE is turned on: TestString#test_sub_gc_compact_stress = 9.42 s 1) Failure: TestString#test_sub_gc_compact_stress [test/ruby/test_string.rb:2089]: <"aaa [amp] yyy"> expected but was <"aaa [] yyy">.
2023-12-17Stir the hash value more with encoding indexNobuyoshi Nakada
2023-12-16[Bug #20068] Encoding does not matter to empty stringsNobuyoshi Nakada
2023-12-13Make String#chomp! raise ArgumentError for 2+ arguments if string is emptyJeremy Evans
String#chomp! returned nil without checking the number of passed arguments in this case.
2023-12-01Make String#undump compaction safePeter Zhu
2023-11-29Guard match from GC in String#gsubPeter Zhu
We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.
2023-11-27Guard match from GC when scanning stringPeter Zhu
We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.
2023-09-01Add regression tests for start_with?/delete_prefixywenc
Notes: Merged: https://github.com/ruby/ruby/pull/8348
2023-08-26[Bug #19784] Fix behaviors against prefix with broken encodingNobuyoshi Nakada
- String#start_with? - String#delete_prefix - String#delete_prefix! Notes: Merged: https://github.com/ruby/ruby/pull/8296
2023-08-26Split string testsNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/8296
2023-07-15[Bug #19769] Fix range of size 1 in `String#tr`alexandre184
Notes: Merged: https://github.com/ruby/ruby/pull/8080 Merged-By: nobu <nobu@ruby-lang.org>
2023-06-28[Bug #19748] Fix out-of-bound access in `String#byteindex`Nobuyoshi Nakada
2023-06-28[Bug #19746] `String#index` with regexp should clear `$~` unless matchedNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/7988
2023-06-28Assert `$~` after `String#index` familyNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/7988
2023-06-28Use the same capacities for memory leak testsNobuyoshi Nakada
2023-01-20[Feature #19314] Add new arguments of String#bytespliceShugo Maeda
bytesplice(index, length, str, str_index, str_length) -> string bytesplice(range, str, str_range) -> string In these forms, the content of +self+ is replaced by str.byteslice(str_index, str_length) or str.byteslice(str_range); however the substring of +str+ is not allocated as a new string. Notes: Merged: https://github.com/ruby/ruby/pull/7160
2023-01-19String#bytesplice should return selfShugo Maeda
In Feature #19314, we concluded that the return value of String#bytesplice should be changed from the source string to the receiver, because the source string is useless and confusing when extra arguments are added. This change should be included in Ruby 3.2.1.
2023-01-13Remove MIN_PRE_ALLOC_SIZE from Strings.Matt Valentine-House
This optimisation is no longer helpful now that we use VWA to allocate strings in larger size pools where they can be embedded. Notes: Merged: https://github.com/ruby/ruby/pull/6965
2022-12-01Prevent segfault in String#scan with ObjectSpace.each_objectYusuke Endoh
Calling `String#scan` without a block creates an incomplete MatchData object whose `RMATCH(match)->str` is Qfalse. Usually this object is not leaked, but it was possible to pull it by using ObjectSpace.each_object. This change hides the internal MatchData object by using rb_obj_hide. Fixes [Bug #19159] Notes: Merged: https://github.com/ruby/ruby/pull/6836
2022-11-24Make String#rstrip{,!} raise Encoding::CompatibilityError for broken coderangeJeremy Evans
It's questionable whether we want to allow rstrip to work for strings where the broken coderange occurs before the trailing whitespace and not after, but this approach is probably simpler, and I don't think users should expect string operations like rstrip to work on broken strings. In some cases, this changes rstrip to raise Encoding::CompatibilityError instead of ArgumentError. However, as the problem is related to an encoding issue in the receiver, and due not due to an issue with an argument, I think Encoding::CompatibilityError is the more appropriate error. Fixes [Bug #18931] Notes: Merged: https://github.com/ruby/ruby/pull/6282
2022-10-19Transition frozen string to frozen root shapeJemma Issroff
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/6590
2022-08-11Fix inspect for unicode codepoint 0x85Jeremy Evans
This is an inelegant hack, by manually checking for this specific code point in rb_str_inspect. Some testing indicates that this is the only code point affected. It's possible a better fix would be inside of lower-level encoding code, such that rb_enc_isprint would return false and not true for codepoint 0x85. Fixes [Bug #16842] Notes: Merged: https://github.com/ruby/ruby/pull/4229
2022-07-21Make String#each_line work correctly with paragraph separator and chompJeremy Evans
Previously, it was including one newline when chomp was used, which is inconsistent with IO#each_line behavior. This makes behavior consistent with IO#each_line, chomping all paragraph separators (multiple consecutive newlines), but not single newlines. Partially Fixes [Bug #18768] Notes: Merged: https://github.com/ruby/ruby/pull/5960
2022-03-18Add String#bytespliceShugo Maeda
Notes: Merged: https://github.com/ruby/ruby/pull/5584
2022-03-13add some tests for Unicode Version 14.0.0Martin Dürst
2022-02-19Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518)Shugo Maeda
* Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110] Co-authored-by: NARUSE, Yui <naruse@airemix.jp> Notes: Merged-By: shugo <shugo@ruby-lang.org>
2022-01-08Do not run the same tests twiceNobuyoshi Nakada
2022-01-08Run an old fixed bug in the same processNobuyoshi Nakada