summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2025-11-07Don't modify fstrings in rb_str_tmp_frozen_no_embed_acquire (#15104)John Hawthorn
[Bug #21671]
2025-10-08merge revision(s) 2bb6fe3854e2a4854bb89bfce4eaaea9d848fd1b: [Backport #21629]Takashi Kokubun
[PATCH] [Bug #21629] Initialize `struct RString` which appears to be missed in the previous commit for some reason.
2025-07-14merge revision(s) b42afa1dbcbb91e89852b3b3bc72484d7f0a5528, ↵Takashi Kokubun
f1f0cc14cc7d3f9be35b203e5583f9224b1e2387, 543e3a1896ae2fe3b5b954f6497d261ab5663a15, ed2806117a0b76e4439ce1a061fae21d9e116d69, 46e4c8673747de96838d2c5dec37446d23d99d88: [Backport #21500] Suppress gcc 15 unterminated-string-initialization warnings Separate `__has_attribute` from `defined(__has_attribute)` Fix Visual C warnings: ``` regenc.h(121): warning C4067: unexpected tokens following preprocessor directive - expected a newline ``` Cast up `int` instruction code to `VALUE` Fix Visual C warnings: ``` iseq.c(3793): warning C4312: 'type cast': conversion from 'int' to 'void *' of greater size iseq.c(3794): warning C4312: 'type cast': conversion from 'int' to 'void *' of greater size ``` Do not let files depend on a phony target Detect `clock_gettime` and `clock_getres` for winpthreads
2025-07-14merge revision(s) fa85d23ff4a02985ebfe0716b0ff768f5b4fe13d: [Backport #21380]Takashi Kokubun
[Bug #21380] Prohibit modification in String#split block Reported at https://hackerone.com/reports/3163876
2025-03-06Fix a race condition with interned strings sweeping.Jean Boussier
[Bug #21172] This fixes a rare CI failure. The timeline of the race condition is: - A `"foo" oid=1` string is interned. - `"foo" oid=1` is no longer referenced and will be swept in the future. - Another `"foo" oid=2` string is interned. - `register_fstring` finds `"foo" oid=1`, but since it is about to be swept, removes it from `fstring_table` and insert `"foo" oid=2` instead. - `"foo" oid=1` is swept, since it has the `RSTRING_FSTR` flag, a `st_delete` is issued in `fstring_table` which removes `"foo" oid=2`. I don't know how to reproduce this bug consistently in a single test case.
2024-12-13[DOC] [Feature #20205] Document the new power of String#+@Alan Wu
Notes: Merged: https://github.com/ruby/ruby/pull/12341
2024-11-27Optimize `rb_must_asciicompat`Jean Boussier
While profiling `strscan`, I noticed `rb_must_asciicompat` was quite slow, as more than 5% of the benchmark was spent in it: https://share.firefox.dev/49bOcTn By checking for the common 3 ASCII compatible encoding index first, we can skip a lot of expensive operations in the happy path. Notes: Merged: https://github.com/ruby/ruby/pull/12180
2024-11-26Many of Oniguruma functions need valid encoding stringsNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-11-26Check negative integer underflowNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-11-25Place all non-default GC API behind USE_SHARED_GCMatt Valentine-House
So that it doesn't get included in the generated binaries for builds that don't support loading shared GC modules Co-Authored-By: Peter Zhu <peter@peterzhu.ca> Notes: Merged: https://github.com/ruby/ruby/pull/12149
2024-11-20[DOC] Fix typo in comment for STR_PRECOMPUTED_HASHPeter Zhu
2024-11-19[DOC] Fix the default `limit` of String#splitKouhei Yanagita
We can't pass `nil` as the second parameter of `String#split`. Therefore, descriptions like "if limit is nil, ..." are not appropriate. Notes: Merged: https://github.com/ruby/ruby/pull/12109
2024-11-13YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069)Randy Stauner
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments String#[] is in the top few C calls of several YJIT benchmarks: liquid-compile rubocop mail sudoku This speeds up these benchmarks by 1-2%. * YJIT: Try harder to get type info for `String#[]` In the large generated code of the mail gem the context doesn't have the type info. In that case if we peek at the stack and add a guard we can still apply the specialization and it speeds up the mail benchmark by 5%. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-11-13Mark strings returned by Symbol#to_s as chilled (#12065)Jean byroot Boussier
* Use FL_USER0 for ELTS_SHARED This makes space in RString for two bits for chilled strings. * Mark strings returned by `Symbol#to_s` as chilled [Feature #20350] `STR_CHILLED` now spans on two user flags. If one bit is set it marks a chilled string literal, if it's the other it marks a `Symbol#to_s` chilled string. Since it's not possible, and doesn't make much sense to include debug info when `--debug-frozen-string-literal` is set, we can't include allocation source, but we can safely include the symbol name in the warning message, making it much easier to find the source of the issue. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com> --------- Co-authored-by: Étienne Barrié <etienne.barrie@gmail.com> Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2024-11-13string.c: preserve coderange when interning a stringJean Boussier
Since `str_do_hash` will most likely scan the string to compute the coderange, we might as well copy it over in the interned string in case it's useful later. Notes: Merged: https://github.com/ruby/ruby/pull/12077
2024-11-13string.c: Directly create strings with the correct encodingJean Boussier
While profiling msgpack-ruby I noticed a very substantial amout of time spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`. On that benchmark, `rb_utf8_str_new` is 33% of the total runtime, in big part because it cause GC to trigger often, but even then `5.3%` of the total runtime is spent in `rb_enc_associate_index` called by `rb_utf8_str_new`. After closer inspection, it appears that it's performing a lot of safety check we can assert we don't need, and other extra useless operations, because strings are first created and filled as ASCII-8BIT and then later reassociated to the desired encoding. By directly allocating the string with the right encoding, it allow to skip a lot of duplicated and useless operations. After this change, the time spent in `rb_utf8_str_new` is down to `28.4%` of total runtime, and most of that is GC. Notes: Merged: https://github.com/ruby/ruby/pull/12076
2024-11-13Move `Symbol#name` into `symbol.rb`Jean Boussier
This allows to declare it as leaf just like `Symbol#to_s`. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2024-11-06Store precomputed hash when there's capacityÉtienne Barrié
Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11990
2024-11-04Precompute hash only once when interning string literalsÉtienne Barrié
When a fake string is interned, use the capa field to store the string hash. This lets us compute it once for hash lookup and embedding the hash in the interned string. Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11989
2024-10-21Fix an off-by-one error of own memrchr implementationYusuke Endoh
and make it support `search_len == 0`, just for the case Ref [Bug #20796] Notes: Merged: https://github.com/ruby/ruby/pull/11923
2024-10-21Show where mutated chilled strings were allocatedÉtienne Barrié
[Feature #20205] The warning now suggests running with --debug-frozen-string-literal: ``` test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) ``` When using --debug-frozen-string-literal, the location where the string was created is shown: ``` test.rb:3: warning: literal string will be frozen in the future test.rb:1: info: the string was created here ``` When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW. When mutating chilled strings and deprecation warnings are not enabled, the overhead is a simple warning category enabled check. Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/11893
2024-10-07[DOC] String#sub! and String#gsub! return nil if no replacement occuredHolger Just
Notes: Merged: https://github.com/ruby/ruby/pull/11700 Merged-By: nobu <nobu@ruby-lang.org>
2024-09-24Use rb_bug instead of UNREACHABLE for assertionsPeter Zhu
UNREACHABLE uses __builtin_unreachable which is not intended to be used as an assertion. Notes: Merged: https://github.com/ruby/ruby/pull/11678
2024-09-24Fix undefined behavior in String#append_as_bytesPeter Zhu
The UNREACHABLE macro calls __builtin_unreachable, which according to the [GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005funreachable): > If control flow reaches the point of the __builtin_unreachable, the > program is undefined. But it can reach this point with the following script: "123".append_as_bytes("123") This can crash on some platforms with a `Trace/BPT trap: 5`. Notes: Merged: https://github.com/ruby/ruby/pull/11678
2024-09-18Update exception message in string_for_symbolJeremy Evans
This is a static function only called in two places (rb_to_id and rb_to_symbol), and in both places, both symbols and strings are allowed. This makes the error message consistent with rb_check_id and rb_check_symbol. Fixes [Bug #20607] Notes: Merged: https://github.com/ruby/ruby/pull/11097
2024-09-09Implement String#append_as_bytes(String | Integer, ...)Jean Boussier
[Feature #20594] A handy method to construct a string out of multiple chunks. Contrary to `String#concat`, it doesn't do any encoding negociation, and simply append the content as bytes regardless of whether this result in a broken string or not. It's the caller responsibility to check for `String#valid_encoding?` in cases where it's needed. When passed integers, only the lower byte is considered, like in `String#setbyte`. Notes: Merged: https://github.com/ruby/ruby/pull/11552
2024-09-04Fix documentation for String#index and String#byterindexJean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/11540
2024-09-04Adjust indents [ci skip]Nobuyoshi Nakada
2024-09-03rb_enc_str_asciionly_p: avoid always fetching the encodingJean Boussier
Profiling of `JSON.dump` shows a significant amount of time spent in `rb_enc_str_asciionly_p`, in large part because it fetches the encoding. It can be made twice as fast in this scenario by first checking the coderange and only falling back to fetching the encoding if the coderange is unknown. Additionally we can skip fetching the encoding for the common popular encodings. Notes: Merged: https://github.com/ruby/ruby/pull/11533
2024-09-03Improve String#rindex performance on OSXZack Deveau
On OSX, String#rindex is slow due to the lack of `memrchr`. The fallback implementation finds a match by instead doing a `memcmp` on every single character in the search string looking for a substring match. For OSX hosts, this changeset introduces a simple `memrchr` implementation, `rb_memrchr`, that can be used instead. An example benchmark below demonstrates an 8000 char long search string with a 10 char substring near the end. ``` ruby-master | substring near the end | osx UTF-8 user system total real index 0.000111 0.000000 0.000111 ( 0.000110) rindex 0.000446 0.000005 0.000451 ( 0.000454) ``` ``` ruby-patched | substring near the end | osx UTF-8 user system total real index 0.000112 0.000000 0.000112 ( 0.000111) rindex 0.000057 0.000001 0.000058 ( 0.000057) ``` Notes: Merged: https://github.com/ruby/ruby/pull/11519
2024-08-09rb_str_bytesplice: skip encoding check if encodings are the sameJean Boussier
If both strings have the same encoding, all this work is useless. Notes: Merged: https://github.com/ruby/ruby/pull/11353
2024-08-09string.c: add fastpath in str_ensure_byte_posJean Boussier
If the string only contain single byte characters we can skips all the costly checks. Notes: Merged: https://github.com/ruby/ruby/pull/11353
2024-08-09string.c: Add fastpath to single_byte_optimizableJean Boussier
`rb_enc_from_index` is a costly operation so it is worth avoiding to call it for the common encodings. Also in the case of UTF-8, it's more efficient to scan the coderange if it is unknown that to fallback to the slower algorithms. Notes: Merged: https://github.com/ruby/ruby/pull/11353
2024-08-09string.c: str_capacity don't check for immediatesJean Boussier
`STR_EMBED_P` uses `FL_TEST_RAW` meaning we already assume `str` isn't an immediate, so we can use `FL_TEST_RAW` here too. Notes: Merged: https://github.com/ruby/ruby/pull/11350
2024-08-09str_independent: add a fastpath with a single flag checkJean Boussier
If we assume that most strings we modify are not frozen and are independent, then we can optimize this case by replacing multiple flag checks by a single mask check. Notes: Merged: https://github.com/ruby/ruby/pull/11350
2024-08-02YJIT: Enhance the `String#<<` method substitution to handle integer ↵Kevin Menard
codepoint values. (#11032) * Document why we need to explicitly spill registers. * Simplify passing a byte value to `str_buf_cat`. * YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. * YJIT: Move runtime type check into YJIT. Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum. Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2024-06-19String.new(capacity:) don't substract termlenJean Boussier
[Bug #20585] This was changed in 36a06efdd9f0604093dccbaf96d4e2cb17874dc8 because `String.new(1024)` would end up allocating `1025` bytes, but the problem with this change is that the caller may be trying to right size a String. So instead, we should just better document the behavior of `capacity:`.
2024-06-17Add a fast path implementation for appending single byte values to US-ASCII ↵Kevin Menard
strings.
2024-06-17Add a fast path implementation for appending single byte values to binary ↵Kevin Menard
strings. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-06-13Simplify unaligned write for pre-computed string hashAlan Wu
2024-06-13rb_str_hash(): Avoid UB with making misaligned pointerAlan Wu
Previously, on common platforms, this code made a pointer to a union of 8 byte alignment out of a char pointer that is not guaranteed to satisfy the alignment requirement. That is undefined behavior according to [C99 6.3.2.3p7](https://port70.net/~nsz/c/c99/n1256.html#6.3.2.3p7). Use memcpy() to do the unaligned read instead.
2024-06-13Simplify rb_str_resize clear range conditiontompng
2024-06-13Clear coderange when rb_str_resize change sizetompng
In some encoding like utf-16 utf-32, expanding the string with null bytes can change coderange to either broken or valid.
2024-06-09[Bug #20566] Mention out-of-range argument cases in `String#<<`Nobuyoshi Nakada
Also [Bug #18973].
2024-06-02Stop exposing `rb_str_chilled_p`Jean Boussier
[Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings.
2024-05-28[Bug #20512] Set coderange in `Range#each` of stringsNobuyoshi Nakada
2024-05-28Set empty strings to ASCII-onlyNobuyoshi Nakada
2024-05-28Precompute embedded string literals hash codeJean Boussier
With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |symbol | 39.275M| 39.753M| | | -| 1.01x| |dyn_symbol | 37.348M| 37.704M| | | -| 1.01x| |small_lit | 29.514M| 33.948M| | | -| 1.15x| |frozen_lit | 27.180M| 33.056M| | | -| 1.22x| |iseq_lit | 27.391M| 32.242M| | | -| 1.18x| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2024-05-28Stop marking chilled strings as frozenÉtienne Barrié
They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-04-18Add a hint of `ASCII-8BIT` being `BINARY`Jean Boussier
[Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.