summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2024-05-28Stop marking chilled strings as frozenÉtienne Barrié
They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-04-18Add a hint of `ASCII-8BIT` being `BINARY`Jean Boussier
[Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.
2024-04-16Eliminate usage of OBJ_FREEZE_RAWJean Boussier
Previously it would bypass the `FL_ABLE` check, but since shapes introduction, it started having a different behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE` flag, but not update the shape. I have no indication of this causing a bug yet, but it seems like a trap waiting to happen.
2024-04-08Document STR_CHILLED flag on RStringÉtienne Barrié
[Feature #20205]
2024-04-08Add builtin type assertionNobuyoshi Nakada
2024-04-05Assert that Symbol#inspect returns a T_STRINGPeter Zhu
2024-03-31Add missing RB_GC_GUARDs related to DATA_PTRKJ Tsanaktsidis
I discovered the problem in `compile.c` from a failing TestIseqLoad#test_stressful_roundtrip test with ASAN enabled. The other two changes in array.c and string.c I found by auditing similar usages of DATA_PTR in the codebase. [Bug #20402]
2024-03-26Expose rb_str_chilled_pÉtienne Barrié
Some extensions (like stringio) may need to differentiate between chilled strings and frozen strings. They can now use rb_str_chilled_p but must check for its presence since the function will be removed when chilled strings are removed. [Bug #20389] [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-03-25[Bug #20389] Chilled string cannot be a shared rootNobuyoshi Nakada
2024-03-19Implement chilled stringsÉtienne Barrié
[Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-03-03[Bug #20322] Fix rb_enc_interned_str_cstr null encodingThomas Marshall
The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be a null pointer, but this currently causes a segmentation fault when trying to autoload the encoding. This commit fixes the issue by checking for NULL before calling `rb_enc_autoload`.
2024-02-23Stop using rb_str_locktmp_ensure publiclyPeter Zhu
rb_str_locktmp_ensure is a private API.
2024-02-23YJIT: Lazily push a frame for specialized C funcs (#10080)Takashi Kokubun
* YJIT: Lazily push a frame for specialized C funcs Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> * Fix a comment on pc_to_cfunc * Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame * Rename it to jit_prepare_lazy_frame_call * Fix a typo * Optimize String#getbyte as well * Optimize String#byteslice as well --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-02-23Stop using rb_fstring publiclyPeter Zhu
rb_fstring is a private API, so we should use rb_str_to_interned_str instead, which is a public API.
2024-02-23Remove unneeded RUBY_FUNC_EXPORTEDPeter Zhu
2024-02-22Fix -Wsign-compare on String#initializeTakashi Kokubun
../string.c:1886:57: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘long int’ [-Wsign-compare] 1886 | if (STR_EMBED_P(str)) RUBY_ASSERT(osize <= str_embed_capa(str)); | ^~
2024-02-22[Bug #20292] Truncate embedded string to new capacityNobuyoshi Nakada
2024-02-19[Bug #20280] Check by `rb_parser_enc_str_coderange`Nobuyoshi Nakada
Co-authored-by: Yuichiro Kaneko <spiketeika@gmail.com>
2024-02-19[Bug #20280] Raise SyntaxError on invalid encoding symbolNobuyoshi Nakada
2024-02-15Unset STR_SHARED when setting string to embedPeter Zhu
2024-02-15Do not include a backtick in error messages and backtracesYusuke Endoh
[Feature #16495]
2024-02-14[DOC] Doc compliance (#9955)Burdette Lamar
2024-02-13Fix use-after-move in Symbol#inspectAlan Wu
The allocation could re-embed `orig_str` and invalidate the data pointer from RSTRING_GETMEM() if the string is embedded. Found on CI, where the test introduced in 7002e776944 ("Fix Symbol#inspect for GC compaction") recently failed. See: <https://github.com/ruby/ruby/actions/runs/7880657560/job/21503019659>
2024-02-13Specialize String#byteslice(a, b) (#9939)Aaron Patterson
* Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2024-02-12Replace assert with RUBY_ASSERT in string.cPeter Zhu
assert does not print the bug report, only the file and line number of the assertion that failed. RUBY_ASSERT prints the full bug report, which makes it much easier to debug.
2024-02-08[DOC] Improve flags of stringPeter Zhu
2024-02-05Make io_fwrite safe for compactionPeter Zhu
[Bug #20169] Embedded strings are not safe for system calls without the GVL because compaction can cause pages to be locked causing the operation to fail with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire, which guarantees that the return string is not embedded.
2024-01-31Annotate Symbol#to_s as leaf (#9769)Takashi Kokubun
2024-01-17Fix memory leak in String#tr and String#tr_sPeter Zhu
rb_enc_codepoint_len could raise, which would cause the memory in buf to leak. For example: str1 = "\xE0\xA0\xA1#{" " * 100}".force_encoding("EUC-JP") str2 = "" str3 = "a".force_encoding("Windows-31J") 10.times do 1_000_000.times do str1.tr_s(str2, str3) rescue end puts `ps -o rss= -p #{$$}` end Before: 17536 22752 28032 33312 38688 43968 49200 54432 59744 64992 After: 12176 12352 12352 12448 12448 12448 12448 12448 12448 12448
2024-01-16Fix coderange of invalid_encoding_string.<<(ord)tompng
Appending valid encoding character can change coderange from invalid to valid. Example: "\x95".force_encoding('sjis')<<0x5C will be a valid string "\x{955C}"
2024-01-08Fix memory leak in grapheme clustersPeter Zhu
[Bug #20150] String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end Before: 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 After: 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896
2024-01-02[DOC] Add parentheses in call-seq for String#include?Peter Zhu
2023-12-24Fix Symbol#inspect for GC compactionPeter Zhu
The test fails when RGENGC_CHECK_MODE is turned on: 1) Failure: TestSymbol#test_inspect_under_gc_compact_stress [test/ruby/test_symbol.rb:123]: <":testing"> expected but was <":\x00\x00\x00\x00\x00\x00\x00">.
2023-12-23Fix String#sub for GC compactionPeter Zhu
The test fails when RGENGC_CHECK_MODE is turned on: TestString#test_sub_gc_compact_stress = 9.42 s 1) Failure: TestString#test_sub_gc_compact_stress [test/ruby/test_string.rb:2089]: <"aaa [amp] yyy"> expected but was <"aaa [] yyy">.
2023-12-17Stir the hash value more with encoding indexNobuyoshi Nakada
2023-12-16[Bug #20068] Encoding does not matter to empty stringsNobuyoshi Nakada
2023-12-13Make String#chomp! raise ArgumentError for 2+ arguments if string is emptyJeremy Evans
String#chomp! returned nil without checking the number of passed arguments in this case.
2023-12-01Make String#undump compaction safePeter Zhu
2023-12-01Pin embedded shared stringsPeter Zhu
Embedded shared strings cannot be moved because strings point into the slot of the shared string. There may be code using the RSTRING_PTR on the stack, which would pin the string but not pin the shared string, causing it to move.
2023-11-29Guard match from GC in String#gsubPeter Zhu
We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.
2023-11-27Guard match from GC when scanning stringPeter Zhu
We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.
2023-11-20Specialize String#dupJean Boussier
`String#+@` is 2-3 times faster than `String#dup` because it can directly go through `rb_str_dup` instead of using the generic much slower `rb_obj_dup`. This fact led to the existance of the ugly `Performance/UnfreezeString` rubocop performance rule that encourage users to rewrite the much more readable and convenient `"foo".dup` into the ugly `(+"foo")`. Let's make that rubocop rule useless. ``` compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22] last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884) built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22] warming up.. | |compare-ruby|built-ruby| |:------|-----------:|---------:| |uplus | 16.312M| 16.332M| | | -| 1.00x| |dup | 5.912M| 16.329M| | | -| 2.76x| ```
2023-11-09String#force_encoding don't clear coderange if encoding is unchangedJean Boussier
Some code out there blind calls `force_encoding` without checking what the original encoding was, which clears the coderange uselessly. If the String is big, it can be a rather costly mistake. For instance the `rack-utf8_sanitizer` gem does this on request bodies.
2023-11-08String for string literal is not resizableNobuyoshi Nakada
2023-11-02Make String.new size pools aware.Jean Boussier
If the required capacity would fit in an embded string, returns one. This can reduce malloc churn for code that use string buffers.
2023-09-27[DOC] Missing comment markersNobuyoshi Nakada
2023-09-26[Bug #19902] Update the coderange regarding the changed regionNobuyoshi Nakada
2023-09-01Use end of char boundary in start_with?John Hawthorn
Previously we used the next character following the found prefix to determine if the match ended on a broken character. This had caused surprising behaviour when a valid character was followed by a UTF-8 continuation byte. This commit changes the behaviour to instead look for the end of the last character in the prefix. [Bug #19784] Co-authored-by: ywenc <ywenc@github.com> Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/8348
2023-08-26[Bug #19784] Fix behaviors against prefix with broken encodingNobuyoshi Nakada
- String#start_with? - String#delete_prefix - String#delete_prefix! Notes: Merged: https://github.com/ruby/ruby/pull/8296
2023-08-26Introduce `at_char_boundary` functionNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/8296