summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2022-12-22[DOC] Fix typoNobuyoshi Nakada
2022-12-02Introduce encoding check macroS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6700
2022-11-24Make String#rstrip{,!} raise Encoding::CompatibilityError for broken coderangeJeremy Evans
It's questionable whether we want to allow rstrip to work for strings where the broken coderange occurs before the trailing whitespace and not after, but this approach is probably simpler, and I don't think users should expect string operations like rstrip to work on broken strings. In some cases, this changes rstrip to raise Encoding::CompatibilityError instead of ArgumentError. However, as the problem is related to an encoding issue in the receiver, and due not due to an issue with an argument, I think Encoding::CompatibilityError is the more appropriate error. Fixes [Bug #18931] Notes: Merged: https://github.com/ruby/ruby/pull/6282
2022-11-16Using UNDEF_P macroS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/6721
2022-11-15Rewrite Symbol#to_sym and #intern in Ruby (#6683)Takashi Kokubun
Notes: Merged-By: k0kubun <takashikkbn@gmail.com>
2022-11-14Use string's capacity to determine if reembeddablePeter Zhu
During auto-compaction, using length to determine whether or not a string can be re-embedded may be a problem for newly created strings. This is because usually it requires a malloc before setting the length. If the malloc triggers compaction, then the string may be re-embedded and can cause crashes.
2022-11-03Make str_alloc_heap return a STR_NOEMBED stringPeter Zhu
This commit refactors str_alloc_heap to return a string with the STR_NOEMBED flag set. Notes: Merged: https://github.com/ruby/ruby/pull/6663
2022-10-04Correcting example for swapcase! methodVaevictusnet
Example, line 3, swapcase! was incorrect. implied that the swapcase! did /not/ change the starting string. Notes: Merged: https://github.com/ruby/ruby/pull/6474
2022-09-28Fix bug when slicing a string with broken encodingPeter Zhu
Commit aa2a428 introduced a bug where non-embedded string slices copied the encoding of the original string. If the original string had a broken encoding but the slice has valid encoding, then the slice would be incorrectly marked as broken encoding. Notes: Merged: https://github.com/ruby/ruby/pull/6456
2022-09-28Make string slices views rather than copiesPeter Zhu
Just like commit 1c16645 for arrays, this commit changes string slices to be a view rather than a copy even if it can be allocated through VWA. Notes: Merged: https://github.com/ruby/ruby/pull/6456
2022-09-26Refactor str_substr and str_subseqPeter Zhu
This commit extracts common code between str_substr and rb_str_subseq into a function called str_subseq. This commit also applies optimizations in commit 2e88bca to rb_str_subseq. Notes: Merged: https://github.com/ruby/ruby/pull/6447
2022-09-26string.c: don't create a frozen copy for str_new_sharedJean Boussier
str_new_shared already has all the necessary logic to do this and is also smart enough to skip this step if the source string is already a shared string itself. This saves a useless String allocation on each call. Notes: Merged: https://github.com/ruby/ruby/pull/6443
2022-09-26Fix coderange calculation in String#bKazuki Yamaguchi
Leave the new coderange unknown if the original encoding is not ASCII-compatible. Non-ASCII-compatible encoding strings with valid or broken coderange can end up as ascii-only. Fixes 9a8f6e392fbd ("Cheaply derive code range for String#b return value", 2022-07-25).
2022-09-23Revert "Revert "error.c: Let Exception#inspect inspect its message""Yusuke Endoh
This reverts commit b9f030954a8a1572032f3548b39c5b8ac35792ce. [Bug #18170]
2022-09-12Remove get_actual_encoding() and the dynamic endian detection for dummy ↵Benoit Daloze
UTF-16/UTF-32 * And simplify callers of get_actual_encoding(). * See [Feature #18949]. * See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-09Avoid unnecessary copying when removing the leading part of a stringKazuki Yamaguchi
Remove the superfluous str_modify_keep_cr() call from rb_str_update(). It ends up calling either rb_str_drop_bytes() or rb_str_splice_0(), which already does checks if necessary. The extra call makes the string "independent". This is not always wanted, in other words, it can keep the same shared root when merely removing the leading part of a shared string. Notes: Merged: https://github.com/ruby/ruby/pull/6336
2022-09-08rb_str_concat_literals: use rb_str_buf_appendJean Boussier
That's about 1.30x faster. Notes: Merged: https://github.com/ruby/ruby/pull/6334
2022-09-08[DOC] non-positive `base` in `Kernel#Integer` and `String#to_i`Nobuyoshi Nakada
2022-08-31[Bug #18973] Promote US-ASCII to ASCII-8BIT when adding 8-bit charNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/6306
2022-08-27[DOC] Fix a typo [ci skip]Nobuyoshi Nakada
2022-08-20Check if encoding capable object before check if ASCII compatibleNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/6260
2022-08-18rb_str_resize: Only clear coderange on truncationJean Boussier
If we are expanding the string or only stripping extra capacity then coderange won't change, so clearing it is wasteful. Notes: Merged: https://github.com/ruby/ruby/pull/6178
2022-08-11Fix inspect for unicode codepoint 0x85Jeremy Evans
This is an inelegant hack, by manually checking for this specific code point in rb_str_inspect. Some testing indicates that this is the only code point affected. It's possible a better fix would be inside of lower-level encoding code, such that rb_enc_isprint would return false and not true for codepoint 0x85. Fixes [Bug #16842] Notes: Merged: https://github.com/ruby/ruby/pull/4229
2022-07-26Adjust indent [ci skip]Nobuyoshi Nakada
2022-07-26Cheaply derive code range for String#b return valueKevin Menard
The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan. Notes: Merged: https://github.com/ruby/ruby/pull/6183
2022-07-25rb_str_buf_append: add a fast path for ENC_CODERANGE_VALIDJean Boussier
If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:-------------------|-----------:|---------:| |binary_concat_7bit | 554.816k| 556.460k| | | -| 1.00x| |utf8_concat_7bit | 556.367k| 555.101k| | | 1.00x| -| |utf8_concat_UTF8 | 412.555k| 556.824k| | | -| 1.35x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6163
2022-07-21Expand tabs [ci skip]Takashi Kokubun
[Misc #18891] Notes: Merged: https://github.com/ruby/ruby/pull/6094
2022-07-21Make String#each_line work correctly with paragraph separator and chompJeremy Evans
Previously, it was including one newline when chomp was used, which is inconsistent with IO#each_line behavior. This makes behavior consistent with IO#each_line, chomping all paragraph separators (multiple consecutive newlines), but not single newlines. Partially Fixes [Bug #18768] Notes: Merged: https://github.com/ruby/ruby/pull/5960
2022-07-21string.c: use str_enc_fastpath in TERM_LENJean Boussier
Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 510.580k| 565.600k| | | -| 1.11x| |binary_concat_binary | 512.653k| 571.483k| | | -| 1.11x| |utf8_concat_utf8 | 511.396k| 566.879k| | | -| 1.11x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6160
2022-07-19str_buf_cat: preserve coderange when going through fastpathJean Boussier
rb_str_modify clear the coderange, which in this case isn't necessary. ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 360.617k| 605.091k| | | -| 1.68x| |binary_concat_binary | 446.650k| 605.053k| | | -| 1.35x| |utf8_concat_utf8 | 454.166k| 597.311k| | | -| 1.32x| ``` ``` | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |erb_render | 1.790M| 2.045M| | | -| 1.14x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6120
2022-07-19rb_str_buf_append: fastpath to str_buf_catJean Boussier
If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_append_utf8 | 385.315k| 573.663k| | | -| 1.49x| |binary_append_binary | 446.579k| 574.898k| | | -| 1.29x| |utf8_append_utf8 | 430.936k| 573.394k| | | -| 1.33x| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat Notes: Merged: https://github.com/ruby/ruby/pull/6120
2022-07-19Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BITJean Boussier
Otherwise it's way too easy to confuse it with US_ASCII. Notes: Merged: https://github.com/ruby/ruby/pull/6127
2022-07-13[DOC] Correct call-seq directive in string.c (#6131)Burdette Lamar
Correct call-seq directive in string.c Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-06-17Using is_ascii_string to check encodingS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/5867
2022-06-16Remove unused and accidentally public rb_str_shared_root_p()Alan Wu
This function was added to a public header in [1] probably unintentionally since it's not used anywhere, exposes implementation details, and isn't related to the goals of that pull request. [1]: 56cc3e99b6b9ec004255280337f6b8353f5e5b06 Notes: Merged: https://github.com/ruby/ruby/pull/6023 Merged-By: XrXr
2022-06-14Add placeholder to let braces matchNobuyoshi Nakada
2022-06-13Move String RVALUES between poolsMatt Valentine-House
And re-embed any strings that can now fit inside the slot they've been moved to Notes: Merged: https://github.com/ruby/ruby/pull/5986
2022-06-09[DOC] Fix markup for `String` (#5984)Alexander Ilyin
* Add missing space for `String#start_with?`. * Add missing pluses for `String#tr` and `Methods for Converting to New String` label. * Move quote into the tag for `Whitespace in Strings` label. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-06-07Revert "error.c: Let Exception#inspect inspect its message"Yusuke Endoh
This reverts commit 9d927204e7b86eb00bfd07a060a6383139edf741. Notes: Merged: https://github.com/ruby/ruby/pull/5981
2022-06-07error.c: Let Exception#inspect inspect its messageYusuke Endoh
... only when the message string has a newline. `p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">' instead of: #<StandardError: bar> [Bug #18170] Notes: Merged: https://github.com/ruby/ruby/pull/4857
2022-05-20[Feature #18595] Alias String#-@ as String#dedupJean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/5583
2022-04-14[DOC] Move the documentations of moved Symbol methodsNobuyoshi Nakada
2022-04-13[DOC] Enhanced RDoc for Symbol (#5796)Burdette Lamar
Treats: #[] #length #empty? #upcase #downcase #capitalize #swapcase #start_with? #end_with? #encoding ::all_symbols Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-13Enforce literals on the second argumentsNobuyoshi Nakada
2022-04-12Enhanced RDoc for Symbol (#5795)Burdette Lamar
Treats: #== #inspect #name #to_s #to_sym #to_proc #succ #<=> #casecmp #casecmp? #=~ #match #match? Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-08Fix some RDoc links (#5778)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-07All-in-one RDoc for class String (#5777)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-06[DOC] Enhanced RDoc for string slices (#5769)Burdette Lamar
Creates file doc/string/slices.rdoc that the string slicing methods can link to. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-04Enhanced RDoc for String#index (#5759)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-03[DOC] Enhanced RDoc for String (#5753)Burdette Lamar
Treats: #length #bytesize Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>