summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2022-07-26Adjust indent [ci skip]Nobuyoshi Nakada
2022-07-26Cheaply derive code range for String#b return valueKevin Menard
The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan. Notes: Merged: https://github.com/ruby/ruby/pull/6183
2022-07-25rb_str_buf_append: add a fast path for ENC_CODERANGE_VALIDJean Boussier
If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:-------------------|-----------:|---------:| |binary_concat_7bit | 554.816k| 556.460k| | | -| 1.00x| |utf8_concat_7bit | 556.367k| 555.101k| | | 1.00x| -| |utf8_concat_UTF8 | 412.555k| 556.824k| | | -| 1.35x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6163
2022-07-21Expand tabs [ci skip]Takashi Kokubun
[Misc #18891] Notes: Merged: https://github.com/ruby/ruby/pull/6094
2022-07-21Make String#each_line work correctly with paragraph separator and chompJeremy Evans
Previously, it was including one newline when chomp was used, which is inconsistent with IO#each_line behavior. This makes behavior consistent with IO#each_line, chomping all paragraph separators (multiple consecutive newlines), but not single newlines. Partially Fixes [Bug #18768] Notes: Merged: https://github.com/ruby/ruby/pull/5960
2022-07-21string.c: use str_enc_fastpath in TERM_LENJean Boussier
Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 510.580k| 565.600k| | | -| 1.11x| |binary_concat_binary | 512.653k| 571.483k| | | -| 1.11x| |utf8_concat_utf8 | 511.396k| 566.879k| | | -| 1.11x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6160
2022-07-19str_buf_cat: preserve coderange when going through fastpathJean Boussier
rb_str_modify clear the coderange, which in this case isn't necessary. ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 360.617k| 605.091k| | | -| 1.68x| |binary_concat_binary | 446.650k| 605.053k| | | -| 1.35x| |utf8_concat_utf8 | 454.166k| 597.311k| | | -| 1.32x| ``` ``` | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |erb_render | 1.790M| 2.045M| | | -| 1.14x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/6120
2022-07-19rb_str_buf_append: fastpath to str_buf_catJean Boussier
If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_append_utf8 | 385.315k| 573.663k| | | -| 1.49x| |binary_append_binary | 446.579k| 574.898k| | | -| 1.29x| |utf8_append_utf8 | 430.936k| 573.394k| | | -| 1.33x| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat Notes: Merged: https://github.com/ruby/ruby/pull/6120
2022-07-19Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BITJean Boussier
Otherwise it's way too easy to confuse it with US_ASCII. Notes: Merged: https://github.com/ruby/ruby/pull/6127
2022-07-13[DOC] Correct call-seq directive in string.c (#6131)Burdette Lamar
Correct call-seq directive in string.c Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-06-17Using is_ascii_string to check encodingS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/5867
2022-06-16Remove unused and accidentally public rb_str_shared_root_p()Alan Wu
This function was added to a public header in [1] probably unintentionally since it's not used anywhere, exposes implementation details, and isn't related to the goals of that pull request. [1]: 56cc3e99b6b9ec004255280337f6b8353f5e5b06 Notes: Merged: https://github.com/ruby/ruby/pull/6023 Merged-By: XrXr
2022-06-14Add placeholder to let braces matchNobuyoshi Nakada
2022-06-13Move String RVALUES between poolsMatt Valentine-House
And re-embed any strings that can now fit inside the slot they've been moved to Notes: Merged: https://github.com/ruby/ruby/pull/5986
2022-06-09[DOC] Fix markup for `String` (#5984)Alexander Ilyin
* Add missing space for `String#start_with?`. * Add missing pluses for `String#tr` and `Methods for Converting to New String` label. * Move quote into the tag for `Whitespace in Strings` label. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-06-07Revert "error.c: Let Exception#inspect inspect its message"Yusuke Endoh
This reverts commit 9d927204e7b86eb00bfd07a060a6383139edf741. Notes: Merged: https://github.com/ruby/ruby/pull/5981
2022-06-07error.c: Let Exception#inspect inspect its messageYusuke Endoh
... only when the message string has a newline. `p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">' instead of: #<StandardError: bar> [Bug #18170] Notes: Merged: https://github.com/ruby/ruby/pull/4857
2022-05-20[Feature #18595] Alias String#-@ as String#dedupJean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/5583
2022-04-14[DOC] Move the documentations of moved Symbol methodsNobuyoshi Nakada
2022-04-13[DOC] Enhanced RDoc for Symbol (#5796)Burdette Lamar
Treats: #[] #length #empty? #upcase #downcase #capitalize #swapcase #start_with? #end_with? #encoding ::all_symbols Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-13Enforce literals on the second argumentsNobuyoshi Nakada
2022-04-12Enhanced RDoc for Symbol (#5795)Burdette Lamar
Treats: #== #inspect #name #to_s #to_sym #to_proc #succ #<=> #casecmp #casecmp? #=~ #match #match? Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-08Fix some RDoc links (#5778)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-07All-in-one RDoc for class String (#5777)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-06[DOC] Enhanced RDoc for string slices (#5769)Burdette Lamar
Creates file doc/string/slices.rdoc that the string slicing methods can link to. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-04Enhanced RDoc for String#index (#5759)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-03[DOC] Enhanced RDoc for String (#5753)Burdette Lamar
Treats: #length #bytesize Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-04-02[DOC] Enhanced RDoc for String (#5751)Burdette Lamar
Adds to doc for String.new, also making it compliant with documentation_guide.rdoc. Fixes some broken links in io.c (that I failed to correct yesterday). Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-31[DOC] Enhanced RDoc for String (#5742)Burdette Lamar
Treats: #force_encoding #b #valid_encoding? #ascii_only? #scrub #scrub! #unicode_normalized? Plus a couple of minor tweaks. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-30Repaired What's Here sections for Range, String, Symbol, Struct (#5735)Burdette Lamar
Repaired What's Here sections for Range, String, Symbol, Struct. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-29[DOC] Enhanced RDoc for String (#5730)Burdette Lamar
Treats: #start_with? #end_with? #delete_prefix #delete_prefix! #delete_suffix #delete_suffix! Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-28[DOC] Enhanced RDoc for String (#5726)Burdette Lamar
Treats: #ljust #rjust #center #partition #rpartition Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-27[DOC] Enhanced RDoc for String (#5724)Burdette Lamar
Treats: #scan #hex #oct #crypt #ord #sum Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-27[DOC] Fix references to unary operatorNobuyoshi Nakada
2022-03-26Enhanced RDoc for String (#5723)Burdette Lamar
Treats: #lstrip #lstrip! #rstrip #rstrip! #strip #strip! Adds section Whitespace in Strings. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-26[DOC] Use simple references to operator methodsNobuyoshi Nakada
Method references is not only able to be marked up as code, also reflects `--show-hash` option. The bug that prevented the old rdoc from correctly parsing these methods was fixed last month.
2022-03-24[DOC] Enhanced RDoc for String (#5707)Burdette Lamar
Treated: #chomp #chomp! #chop #chop! Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-22[DOC] Enhanced RDoc for String (#5685)Burdette Lamar
Treats: #chars #codepoints #each_char #each_codepoint #each_grapheme_cluster #grapheme_clusters Also, corrects a passage in #unicode_normalize that mentioned module UnicodeNormalize, whose doc (:nodoc:, actually) says not to mention it. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-21[DOC] Use RDoc inclusions in string.c (#5683)Burdette Lamar
As @peterzhu2118 and @duerst have pointed out, putting string method's RDoc into doc/ (which allows non-ASCII in examples) makes the "click to toggle source" feature not work for that method. This PR moves the primary method doc back into string.c, then includes RDoc from doc/string/*.rdoc, and also removes doc/string.rdoc. The affected methods are: ::new #bytes #each_byte #each_line #split The call-seq is in string.c because it works there; it did not work when the call-seq is in doc/string/*.rdoc. This PR also updates the relevant guidance in doc/documentation_guide.rdoc. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-18[DOC] Enhanced RDoc for String (#5675)Burdette Lamar
Treats: #split #each_line #lines #each_byte #bytes Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-18Add String#bytespliceShugo Maeda
Notes: Merged: https://github.com/ruby/ruby/pull/5584
2022-03-16[DOC] Enhanced RDoc for String#split (#5644)Burdette Lamar
* Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-16Initialize mutex for crypt(3) staticallyNobuyoshi Nakada
Assuming that all platforms, where only `crypt` is available but not `crypt_r`, are POSIX-base.
2022-03-09[DOC] Enhanced RDoc for String (#5635)Burdette Lamar
Treats: #count #delete #delete! #squeeze #squeeze! Adds section "Multiple Character Selectors" to doc/character_selectors.rdoc. Co-authored-by: Peter Zhu <peter@peterzhu.ca> Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-09[DOC] Enhanced RDoc for String (#5633)Burdette Lamar
Treats: #tr (revised to link to "Character Selectors" document) #tr! #tr_s #tr_s! Also renames doc/character_selector.rdoc to match its title. Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-09[DOC] Fix default offset of String#byterindexKazuhiro NISHIYAMA
2022-03-07[DOC] Enhanced RDoc for String #tr and #tr! (#5626)Burdette Lamar
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
2022-03-03[DOC] mark `rb_str_init` as `:nodoc:`Nobuyoshi Nakada
Otherwise, an empty entry will be generated as `String::new` along with the one from doc/string.rb.
2022-03-01[DOC] Fix String#getbyte docMau Magnaguagno
* String#getbyte returns `nil` if `index` is out of range. * Add String#getbyte example with nil output. * Modify String#getbyte example to use negative index. Notes: Merged: https://github.com/ruby/ruby/pull/5586 Merged-By: nobu <nobu@ruby-lang.org>
2022-02-26[DOC] Move String.new to allow non US-ASCII charactersNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/5410