summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2020-03-30merge revision(s) 78ef2d0f331c3e056ee367214710b41722de2fe0: [Backport #15935]usa
merge revision(s) 8b3774be3dd9f472bddd99e84e3c9fe2ff99d7ac: [Backport #15935] Fix memory leak * string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_6@67805 b2dd03c8-39d4-4d8f-98ff-823fe69b080e git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-27Revert a part of r67767usa
it was not necessary for ruby_2_5. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67776 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) d5c33364e3c0efb15e11df417c925afee2cdb9c9: [Backport #16105]usa
Fixed heap-use-after-free * string.c (rb_str_sub_bang): retrieves a pointer to the replacement string buffer just before using it, for the case of replacement with the receiver string itself. [Bug #16105] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) 8f51da5d41f0642d5a971e4223d1ba14643c6398: [Backport #15946]usa
Get rid of undefined behavior * string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67770 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) ↵usa
28678997e40869f5591eae60edd9757334426ffb,8797f48373dcfa3ff8e748667732dea8aea4347e: [Backport #15937] Preserve the string content at self-copying * string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937] New buffer for shared string * string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67769 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) 9dec4e8fc3a6018261834b5ac9b9877f787b97ca: [Backport #15934]usa
String#b: Don't depend on dependent string Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67767 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) ↵usa
3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e5: [Backport #15792] Get rid of indirect sharing * string.c (str_duplicate): share the root shared string if the original string is already sharing, so that all shared strings refer the root shared string directly. indirect sharing can cause a dangling pointer. [Bug #15792] str_duplicate: Don't share with a frozen shared string This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159 Update dependencies git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-08-26merge revision(s) 5e23b1138f16af0defb184d7deeffadfd2ce3c04 [backport #15820]usa
Fix potential memory leak git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-07merge revision(s) 67167: [Backport #15635]nagachika
string.c: respect the actual encoding * string.c (rb_enc_str_coderange): respect the actual encoding of if a BOM presents, and scan for the actual code range. [ruby-core:91662] [Bug #15635] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67192 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-30merge revision(s) 65956:nagachika
fix r65954; Keep tainty git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@66105 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28merge revision(s) 65954,65955,65958: [Backport #15337]nagachika
Don't use single byte optimization on grapheme clusters Unicode Text Segmentation considers CRLF as a character. [Bug #15337] add tests using Unicode test data for grapheme clusters Add file test/ruby/enc/test_grapheme_breaks.rb to test String#each_grapheme_cluster and \X extended grapheme cluster matcher in regular expressions against test data provided by Unicode (ucd/auxiliary/GraphemeBreakTest.txt). Some lines in the data file are ignored, as follows: - Lines with a surrogate, because Ruby doesn't handle these - The case of "\r\n", because there is a bug (#15337) in the implementation remove guard against bug #15337, because it is fixed git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@66073 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-08-12merge revision(s) 63252: [Backport #14707]nagachika
string.c: fix scanned substring with `\K` * string.c (scan_once): fix the matched substring with `\K`, the beginning of that string may differ from the matched position. [ruby-core:86663] [Bug #14707] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@64320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22merge revision(s) 62892,62893: [Backport #14363]naruse
fix each_grapheme_cluster's size [Bug #14363] From: Hugo Peixoto <hugo.peixoto@gmail.com> Factor out get_reg_grapheme_cluster git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@62896 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-19merge revision(s) 62040: [Backport #14388]naruse
string.c: clear substring code range * string.c (str_substr): substring of broken code range string may be valid or broken. patch by tommy (Masahiro Tomita) at [ruby-dev:50430] [Bug #14388]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@62483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-16merge revision(s) 61636: [Backport #14257]naruse
string.c: out-of-bounds access * string.c (rb_str_enumerate_lines): fix out-of-bounds access when record separator is longer than the last element. [Bug #14257] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@62421 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-05merge revision(s) 61513: [Backport #14257]naruse
string.c: chomp rs at the end * string.c (rb_str_enumerate_lines): should chomp record separator only, but not a newline, at the end of the receiver as well as middle, if the separator is given. [ruby-core:84552] [Bug #14257] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@61628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-22encoding.c: rb_enc_find_index2nobu
* string.c (str_undump): use rb_enc_find_index2 to find encoding by unterminated string. check the format before encoding name. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61396 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21string.c: fix memory leaknobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21Don't allow mixed escapenaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61381 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21move dump format validation into parsing epiloguenaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61380 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21fix escapes in undumpnaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61379 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16string.c: multiple codepointsnobu
* string.c (undump_after_backslash): fix multiple codepoints in braces. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16string.c: suppress warningnobu
* string.c (str_undump): suppress maybe-uninitialized warning by gcc 7 and later. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-14Implement String#undump to unescape String#dump-ed stringtadd
[Feature #12275] [close GH-1765] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61228 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-02string.c: fix rb_external_str_new_with_encnobu
* string.c (rb_external_str_new_with_enc): do not search non-ascii by NULL pointer. [ruby-core:84055] [Bug #14150] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60979 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-14string.c: prefer rb_syserr_failnobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60761 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12string.c: fix up r60748rhe
An #ifdef was missing in r60748 and build broke on systems without crypt_r(). https://rubyci.org/logs/rubyci.s3.amazonaws.com/unstable11s/ruby-trunk/log/20171112T162503Z.fail.html.gz git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12string.c: fix memory leak in String#cryptrhe
Use ALLOCV to allocate struct crypt_data for slightly cleaner and less error-prone code. It is currently possible it leaks when an invalid argument is passed to String#crypt or rb_str_new_cstr() fails to allocate memory. SIZEOF_CRYPT_DATA macro in missing/crypt.h is removed since it is not used any longer. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07string.c: improve docs for String#{concat,<<}stomar
* string.c: [DOC] remove a misleading call-seq for String#concat, which suggests that all arguments must be Integers in this case; also clarify in the example that the receiver is modified; fix grammar for String#<<; move references to the end. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07string.c: fix typosstomar
* string.c: [DOC] fix typos in doxygen comments. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60707 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-29string.c: improve docsstomar
* string.c: [DOC] fix rdoc for cross reference; fix grammar. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60574 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-27string.c: Improve String#prepend performance if only one argument is givenwatson1978
* string.c (rb_str_prepend_multi): Prepend the string without generating temporary String object if only one argument is given. This is very similar with https://github.com/ruby/ruby/pull/1634 String#prepend -> 47.5 % up [Fix GH-1670] [ruby-core:82195] [Bug #13773] * Before String#prepend 1.517M (± 1.8%) i/s - 7.614M in 5.019819s * After String#prepend 2.236M (± 3.4%) i/s - 11.234M in 5.029716s * Test code require 'benchmark/ips' Benchmark.ips do |x| x.report "String#prepend" do |loop| loop.times { "!".prepend("hello") } end end git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-22string.c: comment layout [ci skip]nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21* remove trailing spaces.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21* string.c: [DOC] Split rdoc of String#<< and String#concat [ci skip]sonots
Split String#<< and String#concat docs to reflect single and multiple arguments patched by MSP-Greg [fix GH-1614] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21* string.c: Remove errant "the" in gsub documentationsonots
patched by jlmuir (J. Lewis Muir) [fix GH-1679] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21Improve performance of string interpolationnobu
This patch will add pre-allocation in string interpolation. By this, unecessary capacity resizing is avoided. For small strings, optimized `rb_str_resurrect` operation is faster, so pre-allocation is done only when concatenated strings are large. `MIN_PRE_ALLOC_SIZE` was decided by experimenting with local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version 8.1.0 (clang - 802.0.42)). String interpolation will be faster around 72% when large string is created. * Before ``` Calculating ------------------------------------- Large string interpolation 1.276M (± 5.9%) i/s - 6.358M in 5.002022s Small string interpolation 5.156M (± 5.5%) i/s - 25.728M in 5.005731s ``` * After ``` Calculating ------------------------------------- Large string interpolation 2.201M (± 5.8%) i/s - 11.063M in 5.043724s Small string interpolation 5.192M (± 5.7%) i/s - 25.971M in 5.020516s ``` * Test code ```ruby require 'benchmark/ips' Benchmark.ips do |x| x.report "Large string interpolation" do |t| a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo" b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld" t.times do "#{a}, #{b}!" end end x.report "Small string interpolation" do |t| a = "Hello" b = "World" t.times do "#{a}, #{b}!" end end end ``` [Fix GH-1626] From: Nao Minami <south37777@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21Add documentation for `chomp` option.hsbt
https://github.com/ruby/ruby/pull/1717 Patch by @ksss [fix GH-1717] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21* string.c (deleted_prefix_length, deleted_suffix_length):sonots
Add doxygen comment. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60254 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21[Feature #13712] String#start_with? supports regexpnaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-01string.c: avoid unnecessary call of str_strlen()glass
* string.c (rb_strseq_index): refactor and avoid call of str_strlen() when offset == 0. it will improve performance of String#index and #include? * benchmark/bm_string_index.rb: benchmark for this change git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-30string.c: fix ASCII-only on succnobu
* string.c (str_succ): clear coderange cache when no alpha-numeric character case, carried part may become ASCII-only. [ruby-core:83062] [Bug #13952] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60066 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-29string.c: ASCII-incompatible is not ASCII onlynobu
* string.c (tr_trans): ASCII-incompatible encoding strings cannot be ASCII-only even if valid. [ruby-core:83056] [Bug #13950] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23dup String#split return valuenobu
* string.c (rb_str_split): return duplicated receiver, when no splits. patched by tompng (tomoya ishida) in [ruby-core:82911], and the test case by Seiei Miyagi <hanachin@gmail.com>. [Bug#13925] [Fix GH-1705] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23dup String#rpartition return valuenobu
* string.c (rb_str_rpartition): return duplicated receiver, when no splits. [ruby-core:82911] [Bug#13925] Author: Seiei Miyagi <hanachin@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60001 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23dup String#partition return valuenobu
* string.c (rb_str_partition): return duplicated receiver, when no splits. [ruby-core:82911] [Bug#13925] Author: Seiei Miyagi <hanachin@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-18refinements in string interpolationnobu
* compile.c (iseq_compile_each0): insert to_s method call, so that refinements activated at the caller should take place. [Feature #13812] * insns.def (tostring): fix up converted object to a string, infect and fallback. * insns.def (branchiftype): new instruction for conversion. branches if TOS is an instance of the given type. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06Fix a typo [ci skip]kazu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06string.c: fix false coderangenobu
* string.c (rb_enc_str_scrub): enc can differ from the actual encoding of the string, the cached coderange is useless then. [ruby-core:82674] [Bug #13874] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06string.c: optimize enumerate_grapheme_clustersnobu
* string.c (rb_str_enumerate_grapheme_clusters): optimize when single byte only. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59762 b2dd03c8-39d4-4d8f-98ff-823fe69b080e