summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2019-11-05Revert "[EXPERIMENTAL] Make Symbol#to_s return a frozen String [Feature #16150]"NARUSE, Yui
This reverts commit 6ffc045a817fbdf04a6945d3c260b55b0fa1fd1e.
2019-10-26Documentation improvements for Ruby corezverok
* Top-level `return`; * Documentation for comments syntax; * `rescue` inside blocks; * Enhance `Object#to_enum` docs; * Make `chomp:` option more obvious for `String#each_line` and `#lines`; * Enhance `Proc#>>` and `#<<` docs; * Enhance `Processs` class docs. Notes: Merged: https://github.com/ruby/ruby/pull/2612
2019-10-11Reduce the minimum string buffer size from 127 to 63 bytesLourens Naudé
Notes: Merged: https://github.com/ruby/ruby/pull/2151
2019-10-09avoid overflow in integer multiplication卜部昌平
This changeset basically replaces `ruby_xmalloc(x * y)` into `ruby_xmalloc2(x, y)`. Some convenient functions are also provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates x * y + z byes. Notes: Merged: https://github.com/ruby/ruby/pull/2540
2019-09-26[EXPERIMENTAL] Make Symbol#to_s return a frozen StringBenoit Daloze
* Always the same frozen String for a given Symbol. * Avoids extra allocations whenever calling Symbol#to_s. * See [Feature #16150] Notes: Merged: https://github.com/ruby/ruby/pull/2437
2019-09-26Rename STR_IS_SHARED_M to STR_BORROWEDAlan Wu
Since the introduction of STR_SHARED_ROOT, the word "shared" has become very overloaded with respect to String's internal states. Use a different name for STR_IS_SHARED_M and explain its purpose. Notes: Merged: https://github.com/ruby/ruby/pull/2480
2019-09-26Tag string shared roots to fix use-after-freeAlan Wu
The buffer deduplication codepath in rb_fstring can be used to free the buffer of shared string roots, which leads to use-after-free. Introudce a new flag to tag strings that at one point have been a shared root. Check for it in rb_fstring to avoid freeing buffers that are shared by multiple strings. This change is based on nobu's idea in [ruby-core:94838]. The included test case test for the sequence of calls to internal functions that lead to this bug. See attached ticket for Ruby level repros. [Bug #16151] Notes: Merged: https://github.com/ruby/ruby/pull/2480
2019-09-05Make Symbol#to_proc calls handle keyword argumentsJeremy Evans
Make rb_sym_proc_call take a flag for whether a keyword argument is used, and use the new rb_funcall_with_block_kw function to pass that information.
2019-08-29drop-in type check for rb_define_singleton_method卜部昌平
We can check the function pointer passed to rb_define_singleton_method like how we do so in rb_define_method. Doing so revealed many arity mismatches.
2019-08-15Fixed heap-use-after-freeNobuyoshi Nakada
* string.c (rb_str_sub_bang): retrieves a pointer to the replacement string buffer just before using it, for the case of replacement with the receiver string itself. [Bug #16105]
2019-08-15* expand tabs. [ci skip]git
2019-08-14Fold to lowercase intead of uppercase for String#casecmpJeremy Evans
strcasecmp(3) and String#casecmp? both fold to lowercase.
2019-08-12Update docs to use more natural EnglishAaron Patterson
Just a few updates to make the English sound a bit more natural
2019-08-12string.c (rb_str_sub, _gsub): improve the rdocYusuke Endoh
This change: * Added an explanation about back references except \n and \k<n> (\` \& \' \+ \0) * Added an explanation about an escape (\\) * Added some rdoc references * Rephrased and clarified the reason why double escape is needed, added some examples, and moved the note to the last (because it is not specific to the method itself).
2019-08-06leafify opt_plus卜部昌平
Inspired by 346aa557b31fe96760e505d30da26eb7a846bac9 Closes: https://github.com/ruby/ruby/pull/2321
2019-08-04Make opt_eq and opt_neq insns leafTakashi Kokubun
# Benchmark zero? ``` require 'benchmark/ips' Numeric.class_eval do def ruby_zero? self == 0 end end Benchmark.ips do |x| x.report('0.zero?') { 0.ruby_zero? } x.report('1.zero?') { 1.ruby_zero? } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] 0.zero?: 21855445.5 i/s 1.zero?: 21770817.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] 1.zero?: 21958912.3 i/s 0.zero?: 21881625.9 i/s - same-ish: difference falls within error ## JIT The performance improves about 1.23x. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] 0.zero?: 36343111.6 i/s 1.zero?: 36295153.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] 0.zero?: 44740467.2 i/s 1.zero?: 44363616.1 i/s - same-ish: difference falls within error # Benchmark str == str / str != str ``` # frozen_string_literal: true require 'benchmark/ips' Benchmark.ips do |x| x.report('a == a') { 'a' == 'a' } x.report('a == b') { 'a' == 'b' } x.report('a != a') { 'a' != 'a' } x.report('a != b') { 'a' != 'b' } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] a == a: 27286219.0 i/s a != a: 24892389.5 i/s - 1.10x slower a == b: 23623635.8 i/s - 1.16x slower a != b: 21800958.0 i/s - 1.25x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] a == a: 27224016.2 i/s a != a: 24490109.5 i/s - 1.11x slower a == b: 23391052.4 i/s - 1.16x slower a != b: 21811321.7 i/s - 1.25x slower ## JIT The performance improves on JIT a little. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] a == a: 42010674.7 i/s a != a: 38920311.2 i/s - same-ish: difference falls within error a == b: 32574262.2 i/s - 1.29x slower a != b: 32099790.3 i/s - 1.31x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] a == a: 46902738.8 i/s a != a: 43097258.6 i/s - 1.09x slower a == b: 35822018.4 i/s - 1.31x slower a != b: 33377257.8 i/s - 1.41x slower This is needed towards Bug#15589. Closes: https://github.com/ruby/ruby/pull/2318
2019-07-28Reuse match dataNobuyoshi Nakada
* string.c (rb_str_split_m): reuse occupied match data. [Bug #16024]
2019-07-27Occupy match dataNobuyoshi Nakada
* string.c (rb_str_split_m): occupy match data not to be modified during yielding the block. [Bug #16024]
2019-07-14string.c (str_succ): refactoringYusuke Endoh
Use more communicative variable name
2019-07-14string.c (str_succ): remove a unnecessary assignmentYusuke Endoh
This change will suppress Coverity Scan warnings
2019-07-14* expand tabs.git
2019-07-14Prefer `rb_error_arity` to `rb_check_arity` when it can be usedYusuke Endoh
2019-07-02Check that String#scrub block does not modify receiverJeremy Evans
Similar to the check used for String#gsub. Can fix possible segfault. Fixes [Bug #15941]
2019-07-02Make String#-@ not freeze receiver if called on unfrozen subclass instanceJeremy Evans
rb_fstring behavior in this case is to freeze the receiver. I'm not sure if that should be changed, so this takes the conservative approach of duping the receiver in String#-@ before passing to rb_fstring. Fixes [Bug #15926]
2019-06-29* expand tabs.git
2019-06-29Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada
* string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
2019-06-26Resize capacity for fstringJohn Hawthorn
When a string is #frozen, it's capacity is resized to fit (if it is much larger), since we know it will no longer be mutated. > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"... > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"... (ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize) Previously, if we dedup into an fstring, using String#-@, capacity would not be reduced. > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"... This commit makes rb_fstring call rb_str_resize, the same as rb_str_freeze does. Closes: https://github.com/ruby/ruby/pull/2256
2019-06-21* expand tabs.git
2019-06-21Get rid of undefined behaviorNobuyoshi Nakada
* string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946]
2019-06-19New buffer for shared stringNobuyoshi Nakada
* string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937]
2019-06-19Preserve the string content at self-copyingNobuyoshi Nakada
* string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937]
2019-06-18Fix memory leakNobuyoshi Nakada
* string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-18* expand tabs.git
2019-06-18String#b: Don't depend on dependent stringAlan Wu
Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934]
2019-06-16Fix memory leakNobuyoshi Nakada
* string.c (str_replace_shared_without_enc): free previous buffer before replaced. * parse.y (gettable): make sure in advance that the `__FILE__` object shares a fstring, to get rid of replacement with the fstring later. TODO: this hack may be needed in other places. [Bug #15916] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-05-14Symbol just represents a nameNobuyoshi Nakada
2019-05-09str_duplicate: Don't share with a frozen shared stringAlan Wu
This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159
2019-05-06Revert "UTF-8 is one of byte based encodings"Nobuyoshi Nakada
This reverts commit 5776ae347540ac19c40d146a3566a806cd176bf1. Mistaken `max` as `min`.
2019-05-05Improve documentation for String#{dump,undump}Marcus Stollsteimer
2019-05-03* expand tabs.git
2019-05-03Improve performance of case-conversion methodsNobuyoshi Nakada
2019-05-03UTF-8 is one of byte based encodingsNobuyoshi Nakada
2019-05-02* expand tabs.git
2019-05-02Fix potential memory leakNobuyoshi Nakada
2019-04-29this variable is not guaranteed alignedUrabe, Shyouhei
No problem for unaligned-ness because we never dereference.
2019-04-29fix typoUrabe, Shyouhei
2019-04-27Get rid of indirect sharingNobuyoshi Nakada
* string.c (str_duplicate): share the root shared string if the original string is already sharing, so that all shared strings refer the root shared string directly. indirect sharing can cause a dangling pointer. [Bug #15792]
2019-04-18string.c: warn non-nil $;nobu
* string.c (rb_str_split_m): warn use of non-nil $;. * string.c (rb_fs_setter): warn when set to non-nil value. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-17string.c: improve splitting into charsnobu
* string.c (rb_str_split_m): improve splitting into chars by an empty string, without a regexp. Comparison: to_chars-1 built-ruby: 1273527.6 i/s compare-ruby: 189423.3 i/s - 6.72x slower to_chars-10 built-ruby: 120993.5 i/s compare-ruby: 37075.8 i/s - 3.26x slower to_chars-100 built-ruby: 15646.4 i/s compare-ruby: 4012.1 i/s - 3.90x slower to_chars-1000 built-ruby: 1295.1 i/s compare-ruby: 408.5 i/s - 3.17x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20string.c: [DOC] fix reference to sprintf [ci skip]nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e