summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2019-11-28Added Symbol#start_with? and Symbol#end_with? method. [Feature #16348]NARUSE, Yui
2019-11-18delete unused codes卜部昌平
Suppress compiler warnings.
2019-11-18rb_tainted_str_new_with_enc is no longer usedNobuyoshi Nakada
2019-11-18Deprecate taint/trust and related methods, and make the methods no-opsJeremy Evans
This removes the related tests, and puts the related specs behind version guards. This affects all code in lib, including some libraries that may want to support older versions of Ruby. Notes: Merged: https://github.com/ruby/ruby/pull/2476
2019-11-14delete unused functions卜部昌平
Looking at the list of symbols inside of libruby-static.a, I found hundreds of functions that are defined, but used from nowhere. There can be reasons for each of them (e.g. some functions are specific to some platform, some are useful when debugging, etc). However it seems the functions deleted here exist for no reason. This changeset reduces the size of ruby binary from 26,671,456 bytes to 26,592,864 bytes on my machine. Notes: Merged: https://github.com/ruby/ruby/pull/2677
2019-11-05Revert "[EXPERIMENTAL] Make Symbol#to_s return a frozen String [Feature #16150]"NARUSE, Yui
This reverts commit 6ffc045a817fbdf04a6945d3c260b55b0fa1fd1e.
2019-10-26Documentation improvements for Ruby corezverok
* Top-level `return`; * Documentation for comments syntax; * `rescue` inside blocks; * Enhance `Object#to_enum` docs; * Make `chomp:` option more obvious for `String#each_line` and `#lines`; * Enhance `Proc#>>` and `#<<` docs; * Enhance `Processs` class docs. Notes: Merged: https://github.com/ruby/ruby/pull/2612
2019-10-11Reduce the minimum string buffer size from 127 to 63 bytesLourens Naudé
Notes: Merged: https://github.com/ruby/ruby/pull/2151
2019-10-09avoid overflow in integer multiplication卜部昌平
This changeset basically replaces `ruby_xmalloc(x * y)` into `ruby_xmalloc2(x, y)`. Some convenient functions are also provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates x * y + z byes. Notes: Merged: https://github.com/ruby/ruby/pull/2540
2019-09-26[EXPERIMENTAL] Make Symbol#to_s return a frozen StringBenoit Daloze
* Always the same frozen String for a given Symbol. * Avoids extra allocations whenever calling Symbol#to_s. * See [Feature #16150] Notes: Merged: https://github.com/ruby/ruby/pull/2437
2019-09-26Rename STR_IS_SHARED_M to STR_BORROWEDAlan Wu
Since the introduction of STR_SHARED_ROOT, the word "shared" has become very overloaded with respect to String's internal states. Use a different name for STR_IS_SHARED_M and explain its purpose. Notes: Merged: https://github.com/ruby/ruby/pull/2480
2019-09-26Tag string shared roots to fix use-after-freeAlan Wu
The buffer deduplication codepath in rb_fstring can be used to free the buffer of shared string roots, which leads to use-after-free. Introudce a new flag to tag strings that at one point have been a shared root. Check for it in rb_fstring to avoid freeing buffers that are shared by multiple strings. This change is based on nobu's idea in [ruby-core:94838]. The included test case test for the sequence of calls to internal functions that lead to this bug. See attached ticket for Ruby level repros. [Bug #16151] Notes: Merged: https://github.com/ruby/ruby/pull/2480
2019-09-05Make Symbol#to_proc calls handle keyword argumentsJeremy Evans
Make rb_sym_proc_call take a flag for whether a keyword argument is used, and use the new rb_funcall_with_block_kw function to pass that information.
2019-08-29drop-in type check for rb_define_singleton_method卜部昌平
We can check the function pointer passed to rb_define_singleton_method like how we do so in rb_define_method. Doing so revealed many arity mismatches.
2019-08-15Fixed heap-use-after-freeNobuyoshi Nakada
* string.c (rb_str_sub_bang): retrieves a pointer to the replacement string buffer just before using it, for the case of replacement with the receiver string itself. [Bug #16105]
2019-08-15* expand tabs. [ci skip]git
2019-08-14Fold to lowercase intead of uppercase for String#casecmpJeremy Evans
strcasecmp(3) and String#casecmp? both fold to lowercase.
2019-08-12Update docs to use more natural EnglishAaron Patterson
Just a few updates to make the English sound a bit more natural
2019-08-12string.c (rb_str_sub, _gsub): improve the rdocYusuke Endoh
This change: * Added an explanation about back references except \n and \k<n> (\` \& \' \+ \0) * Added an explanation about an escape (\\) * Added some rdoc references * Rephrased and clarified the reason why double escape is needed, added some examples, and moved the note to the last (because it is not specific to the method itself).
2019-08-06leafify opt_plus卜部昌平
Inspired by 346aa557b31fe96760e505d30da26eb7a846bac9 Closes: https://github.com/ruby/ruby/pull/2321
2019-08-04Make opt_eq and opt_neq insns leafTakashi Kokubun
# Benchmark zero? ``` require 'benchmark/ips' Numeric.class_eval do def ruby_zero? self == 0 end end Benchmark.ips do |x| x.report('0.zero?') { 0.ruby_zero? } x.report('1.zero?') { 1.ruby_zero? } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] 0.zero?: 21855445.5 i/s 1.zero?: 21770817.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] 1.zero?: 21958912.3 i/s 0.zero?: 21881625.9 i/s - same-ish: difference falls within error ## JIT The performance improves about 1.23x. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] 0.zero?: 36343111.6 i/s 1.zero?: 36295153.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] 0.zero?: 44740467.2 i/s 1.zero?: 44363616.1 i/s - same-ish: difference falls within error # Benchmark str == str / str != str ``` # frozen_string_literal: true require 'benchmark/ips' Benchmark.ips do |x| x.report('a == a') { 'a' == 'a' } x.report('a == b') { 'a' == 'b' } x.report('a != a') { 'a' != 'a' } x.report('a != b') { 'a' != 'b' } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] a == a: 27286219.0 i/s a != a: 24892389.5 i/s - 1.10x slower a == b: 23623635.8 i/s - 1.16x slower a != b: 21800958.0 i/s - 1.25x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] a == a: 27224016.2 i/s a != a: 24490109.5 i/s - 1.11x slower a == b: 23391052.4 i/s - 1.16x slower a != b: 21811321.7 i/s - 1.25x slower ## JIT The performance improves on JIT a little. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] a == a: 42010674.7 i/s a != a: 38920311.2 i/s - same-ish: difference falls within error a == b: 32574262.2 i/s - 1.29x slower a != b: 32099790.3 i/s - 1.31x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] a == a: 46902738.8 i/s a != a: 43097258.6 i/s - 1.09x slower a == b: 35822018.4 i/s - 1.31x slower a != b: 33377257.8 i/s - 1.41x slower This is needed towards Bug#15589. Closes: https://github.com/ruby/ruby/pull/2318
2019-07-28Reuse match dataNobuyoshi Nakada
* string.c (rb_str_split_m): reuse occupied match data. [Bug #16024]
2019-07-27Occupy match dataNobuyoshi Nakada
* string.c (rb_str_split_m): occupy match data not to be modified during yielding the block. [Bug #16024]
2019-07-14string.c (str_succ): refactoringYusuke Endoh
Use more communicative variable name
2019-07-14string.c (str_succ): remove a unnecessary assignmentYusuke Endoh
This change will suppress Coverity Scan warnings
2019-07-14* expand tabs.git
2019-07-14Prefer `rb_error_arity` to `rb_check_arity` when it can be usedYusuke Endoh
2019-07-02Check that String#scrub block does not modify receiverJeremy Evans
Similar to the check used for String#gsub. Can fix possible segfault. Fixes [Bug #15941]
2019-07-02Make String#-@ not freeze receiver if called on unfrozen subclass instanceJeremy Evans
rb_fstring behavior in this case is to freeze the receiver. I'm not sure if that should be changed, so this takes the conservative approach of duping the receiver in String#-@ before passing to rb_fstring. Fixes [Bug #15926]
2019-06-29* expand tabs.git
2019-06-29Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada
* string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
2019-06-26Resize capacity for fstringJohn Hawthorn
When a string is #frozen, it's capacity is resized to fit (if it is much larger), since we know it will no longer be mutated. > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"... > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"... (ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize) Previously, if we dedup into an fstring, using String#-@, capacity would not be reduced. > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"... This commit makes rb_fstring call rb_str_resize, the same as rb_str_freeze does. Closes: https://github.com/ruby/ruby/pull/2256
2019-06-21* expand tabs.git
2019-06-21Get rid of undefined behaviorNobuyoshi Nakada
* string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946]
2019-06-19New buffer for shared stringNobuyoshi Nakada
* string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937]
2019-06-19Preserve the string content at self-copyingNobuyoshi Nakada
* string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937]
2019-06-18Fix memory leakNobuyoshi Nakada
* string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-18* expand tabs.git
2019-06-18String#b: Don't depend on dependent stringAlan Wu
Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934]
2019-06-16Fix memory leakNobuyoshi Nakada
* string.c (str_replace_shared_without_enc): free previous buffer before replaced. * parse.y (gettable): make sure in advance that the `__FILE__` object shares a fstring, to get rid of replacement with the fstring later. TODO: this hack may be needed in other places. [Bug #15916] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-05-14Symbol just represents a nameNobuyoshi Nakada
2019-05-09str_duplicate: Don't share with a frozen shared stringAlan Wu
This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159
2019-05-06Revert "UTF-8 is one of byte based encodings"Nobuyoshi Nakada
This reverts commit 5776ae347540ac19c40d146a3566a806cd176bf1. Mistaken `max` as `min`.
2019-05-05Improve documentation for String#{dump,undump}Marcus Stollsteimer
2019-05-03* expand tabs.git
2019-05-03Improve performance of case-conversion methodsNobuyoshi Nakada
2019-05-03UTF-8 is one of byte based encodingsNobuyoshi Nakada
2019-05-02* expand tabs.git
2019-05-02Fix potential memory leakNobuyoshi Nakada
2019-04-29this variable is not guaranteed alignedUrabe, Shyouhei
No problem for unaligned-ness because we never dereference.