summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2019-07-02Check that String#scrub block does not modify receiverJeremy Evans
Similar to the check used for String#gsub. Can fix possible segfault. Fixes [Bug #15941]
2019-07-02Make String#-@ not freeze receiver if called on unfrozen subclass instanceJeremy Evans
rb_fstring behavior in this case is to freeze the receiver. I'm not sure if that should be changed, so this takes the conservative approach of duping the receiver in String#-@ before passing to rb_fstring. Fixes [Bug #15926]
2019-06-29* expand tabs.git
2019-06-29Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada
* string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
2019-06-26Resize capacity for fstringJohn Hawthorn
When a string is #frozen, it's capacity is resized to fit (if it is much larger), since we know it will no longer be mutated. > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"... > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"... (ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize) Previously, if we dedup into an fstring, using String#-@, capacity would not be reduced. > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"... This commit makes rb_fstring call rb_str_resize, the same as rb_str_freeze does. Closes: https://github.com/ruby/ruby/pull/2256
2019-06-21* expand tabs.git
2019-06-21Get rid of undefined behaviorNobuyoshi Nakada
* string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946]
2019-06-19New buffer for shared stringNobuyoshi Nakada
* string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937]
2019-06-19Preserve the string content at self-copyingNobuyoshi Nakada
* string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937]
2019-06-18Fix memory leakNobuyoshi Nakada
* string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-18* expand tabs.git
2019-06-18String#b: Don't depend on dependent stringAlan Wu
Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934]
2019-06-16Fix memory leakNobuyoshi Nakada
* string.c (str_replace_shared_without_enc): free previous buffer before replaced. * parse.y (gettable): make sure in advance that the `__FILE__` object shares a fstring, to get rid of replacement with the fstring later. TODO: this hack may be needed in other places. [Bug #15916] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-05-14Symbol just represents a nameNobuyoshi Nakada
2019-05-09str_duplicate: Don't share with a frozen shared stringAlan Wu
This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159
2019-05-06Revert "UTF-8 is one of byte based encodings"Nobuyoshi Nakada
This reverts commit 5776ae347540ac19c40d146a3566a806cd176bf1. Mistaken `max` as `min`.
2019-05-05Improve documentation for String#{dump,undump}Marcus Stollsteimer
2019-05-03* expand tabs.git
2019-05-03Improve performance of case-conversion methodsNobuyoshi Nakada
2019-05-03UTF-8 is one of byte based encodingsNobuyoshi Nakada
2019-05-02* expand tabs.git
2019-05-02Fix potential memory leakNobuyoshi Nakada
2019-04-29this variable is not guaranteed alignedUrabe, Shyouhei
No problem for unaligned-ness because we never dereference.
2019-04-29fix typoUrabe, Shyouhei
2019-04-27Get rid of indirect sharingNobuyoshi Nakada
* string.c (str_duplicate): share the root shared string if the original string is already sharing, so that all shared strings refer the root shared string directly. indirect sharing can cause a dangling pointer. [Bug #15792]
2019-04-18string.c: warn non-nil $;nobu
* string.c (rb_str_split_m): warn use of non-nil $;. * string.c (rb_fs_setter): warn when set to non-nil value. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-17string.c: improve splitting into charsnobu
* string.c (rb_str_split_m): improve splitting into chars by an empty string, without a regexp. Comparison: to_chars-1 built-ruby: 1273527.6 i/s compare-ruby: 189423.3 i/s - 6.72x slower to_chars-10 built-ruby: 120993.5 i/s compare-ruby: 37075.8 i/s - 3.26x slower to_chars-100 built-ruby: 15646.4 i/s compare-ruby: 4012.1 i/s - 3.90x slower to_chars-1000 built-ruby: 1295.1 i/s compare-ruby: 408.5 i/s - 3.17x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20string.c: [DOC] fix reference to sprintf [ci skip]nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20string.c: [DOC] remove unnecessary markups [ci skip]nobu
* string.c: remove <code> markups, which are not only unnecessary but also prevented cross-references. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20string.c: [DOC] fix indent [ci skip]nobu
* string.c (rb_str_crypt): fix indent not to make the whole list verbatim entirely. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-05string.c: respect the actual encodingnobu
* string.c (rb_enc_str_coderange): respect the actual encoding of if a BOM presents, and scan for the actual code range. [ruby-core:91662] [Bug #15635] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-07* string.c (chopped_length): early return for empty stringsnobu
[Bug #11391] From: Franck Verrot <franck@verrot.fr> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-22Add more example of `String#dump`kazu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21Improvements to documentation.samuel
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66897 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21string.c (rb_str_dump): Fix the rdocmame
* Officially states that String#dump is intended for round-trip. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15Use `&` instead of `modulo`nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66830 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15setbyte / ungetbyte allow out-of-range integersshyouhei
* string.c: String#setbyte to accept arbitrary integers [Bug #15460] * io.c: ditto for IO#ungetbyte * ext/strringio/stringio.c: ditto for StringIO#ungetbyte git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-08Defer escaping control char in error messagesnobu
* eval_error.c (print_errinfo): defer escaping control char in error messages until writing to stderr, instead of quoting at building the message. [ruby-core:90853] [Bug #15497] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26string.c: remove the deprecation warnings of `String#bytes` with blockmame
And its friends: lines, chars, grapheme_clusters, and codepoints. [Feature #6670] [ruby-core:90728] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26Revert "string.c: remove the deprecation warnings of `String#bytes` with block"mame
Forgot to write the ticket number in the commit log... git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26string.c: remove the deprecation warnings of `String#bytes` with blockmame
And its friends: lines, chars, grapheme_clusters, and codepoints. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-12string.c: [DOC] fix typosstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09implement special behavior for Georgian for String#capitalizeduerst
The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06suppress warning: unused variable 'vbits'naruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06Prefer rb_check_arity when 0 or 1 argumentsnobu
Especially over checking argc then calling rb_scan_args just to raise an ArgumentError. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-03string.c: [DOC] deprecate String#crypt [ci skip] [Feature #14915]shyouhei
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24* expand tabs.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24fix r65954; Keep taintynaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65956 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24Don't use single byte optimization on grapheme clustersnaruse
Unicode Text Segmentation considers CRLF as a character. [Bug #15337] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-21char is not unsignedshyouhei
It seems that decades ago, ruby was written under assumption that char is unsigned. Which is of course a false assumption. We need to explicitly store a numeric value into an unsigned char variable to tell we expect 0..255 value. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e