summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2017-02-05doc: Add example for Symbol#to_snormal
* string.c: add example for Symbol#to_s. The docs for Symbol#to_s only include an example for Symbol#id2name, but not for #to_s which is an alias; the docs should include examples for both methods. From: Marcus Stollsteimer <sto.mar@web.de> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-03symbol.c (rb_id2str): eliminate branch to set classnormal
Since the fstring table encompasses all strings in the symbol table, we may reuse the fstring table walk to set the class and eliminate the branch in rb_id2str. * string.c (Init_String): use rb_cString immediately after definition * symbol.c (rb_id2str): eliminate branch to set class git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30string.c (rb_str_tmp_frozen_release): release embedded stringsnormal
Handle the embedded case first, since we may have an embedded duplicate and non-embedded original string. * string.c (rb_str_tmp_frozen_release): handled embedded strings * test/ruby/test_io.rb (test_write_no_garbage): new test [ruby-core:78898] [Bug #13085] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30io.c: recycle garbage on writenormal
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times (STR_SET_SHARED): set STR_IS_SHARED_M (rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions (str_new_frozen): set/unset STR_IS_SHARED_M as appropriate * internal.h: declare new functions * io.c (fwrite_arg, fwrite_do, fwrite_end): new (io_fwrite): use new functions Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release to manage a hidden, frozen string. Reuse one bit of the embed length for shared strings as STR_IS_SHARED_M to indicate a string has been shared multiple times. In the common case, the string is only shared once so the object slot can be reclaimed immediately. minimum results in each 3 measurements. (time and size) Execution time (sec) name trunk built io_copy_stream_write 0.682 0.254 io_copy_stream_write_socket 1.225 0.751 Speedup ratio: compare with the result of `trunk' (greater is better) name built io_copy_stream_write 2.680 io_copy_stream_write_socket 1.630 Memory usage (last size) (B) name trunk built io_copy_stream_write 95436800.000 6512640.000 io_copy_stream_write_socket 117628928.000 7127040.000 Memory consuming ratio (size) with the result of `trunk' (greater is better) name built io_copy_stream_write 14.654 io_copy_stream_write_socket 16.505 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-19string.c: rindex(//) should set $~.shugo
This seems a bug introduced by r520 (1.4.0). [ruby-core:79110] [Bug #13135] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-16file.c: refine messagenobu
* file.c (rb_get_path_check_convert): refine the error message when the path name contains null byte. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11string.c: replacement and blocknobu
* string.c (rb_enc_str_scrub): only one of replacement and block is allowed. [ruby-core:79038] [Bug #13119] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11string.c: yield invalid partnobu
* string.c (rb_enc_str_scrub): yield the invalid part only with ASCII-incompatible. [ruby-core:79039] [Bug #13120] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57303 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11string.c: block for scrub with ASCII-incompatiblenobu
* string.c (rb_enc_str_scrub): honor the given block with ASCII-incompatible encoding. [ruby-core:79039] [Bug #13120] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57302 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25string.c: CRLF in paragraph modenobu
* string.c (rb_str_enumerate_lines): allow CRLF to separate paragraphs. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25string.c: consistent paragraph mode with IOnobu
* string.c (rb_str_enumerate_lines): in paragraph mode, do not include newlines which separate paragraphs, so that it will be consistent with IO#each_line. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57184 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-22string.c: suppress a warningnobu
* string.c (rb_str_casecmp_p): [DOC] use Unicode escape form to get rid of warning C4819 by Microsoft Visual C++. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-20string.c: add missing size_t castrhe
Add size_t cast to avoid signed integer overflow. r56157 ("string.c: avoid signed integer overflow", 2016-09-13) missed this. Suppresses UBSan. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16no crypt.h on FreeBSD 12nobu
* string.c (crypt.h): crypt_r() was added in FreeBSD 12.0 but is declared in unistd.h. [ruby-core:78664] [Bug #13038] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16fix chomping newline only linenobu
* string.c (chomp_newline): fix chomping newline only line. rb_enc_prev_char return NULL if no previous character and must not call rb_enc_ascget on it. a patch by Ary Borenszweig <asterite AT gmail.com> at [ruby-core:78666]. [Bug #13037] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12string.c: fix method name in rdoc [ci skip]nobu
* string.c (rb_str_equal): [DOC] fix fallback method name. the peer's == method will be used, not ===. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12String#match? and Symbol#match?nobu
* string.c (rb_str_match_m_p): inverse of Regexp#match?. based on the patch by Herwin Weststrate <herwin@snt.utwente.nl>. [Fix GH-1483] [Feature #12898] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-03string.c: chomp optionnobu
* string.c (rb_str_enumerate_lines): implement chomp option. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29Fix/improve documentation of String/Symbol#casecmp[?]duerst
Fix documentation of String#casecmp? (examples didn't have the '?'). Add an example with non-ASCII characters. Clarify that casecmp, unlike casecmp?, only does case-insensitivity on A-Z/a-z. [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29string.c: use xmallocnobu
* string.c (rb_str_casemap): use xmalloc simply instead of ALLOC_N. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28string.c: fix zero-length arraynobu
* string.c (mapping_buffer): get rid of zero-length array member, which is not a part of C90. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28string.c: enable rdocnobu
* string.c (rb_str_casecmp_p): [DOC] move forward declaration of rb_str_downcase to enable rdoc. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28implement String/Symbol#casecmp? including Unicode case foldingduerst
* string.c: Implement String#casecmp? and Symbol#casecmp? by using String#downcase :fold for Unicode case folding. This does not include options such as :turkic, because these currently cannot be combined with the :fold option. This implements feature #12786. * test/ruby/test_string.rb/test_symbol.rb: Tests for above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-05chomp optionnobu
* io.c (extract_getline_opts): extract chomp option. [Feature #12553] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56581 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-26[DOC] replace Fixnum with Integer [ci skip]nobu
* numeric.c: [DOC] update document for Integer class. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-21Fixed typo [ci skip]nobu
* string.c (rb_str_sub, rb_str_gsub): [DOC] 'backlash' should read 'backslash'. [Fix GH-1461] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-04* internal.h (ST2FIX): new macro to convert st_index_t to Fixnum.usa
a hash value of Object might be Bignum, but it causes many troubles expecially the Object is used as a key of a hash. so I've gave up to do so. * array.c (rb_ary_hash): use above macro. * bignum.c (rb_big_hash): ditto. * hash.c (rb_obj_hash, rb_hash_hash): ditto. * numeric.c (rb_dbl_hash): ditto. * proc.c (proc_hash): ditto. * re.c (rb_reg_hash, match_hash): ditto. * string.c (rb_str_hash_m): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01string.c: negative hashnobu
* string.c (rb_str_hash_m): hash values may be negative. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56321 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01* string.c (rb_str_hash_m): st_index_t is not guaranteed as the sameusa
size with int, and of course also not guaranteed the value can be Fixnum. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-26string.c: fast path of lstrip_offsetnobu
* string.c (lstrip_offset): add a fast path in the case of single byte optimizable strings, as well as rstrip_offset. [ruby-core:77392] [Feature #12788] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-26string.c: fix integer overflow in enc_strlen() and rb_enc_strlen_cr()rhe
* string.c (enc_strlen, rb_enc_strlen_cr): Avoid signed integer overflow. The result type of a pointer subtraction may have the same size as long. This fixes String#size returning an negative value on i686-linux environment: str = "\x00" * ((1<<31)-2)) str.slice!(-3, 3) str.force_encoding("UTF-32BE") str << 1234 p str.size git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56247 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-16 * internal.h (WARN_UNUSED_RESULT): moved to configure.in, toshyouhei
actually check its availability rather to check GCC's version. * configure.in (WARN_UNUSED_RESULT): moved to here. * configure.in (RUBY_FUNC_ATTRIBUTE): change function declaration to return int rather than void, because it makes no sense for a warn_unused_result attributed function to return void. Funny thing however is that it also makes no sense for noreturn attributed function to return int. So there is a fundamental conflict between them. While I tested this, I confirmed both GCC 6 and Clang 3.8 prefers int over void to correctly detect necessary attributes under this setup. Maybe subject to change in future. * internal.h (UNINITIALIZED_VAR): renamed to MAYBE_UNUSED, then moved to configure.in for the same reason we move WARN_UNUSED_RESULT. * configure.in (MAYBE_UNUSED): moved to here. * internal.h (__has_attribute): deleted, because it has no use now. * string.c (rb_str_enumerate_lines): refactor macro rename. * string.c (rb_str_enumerate_bytes): ditto. * string.c (rb_str_enumerate_chars): ditto. * string.c (rb_str_enumerate_codepoints): ditto. * thread.c (do_select): ditto. * vm_backtrace.c (rb_debug_inspector_open): ditto. * vsnprintf.c (BSD_vfprintf): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13string.c: avoid signed integer overflowrhe
The behavior on signed integer overflow is undefined. On platform with sizeof(long)==4, it's fairly easy that 'len + termlen' overflows, where len is the string length and termlen is the terminator length. So, prevent the integer overflow by avoiding adding to a string length, or casting to size_t before adding where the total size is passed to {RE,}ALLOC*(). * string.c (STR_HEAP_SIZE, RESIZE_CAPA_TERM, str_new0, rb_str_buf_new, str_shared_replace, rb_str_init, str_make_independent_expand, rb_str_resize): Avoid overflow by casting the length to size_t. size_t should be able to represent LONG_MAX+termlen. * string.c (rb_str_modify_expand): Check that the new length is in the range of long before resizing. Also refactor to use RESIZE_CAPA_TERM macro. * string.c (str_buf_cat): Fix so that it does not create a negative length String. Also fix the condition for 'string sizes too big', the total length can be up to LONG_MAX. * string.c (rb_str_plus): Check the resulting String length does not exceed LONG_MAX. * string.c (rb_str_dump): Fix integer overflow. The dump result will be longer then the original String. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13string.c: rename STR_EMBEDABLE_P to STR_EMBEDDABLE_Prhe
* string.c (STR_EMBEDDABLE_P): Renamed from STR_EMBEDABLE_P(). And use it in more places. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56155 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13string.c: STR_EMBEDABLE_Pnobu
* string.c (STR_EMBEDABLE_P): extract the predicate macro to tell if the given length is capable in an embedded string, and fix possible integer overflow. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13string.c: fix integer overflownobu
* string.c (rb_str_change_terminator_length): fix integer overflow in the case growing the terminator length and the string length is around LONG_MAX. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13string.c: fix buffer overflow check condition in rb_str_set_len()rhe
* string.c (rb_str_set_len): The buffer overflow check is wrong. The space for termlen is allocated outside the capacity returned by rb_str_capacity(). This fixes r41920 ("string.c: multi-byte terminator", 2013-07-11). [ruby-core:77257] [Bug #12757] * test/-ext-/string/test_set_len.rb (test_capacity_equals_to_new_size): Test for this change. Applying only the test will trigger [BUG]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-08replace fixnum by integer in documents.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56102 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-27multiple argumentsnobu
* array.c (rb_ary_concat_multi): take multiple arguments. based on the patch by Satoru Horie. [Feature #12333] * string.c (rb_str_concat_multi, rb_str_prepend_multi): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-23string.c: rb_fs_setternobu
* string.c (rb_fs_setter): check and convert $; value at assignment. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-22string.c: $; name in error messagenobu
* string.c (rb_str_split_m): show $; name in error message when it is a wrong object. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55986 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-30* string.c (String#downcase), NEWS: Mentioned that case mapping for allduerst
of ISO-8859-1~16 is now supported. [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55777 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-29rb_funcallvnobu
* *.c: rename rb_funcall2 to rb_funcallv, except for extensions which are/will be/may be gems. [Fix GH-1406] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-28* vm_core.h: revisit the structure of frame, block and env.ko1
[Bug #12628] This patch introduce many changes. * Introduce concept of "Block Handler (BH)" to represent passed blocks. * move rb_control_frame_t::flag to ep[0] (as a special local variable). This flags represents not only frame type, but also env flags such as escaped. * rename `rb_block_t` to `struct rb_block`. * Make Proc, Binding and RubyVM::Env objects wb-protected. Check [Bug #12628] for more details. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-22Fix Issues reported by PVS-Studio static analyzernobu
* vm.c (vm_set_main_stack): remove unnecessary check. toplevel binding must be initialized. [Bug #12611] (N1) * win32/win32.c (w32_symlink): fix return type. [Bug #12611] (N3) * string.c (rb_str_split_m): simplify the condition. [Bug #12611](N4) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55729 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-22* string.c (String#dump): Change escaping of non-ASCII characters induerst
UTF-8 to use upper-case four-digit hexadecimal escapes without braces where possible [Feature #12419]. * test/ruby/test_string.rb (test_dump): Add tests for above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55728 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15* string.c (str_buf_cat): Fix potential interger overflow of capa.ngoto
In addition, termlen is used instead of +1. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15* string.c (str_buf_cat): Fix capa size for embed string.ngoto
Fix bug in r55547. [Bug #12536] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14string.c: reduce malloc overhead for default buffer sizenormal
* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127 [ruby-core:76371] [Feature #12025] * string.c (rb_str_buf_new): adjust for above reduction From Jeremy Evans <code@jeremyevans.net>: This changes the minimum buffer size for string buffers from 128 to 127. The underlying C buffer is always 1 more than the ruby buffer, so this changes the actual amount of memory used for the minimum string buffer from 129 to 128. This makes it much easier on the malloc implementation, as evidenced by the following code (note that time -l is used here, but Linux systems may need time -v). $ cat bench_mem.rb i = ARGV.first.to_i Array.new(1000000){" " * i} $ /usr/bin/time -l ruby bench_mem.rb 128 3.10 real 2.19 user 0.46 sys 289080 maximum resident set size 72673 minor page faults 13 block output operations 29 voluntary context switches $ /usr/bin/time -l ruby bench_mem.rb 127 2.64 real 2.09 user 0.27 sys 162720 maximum resident set size 40966 minor page faults 2 block output operations 4 voluntary context switches To try to ensure a power-of-2 growth, when a ruby string capacity needs to be increased, after doubling the capacity, add one. This ensures the ruby capacity will be odd, which means actual amount of memory used will be even, which is probably better than the current case of the ruby capacity being even and the actual amount of memory used being odd. A very similar patch was proposed 4 years ago in feature #5875. It ended up being rejected, because no performance increase was shown. One reason for that is that ruby does not use STR_BUF_MIN_SIZE unless rb_str_buf_new is called, and that previously did not have a ruby API, only a C API, so unless you were using a C extension that called it, there would be no performance increase. With the recently proposed feature #12024, String.buffer is added, which is a ruby API for creating string buffers. Using String.buffer(100) wastes much less memory with this patch, as the malloc implementation can more easily deal with the power-of-2 sized memory usage. As measured above, memory usage is 44% less, and performance is 17% better. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-05* string.c (rb_str_change_terminator_length): New function to changengoto
termlen and resize heap for the terminator. This is split from rb_str_fill_terminator (str_fill_term) because filling terminator and changing terminator length are different things. [Bug #12536] * internal.h: declaration for rb_str_change_terminator_length. * string.c (str_fill_term): Simplify only to zero-fill the terminator. For non-shared strings, it assumes that (capa + termlen) bytes of heap is allocated. This partially reverts r55557. * encoding.c (rb_enc_associate_index): rb_str_change_terminator_length is used, and it should be called whenever the termlen is changed. * string.c (str_capacity): New static function to return capacity of a string with the given termlen, because the termlen may sometimes be different from TERM_LEN(str) especially during changing termlen or filling terminator with specific termlen. * string.c (rb_str_capacity): Use str_capacity. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e