summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2016-07-22Fix Issues reported by PVS-Studio static analyzernobu
* vm.c (vm_set_main_stack): remove unnecessary check. toplevel binding must be initialized. [Bug #12611] (N1) * win32/win32.c (w32_symlink): fix return type. [Bug #12611] (N3) * string.c (rb_str_split_m): simplify the condition. [Bug #12611](N4) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55729 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-22* string.c (String#dump): Change escaping of non-ASCII characters induerst
UTF-8 to use upper-case four-digit hexadecimal escapes without braces where possible [Feature #12419]. * test/ruby/test_string.rb (test_dump): Add tests for above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55728 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15* string.c (str_buf_cat): Fix potential interger overflow of capa.ngoto
In addition, termlen is used instead of +1. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15* string.c (str_buf_cat): Fix capa size for embed string.ngoto
Fix bug in r55547. [Bug #12536] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14string.c: reduce malloc overhead for default buffer sizenormal
* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127 [ruby-core:76371] [Feature #12025] * string.c (rb_str_buf_new): adjust for above reduction From Jeremy Evans <code@jeremyevans.net>: This changes the minimum buffer size for string buffers from 128 to 127. The underlying C buffer is always 1 more than the ruby buffer, so this changes the actual amount of memory used for the minimum string buffer from 129 to 128. This makes it much easier on the malloc implementation, as evidenced by the following code (note that time -l is used here, but Linux systems may need time -v). $ cat bench_mem.rb i = ARGV.first.to_i Array.new(1000000){" " * i} $ /usr/bin/time -l ruby bench_mem.rb 128 3.10 real 2.19 user 0.46 sys 289080 maximum resident set size 72673 minor page faults 13 block output operations 29 voluntary context switches $ /usr/bin/time -l ruby bench_mem.rb 127 2.64 real 2.09 user 0.27 sys 162720 maximum resident set size 40966 minor page faults 2 block output operations 4 voluntary context switches To try to ensure a power-of-2 growth, when a ruby string capacity needs to be increased, after doubling the capacity, add one. This ensures the ruby capacity will be odd, which means actual amount of memory used will be even, which is probably better than the current case of the ruby capacity being even and the actual amount of memory used being odd. A very similar patch was proposed 4 years ago in feature #5875. It ended up being rejected, because no performance increase was shown. One reason for that is that ruby does not use STR_BUF_MIN_SIZE unless rb_str_buf_new is called, and that previously did not have a ruby API, only a C API, so unless you were using a C extension that called it, there would be no performance increase. With the recently proposed feature #12024, String.buffer is added, which is a ruby API for creating string buffers. Using String.buffer(100) wastes much less memory with this patch, as the malloc implementation can more easily deal with the power-of-2 sized memory usage. As measured above, memory usage is 44% less, and performance is 17% better. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-05* string.c (rb_str_change_terminator_length): New function to changengoto
termlen and resize heap for the terminator. This is split from rb_str_fill_terminator (str_fill_term) because filling terminator and changing terminator length are different things. [Bug #12536] * internal.h: declaration for rb_str_change_terminator_length. * string.c (str_fill_term): Simplify only to zero-fill the terminator. For non-shared strings, it assumes that (capa + termlen) bytes of heap is allocated. This partially reverts r55557. * encoding.c (rb_enc_associate_index): rb_str_change_terminator_length is used, and it should be called whenever the termlen is changed. * string.c (str_capacity): New static function to return capacity of a string with the given termlen, because the termlen may sometimes be different from TERM_LEN(str) especially during changing termlen or filling terminator with specific termlen. * string.c (rb_str_capacity): Use str_capacity. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01* string.c: Partially reverts r55547 and r55555.ngoto
ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01* string.c (str_fill_term): When termlen increases, re-allocationngoto
of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01* string.c: Specify termlen as far as possible.ngoto
Additional fix for [Bug #12536] [ruby-dev:49699]. * string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen which is apparently 1 for the encodings. * string.c (str_new0_cstr): New static function to create a String object from a C string with specifying termlen. * string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify termlen by using new str_new0_cstr(). * string.c (str_new_static): Specify termlen from the given encoding when creating a new String object is needed. * string.c (rb_tainted_str_new_with_enc): New function to create a tainted String object with the given encoding. This means that the termlen is correctly specified. Curretly static function. The function name might be renamed to rb_tainted_enc_str_new or rb_enc_tainted_str_new. * string.c (rb_external_str_new_with_enc): Use encoding by using the above rb_tainted_str_new_with_enc(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01* string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAXngoto
is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]ngoto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-30* string.c: Fix memory corruptions when using UTF-16/32 strings.ngoto
[Bug #12536] [ruby-dev:49699] * string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length, the same as largest value of rb_enc_mbminlen(enc) among encodings. * string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate +TERM_LEN_MAX bytes instead of +1. This change may increase memory usage. * string.c (rb_str_new_with_class): Use TERM_LEN of the "obj". * string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware of termlen. * string.c (str_shared_replace): Copy +termlen bytes instead of +1. * string.c (rb_str_times): termlen should not be included in capa. * string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX, termlen should be counted with it because embedded strings are also processed by TERM_FILL. * string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto. * string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21CASEMAP_DEBUG [ci skip]nobu
* string.c (rb_str_casemap, rb_str_ascii_casemap): move debug/tuning messages under a preprocessor condition, CASEMAP_DEBUG. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21Fix garbage allocationnobu
* string.c (rb_str_casemap): do not put code with side effects inside RSTRING_PTR() macro which evaluates the argument multiple times. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21* string.c (rb_str_casemap): fix memory leak.naruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21* string.c (rb_str_casemap): int is too small for string size.naruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16string.c: adjust buffer sizenobu
* string.c (tr_trans): adjust buffer size by processed and rest lengths, instead of doubling repeatedly. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55428 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16string.c: fix terminatornobu
* string.c (tr_trans): consider terminator length and fix heap overflow. reported by Guido Vranken <guido AT guidovranken.nl>. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55427 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11Fix typo in string.c [ci skip]nobu
* string.c (rb_str_oct): [DOC] fix typo, hornored -> honored. [Fix GH-1379] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55378 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11* enc/iso_8859_1.c: Implement non-ASCII case mapping.duerst
* test/ruby/enc/test_case_comprehensive.rb: Tests for above. * string.c: Add iso-8859-1 to supported encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55373 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10* string.c: Special-case :ascii option in rb_str_capitalize_bang andduerst
rb_str_swapcase_bang. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55361 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10* string.c: Special-case :ascii option in rb_str_upcase_bang (retry).duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55359 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10hash.c: ensure NUL-terminated for ENVnobu
* hash.c (get_env_cstr): ensure NUL-terminated. [ruby-dev:49655] [Bug #12475] * string.c (rb_str_fill_terminator): return the pointer to the NUL-terminated content. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55345 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08string.c (rb_str_ascii_casemap): fix compile error.kazu
error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55332 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08* string.c: Revert previous commit (possibility of endless loop).duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08* string.c: Special-case :ascii option in rb_str_upcase_bang.duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55330 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08* string.c: New static function rb_str_ascii_casemap; special-casingduerst
:ascii option in rb_str_upcase_bang and rb_str_downcase_bang. * regenc.c: Fix a bug (wrong use of unnecessary slack at end of string). * regenc.h -> include/ruby/oniguruma.h: Move declaration of onigenc_ascii_only_case_map so that it is visible in string.c. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07* string.c (rb_str_upcase_bang, rb_str_capitalize_bang,duerst
rb_str_swapcase_bang): Switch to use primitive. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07* string.c (rb_str_downcase_bang): Switch to use primitive except ifduerst
conversion can be done ASCII-only. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06* string.c: Added UTF-16BE/LE and UTF-32BE/LE to supported encodingsduerst
for Unicode case mapping. * test/ruby/enc/test_case_comprehensive.rb: Tests for above functionality; fixed an encoding issue in assertion error message. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06* string.c Change rb_str_casemap to use encoding primitiveduerst
case_map instead of directly calling onigenc_unicode_case_map. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-05* string.c: Remove :lithuanian guard for Unicode case mapping.duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55277 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-04crypt.h: remove initializednobu
* missing/crypt.h (struct crypt_data): remove unnecessary member "initialized". * missing/crypt.c (des_setkey_r): nothing to be initialized in crypt_data. * configure.in (struct crypt_data): check for "initialized" in struct crypt_data, which may be only in glibc, and isn't on AIX at least. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-02* string.c: Raise ArgumentError when invalid string is detected induerst
case mapping methods. * enc/unicode.c: Check for invalid string and signal with negative length value. * test/ruby/enc/test_case_mapping.rb: Add tests for above. * test/ruby/test_m17n_comb.rb: Add a message to clarify test failure. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01string.c: fallback to crypt_rnobu
* string.c: prefer crypt_r to crypt iff system crypt nor crypt_r are not provided. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01use system cryptnobu
* configure.in: revert r55237. replace crypt, not crypt_r, and check if crypt is broken more. * missing/crypt.c: move crypt_r.c * string.c (rb_str_crypt): use crypt_r if provided by the system. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01use crypt_rnobu
* string.c (rb_str_crypt): use reentrant crypt_r. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55237 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31Revert r55225naruse
Run test-all before large commit: "* string.c: Activate full Unicode case mapping for UTF-8 by removing" This reverts commit 3fb0fcd1e881c1f6dd74db73a64e8623208acb77. http://rubyci.s3.amazonaws.com/centos5-64/ruby-trunk/log/20160531T013303Z.fail.html.gz git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55226 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31* string.c: Activate full Unicode case mapping for UTF-8 by removingduerst
the protective check for the presence of an option. Update documentation. * test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30* string.c: Document current behavior for other case mapping methodsduerst
on String. [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30* string.c: Document current situation for String#downcase. [ci skip]duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55215 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30string.c: return reallocated pointernobu
* string.c (str_fill_term): return new pointer reallocated by filling terminator. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30string.c: get rid of unnecessary empty stringnobu
* string.c (str_substr, rb_str_aref): refactor not to create unnecessary empty string. * string.c (str_byte_substr, str_byte_aref): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55209 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30string.c: check in the ordernobu
* string.c (rb_str_aref_m, rb_str_byteslice): check arguments in the left-to-right order. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55208 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-27transcode.c: scrub in the given encodingnobu
* transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18string.c: integer overflownobu
* string.c (rb_str_modify_expand): check integer overflow. [ruby-core:75592] [Bug #12390] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18ruby.h: RB_INTEGER_TYPE_Pnobu
* include/ruby/ruby.h (RB_INTEGER_TYPE_P): new macro and underlying inline function to check if the object is an Integer (Fixnum or Bignum). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-08* configure.in: check function attirbute const and pure,naruse
and define CONSTFUNC and PUREFUNC if available. Note that I don't add those options as default because it still shows many false-positive (it seems not to consider longjmp). * vm_eval.c (stack_check): get rb_thread_t* as an argument to avoid duplicate call of GET_THREAD(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-05* string.c (rb_str_sub): Fix a special match variable name.yui-knk
[ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-03* string.c (count_utf8_lead_bytes_with_word): Use __builtin_popcountnaruse
only if it can use SSE 4.2 POPCNT whose latency is 3 cycle. * internal.h (rb_popcount64): use __builtin_popcountll because now it is in fast path. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e