summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2019-01-17merge revision(s) 66760,66761,66824: [Backport #15460]naruse
Follow behaviour of IO#ungetbyte see r65802 and [Bug #14359] * expand tabs. setbyte / ungetbyte allow out-of-range integers * string.c: String#setbyte to accept arbitrary integers [Bug #15460] * io.c: ditto for IO#ungetbyte * ext/strringio/stringio.c: ditto for StringIO#ungetbyte git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_6@66845 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-12string.c: [DOC] fix typosstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09implement special behavior for Georgian for String#capitalizeduerst
The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06suppress warning: unused variable 'vbits'naruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06Prefer rb_check_arity when 0 or 1 argumentsnobu
Especially over checking argc then calling rb_scan_args just to raise an ArgumentError. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-03string.c: [DOC] deprecate String#crypt [ci skip] [Feature #14915]shyouhei
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24* expand tabs.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24fix r65954; Keep taintynaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65956 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24Don't use single byte optimization on grapheme clustersnaruse
Unicode Text Segmentation considers CRLF as a character. [Bug #15337] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-21char is not unsignedshyouhei
It seems that decades ago, ruby was written under assumption that char is unsigned. Which is of course a false assumption. We need to explicitly store a numeric value into an unsigned char variable to tell we expect 0..255 value. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-19string.c: setbyte silently ignores upper bitsshyouhei
The behaviour of String#setbyte has been depending on the width of int, which is not portable. Must check explicitly. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65804 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-07string.c: this assumption is false [ci skip]shyouhei
Looking at the lines right above, it is clear than a blue sky that we cannot assume `p` to be aligned at all when UNALIGNED_WORD_ACCESS is true. It is a wrong idea to use __builtin_assume_aligned for that situation. See also: https://travis-ci.org/ruby/ruby/jobs/451710732#L2007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65592 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-06adopt sanitizer APIshyouhei
These APIs are much like <valgrind/memcheck.h>. Use them to fine-grain annotate the usage of our memory. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30fix type.ko1
* string.c (rb_str_format_m): should pass `int`. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65456 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30introduce TransientHeap. [Bug #14858]ko1
* transient_heap.c, transient_heap.h: implement TransientHeap (theap). theap is designed for Ruby's object system. theap is like Eden heap on generational GC terminology. theap allocation is very fast because it only needs to bump up pointer and deallocation is also fast because we don't do anything. However we need to evacuate (Copy GC terminology) if theap memory is long-lived. Evacuation logic is needed for each type. See [Bug #14858] for details. * array.c: Now, theap for T_ARRAY is supported. ary_heap_alloc() tries to allocate memory area from theap. If this trial sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on. We don't need to free theap ptr. * ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that if ary is allocated at theap, force evacuation to malloc'ed memory. It makes programs slow, but very compatible with current code because theap memory can be evacuated (theap memory will be recycled). If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT() instead of RARRAY_CONST_PTR(). If you can't understand when evacuation will occur, use RARRAY_CONST_PTR(). (re-commit of r65444) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30* expand tabs.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65448 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30revert r65444 and r65446 because of commit missko1
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30introduce TransientHeap. [Bug #14858]ko1
* transient_heap.c, transient_heap.h: implement TransientHeap (theap). theap is designed for Ruby's object system. theap is like Eden heap on generational GC terminology. theap allocation is very fast because it only needs to bump up pointer and deallocation is also fast because we don't do anything. However we need to evacuate (Copy GC terminology) if theap memory is long-lived. Evacuation logic is needed for each type. See [Bug #14858] for details. * array.c: Now, theap for T_ARRAY is supported. ary_heap_alloc() tries to allocate memory area from theap. If this trial sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on. We don't need to free theap ptr. * ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that if ary is allocated at theap, force evacuation to malloc'ed memory. It makes programs slow, but very compatible with current code because theap memory can be evacuated (theap memory will be recycled). If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT() instead of RARRAY_CONST_PTR(). If you can't understand when evacuation will occur, use RARRAY_CONST_PTR(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-26string.c: improve docs for String#strip and relatedstomar
* string.c: [DOC] improve docs for String#{strip,lstrip,rstrip}{,!}: small clarification, avoid referring to the receiver as `str' (does not appear in the call-seq of the generated HTML docs), enable links for cross-references, simplify rdoc. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-19array.c, file.c, string.c: [DOC] fix typosstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16string.c: grapheme cluster regexp failurenobu
* string.c (get_reg_grapheme_cluster): show error info and relax to rb_fatal from rb_bug. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13string.c: [DOC] add example code for String#strip!stomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65068 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13string.c: small doc improvementstomar
* string.c: [DOC] move unaltered case for String#strip to the end, similar to other strip methods. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13Prefer `rb_fstring_lit` over `rb_fstring_cstr`nobu
The former states explicitly that the argument must be a literal, and can optimize away `strlen` on all compilers. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65059 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13Added comments to rb_setup_fake_str and rb_fstring_new [ci skip]nobu
`ptr` for these functions must refer constant string literals. Otherwise, the result string's content can be modified/discarded unexpectedly. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65058 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-09-16[DOC] Improve String#strip documentation.marcandre
Patch by Josh Goldberg. [Fix GH-1933] [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64757 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-27move function declarations from insns.def to internal.hshyouhei
Just avoid being loose. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63755 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-11string.c: [DOC] grammar fixesstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63632 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-08[Docs] Improve documentation of String#linesnobu
* Document about optional getline arguments * Add examples, especially for the demonstration of `chomp: true` [Fix GH-1886] From: Koki Takahashi <hakatasiloving@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63610 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-04String#uminus dedupes unconditionallynormal
[Feature #14478] [ruby-core:85669] Thanks-to: Sam Saffron <sam.saffron@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63566 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-01string.c: trivial optimizationsnobu
* string.c (rb_str_aset): prefer BUILTIN_TYPE over TYPE after SPECIAL_CONST_P check. * string.c (rb_str_start_with): prefer RB_TYPE_P over switch by TYPE. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63543 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-01string.c: doc for [Feature #13712]nobu
* string.c (rb_str_start_with): [DOC] start_with? example with regexp. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-05-22string.c: MAYBE_UNUSED to suppress warnings for `old`normal
Building with HAVE_MALLOC_USABLE_SIZE currently makes SIZED_REALLOC_N ignore the old size arg. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63487 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-05-22string.c: size hints for free and realloc callsnormal
Another part of the plan to reduce dependencies on malloc_usable_size: https://bugs.ruby-lang.org/issues/10238 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63485 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-28string.c: adjust to rb_str_upto_eachnobu
* range.c (range_each_func): adjust the signature of the callback function to rb_str_upto_each, and exit the loop if the callback returned non-zero. * string.c (rb_str_upto_endless_each): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-24string.c: fix scanned substring with `\K`nobu
* string.c (scan_once): fix the matched substring with `\K`, the beginning of that string may differ from the matched position. [ruby-core:86663] [Bug #14707] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63252 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-19Introduce endless range [Feature#12912]mame
Typical usages: ``` p ary[1..] # drop the first element; identical to ary[1..-1] (1..).each {|n|...} # iterate forever from 1; identical to 1.step{...} ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63192 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-17string.c: suppress warningnobu
* string.c (str_undump): get rid of warning C4129 by VC. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63170 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-16string.c: fix dumped suffixnobu
* string.c (rb_str_dump): get rid of an error on evaling with frozen-string-literal enabled. [ruby-core:86539] [Bug #14687] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-16string.c: fix checking ordernobu
* string.c (str_undump): check for suffix before if Unicode escape conflicts with it. the message "but used force_encoding" sounds strange when it is not used. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63162 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-14string.c: [DOC] fix typostomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63160 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22Factor out get_reg_grapheme_clusternaruse
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62893 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22fix each_grapheme_cluster's size [Bug #14363]naruse
From: Hugo Peixoto <hugo.peixoto@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22Revert "each_grapheme_cluster shouldn't return size [Bug #14363]"naruse
This reverts commit r62887. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22each_grapheme_cluster shouldn't return size [Bug #14363]naruse
From: Stefan Schüßler <mail@stefanschuessler.de> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62888 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-21Improve documentation for 'text '.splitnobu
The documentation didn't mention trailing spaces and the example only demonstrated the case with leading spaces. [Fix GH-1845] From: Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-17string.c: [DOC] split with block [ci skip]nobu
* string.c (rb_str_split_m): [DOC] about split with block. [Feature #4780] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62790 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-15string.c: split with blocknobu
* string.c (rb_str_split_m): yield each split substrings if the block is given, instead of returing the array. [Feature #4780] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-14quote symbolsnobu
* sprintf.c (ruby__sfvextra): quote symbols as identifiers. * string.c (rb_id_quote_unprintable): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-08Export some missing symbols for MJITk0kubun
tool/ruby_vm/views/_insn_name_info.erb: on Linux, rb_vm_insn_name_offset was needed to compile with --jit-debug (Usually --jit-debug requires more symbols than the situation without --jit-debug because -O2 skips some functions to compile). vm.c: when running transform_mjit_header.rb with --jit-wait, rb_source_location_cstr was repoted to be missing. string.c: ditto, for rb_str_eql numeric.c: ditto, for rb_float_eql git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e