summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
2017-05-14string.c: cut down intermediate stringnobu
* string.c (rb_external_str_new_with_enc): cut down intermediate string for conversion source, by appending with conversion. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58709 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13revert r58703 & r58705nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58708 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13string.c: fix up r58703nobu
* string.c (rb_external_str_new_with_enc): fix the case of conversion failure. when conversion failed for some reason, just ignores the default internal encoding and returns in the given encoding. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58705 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13string.c: cut down intermediate stringnobu
* string.c (rb_external_str_new_with_enc): cut down intermediate string for conversion source, by appending with conversion. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58703 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13string.c: fix one-off bugnobu
* string.c (rb_str_cat_conv_enc_opts): fix one-off bug. `ofs` equals `olen` when appending at the end. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58702 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-12string.c: remove bare Unicode.nobu
* string.c (rb_str_unicode_normalize): remove bare Unicode. do not assume that all compilers can handle UTF-8. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58688 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11string.c: docs for String#matchstomar
* string.c: [DOC] add example for String#match with pos argument. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11string.c: docs for Symbolstomar
* string.c: [DOC] adopt call-seq's for Symbol#{match,match?} from String methods; other small improvements for Symbol docs. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58668 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11string.c: docs for Symbol#{match,match?}stomar
* string.c: [DOC] mention pos argument for Symbol#{match,match?}. Patch by Yuki Kurihara (ksss). [Fix GH-1606] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-09string.c: fix r58618nobu
* string.c (unicode_normalize_common): aggregation type cannot be initialized with dynamic values, in C89. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-09replace hand-written argument check by call to rb_scan_args in ↵duerst
unicode_normalize_common In string.c, replace hand-written argument count check by call to rb_scan_args. This allows to use rb_funcallv once, rather than using rb_funcall twice. Thanks to Hanmac (Hans Mackowiak) for the idea, see https://bugs.ruby-lang.org/issues/11078#note-7. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-06string.c: fix typesnobu
* string.c (id_normalize, id_normalized_p): fix types, IDs should be ID. * string.c (unicode_normalize_common): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04string.c: [DOC] improve docs for String.newstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58569 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04string.c: [DOC] Properly refer to keyword argument by its namektsj
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58567 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04refactor common parts of unicode normalization functions into ↵duerst
unicode_normalize_common In string.c, refactor the common parts (requiring of unicode_normalize/normalize.rb, check of number of arguments) of the unicode normalization functions (rb_str_unicode_normalize, rb_str_unicode_normalize_bang, rb_str_unicode_normalized_p) into the new function unicode_normalize_common. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58558 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04move definition of String#unicode_normalized? to C to make sure it is documentedduerst
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalized? (including documentation). Leave a comment explaining that the file is now empty. * string.c: Define String#unicode_normalized? in rb_str_unicode_normalized_p in C, (including documentation) * lib/unicode_normalize/normalize.rb: Remove (re)definition of String#unicode_normalized? to avoid warnings (when $VERBOSE==true) and problems when String is frozen git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04move definition of String#unicode_normalize! to C to make sure it is documentedduerst
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize! (including documentation) * string.c: Define String#unicode_normalize! in rb_str_unicode_normalize_bang in C, (including documentation) * lib/unicode_normalize/normalize.rb: Remove (re)definition of String#unicode_normalize! to avoid warnings (when $VERBOSE==true) and problems when String is frozen git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-03* remove trailing spaces.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-03move definition of String#unicode_normalize to C to make sure it is documentedduerst
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize (including documentation) * string.c: Define String#unicode_normalize in rb_str_unicode_normalize in C, (including documentation) * lib/unicode_normalize/normalize.rb: Remove (re)definition of String#unicode_normalize to avoid warnings (when $VERBOSE==true) and problems when String is frozen git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-17string.c: improve insertion performacenobu
* string.c (rb_str_splice_0): improve performace of single byte optimizable cases, insertion 7bit string to 7bit string. [ruby-dev:49984] [Bug #13228] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-29string.c: Supress logical-op-parentheses warningsorah
* string.c(rb_str_upcase_bang): Supress logical-op-parentheses warning Patch by Fukuo Kadota <fukuo-kadota@cookpad.com>, Closes [GH-1570] [Bug #13387]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58211 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-21string.c: use the usable sizenobu
* string.c (rb_str_change_terminator_length): when called after the content has been copied, old terminator length no longer makes sense. use the whole usable size instead of capacity without terminator. [ruby-core:80257] [Bug #13339] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-18fix accidental reversal of r57997 in r58000duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58009 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17clarifiy 'codepoint' in documentation of String#each_codepointduerst
Make sure it's clear that the returned values are not Unicode codepoints for encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE). [ci skip] [Bug #13321] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17deduplicate static rb_str_format format stringsnormal
Anybody who hits these code paths can hit them again in the future, so try deduplicating across multiple runs of these methods to reduce garbage. * string.c (str_upto_each): fstring on "%.*d" * strftime.c (rb_strftime_with_timespec): fstring on "%0*d" git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57997 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-15string.c: shortcut argument checknobu
* string.c (str_casecmp, str_casecmp_p): split to skip argument check when it is a String certainly. * string.c (sym_casecmp, sym_casecmp_p): shortcut argument checks. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-14string.c: use rb_check_string_typenobu
* string.c (rb_str_cmp_m): use rb_check_string_type for check and conversion, instead of calling the conversion method directly. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13docs for Symbol#casecmp and Symbol#casecmp?stomar
* string.c: [DOC] improve docs of Symbol#casecmp and Symbol#casecmp? according to the similar String methods; fix RDoc markup and typos; fix call-seq's for Symbol#{upcase,downcase,capitalize,swapcase}. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57963 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13string.c (rb_str_set_len): pathological checknobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57961 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13string.c: $; is a GC-rootnobu
* string.c (Init_String): $; must be a GC-root, not to be collected. [ruby-core:79582] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-11docs for String#casecmp and String#casecmp?stomar
* string.c: [DOC] specify when String#casecmp and String#casecmp? return nil; modify examples to better show difference to <=>; fix RDoc markup and typos. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08string.c (str_uminus): update doc for deduplicationnormal
As of r57698, String#-@ can return pre-existing strings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08fix parennobu
* string.c (str_byte_substr): fix misplaced parenthesis at r56155. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57809 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07string.c: [DOC] Fix a typo in String#dumpkazu
[Fix GH-1531][ci skip] Author: Alex Semyonov <alex@semyonov.us> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57802 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07string.c: negation of LONG_MINnobu
* string.c (rb_str_update): do not use negation of LONG_MIN, which is negative too. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07string.c: fix integer overflownobu
* string.c (str_byte_substr): fix another integer overflow which can happen only when SHARABLE_MIDDLE_SUBSTRING is enabled. [ruby-core:79951] [Bug #13289] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07string.c: fix integer overflownobu
* string.c (rb_str_subpos): fix integer overflow which can happen only when SHARABLE_MIDDLE_SUBSTRING is enabled. incorpolate https://github.com/mruby/mruby/commit/7db0786abdd243ba031e24683f git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57797 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-04string.c: [DOC] fix doc formatting for String#==, #===stomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-02string.c: restore documentation for String#<<stomar
* string.c: [DOC] restore documentation for String#<< which became undocumented with r56021; fix a typo. [ruby-core:79865] [Bug #13268] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57758 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-24string.c (str_uminus): deduplicate stringsnormal
This exposes the rb_fstring internal function to return a deduped and frozen string when a non-frozen string is given. This is useful for writing all sorts of record processing key values maybe stored, but certain keys and values are often duplicated at a high frequency, so memory savings can noticeable. Use cases are many: * email/NNTP header processing There are some standard header keys everybody uses (From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To), as well as common ones specific to a certain lists: (ruby-core has X-Redmine-* headers) It is also useful to dedupe values, as most inboxes have multiple messages from the same sender, or MUA. * package management systems - things like RubyGems stores identical strings for licenses, dependency names, author names/emails, etc * HTTP headers/trailers - standard headers (Host/Accept/Accept-Encoding/User-Agent/...) are common, but there are also uncommon ones. Values may be deduped, as well, as it is likely a user agent will make multiple/parallel requests to the same server. * version control systems - this can be useful for deduplicating names of frequent committers (like "nobu" :) In linux.git and git.git, there are also common trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/... as well as less common ones. * audio metadata - There are commonly used tags (Artist/Album/Title/Tracknumber), but Vorbis comments allows arbitrary key values to be stored. Music collections contain songs by the same artist or mutiple songs from the same album, so deduplicating values will be helpful there, too. * JSON, YAML, XML, HTML processing Certain fields, tags and attributes are commonly used across the same and multiple documents There is no security concern in this being a DoS vector by causing immortal strings. The fstring table is not a GC-root and not walked during the mark phase. GC-able dynamic symbols since Ruby 2.2 are handled in the same manner, and that implementation also relies on the non-immortality of fstrings. [Feature #13077] [ruby-core:79663] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14string.c: assertionnobu
* string.c (str_shared_replace): use RUBY_ASSERT for pre-condition. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14initialize variablesnobu
* string.c (rb_str_enumerate_lines): initialize conditionally used variable. * thread.c (rb_fd_no_init): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57625 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-13suppress warningsnobu
* string.c (rb_str_enumerate_lines): hint to suppress a maybe-uninitialized warning by gcc. * thread.c (rb_fd_no_init): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-05doc: Add example for Symbol#to_snormal
* string.c: add example for Symbol#to_s. The docs for Symbol#to_s only include an example for Symbol#id2name, but not for #to_s which is an alias; the docs should include examples for both methods. From: Marcus Stollsteimer <sto.mar@web.de> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-03symbol.c (rb_id2str): eliminate branch to set classnormal
Since the fstring table encompasses all strings in the symbol table, we may reuse the fstring table walk to set the class and eliminate the branch in rb_id2str. * string.c (Init_String): use rb_cString immediately after definition * symbol.c (rb_id2str): eliminate branch to set class git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30string.c (rb_str_tmp_frozen_release): release embedded stringsnormal
Handle the embedded case first, since we may have an embedded duplicate and non-embedded original string. * string.c (rb_str_tmp_frozen_release): handled embedded strings * test/ruby/test_io.rb (test_write_no_garbage): new test [ruby-core:78898] [Bug #13085] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30io.c: recycle garbage on writenormal
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times (STR_SET_SHARED): set STR_IS_SHARED_M (rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions (str_new_frozen): set/unset STR_IS_SHARED_M as appropriate * internal.h: declare new functions * io.c (fwrite_arg, fwrite_do, fwrite_end): new (io_fwrite): use new functions Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release to manage a hidden, frozen string. Reuse one bit of the embed length for shared strings as STR_IS_SHARED_M to indicate a string has been shared multiple times. In the common case, the string is only shared once so the object slot can be reclaimed immediately. minimum results in each 3 measurements. (time and size) Execution time (sec) name trunk built io_copy_stream_write 0.682 0.254 io_copy_stream_write_socket 1.225 0.751 Speedup ratio: compare with the result of `trunk' (greater is better) name built io_copy_stream_write 2.680 io_copy_stream_write_socket 1.630 Memory usage (last size) (B) name trunk built io_copy_stream_write 95436800.000 6512640.000 io_copy_stream_write_socket 117628928.000 7127040.000 Memory consuming ratio (size) with the result of `trunk' (greater is better) name built io_copy_stream_write 14.654 io_copy_stream_write_socket 16.505 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-19string.c: rindex(//) should set $~.shugo
This seems a bug introduced by r520 (1.4.0). [ruby-core:79110] [Bug #13135] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-16file.c: refine messagenobu
* file.c (rb_get_path_check_convert): refine the error message when the path name contains null byte. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11string.c: replacement and blocknobu
* string.c (rb_enc_str_scrub): only one of replacement and block is allowed. [ruby-core:79038] [Bug #13119] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e