summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBurdetteLamar <burdettelamar@yahoo.com>2025-10-22 20:12:49 +0100
committerPeter Zhu <peter@peterzhu.ca>2025-10-22 18:13:58 -0400
commitd4ea1686b5f7989c241511bed4760dc384ff7b54 (patch)
treee74f69f4fbfe5f364cbc0c25ec7e39c2bf3a37b5
parentf9338a95afbde65f33c7d8af0d3dc361b727ed4c (diff)
[DOC] Tweaks for String#split
-rw-r--r--doc/string/split.rdoc134
-rw-r--r--string.c2
2 files changed, 70 insertions, 66 deletions
diff --git a/doc/string/split.rdoc b/doc/string/split.rdoc
index 131c14b83f..9e61bc5bab 100644
--- a/doc/string/split.rdoc
+++ b/doc/string/split.rdoc
@@ -1,99 +1,103 @@
-Returns an array of substrings of +self+
-that are the result of splitting +self+
+Creates an array of substrings by splitting +self+
at each occurrence of the given field separator +field_sep+.
-When +field_sep+ is <tt>$;</tt>:
+With no arguments given,
+splits using the field separator <tt>$;</tt>,
+whose default value is +nil+.
-- If <tt>$;</tt> is +nil+ (its default value),
- the split occurs just as if +field_sep+ were given as a space character
- (see below).
+With no block given, returns the array of substrings:
-- If <tt>$;</tt> is a string,
- the split occurs just as if +field_sep+ were given as that string
- (see below).
+ 'abracadabra'.split('a') # => ["", "br", "c", "d", "br"]
-When +field_sep+ is <tt>' '</tt> and +limit+ is +0+ (its default value),
-the split occurs at each sequence of whitespace:
+When +field_sep+ is +nil+ or <tt>' '</tt> (a single space),
+splits at each sequence of whitespace:
- 'abc def ghi'.split(' ') # => ["abc", "def", "ghi"]
- "abc \n\tdef\t\n ghi".split(' ') # => ["abc", "def", "ghi"]
- 'abc def ghi'.split(' ') # => ["abc", "def", "ghi"]
+ 'foo bar baz'.split(nil) # => ["foo", "bar", "baz"]
+ 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"]
+ "foo \n\tbar\t\n baz".split(' ') # => ["foo", "bar", "baz"]
+ 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"]
''.split(' ') # => []
-When +field_sep+ is a string different from <tt>' '</tt>
-and +limit+ is +0+,
-the split occurs at each occurrence of +field_sep+;
-trailing empty substrings are not returned:
+When +field_sep+ is an empty string,
+splits at every character:
- 'abracadabra'.split('ab') # => ["", "racad", "ra"]
- 'aaabcdaaa'.split('a') # => ["", "", "", "bcd"]
- ''.split('a') # => []
- '3.14159'.split('1') # => ["3.", "4", "59"]
- '!@#$%^$&*($)_+'.split('$') # => ["!@#", "%^", "&*(", ")_+"]
- 'тест'.split('т') # => ["", "ес"]
- 'こんにちは'.split('に') # => ["こん", "ちは"]
+ 'abracadabra'.split('') # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
+ ''.split('') # => []
+ 'тест'.split('') # => ["т", "е", "с", "т"]
+ 'こんにちは'.split('') # => ["こ", "ん", "に", "ち", "は"]
-When +field_sep+ is a Regexp and +limit+ is +0+,
-the split occurs at each occurrence of a match;
-trailing empty substrings are not returned:
+When +field_sep+ is a non-empty string and different from <tt>' '</tt> (a single space),
+uses that string as the separator:
+
+ 'abracadabra'.split('a') # => ["", "br", "c", "d", "br"]
+ 'abracadabra'.split('ab') # => ["", "racad", "ra"]
+ ''.split('a') # => []
+ 'тест'.split('т') # => ["", "ес"]
+ 'こんにちは'.split('に') # => ["こん", "ちは"]
+
+When +field_sep+ is a Regexp,
+splits at each occurrence of a matching substring:
'abracadabra'.split(/ab/) # => ["", "racad", "ra"]
- 'aaabcdaaa'.split(/a/) # => ["", "", "", "bcd"]
- 'aaabcdaaa'.split(//) # => ["a", "a", "a", "b", "c", "d", "a", "a", "a"]
'1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"]
+ 'abracadabra'.split(//) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
-If the \Regexp contains groups, their matches are also included
+If the \Regexp contains groups, their matches are included
in the returned array:
'1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"]
-As seen above, if +limit+ is +0+,
-trailing empty substrings are not returned:
+Argument +limit+ sets a limit on the size of the returned array;
+it also determines whether trailing empty strings are included in the returned array.
- 'aaabcdaaa'.split('a') # => ["", "", "", "bcd"]
+When +limit+ is zero,
+there is no limit on the size of the array,
+but trailing empty strings are omitted:
-If +limit+ is positive integer +n+, no more than <tt>n - 1-</tt>
-splits occur, so that at most +n+ substrings are returned,
-and trailing empty substrings are included:
+ 'abracadabra'.split('', 0) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
+ 'abracadabra'.split('a', 0) # => ["", "br", "c", "d", "br"] # Empty string after last 'a' omitted.
- 'aaabcdaaa'.split('a', 1) # => ["aaabcdaaa"]
- 'aaabcdaaa'.split('a', 2) # => ["", "aabcdaaa"]
- 'aaabcdaaa'.split('a', 5) # => ["", "", "", "bcd", "aa"]
- 'aaabcdaaa'.split('a', 7) # => ["", "", "", "bcd", "", "", ""]
- 'aaabcdaaa'.split('a', 8) # => ["", "", "", "bcd", "", "", ""]
+When +limit+ is a positive integer,
+there is a limit on the size of the array (no more than <tt>n - 1</tt> splits occur),
+and trailing empty strings are included:
-Note that if +field_sep+ is a \Regexp containing groups,
-their matches are in the returned array, but do not count toward the limit.
+ 'abracadabra'.split('', 3) # => ["a", "b", "racadabra"]
+ 'abracadabra'.split('a', 3) # => ["", "br", "cadabra"]
+ 'abracadabra'.split('', 30) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
+ 'abracadabra'.split('a', 30) # => ["", "br", "c", "d", "br", ""]
+ 'abracadabra'.split('', 1) # => ["abracadabra"]
+ 'abracadabra'.split('a', 1) # => ["abracadabra"]
-If +limit+ is negative, it behaves the same as if +limit+ was zero,
-meaning that there is no limit,
-and trailing empty substrings are included:
+When +limit+ is negative,
+there is no limit on the size of the array,
+and trailing empty strings are omitted:
- 'aaabcdaaa'.split('a', -1) # => ["", "", "", "bcd", "", "", ""]
+ 'abracadabra'.split('', -1) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
+ 'abracadabra'.split('a', -1) # => ["", "br", "c", "d", "br", ""]
If a block is given, it is called with each substring and returns +self+:
- 'abc def ghi'.split(' ') {|substring| p substring }
+ 'foo bar baz'.split(' ') {|substring| p substring }
+
+Output :
+
+ "foo"
+ "bar"
+ "baz"
-Output:
+Note that the above example is functionally equivalent to:
- "abc"
- "def"
- "ghi"
- => "abc def ghi"
+ 'foo bar baz'.split(' ').each {|substring| p substring }
-Note that the above example is functionally the same as calling +#each+ after
-+#split+ and giving the same block. However, the above example has better
-performance because it avoids the creation of an intermediate array. Also,
-note the different return values.
+Output :
- 'abc def ghi'.split(' ').each {|substring| p substring }
+ "foo"
+ "bar"
+ "baz"
-Output:
+But the latter:
- "abc"
- "def"
- "ghi"
- => ["abc", "def", "ghi"]
+- Has poorer performance because it creates an intermediate array.
+- Returns an array (instead of +self+).
-Related: String#partition, String#rpartition.
+Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString].
diff --git a/string.c b/string.c
index f907798f85..1236057ad1 100644
--- a/string.c
+++ b/string.c
@@ -9192,7 +9192,7 @@ literal_split_pattern(VALUE spat, split_type_t default_type)
/*
* call-seq:
- * split(field_sep = $;, limit = 0) -> array
+ * split(field_sep = $;, limit = 0) -> array_of_substrings
* split(field_sep = $;, limit = 0) {|substring| ... } -> self
*
* :include: doc/string/split.rdoc