[DOC] Tweaks for String#split

author: BurdetteLamar <burdettelamar@yahoo.com> 2025-10-22 20:12:49 +0100
committer: Peter Zhu <peter@peterzhu.ca> 2025-10-22 18:13:58 -0400
commit: d4ea1686b5f7989c241511bed4760dc384ff7b54 (patch)
tree: e74f69f4fbfe5f364cbc0c25ec7e39c2bf3a37b5
parent: f9338a95afbde65f33c7d8af0d3dc361b727ed4c (diff)
2 files changed, 70 insertions, 66 deletions
diff --git a/doc/string/split.rdoc b/doc/string/split.rdoc
index 131c14b83f..9e61bc5bab 100644
--- a/doc/string/split.rdoc
+++ b/doc/string/split.rdoc
@@ -1,99 +1,103 @@
-Returns an array of substrings of +self+
-that are the result of splitting +self+
+Creates an array of substrings by splitting +self+
 at each occurrence of the given field separator +field_sep+.
 
-When +field_sep+ is <tt>$;</tt>:
+With no arguments given,
+splits using the field separator <tt>$;</tt>,
+whose default value is +nil+.
 
-- If <tt>$;</tt> is +nil+ (its default value),
-  the split occurs just as if +field_sep+ were given as a space character
-  (see below).
+With no block given, returns the array of substrings:
 
-- If <tt>$;</tt> is a string,
-  the split occurs just as if +field_sep+ were given as that string
-  (see below).
+  'abracadabra'.split('a') # => ["", "br", "c", "d", "br"]
 
-When +field_sep+ is <tt>' '</tt> and +limit+ is +0+ (its default value),
-the split occurs at each sequence of whitespace:
+When +field_sep+ is +nil+ or <tt>' '</tt> (a single space),
+splits at each sequence of whitespace:
 
-  'abc def ghi'.split(' ')          # => ["abc", "def", "ghi"]
-  "abc \n\tdef\t\n  ghi".split(' ') # => ["abc", "def", "ghi"]
-  'abc  def   ghi'.split(' ')       # => ["abc", "def", "ghi"]
+  'foo bar baz'.split(nil)          # => ["foo", "bar", "baz"]
+  'foo bar baz'.split(' ')          # => ["foo", "bar", "baz"]
+  "foo \n\tbar\t\n  baz".split(' ') # => ["foo", "bar", "baz"]
+  'foo  bar   baz'.split(' ')       # => ["foo", "bar", "baz"]
   ''.split(' ')                     # => []
 
-When +field_sep+ is a string different from <tt>' '</tt>
-and +limit+ is +0+,
-the split occurs at each occurrence of +field_sep+;
-trailing empty substrings are not returned:
+When +field_sep+ is an empty string,
+splits at every character:
 
-  'abracadabra'.split('ab')   # => ["", "racad", "ra"]
-  'aaabcdaaa'.split('a')      # => ["", "", "", "bcd"]
-  ''.split('a')               # => []
-  '3.14159'.split('1')        # => ["3.", "4", "59"]
-  '!@#$%^$&*($)_+'.split('$') # => ["!@#", "%^", "&*(", ")_+"]
-  'тест'.split('т')           # => ["", "ес"]
-  'こんにちは'.split('に')      # => ["こん", "ちは"]
+  'abracadabra'.split('') # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
+  ''.split('')            # => []
+  'тест'.split('')        # => ["т", "е", "с", "т"]
+  'こんにちは'.split('')   # => ["こ", "ん", "に", "ち", "は"]
 
-When +field_sep+ is a Regexp and +limit+ is +0+,
-the split occurs at each occurrence of a match;
-trailing empty substrings are not returned:
+When +field_sep+ is a non-empty string and different from <tt>' '</tt> (a single space),
+uses that string as the separator:
+
+  'abracadabra'.split('a')  # => ["", "br", "c", "d", "br"]
+  'abracadabra'.split('ab') # => ["", "racad", "ra"]
+  ''.split('a')             # => []
+  'тест'.split('т')         # => ["", "ес"]
+  'こんにちは'.split('に')    # => ["こん", "ちは"]
+
+When +field_sep+ is a Regexp,
+splits at each occurrence of a matching substring:
 
   'abracadabra'.split(/ab/) # => ["", "racad", "ra"]
-  'aaabcdaaa'.split(/a/)    # => ["", "", "", "bcd"]
-  'aaabcdaaa'.split(//)     # => ["a", "a", "a", "b", "c", "d", "a", "a", "a"]
   '1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"]
+  'abracadabra'.split(//)   # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
 
-If the \Regexp contains groups, their matches are also included
+If the \Regexp contains groups, their matches are included
 in the returned array:
 
   '1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"]
 
-As seen above, if +limit+ is +0+,
-trailing empty substrings are not returned:
+Argument +limit+ sets a limit on the size of the returned array;
+it also determines whether trailing empty strings are included in the returned array.
 
-  'aaabcdaaa'.split('a')    # => ["", "", "", "bcd"]
+When +limit+ is zero,
+there is no limit on the size of the array,
+but trailing empty strings are omitted:
 
-If +limit+ is positive integer +n+, no more than <tt>n - 1-</tt>
-splits occur, so that at most +n+ substrings are returned,
-and trailing empty substrings are included:
+  'abracadabra'.split('', 0)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
+  'abracadabra'.split('a', 0) # => ["", "br", "c", "d", "br"]  # Empty string after last 'a' omitted.
 
-  'aaabcdaaa'.split('a', 1) # => ["aaabcdaaa"]
-  'aaabcdaaa'.split('a', 2) # => ["", "aabcdaaa"]
-  'aaabcdaaa'.split('a', 5) # => ["", "", "", "bcd", "aa"]
-  'aaabcdaaa'.split('a', 7) # => ["", "", "", "bcd", "", "", ""]
-  'aaabcdaaa'.split('a', 8) # => ["", "", "", "bcd", "", "", ""]
+When +limit+ is a positive integer,
+there is a limit on the size of the array (no more than <tt>n - 1</tt> splits occur),
+and trailing empty strings are included:
 
-Note that if +field_sep+ is a \Regexp containing groups,
-their matches are in the returned array, but do not count toward the limit.
+  'abracadabra'.split('', 3)   # => ["a", "b", "racadabra"]
+  'abracadabra'.split('a', 3)  # => ["", "br", "cadabra"]
+  'abracadabra'.split('', 30)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
+  'abracadabra'.split('a', 30) # => ["", "br", "c", "d", "br", ""]
+  'abracadabra'.split('', 1)   # => ["abracadabra"]
+  'abracadabra'.split('a', 1)  # => ["abracadabra"]
 
-If +limit+ is negative, it behaves the same as if +limit+ was zero,
-meaning that there is no limit,
-and trailing empty substrings are included:
+When +limit+ is negative,
+there is no limit on the size of the array,
+and trailing empty strings are omitted:
 
-  'aaabcdaaa'.split('a', -1) # => ["", "", "", "bcd", "", "", ""]
+  'abracadabra'.split('', -1)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
+  'abracadabra'.split('a', -1) # => ["", "br", "c", "d", "br", ""]
 
 If a block is given, it is called with each substring and returns +self+:
 
-  'abc def ghi'.split(' ') {|substring| p substring }
+  'foo bar baz'.split(' ') {|substring| p substring }
+
+Output :
+
+  "foo"
+  "bar"
+  "baz"
 
-Output:
+Note that the above example is functionally equivalent to:
 
-  "abc"
-  "def"
-  "ghi"
-  => "abc def ghi"
+   'foo bar baz'.split(' ').each {|substring| p substring }
 
-Note that the above example is functionally the same as calling +#each+ after
-+#split+ and giving the same block. However, the above example has better
-performance because it avoids the creation of an intermediate array. Also,
-note the different return values.
+Output :
 
-  'abc def ghi'.split(' ').each {|substring| p substring }
+  "foo"
+  "bar"
+  "baz"
 
-Output:
+But the latter:
 
-  "abc"
-  "def"
-  "ghi"
-  => ["abc", "def", "ghi"]
+- Has poorer performance because it creates an intermediate array.
+- Returns an array (instead of +self+).
 
-Related: String#partition, String#rpartition.
+Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString].
diff --git a/string.c b/string.c
index f907798f85..1236057ad1 100644
--- a/string.c
+++ b/string.c
@@ -9192,7 +9192,7 @@ literal_split_pattern(VALUE spat, split_type_t default_type)
 
 /*
  *  call-seq:
- *    split(field_sep = $;, limit = 0) -> array
+ *    split(field_sep = $;, limit = 0) -> array_of_substrings
  *    split(field_sep = $;, limit = 0) {|substring| ... } -> self
  *
  *  :include: doc/string/split.rdoc
author	BurdetteLamar <burdettelamar@yahoo.com>	2025-10-22 20:12:49 +0100
committer	Peter Zhu <peter@peterzhu.ca>	2025-10-22 18:13:58 -0400
commit	d4ea1686b5f7989c241511bed4760dc384ff7b54 (patch)
tree	e74f69f4fbfe5f364cbc0c25ec7e39c2bf3a37b5
parent	f9338a95afbde65f33c7d8af0d3dc361b727ed4c (diff)