diff options
Diffstat (limited to 'doc/language')
| -rw-r--r-- | doc/language/box.md | 357 | ||||
| -rw-r--r-- | doc/language/bsearch.rdoc | 120 | ||||
| -rw-r--r-- | doc/language/calendars.rdoc | 62 | ||||
| -rw-r--r-- | doc/language/case_mapping.rdoc | 106 | ||||
| -rw-r--r-- | doc/language/character_selectors.rdoc | 100 | ||||
| -rw-r--r-- | doc/language/dig_methods.rdoc | 82 | ||||
| -rw-r--r-- | doc/language/encodings.rdoc | 482 | ||||
| -rw-r--r-- | doc/language/exceptions.md | 521 | ||||
| -rw-r--r-- | doc/language/fiber.md | 290 | ||||
| -rw-r--r-- | doc/language/format_specifications.rdoc | 354 | ||||
| -rw-r--r-- | doc/language/globals.md | 611 | ||||
| -rw-r--r-- | doc/language/hash_inclusion.rdoc | 31 | ||||
| -rw-r--r-- | doc/language/implicit_conversion.rdoc | 221 | ||||
| -rw-r--r-- | doc/language/marshal.rdoc | 318 | ||||
| -rw-r--r-- | doc/language/option_dump.md | 265 | ||||
| -rw-r--r-- | doc/language/options.md | 744 | ||||
| -rw-r--r-- | doc/language/packed_data.rdoc | 729 | ||||
| -rw-r--r-- | doc/language/ractor.md | 797 | ||||
| -rw-r--r-- | doc/language/regexp/methods.rdoc | 41 | ||||
| -rw-r--r-- | doc/language/regexp/unicode_properties.rdoc | 718 | ||||
| -rw-r--r-- | doc/language/signals.rdoc | 106 | ||||
| -rw-r--r-- | doc/language/strftime_formatting.rdoc | 525 |
22 files changed, 7580 insertions, 0 deletions
diff --git a/doc/language/box.md b/doc/language/box.md new file mode 100644 index 0000000000..92514b3ec9 --- /dev/null +++ b/doc/language/box.md @@ -0,0 +1,357 @@ +# Ruby Box - Ruby's in-process separation of Classes and Modules + +Ruby Box is designed to provide separated spaces in a Ruby process, to isolate application code, libraries and monkey patches. + +## Known issues + +* Experimental warning is shown when ruby starts with `RUBY_BOX=1` (specify `-W:no-experimental` option to hide it) +* Installing native extensions may fail under `RUBY_BOX=1` because of stack level too deep in extconf.rb +* `require 'active_support/core_ext'` may fail under `RUBY_BOX=1` +* Defined methods in a box may not be referred by built-in methods written in Ruby + +## TODOs + +* Add the loaded box on iseq to check if another box tries running the iseq (add a field only when VM_CHECK_MODE?) +* Assign its own TOPLEVEL_BINDING in boxes +* Fix calling `warn` in boxes to refer `$VERBOSE` and `Warning.warn` in the box +* Make an internal data container class `Ruby::Box::Entry` invisible +* More test cases about `$LOAD_PATH` and `$LOADED_FEATURES` + +## How to use + +### Enabling Ruby Box + +First, an environment variable should be set at the ruby process bootup: `RUBY_BOX=1`. +The only valid value is `1` to enable Ruby Box. Other values (or unset `RUBY_BOX`) means disabling Ruby Box. And setting the value after Ruby program starts doesn't work. + +### Using Ruby Box + +`Ruby::Box` class is the entrypoint of Ruby Box. + +```ruby +box = Ruby::Box.new +box.require('something') # or require_relative, load +``` + +The required file (either .rb or .so/.dll/.bundle) is loaded in the box (`box` here). The required/loaded files from `something` will be loaded in the box recursively. + +```ruby +# something.rb + +X = 1 + +class Something + def self.x = X + def x = ::X +end +``` + +Classes/modules, those methods and constants defined in the box can be accessed via `box` object. + +```ruby +X = 2 +p X # 2 +p ::X # 2 +p box::Something.x # 1 +p box::X # 1 +``` + +Instance methods defined in the box also run with definitions in the box. + +```ruby +s = box::Something.new + +p s.x # 1 +``` + +## Specifications + +### Ruby Box types + +There are three box types: + +* Master box +* Root box +* User boxes + +Ruby bootstrap runs in the root box, and a + +There is the root box, just a single box in a Ruby process. All builtin classes/modules are defined and run in the root box. (See "Builtin classes and modules".) + +User boxes are to run user-written programs and libraries loaded from user programs. The user's main program (specified by the `ruby` command line argument) is executed in the "main" box, which is a user box automatically created at the end of Ruby's bootstrap. The files specified with `-r` command line option will be required in the main box. + +Calling `Ruby::Box.new` creates an "optional" box (a user, non-main box), technically equal to the main box. + +Ruby also has the master box. The master box is the "master copy" of all boxes. Boxes will be created as a copy of the master box. The master box is only for the source of box copies, and no code runs in the master box. + + +``` +[master] + | + |----[root] + | + |----[main] + | + |----[user box 1] + | + |----[user box 2] + ... +``` + +### Ruby Box class and instances + +`Ruby::Box` is a class, as a subclass of `Module`. `Ruby::Box` instances are a kind of `Module`. + +### Classes and modules defined in boxes + +The classes and modules, newly defined in a box `box`, are accessible via `box`. For example, if a class `A` is defined in `box`, it is accessible as `box::A` from outside of the box. + +In the box `box`, `A` can be referred to as `A` (and `::A`). + +### Built-in classes and modules reopened in boxes + +In boxes, builtin classes/modules are visible and can be reopened. Those classes/modules can be reopened using `class` or `module` clauses, and class/module definitions can be changed. + +The changed definitions are visible only in the box. In other boxes, builtin classes/modules and those instances work without changed definitions. + +```ruby +# in foo.rb +class String + BLANK_PATTERN = /\A\s*\z/ + def blank? + self.match?(BLANK_PATTERN) + end +end + +module Foo + def self.foo = "foo" + + def self.foo_is_blank? + foo.blank? + end +end + +Foo.foo.blank? #=> false +"foo".blank? #=> false + +# in main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.foo_is_blank? #=> false (#blank? called in box) + +"foo".blank? # NoMethodError +String::BLANK_PATTERN # NameError +``` + +The main box and `box` above are different boxes, so monkey patches in main are also invisible in `box`. + +### Builtin classes and modules + +In the box context, "builtin" classes and modules are classes and modules: + +* Accessible without any `require` calls in user scripts +* Defined before any user program start running + +Hereafter, "builtin classes and modules" will be referred to as just "builtin classes". + +Builtin classes and modules are loaded in all boxes, and run in the root box. + +### Exceptional non-built-in classes/modules + +There are some exceptional classes/modules that are enabled in default, but aren't built-in classes. Those classes/modules are: + +* `RubyGems` +* `ErrorHighlight` +* `DidYouMean` +* `SyntaxSuggest` + +Those classes/modules (part of default gems) are loaded in each boxes independently. If a user box's code calls RubyGems, it calls the RubyGems inside the box itself, instead of the root box's one. + +### Builtin classes referred via box objects + +Builtin classes in a box `box` can be referred from other boxes. For example, `box::String` is a valid reference, and `String` and `box::String` are identical (`String == box::String`, `String.object_id == box::String.object_id`). + +`box::String`-like reference returns just a `String` in the current box, so its definition is `String` in the box, not in `box`. + +```ruby +# foo.rb +class String + def self.foo = "foo" +end + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::String.foo # NoMethodError +``` + +### Class instance variables, class variables, constants + +Builtin classes can have different sets of class instance variables, class variables and constants between boxes. + +```ruby +# foo.rb +class Array + @v = "foo" + @@v = "_foo_" + V = "FOO" +end + +Array.instance_variable_get(:@v) #=> "foo" +Array.class_variable_get(:@@v) #=> "_foo_" +Array.const_get(:V) #=> "FOO" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +Array.instance_variable_get(:@v) #=> nil +Array.class_variable_get(:@@v) # NameError +Array.const_get(:V) # NameError +``` + +### Global variables + +In boxes, changes on global variables are also isolated in the boxes. Changes on global variables in a box are visible/applied only in the box. + +```ruby +# foo.rb +$foo = "foo" +$VERBOSE = nil + +puts "This appears: '#{$foo}'" + +# main.rb +p $foo #=> nil +p $VERBOSE #=> false + +box = Ruby::Box.new +box.require_relative('foo') # "This appears: 'foo'" + +p $foo #=> nil +p $VERBOSE #=> false +``` + +### Top level constants + +Usually, top level constants are defined as constants of `Object`. In boxes, top level constants are constants of `Object` in the box. And the box object `box`'s constants are strictly equal to constants of `Object`. + +```ruby +# foo.rb +FOO = 100 + +FOO #=> 100 +Object::FOO #=> 100 + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::FOO #=> 100 + +FOO # NameError +Object::FOO # NameError +``` + +### Top level methods + +Top level methods are private instance methods of `Object`, in each box. + +```ruby +# foo.rb +def yay = "foo" + +class Foo + def self.say = yay +end + +Foo.say #=> "foo" +yay #=> "foo" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.say #=> "foo" + +yay # NoMethodError +``` + +There is no way to expose top level methods in boxes to others. +(See "Expose top level methods as a method of the box object" in "Discussions" section below) + +### Ruby Box scopes + +Ruby Box works in file scope. One `.rb` file runs in a single box. + +Once a file is loaded in a box `box`, all methods/procs defined/created in the file run in `box`. + +### Utility methods + +Several methods are available for trying/testing Ruby Box. + +* `Ruby::Box.current` returns the current box +* `Ruby::Box.enabled?` returns true/false to represent `RUBY_BOX=1` is specified or not +* `Ruby::Box.root` returns the root box +* `Ruby::Box.main` returns the main box +* `Ruby::Box#eval` evaluates a Ruby code (String) in the receiver box, just like calling `#load` with a file + +## Implementation details + +#### ISeq inline method/constant cache + +As described above in "Ruby Box scopes", an ".rb" file runs in a box. So method/constant resolution will be done in a box consistently. + +That means ISeq inline caches work well even with boxes. Otherwise, it's a bug. + +#### Method call global cache (gccct) + +`rb_funcall()` C function refers to the global cc cache table (gccct), and the cache key is calculated with the current box. + +So, `rb_funcall()` calls have a performance penalty when Ruby Box is enabled. + +#### Current box and loading box + +The current box is the box that the executing code is in. `Ruby::Box.current` returns the current box object. + +The loading box is an internally managed box to determine the box to load newly required/loaded files. For example, `box` is the loading box when `box.require("foo")` is called. + +## Discussions + +#### More builtin methods written in Ruby + +If Ruby Box is enabled by default, builtin methods can be written in Ruby because it can't be overridden by users' monkey patches. Builtin Ruby methods can be JIT-ed, and it could bring performance reward. + +#### Monkey patching methods called by builtin methods + +Builtin methods sometimes call other builtin methods. For example, `Hash#map` calls `Hash#each` to retrieve entries to be mapped. Without Ruby Box, Ruby users can overwrite `Hash#each` and expect the behavior change of `Hash#map` as a result. + +But with boxes, `Hash#map` runs in the root box. Ruby users can define `Hash#each` only in user boxes, so users cannot change `Hash#map`'s behavior in this case. To achieve it, users should override both`Hash#map` and `Hash#each` (or only `Hash#map`). + +It is a breaking change. + +Users can define methods using `Ruby::Box.root.eval(...)`, but it's clearly not ideal API. + +#### Assigning values to global variables used by builtin methods + +Similar to monkey patching methods, global variables assigned in a box is separated from the root box. Methods defined in the root box referring a global variable can't find the re-assigned one. + +#### Context of `$LOAD_PATH` and `$LOADED_FEATURES` + +Global variables `$LOAD_PATH` and `$LOADED_FEATURES` control `require` method behaviors. So those variables are determined by the loading box instead of the current box. + +This could potentially conflict with the user's expectations. We should find the solution. + +#### Expose top level methods as a method of the box object + +Currently, top level methods in boxes are not accessible from outside of the box. But there might be a use case to call other box's top level methods. + +#### Separate `cc_tbl` and `callable_m_tbl`, `cvc_tbl` for less classext CoW + +The fields of `rb_classext_t` contains several cache(-like) data, `cc_tbl`(callcache table), `callable_m_tbl`(table of resolved complemented methods) and `cvc_tbl`(class variable cache table). + +The classext CoW is triggered when the contents of `rb_classext_t` are changed, including `cc_tbl`, `callable_m_tbl`, and `cvc_tbl`. But those three tables are changed by just calling methods or referring class variables. So, currently, classext CoW is triggered much more times than the original expectation. + +If we can move those three tables outside of `rb_classext_t`, the number of copied `rb_classext_t` will be much less than the current implementation. diff --git a/doc/language/bsearch.rdoc b/doc/language/bsearch.rdoc new file mode 100644 index 0000000000..90705853d7 --- /dev/null +++ b/doc/language/bsearch.rdoc @@ -0,0 +1,120 @@ += Binary Searching + +A few Ruby methods support binary searching in a collection: + +Array#bsearch:: Returns an element selected via a binary search + as determined by a given block. +Array#bsearch_index:: Returns the index of an element selected via a binary search + as determined by a given block. +Range#bsearch:: Returns an element selected via a binary search + as determined by a given block. + +Each of these methods returns an enumerator if no block is given. + +Given a block, each of these methods returns an element (or element index) from +self+ +as determined by a binary search. +The search finds an element of +self+ which meets +the given condition in <tt>O(log n)</tt> operations, where +n+ is the count of elements. ++self+ should be sorted, but this is not checked. + +There are two search modes: + +Find-minimum mode:: method +bsearch+ returns the first element for which + the block returns +true+; + the block must return +true+ or +false+. +Find-any mode:: method +bsearch+ some element, if any, for which + the block returns zero. + the block must return a numeric value. + +The block should not mix the modes by sometimes returning +true+ or +false+ +and other times returning a numeric value, but this is not checked. + +<b>Find-Minimum Mode</b> + +In find-minimum mode, the block must return +true+ or +false+. +The further requirement (though not checked) is that +there are no indexes +i+ and +j+ such that: + +- <tt>0 <= i < j <= self.size</tt>. +- The block returns +true+ for <tt>self[i]</tt> and +false+ for <tt>self[j]</tt>. + +Less formally: the block is such that all +false+-evaluating elements +precede all +true+-evaluating elements. + +In find-minimum mode, method +bsearch+ returns the first element +for which the block returns +true+. + +Examples: + + a = [0, 4, 7, 10, 12] + a.bsearch {|x| x >= 4 } # => 4 + a.bsearch {|x| x >= 6 } # => 7 + a.bsearch {|x| x >= -1 } # => 0 + a.bsearch {|x| x >= 100 } # => nil + + r = (0...a.size) + r.bsearch {|i| a[i] >= 4 } #=> 1 + r.bsearch {|i| a[i] >= 6 } #=> 2 + r.bsearch {|i| a[i] >= 8 } #=> 3 + r.bsearch {|i| a[i] >= 100 } #=> nil + r = (0.0...Float::INFINITY) + r.bsearch {|x| Math.log(x) >= 0 } #=> 1.0 + +These blocks make sense in find-minimum mode: + + a = [0, 4, 7, 10, 12] + a.map {|x| x >= 4 } # => [false, true, true, true, true] + a.map {|x| x >= 6 } # => [false, false, true, true, true] + a.map {|x| x >= -1 } # => [true, true, true, true, true] + a.map {|x| x >= 100 } # => [false, false, false, false, false] + +This would not make sense: + + a.map {|x| x == 7 } # => [false, false, true, false, false] + +<b>Find-Any Mode</b> + +In find-any mode, the block must return a numeric value. +The further requirement (though not checked) is that +there are no indexes +i+ and +j+ such that: + +- <tt>0 <= i < j <= self.size</tt>. +- The block returns a negative value for <tt>self[i]</tt> + and a positive value for <tt>self[j]</tt>. +- The block returns a negative value for <tt>self[i]</tt> and zero <tt>self[j]</tt>. +- The block returns zero for <tt>self[i]</tt> and a positive value for <tt>self[j]</tt>. + +Less formally: the block is such that: + +- All positive-evaluating elements precede all zero-evaluating elements. +- All positive-evaluating elements precede all negative-evaluating elements. +- All zero-evaluating elements precede all negative-evaluating elements. + +In find-any mode, method +bsearch+ returns some element +for which the block returns zero, or +nil+ if no such element is found. + +Examples: + + a = [0, 4, 7, 10, 12] + a.bsearch {|element| 7 <=> element } # => 7 + a.bsearch {|element| -1 <=> element } # => nil + a.bsearch {|element| 5 <=> element } # => nil + a.bsearch {|element| 15 <=> element } # => nil + + a = [0, 100, 100, 100, 200] + r = (0..4) + r.bsearch {|i| 100 - a[i] } #=> 1, 2 or 3 + r.bsearch {|i| 300 - a[i] } #=> nil + r.bsearch {|i| 50 - a[i] } #=> nil + +These blocks make sense in find-any mode: + + a = [0, 4, 7, 10, 12] + a.map {|element| 7 <=> element } # => [1, 1, 0, -1, -1] + a.map {|element| -1 <=> element } # => [-1, -1, -1, -1, -1] + a.map {|element| 5 <=> element } # => [1, 1, -1, -1, -1] + a.map {|element| 15 <=> element } # => [1, 1, 1, 1, 1] + +This would not make sense: + + a.map {|element| element <=> 7 } # => [-1, -1, 0, 1, 1] diff --git a/doc/language/calendars.rdoc b/doc/language/calendars.rdoc new file mode 100644 index 0000000000..a2540f1c43 --- /dev/null +++ b/doc/language/calendars.rdoc @@ -0,0 +1,62 @@ +== Julian and Gregorian Calendars + +The difference between the +{Julian calendar}[https://en.wikipedia.org/wiki/Julian_calendar] +and the +{Gregorian calendar}[https://en.wikipedia.org/wiki/Gregorian_calendar] +may matter to your program if it uses dates before the switchovers. + +- October 15, 1582. +- September 14, 1752. + +A date will be different in the two calendars, in general. + +=== Different switchover dates + +The reasons for the difference are religious/political histories. + +- On October 15, 1582, several countries changed + from the Julian calendar to the Gregorian calendar; + these included Italy, Poland, Portugal, and Spain. + Other countries in the Western world retained the Julian calendar. +- On September 14, 1752, most of the British empire + changed from the Julian calendar to the Gregorian calendar. + +When your code uses a date before these switchover dates, +it will matter whether it considers the switchover date +to be the earlier date or the later date (or neither). + +See also {a concrete example here}[rdoc-ref:DateTime@When+should+you+use+DateTime+and+when+should+you+use+Time-3F]. + +=== Argument +start+ + +Certain methods in class \Date handle differences in the +{Julian and Gregorian calendars}[rdoc-ref:@Julian+and+Gregorian+Calendars] +by accepting an optional argument +start+, whose value may be: + +- Date::ITALY (the default): the created date is Julian + if before October 15, 1582, Gregorian otherwise: + + d = Date.new(1582, 10, 15) + d.prev_day.julian? # => true + d.julian? # => false + d.gregorian? # => true + +- Date::ENGLAND: the created date is Julian if before September 14, 1752, + Gregorian otherwise: + + d = Date.new(1752, 9, 14, Date::ENGLAND) + d.prev_day.julian? # => true + d.julian? # => false + d.gregorian? # => true + +- Date::JULIAN: the created date is Julian regardless of its value: + + d = Date.new(1582, 10, 15, Date::JULIAN) + d.julian? # => true + +- Date::GREGORIAN: the created date is Gregorian regardless of its value: + + d = Date.new(1752, 9, 14, Date::GREGORIAN) + d.prev_day.gregorian? # => true + diff --git a/doc/language/case_mapping.rdoc b/doc/language/case_mapping.rdoc new file mode 100644 index 0000000000..d40155db03 --- /dev/null +++ b/doc/language/case_mapping.rdoc @@ -0,0 +1,106 @@ += Case Mapping + +Some string-oriented methods use case mapping. + +In String: + +- String#capitalize +- String#capitalize! +- String#casecmp +- String#casecmp? +- String#downcase +- String#downcase! +- String#swapcase +- String#swapcase! +- String#upcase +- String#upcase! + +In Symbol: + +- Symbol#capitalize +- Symbol#casecmp +- Symbol#casecmp? +- Symbol#downcase +- Symbol#swapcase +- Symbol#upcase + +== Default Case Mapping + +By default, all of these methods use full Unicode case mapping, +which is suitable for most languages. +See {Section 3.13 (Default Case Algorithms) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf]. + +Non-ASCII case mapping and folding are supported for UTF-8, +UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings/Symbols. + +Context-dependent case mapping as described in +{Table 3-17 (Context Specification for Casing) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf] +is currently not supported. + +In most cases, the case conversion of a string has the same number of characters as before. +There are exceptions (see also +:fold+ below): + + s = "\u00DF" # => "ß" + s.upcase # => "SS" + s = "\u0149" # => "ʼn" + s.upcase # => "ʼN" + +Case mapping may also depend on locale (see also +:turkic+ below): + + s = "\u0049" # => "I" + s.downcase # => "i" # Dot above. + s.downcase(:turkic) # => "ı" # No dot above. + +Case changes may not be reversible: + + s = 'Hello World!' # => "Hello World!" + s.downcase # => "hello world!" + s.downcase.upcase # => "HELLO WORLD!" # Different from original s. + +Case changing methods may not maintain Unicode normalization. +See String#unicode_normalize. + +== Case Mappings + +Except for +casecmp+ and +casecmp?+, +each of the case-mapping methods listed above +accepts an optional argument, <tt>mapping</tt>. + +The argument is one of: + +- +:ascii+: ASCII-only mapping. + Uppercase letters ('A'..'Z') are mapped to lowercase letters ('a'..'z); + other characters are not changed + + s = "Foo \u00D8 \u00F8 Bar" # => "Foo Ø ø Bar" + s.upcase # => "FOO Ø Ø BAR" + s.downcase # => "foo ø ø bar" + s.upcase(:ascii) # => "FOO Ø ø BAR" + s.downcase(:ascii) # => "foo Ø ø bar" + +- +:turkic+: Full Unicode case mapping. + For the Turkic languages + that distinguish dotted and dotless I, for example Turkish and Azeri. + + s = 'Türkiye' # => "Türkiye" + s.upcase # => "TÜRKIYE" + s.upcase(:turkic) # => "TÜRKİYE" # Dot above. + + s = 'TÜRKIYE' # => "TÜRKIYE" + s.downcase # => "türkiye" + s.downcase(:turkic) # => "türkıye" # No dot above. + +- +:fold+ (available only for String#downcase, String#downcase!, + and Symbol#downcase). + Unicode case folding, + which is more far-reaching than Unicode case mapping. + + s = "\u00DF" # => "ß" + s.downcase # => "ß" + s.downcase(:fold) # => "ss" + s.upcase # => "SS" + + s = "\uFB04" # => "ffl" + s.downcase # => "ffl" + s.upcase # => "FFL" + s.downcase(:fold) # => "ffl" diff --git a/doc/language/character_selectors.rdoc b/doc/language/character_selectors.rdoc new file mode 100644 index 0000000000..8bfc9b719b --- /dev/null +++ b/doc/language/character_selectors.rdoc @@ -0,0 +1,100 @@ += Character Selectors + +== Character Selector + +A _character_ _selector_ is a string argument accepted by certain Ruby methods. +Each of these instance methods accepts one or more character selectors: + +- String#tr(selector, replacements): returns a new string. +- String#tr!(selector, replacements): returns +self+ or +nil+. +- String#tr_s(selector, replacements): returns a new string. +- String#tr_s!(selector, replacements): returns +self+ or +nil+. +- String#count(*selectors): returns the count of the specified characters. +- String#delete(*selectors): returns a new string. +- String#delete!(*selectors): returns +self+ or +nil+. +- String#squeeze(*selectors): returns a new string. +- String#squeeze!(*selectors): returns +self+ or +nil+. +- String#strip(*selectors): returns a new string. +- String#strip!(*selectors): returns +self+ or +nil+. + +A character selector identifies zero or more characters in +self+ +that are to be operands for the method. + +In this section, we illustrate using method String#delete(selector), +which deletes the selected characters. + +In the simplest case, the characters selected are exactly those +contained in the selector itself: + + 'abracadabra'.delete('a') # => "brcdbr" + 'abracadabra'.delete('ab') # => "rcdr" + 'abracadabra'.delete('abc') # => "rdr" + '0123456789'.delete('258') # => "0134679" + '!@#$%&*()_+'.delete('+&#') # => "!@$%*()_" + 'こんにちは'.delete('に') # => "こんちは" + +Note that order and repetitions do not matter: + + 'abracadabra'.delete('dcab') # => "rr" + 'abracadabra'.delete('aaaa') # => "brcdbr" + +In a character selector, these three characters get special treatment: + +- A leading caret (<tt>'^'</tt>) functions as a "not" operator + for the characters to its right: + + 'abracadabra'.delete('^bc') # => "bcb" + '0123456789'.delete('^852') # => "258" + +- A hyphen (<tt>'-'</tt>) between two other characters + defines a range of characters instead of a plain string of characters: + + 'abracadabra'.delete('a-d') # => "rr" + '0123456789'.delete('4-7') # => "012389" + '!@#$%&*()_+'.delete(' -/') # => "@^_" + + # May contain more than one range. + 'abracadabra'.delete('a-cq-t') # => "d" + + # Ranges may be mixed with plain characters. + '0123456789'.delete('67-950-23') # => "4" + + # Ranges may be mixed with negations. + 'abracadabra'.delete('^a-c') # => "abacaaba" + +- A backslash (<tt>'\'</tt>) acts as an escape for a caret, a hyphen, + or another backslash: + + 'abracadabra^'.delete('\^bc') # => "araadara" + 'abracadabra-'.delete('a\-d') # => "brcbr" + "hello\r\nworld".delete("\r") # => "hello\nworld" + "hello\r\nworld".delete("\\r") # => "hello\r\nwold" + "hello\r\nworld".delete("\\\r") # => "hello\nworld" + +== Multiple Character Selectors + +These instance methods accept multiple character selectors: + +- String#count(*selectors): returns the count of the specified characters. +- String#delete(*selectors): returns a new string. +- String#delete!(*selectors): returns +self+ or +nil+. +- String#squeeze(*selectors): returns a new string. +- String#squeeze!(*selectors): returns +self+ or +nil+. +- String#strip(*selectors): returns a new string. +- String#strip!(*selectors): returns +self+ or +nil+. + +In effect, the given selectors are formed into a single selector +consisting of only those characters common to _all_ of the given selectors. + +All forms of selectors may be used, including negations, ranges, and escapes. + +Each of these pairs of method calls is equivalent: + + s.delete('abcde', 'dcbfg') + s.delete('bcd') + + s.delete('^abc', '^def') + s.delete('^abcdef') + + s.delete('a-e', 'c-g') + s.delete('cde') diff --git a/doc/language/dig_methods.rdoc b/doc/language/dig_methods.rdoc new file mode 100644 index 0000000000..366275d451 --- /dev/null +++ b/doc/language/dig_methods.rdoc @@ -0,0 +1,82 @@ += Dig Methods + +Ruby's +dig+ methods are useful for accessing nested data structures. + +Consider this data: + item = { + id: "0001", + type: "donut", + name: "Cake", + ppu: 0.55, + batters: { + batter: [ + {id: "1001", type: "Regular"}, + {id: "1002", type: "Chocolate"}, + {id: "1003", type: "Blueberry"}, + {id: "1004", type: "Devil's Food"} + ] + }, + topping: [ + {id: "5001", type: "None"}, + {id: "5002", type: "Glazed"}, + {id: "5005", type: "Sugar"}, + {id: "5007", type: "Powdered Sugar"}, + {id: "5006", type: "Chocolate with Sprinkles"}, + {id: "5003", type: "Chocolate"}, + {id: "5004", type: "Maple"} + ] + } + +Without a +dig+ method, you can write: + item[:batters][:batter][1][:type] # => "Chocolate" + +With a +dig+ method, you can write: + item.dig(:batters, :batter, 1, :type) # => "Chocolate" + +Without a +dig+ method, you can write, erroneously +(raises <tt>NoMethodError (undefined method `[]' for nil:NilClass)</tt>): + item[:batters][:BATTER][1][:type] + +With a +dig+ method, you can write (still erroneously, but avoiding the exception): + item.dig(:batters, :BATTER, 1, :type) # => nil + +== Why Is +dig+ Better? + +- It has fewer syntactical elements (to get wrong). +- It reads better. +- It does not raise an exception if an item is not found. + +== How Does +dig+ Work? + +The call sequence is: + obj.dig(*identifiers) + +The +identifiers+ define a "path" into the nested data structures: +- For each identifier in +identifiers+, calls method \#dig on a receiver + with that identifier. +- The first receiver is +self+. +- Each successive receiver is the value returned by the previous call to +dig+. +- The value finally returned is the value returned by the last call to +dig+. + +A +dig+ method raises an exception if any receiver does not respond to \#dig: + h = { foo: 1 } + # Raises TypeError (Integer does not have #dig method): + h.dig(:foo, :bar) + +== What Else? + +The structure above has \Hash objects and \Array objects, +both of which have instance method +dig+. + +Altogether there are six built-in Ruby classes that have method +dig+, +three in the core classes and three in the standard library. + +In the core: +- Array#dig: the first argument is an \Integer index. +- Hash#dig: the first argument is a key. +- Struct#dig: the first argument is a key. + +In the standard library: +- OpenStruct#dig: the first argument is a \String name. +- CSV::Table#dig: the first argument is an \Integer index or a \String header. +- CSV::Row#dig: the first argument is an \Integer index or a \String header. diff --git a/doc/language/encodings.rdoc b/doc/language/encodings.rdoc new file mode 100644 index 0000000000..683842d3fb --- /dev/null +++ b/doc/language/encodings.rdoc @@ -0,0 +1,482 @@ += Encodings + +== The Basics + +A {character encoding}[https://en.wikipedia.org/wiki/Character_encoding], +often shortened to _encoding_, is a mapping between: + +- A sequence of 8-bit bytes (each byte in the range <tt>0..255</tt>). +- Characters in a specific character set. + +Some character sets contain only 1-byte characters; +{US-ASCII}[https://en.wikipedia.org/wiki/ASCII], for example, has 256 1-byte characters. +This string, encoded in US-ASCII, has six characters that are stored as six bytes: + + s = 'Hello!'.encode(Encoding::US_ASCII) # => "Hello!" + s.encoding # => #<Encoding:US-ASCII> + s.bytes # => [72, 101, 108, 108, 111, 33] + +Other encodings may involve multi-byte characters. +{UTF-8}[https://en.wikipedia.org/wiki/UTF-8], for example, +encodes more than one million characters, encoding each in one to four bytes. +The lowest-valued of these characters correspond to ASCII characters, +and so are 1-byte characters: + + s = 'Hello!' # => "Hello!" + s.bytes # => [72, 101, 108, 108, 111, 33] + +Other characters, such as the Euro symbol, are multi-byte: + + s = "\u20ac" # => "€" + s.bytes # => [226, 130, 172] + +== The \Encoding Class + +=== \Encoding Objects + +Ruby encodings are defined by constants in class \Encoding. +There can be only one instance of \Encoding for each of these constants. +Method Encoding.list returns an array of \Encoding objects (one for each constant): + + Encoding.list.size # => 103 + Encoding.list.first.class # => Encoding + Encoding.list.take(3) + # => [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>, #<Encoding:US-ASCII>] + +=== Names and Aliases + +Method Encoding#name returns the name of an \Encoding: + + Encoding::ASCII_8BIT.name # => "ASCII-8BIT" + Encoding::WINDOWS_31J.name # => "Windows-31J" + +An \Encoding object has zero or more aliases; +method Encoding#names returns an array containing the name and all aliases: + + Encoding::ASCII_8BIT.names + # => ["ASCII-8BIT", "BINARY"] + Encoding::WINDOWS_31J.names + #=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"] + +Method Encoding.aliases returns a hash of all alias/name pairs: + + Encoding.aliases.size # => 71 + Encoding.aliases.take(3) + # => [["BINARY", "ASCII-8BIT"], ["CP437", "IBM437"], ["CP720", "IBM720"]] + +Method Encoding.name_list returns an array of all the encoding names and aliases: + + Encoding.name_list.size # => 175 + Encoding.name_list.take(3) + # => ["ASCII-8BIT", "UTF-8", "US-ASCII"] + +Method +name_list+ returns more entries than method +list+ +because it includes both the names and their aliases. + +Method Encoding.find returns the \Encoding for a given name or alias, if it exists: + + Encoding.find("US-ASCII") # => #<Encoding:US-ASCII> + Encoding.find("US-ASCII").class # => Encoding + +=== Default Encodings + +Method Encoding.find, above, also returns a default \Encoding +for each of these special names: + +- +external+: the default external \Encoding: + + Encoding.find("external") # => #<Encoding:UTF-8> + +- +internal+: the default internal \Encoding (may be +nil+): + + Encoding.find("internal") # => nil + +- +locale+: the default \Encoding for a string from the environment: + + Encoding.find("locale") # => #<Encoding:UTF-8> # Linux + Encoding.find("locale") # => #<Encoding:IBM437> # Windows + +- +filesystem+: the default \Encoding for a string from the filesystem: + + Encoding.find("filesystem") # => #<Encoding:UTF-8> + +Method Encoding.default_external returns the default external \Encoding: + + Encoding.default_external # => #<Encoding:UTF-8> + +Method Encoding.default_external= sets that value: + + Encoding.default_external = Encoding::US_ASCII # => #<Encoding:US-ASCII> + Encoding.default_external # => #<Encoding:US-ASCII> + +Method Encoding.default_internal returns the default internal \Encoding: + + Encoding.default_internal # => nil + +Method Encoding.default_internal= sets the default internal \Encoding: + + Encoding.default_internal = Encoding::US_ASCII # => #<Encoding:US-ASCII> + Encoding.default_internal # => #<Encoding:US-ASCII> + +=== Compatible Encodings + +Method Encoding.compatible? returns whether two given objects are encoding-compatible +(that is, whether they can be concatenated); +returns the \Encoding of the concatenated string, or +nil+ if incompatible: + + rus = "\u{442 435 441 442}" + eng = 'text' + Encoding.compatible?(rus, eng) # => #<Encoding:UTF-8> + + s0 = "\xa1\xa1".force_encoding(Encoding::ISO_8859_1) # => "\xA1\xA1" + s1 = "\xa1\xa1".force_encoding(Encoding::EUCJP) # => "\x{A1A1}" + Encoding.compatible?(s0, s1) # => nil + +== \String \Encoding + +A Ruby String object has an encoding that is an instance of class \Encoding. +The encoding may be retrieved by method String#encoding. + +The default encoding for a string literal is the script encoding; +see {Script Encoding}[rdoc-ref:@Script+Encoding]. + + 's'.encoding # => #<Encoding:UTF-8> + +The default encoding for a string created with method String.new is: + +- For no argument, ASCII-8BIT. +- For a \String object argument, the encoding of that string. +- For a string literal, the script encoding; + see {Script Encoding}[rdoc-ref:@Script+Encoding]. + +In either case, any encoding may be specified: + + s = String.new(encoding: Encoding::UTF_8) # => "" + s.encoding # => #<Encoding:UTF-8> + s = String.new('foo', encoding: Encoding::BINARY) # => "foo" + s.encoding # => #<Encoding:BINARY (ASCII-8BIT)> + +The encoding for a string may be changed: + + s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé" + s.encoding # => #<Encoding:UTF-8> + s.force_encoding(Encoding::ISO_8859_1) # => "R\xC3\xA9sum\xC3\xA9" + s.encoding # => #<Encoding:ISO-8859-1> + +Changing the assigned encoding does not alter the content of the string; +it changes only the way the content is to be interpreted: + + s # => "R\xC3\xA9sum\xC3\xA9" + s.force_encoding(Encoding::UTF_8) # => "Résumé" + +The actual content of a string may also be altered; +see {Transcoding a String}[#label-Transcoding+a+String]. + +Here are a couple of useful query methods: + + s = "abc".force_encoding(Encoding::UTF_8) # => "abc" + s.ascii_only? # => true + s = "abc\u{6666}".force_encoding(Encoding::UTF_8) # => "abc晦" + s.ascii_only? # => false + + s = "\xc2\xa1".force_encoding(Encoding::UTF_8) # => "¡" + s.valid_encoding? # => true + s = "\xc2".force_encoding(Encoding::UTF_8) # => "\xC2" + s.valid_encoding? # => false + +== \Symbol and \Regexp Encodings + +The string stored in a Symbol or Regexp object also has an encoding; +the encoding may be retrieved by method Symbol#encoding or Regexp#encoding. + +The default encoding for these, however, is: + +- US-ASCII, if all characters are US-ASCII. +- The script encoding, otherwise; + see (Script Encoding)[rdoc-ref:@Script+Encoding]. + +== Filesystem \Encoding + +The filesystem encoding is the default \Encoding for a string from the filesystem: + + Encoding.find("filesystem") # => #<Encoding:UTF-8> + +== Locale \Encoding + +The locale encoding is the default encoding for a string from the environment, +other than from the filesystem: + + Encoding.find('locale') # => #<Encoding:IBM437> + +== Stream Encodings + +Certain stream objects can have two encodings; these objects include instances of: + +- IO. +- File. +- ARGF. +- StringIO. + +The two encodings are: + +- An _external_ _encoding_, which identifies the encoding of the stream. +- An _internal_ _encoding_, which (if not +nil+) specifies the encoding + to be used for the string constructed from the stream. + +=== External \Encoding + +The external encoding, which is an \Encoding object, specifies how bytes read +from the stream are to be interpreted as characters. + +The default external encoding is: + +- UTF-8 for a text stream. +- ASCII-8BIT for a binary stream. + +The default external encoding is returned by method Encoding.default_external, +and may be set by: + +- Ruby command-line options <tt>--external_encoding</tt> or <tt>-E</tt>. + +You can also set the default external encoding using method Encoding.default_external=, +but doing so may cause problems; strings created before and after the change +may have a different encodings. + +For an \IO or \File object, the external encoding may be set by: + +- Open options +external_encoding+ or +encoding+, when the object is created; + see {Open Options}[rdoc-ref:IO@Open+Options]. + +For an \IO, \File, \ARGF, or \StringIO object, the external encoding may be set by: + +- Methods +set_encoding+ or (except for \ARGF) +set_encoding_by_bom+. + +=== Internal \Encoding + +The internal encoding, which is an \Encoding object or +nil+, +specifies how characters read from the stream +are to be converted to characters in the internal encoding; +those characters become a string whose encoding is set to the internal encoding. + +The default internal encoding is +nil+ (no conversion). +It is returned by method Encoding.default_internal, +and may be set by: + +- Ruby command-line options <tt>--internal_encoding</tt> or <tt>-E</tt>. + +You can also set the default internal encoding using method Encoding.default_internal=, +but doing so may cause problems; strings created before and after the change +may have a different encodings. + +For an \IO or \File object, the internal encoding may be set by: + +- Open options +internal_encoding+ or +encoding+, when the object is created; + see {Open Options}[rdoc-ref:IO@Open+Options]. + +For an \IO, \File, \ARGF, or \StringIO object, the internal encoding may be set by: + +- Method +set_encoding+. + +== Script \Encoding + +A Ruby script has a script encoding, which may be retrieved by: + + __ENCODING__ # => #<Encoding:UTF-8> + +The default script encoding is UTF-8; +a Ruby source file may set its script encoding with a magic comment +on the first line of the file (or second line, if there is a shebang on the first). +The comment must contain the word +coding+ or +encoding+, +followed by a colon, space and the Encoding name or alias: + + # encoding: ISO-8859-1 + __ENCODING__ #=> #<Encoding:ISO-8859-1> + +== Transcoding + +_Transcoding_ is the process of changing a sequence of characters +from one encoding to another. + +As far as possible, the characters remain the same, +but the bytes that represent them may change. + +The handling for characters that cannot be represented in the destination encoding +may be specified by @Encoding+Options. + +=== Transcoding a \String + +Each of these methods transcodes a string: + +- String#encode: Transcodes +self+ into a new string + according to given encodings and options. +- String#encode!: Like String#encode, but transcodes +self+ in place. +- String#scrub: Transcodes +self+ into a new string + by replacing invalid byte sequences with a given or default replacement string. +- String#scrub!: Like String#scrub, but transcodes +self+ in place. +- String#unicode_normalize: Transcodes +self+ into a new string + according to Unicode normalization. +- String#unicode_normalize!: Like String#unicode_normalize, + but transcodes +self+ in place. + +== Transcoding a Stream + +Each of these methods may transcode a stream; +whether it does so depends on the external and internal encodings: + +- IO.foreach: Yields each line of given stream to the block. +- IO.new: Creates and returns a new \IO object for the given integer file descriptor. +- IO.open: Creates a new \IO object. +- IO.pipe: Creates a connected pair of reader and writer \IO objects. +- IO.popen: Creates an \IO object to interact with a subprocess. +- IO.read: Returns a string with all or a subset of bytes from the given stream. +- IO.readlines: Returns an array of strings, which are the lines from the given stream. +- IO.write: Writes a given string to the given stream. + +This example writes a string to a file, encoding it as ISO-8859-1, +then reads the file into a new string, encoding it as UTF-8: + + s = "R\u00E9sum\u00E9" + path = 't.tmp' + ext_enc = Encoding::ISO_8859_1 + int_enc = Encoding::UTF_8 + + File.write(path, s, external_encoding: ext_enc) + raw_text = File.binread(path) + + transcoded_text = File.read(path, external_encoding: ext_enc, internal_encoding: int_enc) + + p raw_text + p transcoded_text + +Output: + + "R\xE9sum\xE9" + "Résumé" + +== \Encoding Options + +A number of methods in the Ruby core accept keyword arguments as encoding options. + +Some of the options specify or utilize a _replacement_ _string_, to be used +in certain transcoding operations. +A replacement string may be in any encoding that can be converted +to the encoding of the destination string. + +These keyword-value pairs specify encoding options: + +- For an invalid byte sequence: + + - <tt>:invalid: nil</tt> (default): Raise exception. + - <tt>:invalid: :replace</tt>: Replace each invalid byte sequence + with the replacement string. + + Examples: + + s = "\x80foo\x80" + s.encode(Encoding::ISO_8859_3) # Raises Encoding::InvalidByteSequenceError. + s.encode(Encoding::ISO_8859_3, invalid: :replace) # => "?foo?" + +- For an undefined character: + + - <tt>:undef: nil</tt> (default): Raise exception. + - <tt>:undef: :replace</tt>: Replace each undefined character + with the replacement string. + + Examples: + + s = "\x80foo\x80" + "\x80".encode(Encoding::UTF_8, Encoding::BINARY) # Raises Encoding::UndefinedConversionError. + s.encode(Encoding::UTF_8, Encoding::BINARY, undef: :replace) # => "�foo�" + + +- Replacement string: + + - <tt>:replace: nil</tt> (default): Set replacement string to default value: + <tt>"\uFFFD"</tt> ("�") for a Unicode encoding, <tt>'?'</tt> otherwise. + - <tt>:replace: some_string</tt>: Set replacement string to the given +some_string+; + overrides +:fallback+. + + Examples: + + s = "\xA5foo\xA5" + options = {:undef => :replace, :replace => 'xyzzy'} + s.encode(Encoding::UTF_8, Encoding::ISO_8859_3, **options) # => "xyzzyfooxyzzy" + +- Replacement fallback: + + One of these may be specified: + + - <tt>:fallback: nil</tt> (default): No replacement fallback. + - <tt>:fallback: hash_like_object</tt>: Set replacement fallback to the given + +hash_like_object+; the replacement string is <tt>hash_like_object[X]</tt>. + - <tt>:fallback: method</tt>: Set replacement fallback to the given + +method+; the replacement string is <tt>method(X)</tt>. + - <tt>:fallback: proc</tt>: Set replacement fallback to the given + +proc+; the replacement string is <tt>proc[X]</tt>. + + Examples: + + s = "\u3042foo\u3043" + + hash = {"\u3042" => 'xyzzy'} + hash.default = 'XYZZY' + s.encode(Encoding::US_ASCII, fallback: hash) # => "xyzzyfooXYZZY" + + def (fallback = "U+%.4X").escape(x) + self % x.unpack("U") + end + "\u{3042}".encode(Encoding::US_ASCII, fallback: fallback.method(:escape)) # => "U+3042" + + proc = Proc.new {|x| x == "\u3042" ? 'xyzzy' : 'XYZZY' } + s.encode('ASCII', fallback: proc) # => "XYZZYfooXYZZY" + +- XML entities: + + One of these may be specified: + + - <tt>:xml: nil</tt> (default): No handling for XML entities. + - <tt>:xml: :text</tt>: Treat source text as XML; + replace each undefined character + with its upper-case hexadecimal numeric character reference, + except that: + + - <tt>&</tt> is replaced with <tt>&</tt>. + - <tt><</tt> is replaced with <tt><</tt>. + - <tt>></tt> is replaced with <tt>></tt>. + + - <tt>:xml: :attr</tt>: Treat source text as XML attribute value; + replace each undefined character + with its upper-case hexadecimal numeric character reference, + except that: + + - The replacement string <tt>r</tt> is double-quoted (<tt>"r"</tt>). + - Each embedded double-quote is replaced with <tt>"</tt>. + - <tt>&</tt> is replaced with <tt>&</tt>. + - <tt><</tt> is replaced with <tt><</tt>. + - <tt>></tt> is replaced with <tt>></tt>. + + Examples: + + s = 'foo"<&>"bar' + "\u3042" + s.encode(Encoding::US_ASCII, xml: :text) # => "foo\"<&>\"barあ" + s.encode(Encoding::US_ASCII, xml: :attr) # => "\"foo"<&>"barあ\"" + + +- Newlines: + + One of these may be specified: + + - <tt>:cr_newline: true</tt>: Replace each line-feed character (<tt>"\n"</tt>) + with a carriage-return character (<tt>"\r"</tt>). + - <tt>:crlf_newline: true</tt>: Replace each line-feed character (<tt>"\n"</tt>) + with a carriage-return/line-feed string (<tt>"\r\n"</tt>). + - <tt>:universal_newline: true</tt>: Replace each carriage-return + character (<tt>"\r"</tt>) and each carriage-return/line-feed string + (<tt>"\r\n"</tt>) with a line-feed character (<tt>"\n"</tt>). + + Examples: + + s = "\n \r \r\n" # => "\n \r \r\n" + s.encode(Encoding::US_ASCII, cr_newline: true) # => "\r \r \r\r" + s.encode(Encoding::US_ASCII, crlf_newline: true) # => "\r\n \r \r\r\n" + s.encode(Encoding::US_ASCII, universal_newline: true) # => "\n \n \n" diff --git a/doc/language/exceptions.md b/doc/language/exceptions.md new file mode 100644 index 0000000000..5f8f0ece69 --- /dev/null +++ b/doc/language/exceptions.md @@ -0,0 +1,521 @@ +# Exceptions + +Ruby code can raise exceptions. + +Most often, a raised exception is meant to alert the running program +that an unusual (i.e., _exceptional_) situation has arisen, +and may need to be handled. + +Code throughout the Ruby core, Ruby standard library, and Ruby gems generates exceptions +in certain circumstances: + +```rb +File.open('nope.txt') # Raises Errno::ENOENT: "No such file or directory" +``` + +## Raised Exceptions + +A raised exception transfers program execution, one way or another. + +### Unrescued Exceptions + +If an exception not _rescued_ +(see [Rescued Exceptions](#label-Rescued+Exceptions) below), +execution transfers to code in the Ruby interpreter +that prints a message and exits the program (or thread): + +```console +$ ruby -e "raise" +-e:1:in '<main>': unhandled exception +``` + +### Rescued Exceptions + +An <i>exception handler</i> may determine what is to happen +when an exception is raised; +the handler may _rescue_ an exception, +and may prevent the program from exiting. + +A simple example: + +```rb +begin + raise 'Boom!' # Raises an exception, transfers control. + puts 'Will not get here.' +rescue + puts 'Rescued an exception.' # Control transferred to here; program does not exit. +end +puts 'Got here.' +``` + +Output: + +``` +Rescued an exception. +Got here. +``` + +An exception handler has several elements: + +| Element | Use | +|-----------------------------|------------------------------------------------------------------------------------------| +| Begin clause. | Begins the handler and contains the code whose raised exception, if any, may be rescued. | +| One or more rescue clauses. | Each contains "rescuing" code, which is to be executed for certain exceptions. | +| Else clause (optional). | Contains code to be executed if no exception is raised. | +| Ensure clause (optional). | Contains code to be executed whether or not an exception is raised, or is rescued. | +| <tt>end</tt> statement. | Ends the handler. ` | + +#### Begin Clause + +The begin clause begins the exception handler: + +- May start with a `begin` statement; + see also [Begin-Less Exception Handlers](#label-Begin-Less+Exception+Handlers). +- Contains code whose raised exception (if any) is covered + by the handler. +- Ends with the first following `rescue` statement. + +#### Rescue Clauses + +A rescue clause: + +- Starts with a `rescue` statement. +- Contains code that is to be executed for certain raised exceptions. +- Ends with the first following `rescue`, + `else`, `ensure`, or `end` statement. + +##### Rescued Exceptions + +A `rescue` statement may include one or more classes +that are to be rescued; +if none is given, StandardError is assumed. + +The rescue clause rescues both the specified class +(or StandardError if none given) or any of its subclasses; +see [Built-In Exception Class Hierarchy](rdoc-ref:Exception@Built-In+Exception+Class+Hierarchy). + +```rb +begin + 1 / 0 # Raises ZeroDivisionError, a subclass of StandardError. +rescue + puts "Rescued #{$!.class}" +end +``` + +Output: + +``` +Rescued ZeroDivisionError +``` + +If the `rescue` statement specifies an exception class, +only that class (or one of its subclasses) is rescued; +this example exits with a ZeroDivisionError, +which was not rescued because it is not ArgumentError or one of its subclasses: + +```rb +begin + 1 / 0 +rescue ArgumentError + puts "Rescued #{$!.class}" +end +``` + +A `rescue` statement may specify multiple classes, +which means that its code rescues an exception +of any of the given classes (or their subclasses): + +```rb +begin + 1 / 0 +rescue FloatDomainError, ZeroDivisionError + puts "Rescued #{$!.class}" +end +``` + +##### Multiple Rescue Clauses + +An exception handler may contain multiple rescue clauses; +in that case, the first clause that rescues the exception does so, +and those before and after are ignored: + +```rb +begin + Dir.open('nosuch') +rescue Errno::ENOTDIR + puts "Rescued #{$!.class}" +rescue Errno::ENOENT + puts "Rescued #{$!.class}" +end +``` + +Output: + +``` +Rescued Errno::ENOENT +``` + +##### Capturing the Rescued \Exception + +A `rescue` statement may specify a variable +whose value becomes the rescued exception +(an instance of Exception or one of its subclasses: + +```rb +begin + 1 / 0 +rescue => x + puts x.class + puts x.message +end +``` + +Output: + +``` +ZeroDivisionError +divided by 0 +``` + +##### Global Variables + +Two read-only global variables always have `nil` value +except in a rescue clause; +they're: + +- `$!`: contains the rescued exception. +- `$@`: contains its backtrace. + +Example: + +```rb +begin + 1 / 0 +rescue + p $! + p $@ +end +``` + +Output: + +``` +#<ZeroDivisionError: divided by 0> +["t.rb:2:in 'Integer#/'", "t.rb:2:in '<main>'"] +``` + +##### Cause + +In a rescue clause, the method Exception#cause returns the previous value of `$!`, +which may be `nil`; +elsewhere, the method returns `nil`. + +Example: + +```rb +begin + raise('Boom 0') +rescue => x0 + puts "Exception: #{x0.inspect}; $!: #{$!.inspect}; cause: #{x0.cause.inspect}." + begin + raise('Boom 1') + rescue => x1 + puts "Exception: #{x1.inspect}; $!: #{$!.inspect}; cause: #{x1.cause.inspect}." + begin + raise('Boom 2') + rescue => x2 + puts "Exception: #{x2.inspect}; $!: #{$!.inspect}; cause: #{x2.cause.inspect}." + end + end +end +``` + +Output: + +``` +Exception: #<RuntimeError: Boom 0>; $!: #<RuntimeError: Boom 0>; cause: nil. +Exception: #<RuntimeError: Boom 1>; $!: #<RuntimeError: Boom 1>; cause: #<RuntimeError: Boom 0>. +Exception: #<RuntimeError: Boom 2>; $!: #<RuntimeError: Boom 2>; cause: #<RuntimeError: Boom 1>. +``` + +#### Else Clause + +The `else` clause: + +- Starts with an `else` statement. +- Contains code that is to be executed if no exception is raised in the begin clause. +- Ends with the first following `ensure` or `end` statement. + +```rb +begin + puts 'Begin.' +rescue + puts 'Rescued an exception!' +else + puts 'No exception raised.' +end +``` + +Output: + +``` +Begin. +No exception raised. +``` + +#### Ensure Clause + +The ensure clause: + +- Starts with an `ensure` statement. +- Contains code that is to be executed + regardless of whether an exception is raised, + and regardless of whether a raised exception is handled. +- Ends with the first following `end` statement. + +```rb +def foo(boom: false) + puts 'Begin.' + raise 'Boom!' if boom +rescue + puts 'Rescued an exception!' +else + puts 'No exception raised.' +ensure + puts 'Always do this.' +end + +foo(boom: true) +foo(boom: false) +``` + +Output: + +``` +Begin. +Rescued an exception! +Always do this. +Begin. +No exception raised. +Always do this. +``` + +#### End Statement + +The `end` statement ends the handler. + +Code following it is reached only if any raised exception is rescued. + +#### Begin-Less \Exception Handlers + +As seen above, an exception handler may be implemented with `begin` and `end`. + +An exception handler may also be implemented as: + +- A method body: + + ```rb + def foo(boom: false) # Serves as beginning of exception handler. + puts 'Begin.' + raise 'Boom!' if boom + rescue + puts 'Rescued an exception!' + else + puts 'No exception raised.' + end # Serves as end of exception handler. + ``` + +- A block: + + ```rb + Dir.chdir('.') do |dir| # Serves as beginning of exception handler. + raise 'Boom!' + rescue + puts 'Rescued an exception!' + end # Serves as end of exception handler. + ``` + +#### Re-Raising an \Exception + +It can be useful to rescue an exception, but allow its eventual effect; +for example, a program can rescue an exception, log data about it, +and then "reinstate" the exception. + +This may be done via the `raise` method, but in a special way; +a rescuing clause: + + - Captures an exception. + - Does whatever is needed concerning the exception (such as logging it). + - Calls method `raise` with no argument, + which raises the rescued exception: + +```rb +begin + 1 / 0 +rescue ZeroDivisionError + # Do needful things (like logging). + raise # Raised exception will be ZeroDivisionError, not RuntimeError. +end +``` + +Output: + +``` +ruby t.rb +t.rb:2:in 'Integer#/': divided by 0 (ZeroDivisionError) + from t.rb:2:in '<main>' +``` + +#### Retrying + +It can be useful to retry a begin clause; +for example, if it must access a possibly-volatile resource +(such as a web page), +it can be useful to try the access more than once +(in the hope that it may become available): + +```rb +retries = 0 +begin + puts "Try ##{retries}." + raise 'Boom' +rescue + puts "Rescued retry ##{retries}." + if (retries += 1) < 3 + puts 'Retrying' + retry + else + puts 'Giving up.' + raise + end +end +``` + +``` +Try #0. +Rescued retry #0. +Retrying +Try #1. +Rescued retry #1. +Retrying +Try #2. +Rescued retry #2. +Giving up. +# RuntimeError ('Boom') raised. +``` + +Note that the retry re-executes the entire begin clause, +not just the part after the point of failure. + +## Raising an \Exception + +Method Kernel#raise raises an exception. + +## Custom Exceptions + +To provide additional or alternate information, +you may create custom exception classes. +Each should be a subclass of one of the built-in exception classes +(commonly StandardError or RuntimeError); +see [Built-In Exception Class Hierarchy](rdoc-ref:Exception@Built-In+Exception+Class+Hierarchy). + +```rb +class MyException < StandardError; end +``` + +## Messages + +Every `Exception` object has a message, +which is a string that is set at the time the object is created; +see Exception.new. + +The message cannot be changed, but you can create a similar object with a different message; +see Exception#exception. + +This method returns the message as defined: + +- Exception#message. + +Two other methods return enhanced versions of the message: + +- Exception#detailed_message: adds exception class name, with optional highlighting. +- Exception#full_message: adds exception class name and backtrace, with optional highlighting. + +Each of the two methods above accepts keyword argument `highlight`; +if the value of keyword `highlight` is `true`, +the returned string includes bolding and underlining ANSI codes (see below) +to enhance the appearance of the message. + +Any exception class (Ruby or custom) may choose to override either of these methods, +and may choose to interpret keyword argument <tt>highlight: true</tt> +to mean that the returned message should contain +[ANSI codes](https://en.wikipedia.org/wiki/ANSI_escape_code) +that specify color, bolding, and underlining. + +Because the enhanced message may be written to a non-terminal device +(e.g., into an HTML page), +it is best to limit the ANSI codes to these widely-supported codes: + +- Begin font color: + + | Color | ANSI Code | + |---------|------------------| + | Red | <tt>\\e[31m</tt> | + | Green | <tt>\\e[32m</tt> | + | Yellow | <tt>\\e[33m</tt> | + | Blue | <tt>\\e[34m</tt> | + | Magenta | <tt>\\e[35m</tt> | + | Cyan | <tt>\\e[36m</tt> | + +<br> + +- Begin font attribute: + + | Attribute | ANSI Code | + |-----------|-----------------| + | Bold | <tt>\\e[1m</tt> | + | Underline | <tt>\\e[4m</tt> | + +<br> + +- End all of the above: + + | Color | ANSI Code | + |-------|-----------------| + | Reset | <tt>\\e[0m</tt> | + +It's also best to craft a message that is conveniently human-readable, +even if the ANSI codes are included "as-is" +(rather than interpreted as font directives). + +## Backtraces + +A _backtrace_ is a record of the methods currently +in the [call stack](https://en.wikipedia.org/wiki/Call_stack); +each such method has been called, but has not yet returned. + +These methods return backtrace information: + +- Exception#backtrace: returns the backtrace as an array of strings or `nil`. +- Exception#backtrace_locations: returns the backtrace as an array + of Thread::Backtrace::Location objects or `nil`. + Each Thread::Backtrace::Location object gives detailed information about a called method. + +By default, Ruby sets the backtrace of the exception to the location where it +was raised. + +The developer might adjust this by either providing `backtrace` argument +to Kernel#raise, or using Exception#set_backtrace. + +Note that: + +- by default, both `backtrace` and `backtrace_locations` represent the same backtrace; +- if the developer sets the backtrace by one of the above methods to an array of + Thread::Backtrace::Location, they still represent the same backtrace; +- if the developer sets the backtrace to a string or an array of strings: + - by Kernel#raise: `backtrace_locations` become `nil`; + - by Exception#set_backtrace: `backtrace_locations` preserve the original + value; +- if the developer sets the backtrace to `nil` by Exception#set_backtrace, + `backtrace_locations` preserve the original value; but if the exception is then + reraised, both `backtrace` and `backtrace_locations` become the location of reraise. diff --git a/doc/language/fiber.md b/doc/language/fiber.md new file mode 100644 index 0000000000..d9011cce2f --- /dev/null +++ b/doc/language/fiber.md @@ -0,0 +1,290 @@ +# Fiber + +Fibers provide a mechanism for cooperative concurrency. + +## Context Switching + +Fibers execute a user-provided block. During the execution, the block may call `Fiber.yield` or `Fiber.transfer` to switch to another fiber. `Fiber#resume` is used to continue execution from the point where `Fiber.yield` was called. + +```rb +#!/usr/bin/env ruby + +puts "1: Start program." + +f = Fiber.new do + puts "3: Entered fiber." + Fiber.yield + puts "5: Resumed fiber." +end + +puts "2: Resume fiber first time." +f.resume + +puts "4: Resume fiber second time." +f.resume + +puts "6: Finished." +``` + +This program demonstrates the flow control of fibers. + +## Scheduler + +The scheduler interface is used to intercept blocking operations. A typical +implementation would be a wrapper for a gem like `EventMachine` or `Async`. This +design provides separation of concerns between the event loop implementation +and application code. It also allows for layered schedulers which can perform +instrumentation. + +To set the scheduler for the current thread: + +```rb +Fiber.set_scheduler(MyScheduler.new) +``` + +When the thread exits, there is an implicit call to `set_scheduler`: + +```rb +Fiber.set_scheduler(nil) +``` + +### Design + +The scheduler interface is designed to be a un-opinionated light-weight layer +between user code and blocking operations. The scheduler hooks should avoid +translating or converting arguments or return values. Ideally, the exact same +arguments from the user code are provided directly to the scheduler hook with +no changes. + +### Interface + +This is the interface you need to implement. + +```rb +class Scheduler + # Wait for the specified process ID to exit. + # This hook is optional. + # @parameter pid [Integer] The process ID to wait for. + # @parameter flags [Integer] A bit-mask of flags suitable for `Process::Status.wait`. + # @returns [Process::Status] A process status instance. + def process_wait(pid, flags) + Thread.new do + Process::Status.wait(pid, flags) + end.value + end + + # Wait for the given io readiness to match the specified events within + # the specified timeout. + # @parameter event [Integer] A bit mask of `IO::READABLE`, + # `IO::WRITABLE` and `IO::PRIORITY`. + # @parameter timeout [Numeric] The amount of time to wait for the event in seconds. + # @returns [Integer] The subset of events that are ready. + def io_wait(io, events, timeout) + end + + # Read from the given io into the specified buffer. + # WARNING: Experimental hook! Do not use in production code! + # @parameter io [IO] The io to read from. + # @parameter buffer [IO::Buffer] The buffer to read into. + # @parameter length [Integer] The minimum amount to read. + def io_read(io, buffer, length) + end + + # Write from the given buffer into the specified IO. + # WARNING: Experimental hook! Do not use in production code! + # @parameter io [IO] The io to write to. + # @parameter buffer [IO::Buffer] The buffer to write from. + # @parameter length [Integer] The minimum amount to write. + def io_write(io, buffer, length) + end + + # Sleep the current task for the specified duration, or forever if not + # specified. + # @parameter duration [Numeric] The amount of time to sleep in seconds. + def kernel_sleep(duration = nil) + end + + # Execute the given block. If the block execution exceeds the given timeout, + # the specified exception `klass` will be raised. Typically, only non-blocking + # methods which enter the scheduler will raise such exceptions. + # @parameter duration [Integer] The amount of time to wait, after which an exception will be raised. + # @parameter klass [Class] The exception class to raise. + # @parameter *arguments [Array] The arguments to send to the constructor of the exception. + # @yields {...} The user code to execute. + def timeout_after(duration, klass, *arguments, &block) + end + + # Resolve hostname to an array of IP addresses. + # This hook is optional. + # @parameter hostname [String] Example: "www.ruby-lang.org". + # @returns [Array] An array of IPv4 and/or IPv6 address strings that the hostname resolves to. + def address_resolve(hostname) + end + + # Block the calling fiber. + # @parameter blocker [Object] What we are waiting on, informational only. + # @parameter timeout [Numeric | Nil] The amount of time to wait for in seconds. + # @returns [Boolean] Whether the blocking operation was successful or not. + def block(blocker, timeout = nil) + end + + # Unblock the specified fiber. + # @parameter blocker [Object] What we are waiting on, informational only. + # @parameter fiber [Fiber] The fiber to unblock. + # @reentrant Thread safe. + def unblock(blocker, fiber) + end + + # Intercept the creation of a non-blocking fiber. + # @returns [Fiber] + def fiber(&block) + Fiber.new(blocking: false, &block) + end + + # Invoked when the thread exits. + def close + self.run + end + + def run + # Implement event loop here. + end +end +``` + +Additional hooks may be introduced in the future, we will use feature detection +in order to enable these hooks. + +### Non-blocking Execution + +The scheduler hooks will only be used in special non-blocking execution +contexts. Non-blocking execution contexts introduce non-determinism because the +execution of scheduler hooks may introduce context switching points into your +program. + +#### Fibers + +Fibers can be used to create non-blocking execution contexts. + +```rb +Fiber.new do + puts Fiber.current.blocking? # false + + # May invoke `Fiber.scheduler&.io_wait`. + io.read(...) + + # May invoke `Fiber.scheduler&.io_wait`. + io.write(...) + + # Will invoke `Fiber.scheduler&.kernel_sleep`. + sleep(n) +end.resume +``` + +We also introduce a new method which simplifies the creation of these +non-blocking fibers: + +```rb +Fiber.schedule do + puts Fiber.current.blocking? # false +end +``` + +The purpose of this method is to allow the scheduler to internally decide the +policy for when to start the fiber, and whether to use symmetric or asymmetric +fibers. + +You can also create blocking execution contexts: + +```rb +Fiber.new(blocking: true) do + # Won't use the scheduler: + sleep(n) +end +``` + +However you should generally avoid this unless you are implementing a scheduler. + +#### IO + +By default, I/O is non-blocking. Not all operating systems support non-blocking +I/O. Windows is a notable example where socket I/O can be non-blocking but pipe +I/O is blocking. Provided that there *is* a scheduler and the current thread *is +non-blocking*, the operation will invoke the scheduler. + +##### `IO#close` + +Closing an IO interrupts all blocking operations on that IO. When a thread calls `IO#close`, it first attempts to interrupt any threads or fibers that are blocked on that IO. The closing thread waits until all blocked threads and fibers have been properly interrupted and removed from the IO's blocking list. Each interrupted thread or fiber receives an `IOError` and is cleanly removed from the blocking operation. Only after all blocking operations have been interrupted and cleaned up will the actual file descriptor be closed, ensuring proper resource cleanup and preventing potential race conditions. + +For fibers managed by a scheduler, the interruption process involves calling `rb_fiber_scheduler_fiber_interrupt` on the scheduler. This allows the scheduler to handle the interruption in a way that's appropriate for its event loop implementation. The scheduler can then notify the fiber, which will receive an `IOError` and be removed from the blocking operation. This mechanism ensures that fiber-based concurrency works correctly with IO operations, even when those operations are interrupted by `IO#close`. + +```mermaid +sequenceDiagram + participant ThreadB + participant ThreadA + participant Scheduler + participant IO + participant Fiber1 + participant Fiber2 + + Note over ThreadA: Thread A has a fiber scheduler + activate Scheduler + ThreadA->>Fiber1: Schedule Fiber 1 + activate Fiber1 + Fiber1->>IO: IO.read + IO->>Scheduler: rb_thread_io_blocking_region + deactivate Fiber1 + + ThreadA->>Fiber2: Schedule Fiber 2 + activate Fiber2 + Fiber2->>IO: IO.read + IO->>Scheduler: rb_thread_io_blocking_region + deactivate Fiber2 + + Note over Fiber1,Fiber2: Both fibers blocked on same IO + + Note over ThreadB: IO.close + activate ThreadB + ThreadB->>IO: thread_io_close_notify_all + Note over ThreadB: rb_mutex_sleep + + IO->>Scheduler: rb_fiber_scheduler_fiber_interrupt(Fiber1) + Scheduler->>Fiber1: fiber_interrupt with IOError + activate Fiber1 + Note over IO: fiber_interrupt causes removal from blocking list + Fiber1->>IO: rb_io_blocking_operation_exit() + IO-->>ThreadB: Wakeup thread + deactivate Fiber1 + + IO->>Scheduler: rb_fiber_scheduler_fiber_interrupt(Fiber2) + Scheduler->>Fiber2: fiber_interrupt with IOError + activate Fiber2 + Note over IO: fiber_interrupt causes removal from blocking list + Fiber2->>IO: rb_io_blocking_operation_exit() + IO-->>ThreadB: Wakeup thread + deactivate Fiber2 + deactivate Scheduler + + Note over ThreadB: Blocking operations list empty + ThreadB->>IO: close(fd) + deactivate ThreadB +``` + +#### Mutex + +The `Mutex` class can be used in a non-blocking context and is fiber specific. + +#### ConditionVariable + +The `ConditionVariable` class can be used in a non-blocking context and is +fiber-specific. + +#### Queue / SizedQueue + +The `Queue` and `SizedQueue` classes can be used in a non-blocking context and +are fiber-specific. + +#### Thread + +The `Thread#join` operation can be used in a non-blocking context and is +fiber-specific. diff --git a/doc/language/format_specifications.rdoc b/doc/language/format_specifications.rdoc new file mode 100644 index 0000000000..763470aa02 --- /dev/null +++ b/doc/language/format_specifications.rdoc @@ -0,0 +1,354 @@ += Format Specifications + +Several Ruby core classes have instance method +printf+ or +sprintf+: + +- ARGF#printf +- IO#printf +- Kernel#printf +- Kernel#sprintf + +Each of these methods takes: + +- Argument +format_string+, which has zero or more + embedded _format_ _specifications_ (see below). +- Arguments <tt>*arguments</tt>, which are zero or more objects to be formatted. + +Each of these methods prints or returns the string +resulting from replacing each +format specification embedded in +format_string+ with a string form +of the corresponding argument among +arguments+. + +A simple example: + + sprintf('Name: %s; value: %d', 'Foo', 0) # => "Name: Foo; value: 0" + +A format specification has the form: + + %[flags][width][.precision]type + +It consists of: + +- A leading percent character. +- Zero or more _flags_ (each is a character). +- An optional _width_ _specifier_ (an integer, or <tt>*</tt>). +- An optional _precision_ _specifier_ (a period followed by a non-negative + integer, or <tt>*</tt>). +- A _type_ _specifier_ (a character). + +Except for the leading percent character, +the only required part is the type specifier, so we begin with that. + +== Type Specifiers + +This section provides a brief explanation of each type specifier. +The links lead to the details and examples. + +=== \Integer Type Specifiers + +- +b+ or +B+: Format +argument+ as a binary integer. + See {Specifiers b and B}[rdoc-ref:@Specifiers+b+and+B]. +- +d+, +i+, or +u+ (all are identical): + Format +argument+ as a decimal integer. + See {Specifier d}[rdoc-ref:@Specifier+d]. +- +o+: Format +argument+ as an octal integer. + See {Specifier o}[rdoc-ref:@Specifier+o]. +- +x+ or +X+: Format +argument+ as a hexadecimal integer. + See {Specifiers x and X}[rdoc-ref:@Specifiers+x+and+X]. + +=== Floating-Point Type Specifiers + +- +a+ or +A+: Format +argument+ as hexadecimal floating-point number. + See {Specifiers a and A}[rdoc-ref:@Specifiers+a+and+A]. +- +e+ or +E+: Format +argument+ in scientific notation. + See {Specifiers e and E}[rdoc-ref:@Specifiers+e+and+E]. +- +f+: Format +argument+ as a decimal floating-point number. + See {Specifier f}[rdoc-ref:@Specifier+f]. +- +g+ or +G+: Format +argument+ in a "general" format. + See {Specifiers g and G}[rdoc-ref:@Specifiers+g+and+G]. + +=== Other Type Specifiers + +- +c+: Format +argument+ as a character. + See {Specifier c}[rdoc-ref:@Specifier+c]. +- +p+: Format +argument+ as a string via <tt>argument.inspect</tt>. + See {Specifier p}[rdoc-ref:@Specifier+p]. +- +s+: Format +argument+ as a string via <tt>argument.to_s</tt>. + See {Specifier s}[rdoc-ref:@Specifier+s]. +- <tt>%</tt>: Format +argument+ (<tt>'%'</tt>) as a single percent character. + See {Specifier %}[rdoc-ref:@Specifier+-25]. + +== Flags + +The effect of a flag may vary greatly among type specifiers. +These remarks are general in nature. +See {type-specific details}[rdoc-ref:@Type+Specifier+Details+and+Examples]. + +Multiple flags may be given with single type specifier; +order does not matter. + +=== <tt>' '</tt> Flag + +Insert a space before a non-negative number: + + sprintf('%d', 10) # => "10" + sprintf('% d', 10) # => " 10" + +Insert a minus sign for negative value: + + sprintf('%d', -10) # => "-10" + sprintf('% d', -10) # => "-10" + +=== <tt>'#'</tt> Flag + +Use an alternate format; varies among types: + + sprintf('%x', 100) # => "64" + sprintf('%#x', 100) # => "0x64" + +=== <tt>'+'</tt> Flag + +Add a leading plus sign for a non-negative number: + + sprintf('%x', 100) # => "64" + sprintf('%+x', 100) # => "+64" + +=== <tt>'-'</tt> Flag + +Left justify the value in its field: + + sprintf('%6d', 100) # => " 100" + sprintf('%-6d', 100) # => "100 " + +=== <tt>'0'</tt> Flag + +Left-pad with zeros instead of spaces: + + sprintf('%6d', 100) # => " 100" + sprintf('%06d', 100) # => "000100" + +=== <tt>'n$'</tt> Flag + +Format the (1-based) <tt>n</tt>th argument into this field: + + sprintf("%s %s", 'world', 'hello') # => "world hello" + sprintf("%2$s %1$s", 'world', 'hello') # => "hello world" + +== Width Specifier + +In general, a width specifier determines the minimum width (in characters) +of the formatted field: + + sprintf('%10d', 100) # => " 100" + + # Left-justify if negative. + sprintf('%-10d', 100) # => "100 " + + # Ignore if too small. + sprintf('%1d', 100) # => "100" + +If the width specifier is <tt>'*'</tt> instead of an integer, the actual minimum +width is taken from the argument list: + + sprintf('%*d', 20, 14) # => " 14" + +== Precision Specifier + +A precision specifier is a decimal point followed by zero or more +decimal digits. + +For integer type specifiers, the precision specifies the minimum number of +digits to be written. If the precision is shorter than the integer, the result is +padded with leading zeros. There is no modification or truncation of the result +if the integer is longer than the precision: + + sprintf('%.3d', 1) # => "001" + sprintf('%.3d', 1000) # => "1000" + + # If the precision is 0 and the value is 0, nothing is written + sprintf('%.d', 0) # => "" + sprintf('%.0d', 0) # => "" + +For the +a+/+A+, +e+/+E+, +f+ specifiers, the precision specifies +the number of digits after the decimal point to be written: + + sprintf('%.2f', 3.14159) # => "3.14" + sprintf('%.10f', 3.14159) # => "3.1415900000" + + # With no precision specifier, defaults to 6-digit precision. + sprintf('%f', 3.14159) # => "3.141590" + +For the +g+/+G+ specifiers, the precision specifies +the number of significant digits to be written: + + sprintf('%.2g', 123.45) # => "1.2e+02" + sprintf('%.3g', 123.45) # => "123" + sprintf('%.10g', 123.45) # => "123.45" + + # With no precision specifier, defaults to 6 significant digits. + sprintf('%g', 123.456789) # => "123.457" + +For the +s+, +p+ specifiers, the precision specifies +the number of characters to write: + + sprintf('%s', Time.now) # => "2022-05-04 11:59:16 -0400" + sprintf('%.10s', Time.now) # => "2022-05-04" + +If the precision specifier is <tt>'*'</tt> instead of a non-negative integer, +the actual precision is taken from the argument list: + + sprintf('%.*d', 20, 1) # => "00000000000000000001" + +== Type Specifier Details and Examples + +=== Specifiers +a+ and +A+ + +Format +argument+ as hexadecimal floating-point number: + + sprintf('%a', 3.14159) # => "0x1.921f9f01b866ep+1" + sprintf('%a', -3.14159) # => "-0x1.921f9f01b866ep+1" + sprintf('%a', 4096) # => "0x1p+12" + sprintf('%a', -4096) # => "-0x1p+12" + + # Capital 'A' means that alphabetical characters are printed in upper case. + sprintf('%A', 4096) # => "0X1P+12" + sprintf('%A', -4096) # => "-0X1P+12" + +=== Specifiers +b+ and +B+ + +The two specifiers +b+ and +B+ behave identically +except when flag <tt>'#'</tt>+ is used. + +Format +argument+ as a binary integer: + + sprintf('%b', 1) # => "1" + sprintf('%b', 4) # => "100" + + # Prefix '..' for negative value. + sprintf('%b', -4) # => "..100" + + # Alternate format. + sprintf('%#b', 4) # => "0b100" + sprintf('%#B', 4) # => "0B100" + +=== Specifier +c+ + +Format +argument+ as a single character: + + sprintf('%c', 'A') # => "A" + sprintf('%c', 65) # => "A" + +This behaves like String#<<, except for raising ArgumentError instead of RangeError. + +=== Specifier +d+ + +Format +argument+ as a decimal integer: + + sprintf('%d', 100) # => "100" + sprintf('%d', -100) # => "-100" + +Flag <tt>'#'</tt> does not apply. + +=== Specifiers +e+ and +E+ + +Format +argument+ in +{scientific notation}[https://en.wikipedia.org/wiki/Scientific_notation]: + + sprintf('%e', 3.14159) # => "3.141590e+00" + sprintf('%E', -3.14159) # => "-3.141590E+00" + +=== Specifier +f+ + +Format +argument+ as a floating-point number: + + sprintf('%f', 3.14159) # => "3.141590" + sprintf('%f', -3.14159) # => "-3.141590" + +Flag <tt>'#'</tt> does not apply. + +=== Specifiers +g+ and +G+ + +Format +argument+ using exponential form (+e+/+E+ specifier) +if the exponent is less than -4 or greater than or equal to the precision. +Otherwise format +argument+ using floating-point form (+f+ specifier): + + sprintf('%g', 100) # => "100" + sprintf('%g', 100.0) # => "100" + sprintf('%g', 3.14159) # => "3.14159" + sprintf('%g', 100000000000) # => "1e+11" + sprintf('%g', 0.000000000001) # => "1e-12" + + # Capital 'G' means use capital 'E'. + sprintf('%G', 100000000000) # => "1E+11" + sprintf('%G', 0.000000000001) # => "1E-12" + + # Alternate format. + sprintf('%#g', 100000000000) # => "1.00000e+11" + sprintf('%#g', 0.000000000001) # => "1.00000e-12" + sprintf('%#G', 100000000000) # => "1.00000E+11" + sprintf('%#G', 0.000000000001) # => "1.00000E-12" + +=== Specifier +o+ + +Format +argument+ as an octal integer. +If +argument+ is negative, it will be formatted as a two's complement +prefixed with +..7+: + + sprintf('%o', 16) # => "20" + + # Prefix '..7' for negative value. + sprintf('%o', -16) # => "..760" + + # Prefix zero for alternate format if positive. + sprintf('%#o', 16) # => "020" + sprintf('%#o', -16) # => "..760" + +=== Specifier +p+ + +Format +argument+ as a string via <tt>argument.inspect</tt>: + + t = Time.now + sprintf('%p', t) # => "2022-05-01 13:42:07.1645683 -0500" + +=== Specifier +s+ + +Format +argument+ as a string via <tt>argument.to_s</tt>: + + t = Time.now + sprintf('%s', t) # => "2022-05-01 13:42:07 -0500" + +Flag <tt>'#'</tt> does not apply. + +=== Specifiers +x+ and +X+ + +Format +argument+ as a hexadecimal integer. +If +argument+ is negative, it will be formatted as a two's complement +prefixed with +..f+: + + sprintf('%x', 100) # => "64" + + # Prefix '..f' for negative value. + sprintf('%x', -100) # => "..f9c" + + # Use alternate format. + sprintf('%#x', 100) # => "0x64" + + # Alternate format for negative value. + sprintf('%#x', -100) # => "0x..f9c" + +=== Specifier <tt>%</tt> + +Format +argument+ (<tt>'%'</tt>) as a single percent character: + + sprintf('%d %%', 100) # => "100 %" + +Flags do not apply. + +== Reference by Name + +For more complex formatting, Ruby supports a reference by name. +%<name>s style uses format style, but %{name} style doesn't. + +Examples: + + sprintf("%<foo>d : %<bar>f", { :foo => 1, :bar => 2 }) # => 1 : 2.000000 + sprintf("%{foo}f", { :foo => 1 }) # => "1f" diff --git a/doc/language/globals.md b/doc/language/globals.md new file mode 100644 index 0000000000..0f6b632a08 --- /dev/null +++ b/doc/language/globals.md @@ -0,0 +1,611 @@ +# Pre-Defined Global Variables + +Some of the pre-defined global variables have synonyms +that are available via module English. +For each of those, the \English synonym is given. + +To use the module: + +```ruby +require 'English' +``` + +## In Brief + +### Exceptions + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:--------:|:-----------------:|----------------------------------------|:---------:|:---------:|--------------| +| `$!` | `$ERROR_INFO` | \Exception object or `nil` | `nil` | Yes | Kernel#raise | +| `$@` | `$ERROR_POSITION` | \Array of backtrace positions or `nil` | `nil` | Yes | Kernel#raise | + +### Matched \Data + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:---------:|:-------------------:|-----------------------------------|:---------:|:---------:|-----------------| +| `$~` | `$LAST_MATCH_INFO` | \MatchData object or `nil` | `nil` | No | Matcher methods | +| `$&` | `$MATCH` | Matched substring or `nil` | `nil` | No | Matcher methods | +| `` $` `` | `$PRE_MATCH` | Substring left of match or `nil` | `nil` | No | Matcher methods | +| `$'` | `$POST_MATCH` | Substring right of match or `nil` | `nil` | No | Matcher methods | +| `$+` | `$LAST_PAREN_MATCH` | Last group matched or `nil` | `nil` | No | Matcher methods | +| `$1` | | First group matched or `nil` | `nil` | Yes | Matcher methods | +| `$2` | | Second group matched or `nil` | `nil` | Yes | Matcher methods | +| `$n` | | <i>n</i>th group matched or `nil` | `nil` | Yes | Matcher methods | + +### Separators + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:---------------------------:|-------------------------|:---------:|:---------:|----------| +| `$/`, `$-0` | `$INPUT_RECORD_SEPARATOR` | Input record separator | Newline | No | | +| `$\` | `$OUTPUT_RECORD_SEPARATOR` | Output record separator | `nil` | No | | + +### Streams + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:---------:|:----------------------------:|---------------------------------------------|:---------:|:---------:|----------------------| +| `$stdin` | | Standard input stream | `STDIN` | No | | +| `$stdout` | | Standard output stream | `STDOUT` | No | | +| `$stderr` | | Standard error stream | `STDERR` | No | | +| `$<` | `$DEFAULT_INPUT` | Default standard input | `ARGF` | Yes | | +| `$>` | `$DEFAULT_OUTPUT` | Default standard output | `STDOUT` | No | | +| `$.` | `$INPUT_LINE_NUMBER`, `$NR` | Input position of most recently read stream | 0 | No | Certain read methods | +| `$_` | `$LAST_READ_LINE` | String from most recently read stream | `nil` | No | Certain read methods | + +### Processes + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-------------------------:|:----------------------:|---------------------------------|:-------------:|:---------:|----------| +| `$0`, `$PROGRAM_NAME` | | Program name | Program name | No | | +| `$*` | `$ARGV` | \ARGV array | `ARGV` | Yes | | +| `$$` | `$PROCESS_ID`, `$PID` | Process id | Process PID | Yes | | +| `$?` | `$CHILD_STATUS` | Status of recently exited child | `nil` | Yes | | +| `$LOAD_PATH`, `$:`, `$-I` | | \Array of search paths | Ruby defaults | Yes | | +| `$LOADED_FEATURES`, `$"` | | \Array of load paths | Ruby defaults | Yes | | + +### Debugging + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:--------:|--------------------------------------------|:----------------------------:|:---------:|----------| +| `$FILENAME` | | Value returned by method `ARGF.filename` | Command-line argument or '-' | Yes | | +| `$DEBUG` | | Whether option `-d` or `--debug` was given | Command-line option | No | | +| `$VERBOSE` | | Whether option `-V` or `-W` was given | Command-line option | No | | + +### Other Variables + +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:--------:|-----------------------------------------------|:---------:|:---------:|----------| +| `$-F`, `$;` | | Separator given with command-line option `-F` | | | | +| `$-a` | | Whether option `-a` was given | | Yes | | +| `$-i` | | Extension given with command-line option `-i` | | No | | +| `$-l` | | Whether option `-l` was given | | Yes | | +| `$-p` | | Whether option `-p` was given | | Yes | | +| `$F` | | \Array of `$_` split by `$-F` | | | | + +## Exceptions + +### `$!` (\Exception) + +Contains the Exception object set by Kernel#raise: + +```ruby +begin + raise RuntimeError.new('Boo!') +rescue RuntimeError + p $! +end +``` + +Output: + +``` +#<RuntimeError: Boo!> +``` + +English - `$ERROR_INFO` + +### `$@` (Backtrace) + +Same as `$!.backtrace`; +returns an array of backtrace positions: + +```ruby +begin + raise RuntimeError.new('Boo!') +rescue RuntimeError + pp $@.take(4) +end +``` + +Output: + +``` +["(irb):338:in `<top (required)>'", + "/snap/ruby/317/lib/ruby/3.2.0/irb/workspace.rb:119:in `eval'", + "/snap/ruby/317/lib/ruby/3.2.0/irb/workspace.rb:119:in `evaluate'", + "/snap/ruby/317/lib/ruby/3.2.0/irb/context.rb:502:in `evaluate'"] +``` + +English - `$ERROR_POSITION`. + +## Matched \Data + +These global variables store information about the most recent +successful match in the current scope. + +For details and examples, +see [Regexp Global Variables]. + +### `$~` (\MatchData) + +MatchData object created from the match; +thread-local and frame-local. + +English - `$LAST_MATCH_INFO`. + +### `$&` (Matched Substring) + +The matched string. + +English - `$MATCH`. + +### `` $` `` (Pre-Match Substring) +The string to the left of the match. + +English - `$PREMATCH`. + +### `$'` (Post-Match Substring) + +The string to the right of the match. + +English - `$POSTMATCH`. + +### `$+` (Last Matched Group) + +The last group matched. + +English - `$LAST_PAREN_MATCH`. + +### `$1`, `$2`, \Etc. (Matched Group) + +For <tt>$n</tt> the <i>n</i>th group of the match. + +No \English. + +## Separators + +### `$/` (Input Record Separator) + +An input record separator, initially newline. +Set by the [command-line option `-0`]. + +Setting to non-nil value by other than the command-line option is +deprecated. + +English - `$INPUT_RECORD_SEPARATOR`, `$RS`. + +Aliased as `$-0`. + +### `$\` (Output Record Separator) + +An output record separator, initially `nil`. + +Copied from `$/` when the [command-line option `-l`] is +given. + +Setting to non-nil value by other than the command-line option is +deprecated. + +English - `$OUTPUT_RECORD_SEPARATOR`, `$ORS`. + +## Streams + +### `$stdin` (Standard Input) + +The current standard input stream; initially: + +```ruby +$stdin # => #<IO:<STDIN>> +``` + +### `$stdout` (Standard Output) + +The current standard output stream; initially: + +```ruby +$stdout # => #<IO:<STDOUT>> +``` + +### `$stderr` (Standard Error) + +The current standard error stream; initially: + +```ruby +$stderr # => #<IO:<STDERR>> +``` + +### `$<` (\ARGF or $stdin) + +Points to stream ARGF if not empty, else to stream $stdin; read-only. + +English - `$DEFAULT_INPUT`. + +### `$>` (Default Standard Output) + +An output stream, initially `$stdout`. + +English - `$DEFAULT_OUTPUT` + +### `$.` (Input Position) + +The input position (line number) in the most recently read stream. + +English - `$INPUT_LINE_NUMBER`, `$NR` + +### `$_` (Last Read Line) + +The line (string) from the most recently read stream. + +English - `$LAST_READ_LINE`. + +## Processes + +### `$0` + +Initially, contains the name of the script being executed; +may be reassigned. + +### `$*` (\ARGV) + +Points to ARGV. + +English - `$ARGV`. + +### `$$` (Process ID) + +The process ID of the current process. Same as Process.pid. + +English - `$PROCESS_ID`, `$PID`. + +### `$?` (Child Status) + +Initially `nil`, otherwise the Process::Status object +created for the most-recently exited child process; +thread-local. + +English - `$CHILD_STATUS`. + +### `$LOAD_PATH` (Load Path) + +Contains the array of paths to be searched +by Kernel#load and Kernel#require. + +Singleton method `$LOAD_PATH.resolve_feature_path(feature)` +returns: + +- <tt>[:rb, path]</tt>, where `path` is the path to the Ruby file to be + loaded for the given `feature`. +- <tt>[:so, path]</tt>, where `path` is the path to the shared object file + to be loaded for the given `feature`. +- `nil` if there is no such `feature` and `path`. + +Examples: + +```ruby +$LOAD_PATH.resolve_feature_path('timeout') +# => [:rb, "/snap/ruby/317/lib/ruby/3.2.0/timeout.rb"] +$LOAD_PATH.resolve_feature_path('date_core') +# => [:so, "/snap/ruby/317/lib/ruby/3.2.0/x86_64-linux/date_core.so"] +$LOAD_PATH.resolve_feature_path('foo') +# => nil +``` + +Aliased as `$:` and `$-I`. + +### `$LOADED_FEATURES` + +Contains an array of the paths to the loaded files: + +```ruby +$LOADED_FEATURES.take(10) +# => +["enumerator.so", + "thread.rb", + "fiber.so", + "rational.so", + "complex.so", + "ruby2_keywords.rb", + "/snap/ruby/317/lib/ruby/3.2.0/x86_64-linux/enc/encdb.so", + "/snap/ruby/317/lib/ruby/3.2.0/x86_64-linux/enc/trans/transdb.so", + "/snap/ruby/317/lib/ruby/3.2.0/x86_64-linux/rbconfig.rb", + "/snap/ruby/317/lib/ruby/3.2.0/rubygems/compatibility.rb"] +``` + +Aliased as `$"`. + +## Debugging + +### `$FILENAME` + +The value returned by method ARGF.filename. + +### `$DEBUG` + +Initially `true` if [command-line option `-d`] or +[`--debug`][command-line option `-d`] is given, otherwise initially `false`; +may be set to either value in the running program. + +When `true`, prints each raised exception to `$stderr`. + +Aliased as `$-d`. + +### `$VERBOSE` + +Initially `true` if [command-line option `-v`] or +[command-line option `-w`] is given, otherwise initially `false`; +may be set to either value, or to `nil`, in the running program. + +When `true`, enables Ruby warnings. + +When `nil`, disables warnings, including those from Kernel#warn. + +Aliased as `$-v` and `$-w`. + +## Other Variables + +### `$-F` + +The default field separator in String#split; must be a String or a +Regexp, and can be set with [command-line option `-F`]. + +Setting to non-nil value by other than the command-line option is +deprecated. + +Aliased as `$;`. + +### `$-a` + +Whether [command-line option `-a`] was given; read-only. + +### `$-i` + +Contains the extension given with [command-line option `-i`], +or `nil` if none. + +An alias of ARGF.inplace_mode. + +### `$-l` + +Whether [command-line option `-l`] was set; read-only. + +### `$-p` + +Whether [command-line option `-p`] was given; read-only. + +### `$F` + +If the [command-line option `-a`] is given, the array +obtained by splitting `$_` by `$-F` is assigned at the start of each +`-l`/`-p` loop. + +## Deprecated + +### `$=` + +### `$,` + +# Pre-Defined Global Constants + +## Summary + +### Streams + +| Constant | Contains | +|:--------:|-------------------------| +| `STDIN` | Standard input stream. | +| `STDOUT` | Standard output stream. | +| `STDERR` | Standard error stream. | + +### Environment + +| Constant | Contains | +|-----------------------|-------------------------------------------------------------------------------| +| `ENV` | Hash of current environment variable names and values. | +| `ARGF` | String concatenation of files given on the command line, or `$stdin` if none. | +| `ARGV` | Array of the given command-line arguments. | +| `TOPLEVEL_BINDING` | Binding of the top level scope. | +| `RUBY_VERSION` | String Ruby version. | +| `RUBY_RELEASE_DATE` | String Ruby release date. | +| `RUBY_PLATFORM` | String Ruby platform. | +| `RUBY_PATCH_LEVEL` | String Ruby patch level. | +| `RUBY_REVISION` | String Ruby revision. | +| `RUBY_COPYRIGHT` | String Ruby copyright. | +| `RUBY_ENGINE` | String Ruby engine. | +| `RUBY_ENGINE_VERSION` | String Ruby engine version. | +| `RUBY_DESCRIPTION` | String Ruby description. | + +### Embedded \Data + +| Constant | Contains | +|:---------------------:|-------------------------------------------------------------------------------| +| `DATA` | File containing embedded data (lines following `__END__`, if any). | + +## Streams + +### `STDIN` + +The standard input stream (the default value for `$stdin`): + +```ruby +STDIN # => #<IO:<STDIN>> +``` + +### `STDOUT` + +The standard output stream (the default value for `$stdout`): + +```ruby +STDOUT # => #<IO:<STDOUT>> +``` + +### `STDERR` + +The standard error stream (the default value for `$stderr`): + +```ruby +STDERR # => #<IO:<STDERR>> +``` + +## Environment + +### `ENV` + +A hash of the contains current environment variables names and values: + +```ruby +ENV.take(5) +# => +[["COLORTERM", "truecolor"], + ["DBUS_SESSION_BUS_ADDRESS", "unix:path=/run/user/1000/bus"], + ["DESKTOP_SESSION", "ubuntu"], + ["DISPLAY", ":0"], + ["GDMSESSION", "ubuntu"]] +``` + +### `ARGF` + +The virtual concatenation of the files given on the command line, or from +`$stdin` if no files were given, `"-"` is given, or after +all files have been read. + +### `ARGV` + +An array of the given command-line arguments. + +### `TOPLEVEL_BINDING` + +The Binding of the top level scope: + +```ruby +TOPLEVEL_BINDING # => #<Binding:0x00007f58da0da7c0> +``` + +### `RUBY_VERSION` + +The Ruby version: + +```ruby +RUBY_VERSION # => "3.2.2" +``` + +### `RUBY_RELEASE_DATE` + +The release date string: + +```ruby +RUBY_RELEASE_DATE # => "2023-03-30" +``` + +### `RUBY_PLATFORM` + +The platform identifier: + +```ruby +RUBY_PLATFORM # => "x86_64-linux" +``` + +### `RUBY_PATCHLEVEL` + +The integer patch level for this Ruby: + +```ruby +RUBY_PATCHLEVEL # => 53 +``` + +For a development build the patch level will be -1. + +### `RUBY_REVISION` + +The git commit hash for this Ruby: + +```ruby +RUBY_REVISION # => "e51014f9c05aa65cbf203442d37fef7c12390015" +``` + +### `RUBY_COPYRIGHT` + +The copyright string: + +```ruby +RUBY_COPYRIGHT +# => "ruby - Copyright (C) 1993-2023 Yukihiro Matsumoto" +``` + +### `RUBY_ENGINE` + +The name of the Ruby implementation: + +```ruby +RUBY_ENGINE # => "ruby" +``` + +### `RUBY_ENGINE_VERSION` + +The version of the Ruby implementation: + +```ruby +RUBY_ENGINE_VERSION # => "3.2.2" +``` + +### `RUBY_DESCRIPTION` + +The description of the Ruby implementation: + +```ruby +RUBY_DESCRIPTION +# => "ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]" +``` + +## Embedded \Data + +### `DATA` + +Defined if and only if the program has this line: + +```ruby +__END__ +``` + +When defined, `DATA` is a File object +containing the lines following the `__END__`, +positioned at the first of those lines: + +```ruby +p DATA +DATA.each_line { |line| p line } +__END__ +Foo +Bar +Baz +``` + +Output: + +``` +#<File:t.rb> +"Foo\n" +"Bar\n" +"Baz\n" +``` + +[command-line option `-0`]: rdoc-ref:language/options.md@-0-set--input-record-separator +[command-line option `-F`]: rdoc-ref:language/options.md@-f-set-input-field-separator +[command-line option `-a`]: rdoc-ref:language/options.md@-a-split-input-lines-into-fields +[command-line option `-d`]: rdoc-ref:language/options.md@-d-set-debug-to-true +[command-line option `-i`]: rdoc-ref:language/options.md@-i-set-argf-in-place-mode +[command-line option `-l`]: rdoc-ref:language/options.md@-l-set-output-record-separator-chop-lines +[command-line option `-p`]: rdoc-ref:language/options.md@-p--n-with-printing +[command-line option `-v`]: rdoc-ref:language/options.md@-v-print-version-set-verbose +[command-line option `-w`]: rdoc-ref:language/options.md@-w-synonym-for--w1 + +[Regexp Global Variables]: rdoc-ref:Regexp@Global+Variables + diff --git a/doc/language/hash_inclusion.rdoc b/doc/language/hash_inclusion.rdoc new file mode 100644 index 0000000000..05c2b0932a --- /dev/null +++ b/doc/language/hash_inclusion.rdoc @@ -0,0 +1,31 @@ +== \Hash Inclusion + +A hash is set-like in that it cannot have duplicate entries +(or even duplicate keys). +\Hash inclusion can therefore based on the idea of +{subset and superset}[https://en.wikipedia.org/wiki/Subset]. + +Two hashes may be tested for inclusion, +based on comparisons of their entries. + +An entry <tt>h0[k0]</tt> in one hash +is equal to an entry <tt>h1[k1]</tt> in another hash +if and only if the two keys are equal (<tt>k0 == k1</tt>) +and their two values are equal (<tt>h0[k0] == h1[h1]</tt>). + +A hash may be a subset or a superset of another hash: + +- Subset (included in or equal to another): + + - \Hash +h0+ is a _subset_ of hash +h1+ (see Hash#<=) + if each entry in +h0+ is equal to an entry in +h1+. + - Further, +h0+ is a <i>proper subset</i> of +h1+ (see Hash#<) + if +h1+ is larger than +h0+. + +- Superset (including or equal to another): + + - \Hash +h0+ is a _superset_ of hash +h1+ (see Hash#>=) + if each entry in +h1+ is equal to an entry in +h0+. + - Further, +h0+ is a <i>proper superset</i> of +h1+ (see Hash#>) + if +h0+ is larger than +h1+. + diff --git a/doc/language/implicit_conversion.rdoc b/doc/language/implicit_conversion.rdoc new file mode 100644 index 0000000000..e244096125 --- /dev/null +++ b/doc/language/implicit_conversion.rdoc @@ -0,0 +1,221 @@ += Implicit Conversions + +Some Ruby methods accept one or more objects +that can be either: + +* <i>Of a given class</i>, and so accepted as is. +* <i>Implicitly convertible to that class</i>, in which case + the called method converts the object. + +For each of the relevant classes, the conversion is done by calling +a specific conversion method: + +* Array: +to_ary+ +* Hash: +to_hash+ +* Integer: +to_int+ +* String: +to_str+ + +== Array-Convertible Objects + +An <i>Array-convertible object</i> is an object that: + +* Has instance method +to_ary+. +* The method accepts no arguments. +* The method returns an object +obj+ for which <tt>obj.kind_of?(Array)</tt> returns +true+. + +The Ruby core class that satisfies these requirements is: + +* Array + +The examples in this section use method <tt>Array#replace</tt>, +which accepts an Array-convertible argument. + +This class is Array-convertible: + + class ArrayConvertible + def to_ary + [:foo, 'bar', 2] + end + end + a = [] + a.replace(ArrayConvertible.new) # => [:foo, "bar", 2] + +This class is not Array-convertible (no +to_ary+ method): + + class NotArrayConvertible; end + a = [] + # Raises TypeError (no implicit conversion of NotArrayConvertible into Array) + a.replace(NotArrayConvertible.new) + +This class is not Array-convertible (method +to_ary+ takes arguments): + + class NotArrayConvertible + def to_ary(x) + [:foo, 'bar', 2] + end + end + a = [] + # Raises ArgumentError (wrong number of arguments (given 0, expected 1)) + a.replace(NotArrayConvertible.new) + +This class is not Array-convertible (method +to_ary+ returns non-Array): + + class NotArrayConvertible + def to_ary + :foo + end + end + a = [] + # Raises TypeError (can't convert NotArrayConvertible to Array (NotArrayConvertible#to_ary gives Symbol)) + a.replace(NotArrayConvertible.new) + +== Hash-Convertible Objects + +A <i>Hash-convertible object</i> is an object that: + +* Has instance method +to_hash+. +* The method accepts no arguments. +* The method returns an object +obj+ for which <tt>obj.kind_of?(Hash)</tt> returns +true+. + +The Ruby core class that satisfies these requirements is: + +* Hash + +The examples in this section use method <tt>Hash#merge</tt>, +which accepts a Hash-convertible argument. + +This class is Hash-convertible: + + class HashConvertible + def to_hash + {foo: 0, bar: 1, baz: 2} + end + end + h = {} + h.merge(HashConvertible.new) # => {:foo=>0, :bar=>1, :baz=>2} + +This class is not Hash-convertible (no +to_hash+ method): + + class NotHashConvertible; end + h = {} + # Raises TypeError (no implicit conversion of NotHashConvertible into Hash) + h.merge(NotHashConvertible.new) + +This class is not Hash-convertible (method +to_hash+ takes arguments): + + class NotHashConvertible + def to_hash(x) + {foo: 0, bar: 1, baz: 2} + end + end + h = {} + # Raises ArgumentError (wrong number of arguments (given 0, expected 1)) + h.merge(NotHashConvertible.new) + +This class is not Hash-convertible (method +to_hash+ returns non-Hash): + + class NotHashConvertible + def to_hash + :foo + end + end + h = {} + # Raises TypeError (can't convert NotHashConvertible to Hash (ToHashReturnsNonHash#to_hash gives Symbol)) + h.merge(NotHashConvertible.new) + +== Integer-Convertible Objects + +An <i>Integer-convertible object</i> is an object that: + +* Has instance method +to_int+. +* The method accepts no arguments. +* The method returns an object +obj+ for which <tt>obj.kind_of?(Integer)</tt> returns +true+. + +The Ruby core classes that satisfy these requirements are: + +* Integer +* Float +* Complex +* Rational + +The examples in this section use method <tt>Array.new</tt>, +which accepts an Integer-convertible argument. + +This user-defined class is Integer-convertible: + + class IntegerConvertible + def to_int + 3 + end + end + a = Array.new(IntegerConvertible.new).size + a # => 3 + +This class is not Integer-convertible (method +to_int+ takes arguments): + + class NotIntegerConvertible + def to_int(x) + 3 + end + end + # Raises ArgumentError (wrong number of arguments (given 0, expected 1)) + Array.new(NotIntegerConvertible.new) + +This class is not Integer-convertible (method +to_int+ returns non-Integer): + + class NotIntegerConvertible + def to_int + :foo + end + end + # Raises TypeError (can't convert NotIntegerConvertible to Integer (NotIntegerConvertible#to_int gives Symbol)) + Array.new(NotIntegerConvertible.new) + +== String-Convertible Objects + +A <i>String-convertible object</i> is an object that: +* Has instance method +to_str+. +* The method accepts no arguments. +* The method returns an object +obj+ for which <tt>obj.kind_of?(String)</tt> returns +true+. + +The Ruby core class that satisfies these requirements is: + +* String + +The examples in this section use method <tt>String::new</tt>, +which accepts a String-convertible argument. + +This class is String-convertible: + + class StringConvertible + def to_str + 'foo' + end + end + String.new(StringConvertible.new) # => "foo" + +This class is not String-convertible (no +to_str+ method): + + class NotStringConvertible; end + # Raises TypeError (no implicit conversion of NotStringConvertible into String) + String.new(NotStringConvertible.new) + +This class is not String-convertible (method +to_str+ takes arguments): + + class NotStringConvertible + def to_str(x) + 'foo' + end + end + # Raises ArgumentError (wrong number of arguments (given 0, expected 1)) + String.new(NotStringConvertible.new) + +This class is not String-convertible (method +to_str+ returns non-String): + + class NotStringConvertible + def to_str + :foo + end + end + # Raises TypeError (can't convert NotStringConvertible to String (NotStringConvertible#to_str gives Symbol)) + String.new(NotStringConvertible.new) diff --git a/doc/language/marshal.rdoc b/doc/language/marshal.rdoc new file mode 100644 index 0000000000..740064ade6 --- /dev/null +++ b/doc/language/marshal.rdoc @@ -0,0 +1,318 @@ += Marshal Format + +The Marshal format is used to serialize ruby objects. The format can store +arbitrary objects through three user-defined extension mechanisms. + +For documentation on using Marshal to serialize and deserialize objects, see +the Marshal module. + +This document calls a serialized set of objects a stream. The Ruby +implementation can load a set of objects from a String, an IO or an object +that implements a +getc+ method. + +== Stream Format + +The first two bytes of the stream contain the major and minor version, each as +a single byte encoding a digit. The version implemented in Ruby is 4.8 +(stored as "\x04\x08") and is supported by ruby 1.8.0 and newer. + +Different major versions of the Marshal format are not compatible and cannot +be understood by other major versions. Lesser minor versions of the format +can be understood by newer minor versions. Format 4.7 can be loaded by a 4.8 +implementation but format 4.8 cannot be loaded by a 4.7 implementation. + +Following the version bytes is a stream describing the serialized object. The +stream contains nested objects (the same as a Ruby object) but objects in the +stream do not necessarily have a direct mapping to the Ruby object model. + +Each object in the stream is described by a byte indicating its type followed +by one or more bytes describing the object. When "object" is mentioned below +it means any of the types below that defines a Ruby object. + +=== true, false, nil + +These objects are each one byte long. "T" is represents +true+, "F" +represents +false+ and "0" represents +nil+. + +=== Fixnum and long + +"i" represents a signed 32 bit value using a packed format. One through five +bytes follows the type. The value loaded will always be a Fixnum. On +32 bit platforms (where the precision of a Fixnum is less than 32 bits) +loading large values will cause overflow on CRuby. + +The fixnum type is used to represent both ruby Fixnum objects and the sizes of +marshaled arrays, hashes, instance variables and other types. In the +following sections "long" will mean the format described below, which supports +full 32 bit precision. + +The first byte has the following special values: + +"\x00":: + The value of the integer is 0. No bytes follow. + +"\x01":: + The total size of the integer is two bytes. The following byte is a + positive integer in the range of 0 through 255. Only values between 123 + and 255 should be represented this way to save bytes. + +"\xff":: + The total size of the integer is two bytes. The following byte is a + negative integer in the range of -1 through -256. + +"\x02":: + The total size of the integer is three bytes. The following two bytes are a + positive little-endian integer. + +"\xfe":: + The total size of the integer is three bytes. The following two bytes are a + negative little-endian integer. + +"\x03":: + The total size of the integer is four bytes. The following three bytes are + a positive little-endian integer. + +"\xfd":: + The total size of the integer is four bytes. The following three bytes are a + negative little-endian integer. + +"\x04":: + The total size of the integer is five bytes. The following four bytes are a + positive little-endian integer. For compatibility with 32 bit ruby, + only Fixnums less than 1073741824 should be represented this way. For sizes + of stream objects full precision may be used. + +"\xfc":: + The total size of the integer is five bytes. The following four bytes are a + negative little-endian integer. For compatibility with 32 bit ruby, + only Fixnums greater than -10737341824 should be represented this way. For + sizes of stream objects full precision may be used. + +Otherwise the first byte is a sign-extended eight-bit value with an offset. +If the value is positive the value is determined by subtracting 5 from the +value. If the value is negative the value is determined by adding 5 to the +value. + +There are multiple representations for many values. CRuby always outputs the +shortest representation possible. + +=== Symbols and Byte Sequence + +":" represents a real symbol. A real symbol contains the data needed to +define the symbol for the rest of the stream as future occurrences in the +stream will instead be references (a symbol link) to this one. The reference +is a zero-indexed 32 bit value (so the first occurrence of <code>:hello</code> +is 0). + +Following the type byte is byte sequence which consists of a long indicating +the number of bytes in the sequence followed by that many bytes of data. Byte +sequences have no encoding. + +For example, the following stream contains the Symbol <code>:hello</code>: + + "\x04\x08:\x0ahello" + +";" represents a Symbol link which references a previously defined Symbol. +Following the type byte is a long containing the index in the lookup table for +the linked (referenced) Symbol. + +For example, the following stream contains <code>[:hello, :hello]</code>: + + "\x04\b[\a:\nhello;\x00" + +When a "symbol" is referenced below it may be either a real symbol or a +symbol link. + +=== Object References + +Separate from but similar to symbol references, the stream contains only one +copy of each object (as determined by #object_id) for all objects except +true, false, nil, Fixnums and Symbols (which are stored separately as +described above) a one-indexed 32 bit value will be stored and reused when the +object is encountered again. (The first object has an index of 1). + +"@" represents an object link. Following the type byte is a long giving the +index of the object. + +For example, the following stream contains an Array of the same +<code>"hello"</code> object twice: + + "\004\b[\a\"\nhello@\006" + +=== Instance Variables + +"I" indicates that instance variables follow the next object. An object +follows the type byte. Following the object is a length indicating the number +of instance variables for the object. Following the length is a set of +name-value pairs. The names are symbols while the values are objects. The +symbols must be instance variable names (<code>:@name</code>). + +An Object ("o" type, described below) uses the same format for its instance +variables as described here. + +For a String and Regexp (described below) a special instance variable +<code>:E</code> is used to indicate the Encoding. + +=== Extended + +"e" indicates that the next object is extended by a module. An object follows +the type byte. Following the object is a symbol that contains the name of the +module the object is extended by. + +=== Array + +"[" represents an Array. Following the type byte is a long indicating the +number of objects in the array. The given number of objects follow the +length. + +=== Bignum + +"l" represents a Bignum which is composed of three parts: + +sign:: + A single byte containing "+" for a positive value or "-" for a negative + value. +length:: + A long indicating the number of bytes of Bignum data follows, divided by + two. Multiply the length by two to determine the number of bytes of data + that follow. +data:: + Bytes of Bignum data representing the number. + +The following ruby code will reconstruct the Bignum value from an array of +bytes: + + result = 0 + + bytes.each_with_index do |byte, exp| + result += (byte * 2 ** (exp * 8)) + end + +=== +Class+ and +Module+ + +"c" represents a +Class+ object, "m" represents a +Module+ and "M" represents +either a class or module (this is an old-style for compatibility). No class +or module content is included, this type is only a reference. Following the +type byte is a byte sequence which is used to look up an existing class or +module, respectively. + +Instance variables are not allowed on a class or module. + +If no class or module exists an exception should be raised. + +For "c" and "m" types, the loaded object must be a class or module, +respectively. + +=== Data + +"d" represents a Data object. (Data objects are wrapped pointers from ruby +extensions.) Following the type byte is a symbol indicating the class for the +Data object and an object that contains the state of the Data object. + +To dump a Data object Ruby calls _dump_data. To load a Data object Ruby calls +_load_data with the state of the object on a newly allocated instance. + +=== Float + +"f" represents a Float object. Following the type byte is a byte sequence +containing the float value. The following values are special: + +"inf":: + Positive infinity + +"-inf":: + Negative infinity + +"nan":: + Not a Number + +Otherwise the byte sequence contains a C double (loadable by strtod(3)). +Older minor versions of Marshal also stored extra mantissa bits to ensure +portability across platforms but 4.8 does not include these. See +[ruby-talk:69518] for some explanation. + +=== Hash and Hash with Default Value + +"{" represents a Hash object while "}" represents a Hash with a default value +set (<code>Hash.new 0</code>). Following the type byte is a long indicating +the number of key-value pairs in the Hash, the size. Double the given number +of objects follow the size. + +For a Hash with a default value, the default value follows all the pairs. + +=== Module and Old Module + +=== Object + +"o" represents an object that doesn't have any other special form (such as +a user-defined or built-in format). Following the type byte is a symbol +containing the class name of the object. Following the class name is a long +indicating the number of instance variable names and values for the object. +Double the given number of pairs of objects follow the size. + +The keys in the pairs must be symbols containing instance variable names. + +=== Regular Expression + +"/" represents a regular expression. Following the type byte is a byte +sequence containing the regular expression source. Following the type byte is +a byte containing the regular expression options (case-insensitive, etc.) as a +signed 8-bit value. + +Regular expressions can have an encoding attached through instance variables +(see above). If no encoding is attached escapes for the following regexp +specials not present in ruby 1.8 must be removed: g-m, o-q, u, y, E, F, H-L, +N-V, X, Y. + +=== String + +'"' represents a String. Following the type byte is a byte sequence +containing the string content. When dumped from ruby 1.9 an encoding instance +variable (<code>:E</code> see above) should be included unless the encoding is +binary. + +=== Struct + +"S" represents a Struct. Following the type byte is a symbol containing the +name of the struct. Following the name is a long indicating the number of +members in the struct. Double the number of objects follow the member count. +Each member is a pair containing the member's symbol and an object for the +value of that member. + +If the struct name does not match a Struct subclass in the running ruby an +exception should be raised. + +If there is a mismatch between the struct in the currently running ruby and +the member count in the marshaled struct an exception should be raised. + +=== User Class + +"C" represents a subclass of a String, Regexp, Array or Hash. Following the +type byte is a symbol containing the name of the subclass. Following the name +is the wrapped object. + +=== User Defined + +"u" represents an object with a user-defined serialization format using the ++_dump+ instance method and +_load+ class method. Following the type byte is +a symbol containing the class name. Following the class name is a byte +sequence containing the user-defined representation of the object. + +The class method +_load+ is called on the class with a string created from the +byte-sequence. + +This type is not recommended for newly created classes, because of some +restrictions: + +- cannot have recursive reference + +=== User Marshal + +"U" represents an object with a user-defined serialization format using the ++marshal_dump+ and +marshal_load+ instance methods. Following the type byte +is a symbol containing the class name. Following the class name is an object +containing the data. + +Upon loading a new instance must be allocated and +marshal_load+ must be +called on the instance with the data. + diff --git a/doc/language/option_dump.md b/doc/language/option_dump.md new file mode 100644 index 0000000000..328c6b52af --- /dev/null +++ b/doc/language/option_dump.md @@ -0,0 +1,265 @@ +# Option `--dump` + +For other argument values, +see {Option `--dump`}[rdoc-ref:options.md@--dump+Dump+Items]. + +For the examples here, we use this program: + +```console +$ cat t.rb +puts 'Foo' +``` + +The supported dump items: + +- `insns`: Instruction sequences: + + ```sh + $ ruby --dump=insns t.rb + == disasm: #<ISeq:<main>@t.rb:1 (1,0)-(1,10)> (catch: FALSE) + 0000 putself ( 1)[Li] + 0001 dupstring "Foo" + 0003 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE> + 0005 leave + ``` + +- `parsetree`: {Abstract syntax tree}[https://en.wikipedia.org/wiki/Abstract_syntax_tree] + (AST): + + ```console + $ ruby --dump=parsetree t.rb + ########################################################### + ## Do NOT use this node dump for any purpose other than ## + ## debug and research. Compatibility is not guaranteed. ## + ########################################################### + + # @ NODE_SCOPE (line: 1, location: (1,0)-(1,10)) + # +- nd_tbl: (empty) + # +- nd_args: + # | (null node) + # +- nd_body: + # @ NODE_FCALL (line: 1, location: (1,0)-(1,10))* + # +- nd_mid: :puts + # +- nd_args: + # @ NODE_LIST (line: 1, location: (1,5)-(1,10)) + # +- nd_alen: 1 + # +- nd_head: + # | @ NODE_STR (line: 1, location: (1,5)-(1,10)) + # | +- nd_lit: "Foo" + # +- nd_next: + # (null node) + ``` + +- `yydebug`: Debugging information from yacc parser generator: + + ``` + $ ruby --dump=yydebug t.rb + Starting parse + Entering state 0 + Reducing stack by rule 1 (line 1295): + lex_state: NONE -> BEG at line 1296 + vtable_alloc:12392: 0x0000558453df1a00 + vtable_alloc:12393: 0x0000558453df1a60 + cmdarg_stack(push): 0 at line 12406 + cond_stack(push): 0 at line 12407 + -> $$ = nterm $@1 (1.0-1.0: ) + Stack now 0 + Entering state 2 + Reading a token: + lex_state: BEG -> CMDARG at line 9049 + Next token is token "local variable or method" (1.0-1.4: puts) + Shifting token "local variable or method" (1.0-1.4: puts) + Entering state 35 + Reading a token: Next token is token "string literal" (1.5-1.6: ) + Reducing stack by rule 742 (line 5567): + $1 = token "local variable or method" (1.0-1.4: puts) + -> $$ = nterm operation (1.0-1.4: ) + Stack now 0 2 + Entering state 126 + Reducing stack by rule 78 (line 1794): + $1 = nterm operation (1.0-1.4: ) + -> $$ = nterm fcall (1.0-1.4: ) + Stack now 0 2 + Entering state 80 + Next token is token "string literal" (1.5-1.6: ) + Reducing stack by rule 292 (line 2723): + cmdarg_stack(push): 1 at line 2737 + -> $$ = nterm $@16 (1.4-1.4: ) + Stack now 0 2 80 + Entering state 235 + Next token is token "string literal" (1.5-1.6: ) + Shifting token "string literal" (1.5-1.6: ) + Entering state 216 + Reducing stack by rule 607 (line 4706): + -> $$ = nterm string_contents (1.6-1.6: ) + Stack now 0 2 80 235 216 + Entering state 437 + Reading a token: Next token is token "literal content" (1.6-1.9: "Foo") + Shifting token "literal content" (1.6-1.9: "Foo") + Entering state 503 + Reducing stack by rule 613 (line 4802): + $1 = token "literal content" (1.6-1.9: "Foo") + -> $$ = nterm string_content (1.6-1.9: ) + Stack now 0 2 80 235 216 437 + Entering state 507 + Reducing stack by rule 608 (line 4716): + $1 = nterm string_contents (1.6-1.6: ) + $2 = nterm string_content (1.6-1.9: ) + -> $$ = nterm string_contents (1.6-1.9: ) + Stack now 0 2 80 235 216 + Entering state 437 + Reading a token: + lex_state: CMDARG -> END at line 7276 + Next token is token "terminator" (1.9-1.10: ) + Shifting token "terminator" (1.9-1.10: ) + Entering state 508 + Reducing stack by rule 590 (line 4569): + $1 = token "string literal" (1.5-1.6: ) + $2 = nterm string_contents (1.6-1.9: ) + $3 = token "terminator" (1.9-1.10: ) + -> $$ = nterm string1 (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 109 + Reducing stack by rule 588 (line 4559): + $1 = nterm string1 (1.5-1.10: ) + -> $$ = nterm string (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 108 + Reading a token: + lex_state: END -> BEG at line 9200 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 586 (line 4541): + $1 = nterm string (1.5-1.10: ) + -> $$ = nterm strings (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 107 + Reducing stack by rule 307 (line 2837): + $1 = nterm strings (1.5-1.10: ) + -> $$ = nterm primary (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 90 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 261 (line 2553): + $1 = nterm primary (1.5-1.10: ) + -> $$ = nterm arg (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 220 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 270 (line 2586): + $1 = nterm arg (1.5-1.10: ) + -> $$ = nterm arg_value (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 221 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 297 (line 2779): + $1 = nterm arg_value (1.5-1.10: ) + -> $$ = nterm args (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 224 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 772 (line 5626): + -> $$ = nterm none (1.10-1.10: ) + Stack now 0 2 80 235 224 + Entering state 442 + Reducing stack by rule 296 (line 2773): + $1 = nterm none (1.10-1.10: ) + + -> $$ = nterm opt_block_arg (1.10-1.10: ) + Stack now 0 2 80 235 224 + Entering state 441 + Reducing stack by rule 288 (line 2696): + $1 = nterm args (1.5-1.10: ) + $2 = nterm opt_block_arg (1.10-1.10: ) + -> $$ = nterm call_args (1.5-1.10: ) + Stack now 0 2 80 235 + Entering state 453 + Reducing stack by rule 293 (line 2723): + $1 = nterm $@16 (1.4-1.4: ) + $2 = nterm call_args (1.5-1.10: ) + cmdarg_stack(pop): 0 at line 2754 + -> $$ = nterm command_args (1.4-1.10: ) + Stack now 0 2 80 + Entering state 333 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 79 (line 1804): + $1 = nterm fcall (1.0-1.4: ) + $2 = nterm command_args (1.4-1.10: ) + -> $$ = nterm command (1.0-1.10: ) + Stack now 0 2 + Entering state 81 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 73 (line 1770): + $1 = nterm command (1.0-1.10: ) + -> $$ = nterm command_call (1.0-1.10: ) + Stack now 0 2 + Entering state 78 + Reducing stack by rule 51 (line 1659): + $1 = nterm command_call (1.0-1.10: ) + -> $$ = nterm expr (1.0-1.10: ) + Stack now 0 2 + Entering state 75 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 39 (line 1578): + $1 = nterm expr (1.0-1.10: ) + -> $$ = nterm stmt (1.0-1.10: ) + Stack now 0 2 + Entering state 73 + Next token is token '\n' (1.10-1.10: ) + Reducing stack by rule 8 (line 1354): + $1 = nterm stmt (1.0-1.10: ) + -> $$ = nterm top_stmt (1.0-1.10: ) + Stack now 0 2 + Entering state 72 + Reducing stack by rule 5 (line 1334): + $1 = nterm top_stmt (1.0-1.10: ) + -> $$ = nterm top_stmts (1.0-1.10: ) + Stack now 0 2 + Entering state 71 + Next token is token '\n' (1.10-1.10: ) + Shifting token '\n' (1.10-1.10: ) + Entering state 311 + Reducing stack by rule 769 (line 5618): + $1 = token '\n' (1.10-1.10: ) + -> $$ = nterm term (1.10-1.10: ) + Stack now 0 2 71 + Entering state 313 + Reducing stack by rule 770 (line 5621): + $1 = nterm term (1.10-1.10: ) + -> $$ = nterm terms (1.10-1.10: ) + Stack now 0 2 71 + Entering state 314 + Reading a token: Now at end of input. + Reducing stack by rule 759 (line 5596): + $1 = nterm terms (1.10-1.10: ) + -> $$ = nterm opt_terms (1.10-1.10: ) + Stack now 0 2 71 + Entering state 312 + Reducing stack by rule 3 (line 1321): + $1 = nterm top_stmts (1.0-1.10: ) + $2 = nterm opt_terms (1.10-1.10: ) + -> $$ = nterm top_compstmt (1.0-1.10: ) + Stack now 0 2 + Entering state 70 + Reducing stack by rule 2 (line 1295): + $1 = nterm $@1 (1.0-1.0: ) + $2 = nterm top_compstmt (1.0-1.10: ) + vtable_free:12426: p->lvtbl->args(0x0000558453df1a00) + vtable_free:12427: p->lvtbl->vars(0x0000558453df1a60) + cmdarg_stack(pop): 0 at line 12428 + cond_stack(pop): 0 at line 12429 + -> $$ = nterm program (1.0-1.10: ) + Stack now 0 + Entering state 1 + Now at end of input. + Shifting token "end-of-input" (1.10-1.10: ) + Entering state 3 + Stack now 0 1 3 + Cleanup: popping token "end-of-input" (1.10-1.10: ) + Cleanup: popping nterm program (1.0-1.10: ) + ``` + +Additional flags can follow dump items. + +- `+comment`: Add comments to AST. +- `+error-tolerant`: Parse in error-tolerant mode. +- `-optimize`: Disable optimizations for instruction sequences. diff --git a/doc/language/options.md b/doc/language/options.md new file mode 100644 index 0000000000..1329b7ca63 --- /dev/null +++ b/doc/language/options.md @@ -0,0 +1,744 @@ +# Ruby Command-Line Options + +## About the Examples + +Some examples here use command-line option `-e`, +which passes the Ruby code to be executed on the command line itself: + +```console +$ ruby -e 'puts "Hello, World."' +``` + +Some examples here assume that file `desiderata.txt` exists: + +```console +$ cat desiderata.txt +Go placidly amid the noise and the haste, +and remember what peace there may be in silence. +As far as possible, without surrender, +be on good terms with all persons. +``` + +## Options + +### `-0`: Set `$/` (Input Record Separator) + +Option `-0` defines the input record separator `$/` +for the invoked Ruby program. + +The optional argument to the option must be octal digits, +each in the range `0..7`; +these digits are prefixed with digit `0` to form an octal value. + +If no argument is given, the input record separator is `0x00`. + +If an argument is given, it must immediately follow the option +(no intervening whitespace or equal-sign character `'='`); +argument values: + +- `0`: the input record separator is `''`; + see {Special Line Separator Values}[rdoc-ref:IO@Special+Line+Separator+Values]. +- In range `(1..0377)`: + the input record separator `$/` is set to the character value of the argument. +- Any other octal value: the input record separator is `nil`. + +Examples: + +```console +$ ruby -0 -e 'p $/' +"\x00" +ruby -00 -e 'p $/' +"" +$ ruby -012 -e 'p $/' +"\n" +$ ruby -015 -e 'p $/' +"\r" +$ ruby -0377 -e 'p $/' +"\xFF" +$ ruby -0400 -e 'p $/' +nil +``` + +See also: + +- [Option `-a`][-a]: + Split input lines into fields. +- [Option `-F`][-F]: + Set input field separator. +- [Option `-l`][-l]: + Set output record separator; chop lines. +- [Option `-n`][-n]: + Run program in `gets` loop. +- [Option `-p`][-p]: + `-n`, with printing. + +### `-a`: Split Input Lines into Fields + +Option `-a`, when given with either of options `-n` or `-p`, +splits the string at `$_` into an array of strings at `$F`: + +```console +$ ruby -an -e 'p $F' desiderata.txt +["Go", "placidly", "amid", "the", "noise", "and", "the", "haste,"] +["and", "remember", "what", "peace", "there", "may", "be", "in", "silence."] +["As", "far", "as", "possible,", "without", "surrender,"] +["be", "on", "good", "terms", "with", "all", "persons."] +``` + +For the splitting, +the default record separator is `$/`, +and the default field separator is `$;`. + +See also: + +- [Option `-0`][-0]: + Set `$/` (input record separator). +- [Option `-F`][-F]: + Set input field separator. +- [Option `-l`][-l]: + Set output record separator; chop lines. +- [Option `-n`][-n]: + Run program in `gets` loop. +- [Option `-p`][-p]: + `-n`, with printing. + +### `-c`: Check Syntax + +Option `-c` specifies that the specified Ruby program +should be checked for syntax, but not actually executed: + +```console +$ ruby -e 'puts "Foo"' +Foo +$ ruby -c -e 'puts "Foo"' +Syntax OK +``` + +### `-C`: Set Working Directory + +The argument to option `-C` specifies a working directory +for the invoked Ruby program; +does not change the working directory for the current process: + +```console +$ basename `pwd` +ruby +$ ruby -C lib -e 'puts File.basename(Dir.pwd)' +lib +$ basename `pwd` +ruby +``` + +This option is accumulative; relative paths are solved from the +previous working directory. + +```console +$ ruby -C / -C usr -e 'puts Dir.pwd' +/usr +``` + +If the argument is not an existing directory, a fatal error will +occur: + +```console +$ ruby -C /nonexistent +ruby: Can't chdir to /nonexistent (fatal) +$ ruby -C /dev/null +ruby: Can't chdir to /dev/null (fatal) +``` + +Whitespace between the option and its argument may be omitted. + +### `-d`: Set `$DEBUG` to `true` + +Some code in (or called by) the Ruby program may include statements or blocks +conditioned by the global variable `$DEBUG` (e.g., `if $DEBUG`); +these commonly write to `$stdout` or `$stderr`. + +The default value for `$DEBUG` is `false`; +option `-d` sets it to `true`: + +```console +$ ruby -e 'p $DEBUG' +false +$ ruby -d -e 'p $DEBUG' +true +``` + +[Option `--debug`][--debug] is an alias for option `-d`. + +### `-e`: Execute Given Ruby Code + +Option `-e` requires an argument, which is Ruby code to be executed; +the option may be given more than once: + +```console +$ ruby -e 'puts "Foo"' -e 'puts "Bar"' +Foo +Bar +``` + +Whitespace between the option and its argument may be omitted. + +The command may include other options, +but should not include arguments (which, if given, are ignored). + +### `-E`: Set Default Encodings + +Option `-E` requires an argument, which specifies either the default external encoding, +or both the default external and internal encodings for the invoked Ruby program: + +```console +# No option -E. +$ ruby -e 'p [Encoding::default_external, Encoding::default_internal]' +[#<Encoding:UTF-8>, nil] +# Option -E with default external encoding. +$ ruby -E cesu-8 -e 'p [Encoding::default_external, Encoding::default_internal]' +[#<Encoding:CESU-8>, nil] +# Option -E with default external and internal encodings. +$ ruby -E utf-8:cesu-8 -e 'p [Encoding::default_external, Encoding::default_internal]' +[#<Encoding:UTF-8>, #<Encoding:CESU-8>] +``` + +Whitespace between the option and its argument may be omitted. + +See also: + +- [Option `--external-encoding`][--external-encoding]: + Set default external encoding. +- [Option `--internal-encoding`][--internal-encoding]: + Set default internal encoding. + +Option `--encoding` is an alias for option `-E`. + +### `-F`: Set Input Field Separator + +Option `-F`, when given with option `-a`, +specifies that its argument is to be the input field separator to be used for splitting: + +```console +$ ruby -an -Fs -e 'p $F' desiderata.txt +["Go placidly amid the noi", "e and the ha", "te,\n"] +["and remember what peace there may be in ", "ilence.\n"] +["A", " far a", " po", "", "ible, without ", "urrender,\n"] +["be on good term", " with all per", "on", ".\n"] +``` + +The argument may be a regular expression: + +```console +$ ruby -an -F'[.,]\s*' -e 'p $F' desiderata.txt +["Go placidly amid the noise and the haste"] +["and remember what peace there may be in silence"] +["As far as possible", "without surrender"] +["be on good terms with all persons"] +``` + +The argument must immediately follow the option +(no intervening whitespace or equal-sign character `'='`). + +See also: + +- [Option `-0`][-0]: + Set `$/` (input record separator). +- [Option `-a`][-a]: + Split input lines into fields. +- [Option `-l`][-l]: + Set output record separator; chop lines. +- [Option `-n`][-n]: + Run program in `gets` loop. +- [Option `-p`][-p]: + `-n`, with printing. + +### `-h`: Print Short Help Message + +Option `-h` prints a short help message +that includes single-hyphen options (e.g. `-I`), +and largely omits double-hyphen options (e.g., `--version`). + +Arguments and additional options are ignored. + +For a longer help message, use option `--help`. + +### `-i`: Set \ARGF In-Place Mode + +Option `-i` sets the \ARGF in-place mode for the invoked Ruby program; +see ARGF#inplace_mode=: + +```console +$ ruby -e 'p ARGF.inplace_mode' +nil +$ ruby -i -e 'p ARGF.inplace_mode' +"" +$ ruby -i.bak -e 'p ARGF.inplace_mode' +".bak" +``` + +### `-I`: Add to `$LOAD_PATH` + +The argument to option `-I` specifies a directory +to be added to the array in global variable `$LOAD_PATH`; +the option may be given more than once: + +```console +$ pushd /tmp +$ ruby -e 'p $LOAD_PATH.size' +8 +$ ruby -I my_lib -I some_lib -e 'p $LOAD_PATH.size' +10 +$ ruby -I my_lib -I some_lib -e 'p $LOAD_PATH.take(2)' +["/tmp/my_lib", "/tmp/some_lib"] +$ popd +``` + +This option and [option `-C`][-C] will +be applied in the order in the command line; expansion of `-I` options +are affected by preceeding `-C` options. + +```console +$ ruby -C / -Ilib -C usr -Ilib -e 'puts $:[0, 2]' +/lib +/usr/lib +``` + +Whitespace between the option and its argument may be omitted. + +### `-l`: Set Output Record Separator; Chop Lines + +Option `-l`, when given with option `-n` or `-p`, +modifies line-ending processing by: + +- Setting global variable output record separator `$\` + to the current value of input record separator `$/`; + this affects line-oriented output (such a the output from Kernel#puts). +- Calling String#chop! on each line read. + +Without option `-l` (unchopped): + +```console +$ ruby -n -e 'p $_' desiderata.txt +"Go placidly amid the noise and the haste,\n" +"and remember what peace there may be in silence.\n" +"As far as possible, without surrender,\n" +"be on good terms with all persons.\n" +``` + +With option `-l` (chopped): + +```console +$ ruby -ln -e 'p $_' desiderata.txt +"Go placidly amid the noise and the haste," +"and remember what peace there may be in silence." +"As far as possible, without surrender," +"be on good terms with all persons." +``` + +See also: + +- [Option `-0`][-0]: + Set `$/` (input record separator). +- [Option `-a`][-a]: + Split input lines into fields. +- [Option `-F`][-F]: + Set input field separator. +- [Option `-n`][-n]: + Run program in `gets` loop. +- [Option `-p`][-p]: + `-n`, with printing. + +### `-n`: Run Program in `gets` Loop + +Option `-n` runs your program in a `Kernel#gets` loop: + +```ruby +while gets + # Your Ruby code. +end +``` + +Note that `gets` reads the next line and sets global variable `$_` +to the last read line: + +```console +$ ruby -n -e 'puts $_' desiderata.txt +Go placidly amid the noise and the haste, +and remember what peace there may be in silence. +As far as possible, without surrender, +be on good terms with all persons. +``` + +See also: + +- [Option `-0`][-0]: + Set `$/` (input record separator). +- [Option `-a`][-a]: + Split input lines into fields. +- [Option `-F`][-F]: + Set input field separator. +- [Option `-l`][-l]: + Set output record separator; chop lines. +- [Option `-p`][-p]: + `-n`, with printing. + +### `-p`: `-n`, with Printing + +Option `-p` is like option `-n`, but also prints each line: + +```console +$ ruby -p -e 'puts $_.size' desiderata.txt +42 +Go placidly amid the noise and the haste, +49 +and remember what peace there may be in silence. +39 +As far as possible, without surrender, +35 +be on good terms with all persons. +``` + +See also: + +- [Option `-0`][-0]: + Set `$/` (input record separator). +- [Option `-a`][-a]: + Split input lines into fields. +- [Option `-F`][-F]: + Set input field separator. +- [Option `-l`][-l]: + Set output record separator; chop lines. +- [Option `-n`][-n]: + Run program in `gets` loop. + +### `-r`: Require Library + +The argument to option `-r` specifies a library to be required +before executing the Ruby program; +the option may be given more than once: + +```console +$ ruby -e 'p defined?(JSON); p defined?(CSV)' +nil +nil +$ ruby -r csv -r json -e 'p defined?(JSON); p defined?(CSV)' +"constant" +"constant" +``` + +The library is loaded with the `Kernel#require` method, after the +other options such as [`-C`][-C], [`-I`][-I], and "custom options" by +[`-s`][-s], are applied: + +Whitespace between the option and its argument may be omitted. + +### `-s`: Define Global Variable + +Option `-s` specifies that a "custom option" is to define a global variable +in the invoked Ruby program: + +- The custom option must appear _after_ the program name. +- If there is no script name in the command line (using {option + -e}[rdoc-ref:@-e+Execute+Given+Ruby+Code] or implicit reading from + `$stdin`), the custom options must be separated from the other + interpreter options with a `--`. +- The custom option must begin with single hyphen (e.g., `-foo`), + not two hyphens (e.g., `--foo`). +- The name of the global variable is based on the option name: + global variable `$foo` for custom option`-foo`. +- The value of the global variable is the string option argument if given, + `true` otherwise. + +More than one custom option may be given: + +```console +$ cat t.rb +p [$foo, $bar] +$ ruby t.rb +[nil, nil] +$ ruby -s t.rb -foo=baz +["baz", nil] +$ ruby -s t.rb -foo +[true, nil] +$ ruby -s t.rb -foo=baz -bar=bat +["baz", "bat"] +``` + +### `-S`: Search Directories in `ENV['PATH']` + +Option `-S` specifies that the Ruby interpreter +is to search (if necessary) the directories whose paths are in the program's +`PATH` environment variable; +the program is executed in the shell's current working directory +(not necessarily in the directory where the program is found). + +This example uses adds path `'tmp/'` to the `PATH` environment variable: + +```console +$ export PATH=/tmp:$PATH +$ echo "puts File.basename(Dir.pwd)" > /tmp/t.rb +$ ruby -S t.rb +ruby +``` + +### `-v`: Print Version; Set `$VERBOSE` + +Options `-v` prints the Ruby version and sets global variable `$VERBOSE`: + +```console +$ ruby -e 'p $VERBOSE' +false +$ ruby -v -e 'p $VERBOSE' +ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x64-mingw-ucrt] +true +``` + +### `-w`: Synonym for `-W1` + +Option `-w` (lowercase letter) is equivalent to option `-W1` (uppercase letter). + +### `-W`: Set \Warning Policy + +Any Ruby code can create a <i>warning message</i> by calling method Kernel#warn; +methods in the Ruby core and standard libraries can also create warning messages. +Such a message may be printed on `$stderr` +(or not, depending on certain settings). + +Option `-W` helps determine whether a particular warning message +will be written, +by setting the initial value of global variable `$-W`: + +- `-W0`: Sets `$-W` to `0` (silent; no warnings). +- `-W1`: Sets `$-W` to `1` (moderate verbosity). +- `-W2`: Sets `$-W` to `2` (high verbosity). +- `-W`: Same as `-W2` (high verbosity). +- Option not given: Same as `-W1` (moderate verbosity). + +The value of `$-W`, in turn, determines which warning messages (if any) +are to be printed to `$stdout` (see Kernel#warn): + +```console +$ ruby -W1 -e 'p $foo' +nil +$ ruby -W2 -e 'p $foo' +-e:1: warning: global variable '$foo' not initialized +nil +``` + +Ruby code may also define warnings for certain categories; +these are the default settings for the defined categories: + +```rb +Warning[:experimental] # => true +Warning[:deprecated] # => false +Warning[:performance] # => false +``` + +They may also be set: + +```rb +Warning[:experimental] = false +Warning[:deprecated] = true +Warning[:performance] = true +``` + +You can suppress a category by prefixing `no-` to the category name: + +```console +$ ruby -W:no-experimental -e 'p IO::Buffer.new' +#<IO::Buffer> +``` + +### `-x`: Execute Ruby Code Found in Text + +Option `-x` executes a Ruby program whose code is embedded +in other, non-code, text: + +The ruby code: + +- Begins after the first line beginning with `'#!` and containing string `'ruby'`. +- Ends before any one of: + + - End-of-file. + - A line consisting of `'__END__'`, + - Character `Ctrl-D` or `Ctrl-Z`. + +Example: + +```console +$ cat t.txt +Leading garbage. +#!ruby +puts File.basename(Dir.pwd) +__END__ +Trailing garbage. + +$ ruby -x t.txt +ruby +``` + +The optional argument specifies the directory where the text file +is to be found; +the Ruby code is executed in that directory: + +```console +$ cp t.txt /tmp/ +$ ruby -x/tmp t.txt +tmp +$ + +``` + +If an argument is given, it must immediately follow the option +(no intervening whitespace or equal-sign character `'='`). + +### `--backtrace-limit`: Set Backtrace Limit + +Option `--backtrace-limit` sets a limit on the number of entries +to be displayed in a backtrace. + +See Thread::Backtrace.limit. + +### `--copyright`: Print Ruby Copyright + +Option `--copyright` prints a copyright message: + +```console +$ ruby --copyright +ruby - Copyright (C) 1993-2024 Yukihiro Matsumoto +``` + +### `--debug`: Alias for `-d` + +Option `--debug` is an alias for +[option `-d`][-d]. + +### `--disable`: Disable Features + +Option `--disable` specifies features to be disabled; +the argument is a comma-separated list of the features to be disabled: + +```sh +ruby --disable=gems,rubyopt t.rb +``` + +The supported features: + +- `gems`: Rubygems (default: enabled). +- `did_you_mean`: [`did_you_mean`](https://github.com/ruby/did_you_mean) (default: enabled). +- `rubyopt`: `RUBYOPT` environment variable (default: enabled). +- `frozen-string-literal`: Freeze all string literals (default: disabled). +- `jit`: JIT compiler (default: disabled). + +See also [option `--enable`][--enable]. + +### `--dump`: Dump Items + +Option `--dump` specifies items to be dumped; +the argument is a comma-separated list of the items. + +Some of the argument values cause the command to behave as if a different +option was given: + +- `--dump=copyright`: + Same as [option `--copyright`][--copyright]. +- `--dump=help`: + Same as [option `--help`][--help]. +- `--dump=syntax`: + Same as [option `-c`][-c]. +- `--dump=usage`: + Same as [option `-h`][-h]. +- `--dump=version`: + Same as [option `--version`][--version]. + +For other argument values and examples, +see {Option `--dump`}[rdoc-ref:option_dump.md]. + +### `--enable`: Enable Features + +Option `--enable` specifies features to be enabled; +the argument is a comma-separated list of the features to be enabled. + +```sh +ruby --enable=gems,rubyopt t.rb +``` + +For the features, +see [option `--disable`][--disable]. + +### `--encoding`: Alias for `-E`. + +Option `--encoding` is an alias for +[option `-E`][-E]. + +### `--external-encoding`: Set Default External \Encoding + +Option `--external-encoding` +sets the default external encoding for the invoked Ruby program; +for values of `encoding`, +see [Encoding: Names and Aliases]. + +```console +$ ruby -e 'puts Encoding::default_external' +UTF-8 +$ ruby --external-encoding=cesu-8 -e 'puts Encoding::default_external' +CESU-8 +``` + +### `--help`: Print Help Message + +Option `--help` prints a long help message. + +Arguments and additional options are ignored. + +For a shorter help message, use option `-h`. + +### `--internal-encoding`: Set Default Internal \Encoding + +Option `--internal-encoding` +sets the default internal encoding for the invoked Ruby program; +for values of `encoding`, +see [Encoding: Names and Aliases]. + +```console +$ ruby -e 'puts Encoding::default_internal.nil?' +true +$ ruby --internal-encoding=cesu-8 -e 'puts Encoding::default_internal' +CESU-8 +``` + +### `--jit` + +Option `--jit` is an alias for option `--yjit`, which enables YJIT; +see additional YJIT options in the [YJIT documentation](rdoc-ref:jit/yjit.md). + +### `--verbose`: Set `$VERBOSE` + +Option `--verbose` sets global variable `$VERBOSE` to `true` +and disables input from `$stdin`. + +### `--version`: Print Ruby Version + +Option `--version` prints the version of the Ruby interpreter, then exits. + +[-0]: rdoc-ref:@-0+Set++Input+Record+Separator +[-C]: rdoc-ref:@-C+Set+Working+Directory +[-E]: rdoc-ref:@-E+Set+Default+Encodings +[-F]: rdoc-ref:@-F+Set+Input+Field+Separator +[-I]: rdoc-ref:@-I+Add+to+LOADPATH +[-a]: rdoc-ref:@-a+Split+Input+Lines+into+Fields +[-c]: rdoc-ref:@-c+Check+Syntax +[-d]: rdoc-ref:@-d+Set+DEBUG+to+true +[-e]: rdoc-ref:@-e+Execute+Given+Ruby+Code +[-h]: rdoc-ref:@-h+Print+Short+Help+Message +[-l]: rdoc-ref:@-l+Set+Output+Record+Separator+Chop+Lines +[-n]: rdoc-ref:@-n+Run+Program+in+gets+Loop +[-p]: rdoc-ref:@-p+-n+with+Printing +[-s]: rdoc-ref:@-s+Define+Global+Variable +[--copyright]: rdoc-ref:@--copyright+Print+Ruby+Copyright +[--debug]: rdoc-ref:@--debug+Alias+for+-d +[--disable]: rdoc-ref:@--disable+Disable+Features +[--enable]: rdoc-ref:@--enable+Enable+Features +[--external-encoding]: rdoc-ref:@--external+encoding+Set+Default+External+Encoding +[--internal-encoding]: rdoc-ref:@--internal+encoding+Set+Default+Internal+Encoding +[--help]: rdoc-ref:@--help+Print+Help+Message +[--version]: rdoc-ref:@--version+Print+Ruby+Version +[Encoding: Names and Aliases]: rdoc-ref:encodings.rdoc@Names+and+Aliases diff --git a/doc/language/packed_data.rdoc b/doc/language/packed_data.rdoc new file mode 100644 index 0000000000..0c84113643 --- /dev/null +++ b/doc/language/packed_data.rdoc @@ -0,0 +1,729 @@ += Packed \Data + +== Quick Reference + +These tables summarize the directives for packing and unpacking. + +=== For Integers + + Directive | Meaning + --------------|--------------------------------------------------------------- + C | 8-bit unsigned (unsigned char) + S | 16-bit unsigned, native endian (uint16_t) + L | 32-bit unsigned, native endian (uint32_t) + Q | 64-bit unsigned, native endian (uint64_t) + J | pointer width unsigned, native endian (uintptr_t) + + c | 8-bit signed (signed char) + s | 16-bit signed, native endian (int16_t) + l | 32-bit signed, native endian (int32_t) + q | 64-bit signed, native endian (int64_t) + j | pointer width signed, native endian (intptr_t) + + S_ S! | unsigned short, native endian + I I_ I! | unsigned int, native endian + L_ L! | unsigned long, native endian + Q_ Q! | unsigned long long, native endian + | (raises ArgumentError if the platform has no long long type) + J! | uintptr_t, native endian (same with J) + + s_ s! | signed short, native endian + i i_ i! | signed int, native endian + l_ l! | signed long, native endian + q_ q! | signed long long, native endian + | (raises ArgumentError if the platform has no long long type) + j! | intptr_t, native endian (same with j) + + S> s> S!> s!> | each the same as the directive without >, but big endian + L> l> L!> l!> | S> is the same as n + I!> i!> | L> is the same as N + Q> q> Q!> q!> | + J> j> J!> j!> | + + S< s< S!< s!< | each the same as the directive without <, but little endian + L< l< L!< l!< | S< is the same as v + I!< i!< | L< is the same as V + Q< q< Q!< q!< | + J< j< J!< j!< | + + n | 16-bit unsigned, network (big-endian) byte order + N | 32-bit unsigned, network (big-endian) byte order + v | 16-bit unsigned, VAX (little-endian) byte order + V | 32-bit unsigned, VAX (little-endian) byte order + + U | UTF-8 character + w | BER-compressed integer + R | LEB128 encoded unsigned integer + r | LEB128 encoded signed integer + +=== For Floats + + Directive | Meaning + ----------|-------------------------------------------------- + D d | double-precision, native format + F f | single-precision, native format + E | double-precision, little-endian byte order + e | single-precision, little-endian byte order + G | double-precision, network (big-endian) byte order + g | single-precision, network (big-endian) byte order + +=== For Strings + + Directive | Meaning + ----------|----------------------------------------------------------------- + A | arbitrary binary string (remove trailing nulls and ASCII spaces) + a | arbitrary binary string + Z | null-terminated string + B | bit string (MSB first) + b | bit string (LSB first) + H | hex string (high nibble first) + h | hex string (low nibble first) + u | UU-encoded string + M | quoted-printable, MIME encoding (see RFC2045) + m | base64 encoded string (RFC 2045) (default) + | (base64 encoded string (RFC 4648) if followed by 0) + P | pointer to a structure (fixed-length string) + p | pointer to a null-terminated string + +=== Additional Directives for Packing + + Directive | Meaning + ----------|---------------------------------------------------------------- + @ | moves to absolute position + X | back up a byte + x | null byte + +=== Additional Directives for Unpacking + + Directive | Meaning + ----------|---------------------------------------------------------------- + @ | skip to the offset given by the length argument + X | skip backward one byte + x | skip forward one byte + ^ | return the current offset + +== Packing and Unpacking + +Certain Ruby core methods deal with packing and unpacking data: + +- Method Array#pack: + Formats each element in array +self+ into a binary string; + returns that string. +- Method String#unpack: + Extracts data from string +self+, + forming objects that become the elements of a new array; + returns that array. +- Method String#unpack1: + Does the same, but unpacks and returns only the first extracted object. + +Each of these methods accepts a string +template+, +consisting of zero or more _directive_ characters, +each followed by zero or more _modifier_ characters. + +Examples (directive <tt>'C'</tt> specifies 'unsigned character'): + + [65].pack('C') # => "A" # One element, one directive. + [65, 66].pack('CC') # => "AB" # Two elements, two directives. + [65, 66].pack('C') # => "A" # Extra element is ignored. + [65].pack('') # => "" # No directives. + [65].pack('CC') # Extra directive raises ArgumentError. + + 'A'.unpack('C') # => [65] # One character, one directive. + 'AB'.unpack('CC') # => [65, 66] # Two characters, two directives. + 'AB'.unpack('C') # => [65] # Extra character is ignored. + 'A'.unpack('CC') # => [65, nil] # Extra directive generates nil. + 'AB'.unpack('') # => [] # No directives. + +The string +template+ may contain any mixture of valid directives +(directive <tt>'c'</tt> specifies 'signed character'): + + [65, -1].pack('cC') # => "A\xFF" + "A\xFF".unpack('cC') # => [65, 255] + +The string +template+ may contain whitespace (which is ignored) +and comments, each of which begins with character <tt>'#'</tt> +and continues up to and including the next following newline: + + [0,1].pack(" C #foo \n C ") # => "\x00\x01" + "\0\1".unpack(" C #foo \n C ") # => [0, 1] + +Any directive may be followed by either of these modifiers: + +- <tt>'*'</tt> - The directive is to be applied as many times as needed: + + [65, 66].pack('C*') # => "AB" + 'AB'.unpack('C*') # => [65, 66] + +- \Integer +count+ - The directive is to be applied +count+ times: + + [65, 66].pack('C2') # => "AB" + [65, 66].pack('C3') # Raises ArgumentError. + 'AB'.unpack('C2') # => [65, 66] + 'AB'.unpack('C3') # => [65, 66, nil] + + Note: Directives in <tt>%w[A a Z m]</tt> use +count+ differently; + see {\String Directives}[rdoc-ref:@String+Directives]. + +If elements don't fit the provided directive, only least significant bits are encoded: + + [257].pack("C").unpack("C") # => [1] + +== Packing Method + +Method Array#pack accepts optional keyword argument ++buffer+ that specifies the target string (instead of a new string): + + [65, 66].pack('C*', buffer: 'foo') # => "fooAB" + +The method can accept a block: + + # Packed string is passed to the block. + [65, 66].pack('C*') {|s| p s } # => "AB" + +== Unpacking Methods + +Methods String#unpack and String#unpack1 each accept +an optional keyword argument +offset+ that specifies an offset +into the string: + + 'ABC'.unpack('C*', offset: 1) # => [66, 67] + 'ABC'.unpack1('C*', offset: 1) # => 66 + +Both methods can accept a block: + + # Each unpacked object is passed to the block. + ret = [] + "ABCD".unpack("C*") {|c| ret << c } + ret # => [65, 66, 67, 68] + + # The single unpacked object is passed to the block. + 'AB'.unpack1('C*') {|ele| p ele } # => 65 + +== \Integer Directives + +Each integer directive specifies the packing or unpacking +for one element in the input or output array. + +=== 8-Bit \Integer Directives + +- <tt>'c'</tt> - 8-bit signed integer + (like C <tt>signed char</tt>): + + [0, 1, 255].pack('c*') # => "\x00\x01\xFF" + s = [0, 1, -1].pack('c*') # => "\x00\x01\xFF" + s.unpack('c*') # => [0, 1, -1] + +- <tt>'C'</tt> - 8-bit unsigned integer + (like C <tt>unsigned char</tt>): + + [0, 1, 255].pack('C*') # => "\x00\x01\xFF" + s = [0, 1, -1].pack('C*') # => "\x00\x01\xFF" + s.unpack('C*') # => [0, 1, 255] + +=== 16-Bit \Integer Directives + +- <tt>'s'</tt> - 16-bit signed integer, native-endian + (like C <tt>int16_t</tt>): + + [513, -514].pack('s*') # => "\x01\x02\xFE\xFD" + s = [513, 65022].pack('s*') # => "\x01\x02\xFE\xFD" + s.unpack('s*') # => [513, -514] + +- <tt>'S'</tt> - 16-bit unsigned integer, native-endian + (like C <tt>uint16_t</tt>): + + [513, -514].pack('S*') # => "\x01\x02\xFE\xFD" + s = [513, 65022].pack('S*') # => "\x01\x02\xFE\xFD" + s.unpack('S*') # => [513, 65022] + +- <tt>'n'</tt> - 16-bit network integer, big-endian: + + s = [0, 1, -1, 32767, -32768, 65535].pack('n*') + # => "\x00\x00\x00\x01\xFF\xFF\x7F\xFF\x80\x00\xFF\xFF" + s.unpack('n*') + # => [0, 1, 65535, 32767, 32768, 65535] + +- <tt>'v'</tt> - 16-bit VAX integer, little-endian: + + s = [0, 1, -1, 32767, -32768, 65535].pack('v*') + # => "\x00\x00\x01\x00\xFF\xFF\xFF\x7F\x00\x80\xFF\xFF" + s.unpack('v*') + # => [0, 1, 65535, 32767, 32768, 65535] + +=== 32-Bit \Integer Directives + +- <tt>'l'</tt> - 32-bit signed integer, native-endian + (like C <tt>int32_t</tt>): + + s = [67305985, -50462977].pack('l*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('l*') + # => [67305985, -50462977] + +- <tt>'L'</tt> - 32-bit unsigned integer, native-endian + (like C <tt>uint32_t</tt>): + + s = [67305985, 4244504319].pack('L*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('L*') + # => [67305985, 4244504319] + +- <tt>'N'</tt> - 32-bit network integer, big-endian: + + s = [0,1,-1].pack('N*') + # => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF" + s.unpack('N*') + # => [0, 1, 4294967295] + +- <tt>'V'</tt> - 32-bit VAX integer, little-endian: + + s = [0,1,-1].pack('V*') + # => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF" + s.unpack('v*') + # => [0, 0, 1, 0, 65535, 65535] + +=== 64-Bit \Integer Directives + +- <tt>'q'</tt> - 64-bit signed integer, native-endian + (like C <tt>int64_t</tt>): + + s = [578437695752307201, -506097522914230529].pack('q*') + # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" + s.unpack('q*') + # => [578437695752307201, -506097522914230529] + +- <tt>'Q'</tt> - 64-bit unsigned integer, native-endian + (like C <tt>uint64_t</tt>): + + s = [578437695752307201, 17940646550795321087].pack('Q*') + # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" + s.unpack('Q*') + # => [578437695752307201, 17940646550795321087] + +=== Platform-Dependent \Integer Directives + +- <tt>'i'</tt> - Platform-dependent width signed integer, + native-endian (like C <tt>int</tt>): + + s = [67305985, -50462977].pack('i*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('i*') + # => [67305985, -50462977] + +- <tt>'I'</tt> - Platform-dependent width unsigned integer, + native-endian (like C <tt>unsigned int</tt>): + + s = [67305985, -50462977].pack('I*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('I*') + # => [67305985, 4244504319] + +- <tt>'j'</tt> - Pointer-width signed integer, native-endian + (like C <tt>intptr_t</tt>): + + s = [67305985, -50462977].pack('j*') + # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF" + s.unpack('j*') + # => [67305985, -50462977] + +- <tt>'J'</tt> - Pointer-width unsigned integer, native-endian + (like C <tt>uintptr_t</tt>): + + s = [67305985, 4244504319].pack('J*') + # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00" + s.unpack('J*') + # => [67305985, 4244504319] + +=== Other \Integer Directives + +- <tt>'U'</tt> - UTF-8 character: + + s = [4194304].pack('U*') + # => "\xF8\x90\x80\x80\x80" + s.unpack('U*') + # => [4194304] + +- <tt>'r'</tt> - Signed LEB128-encoded integer + (see {Signed LEB128}[https://en.wikipedia.org/wiki/LEB128#Signed_LEB128]) + + s = [1, 127, -128, 16383, -16384].pack("r*") + # => "\x01\xFF\x00\x80\x7F\xFF\xFF\x00\x80\x80\x7F" + s.unpack('r*') + # => [1, 127, -128, 16383, -16384] + +- <tt>'R'</tt> - Unsigned LEB128-encoded integer + (see {Unsigned LEB128}[https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128]) + + s = [1, 127, 128, 16383, 16384].pack("R*") + # => "\x01\x7F\x80\x01\xFF\x7F\x80\x80\x01" + s.unpack('R*') + # => [1, 127, 128, 16383, 16384] + +- <tt>'w'</tt> - BER-encoded integer + (see {BER encoding}[https://en.wikipedia.org/wiki/X.690#BER_encoding]): + + s = [1073741823].pack('w*') + # => "\x83\xFF\xFF\xFF\x7F" + s.unpack('w*') + # => [1073741823] + +=== Modifiers for \Integer Directives + +For the following directives, <tt>'!'</tt> or <tt>'_'</tt> modifiers may be +suffixed as underlying platform’s native size. + +- <tt>'i'</tt>, <tt>'I'</tt> - C <tt>int</tt>, always native size. +- <tt>'s'</tt>, <tt>'S'</tt> - C <tt>short</tt>. +- <tt>'l'</tt>, <tt>'L'</tt> - C <tt>long</tt>. +- <tt>'q'</tt>, <tt>'Q'</tt> - C <tt>long long</tt>, if available. +- <tt>'j'</tt>, <tt>'J'</tt> - C <tt>intptr_t</tt>, always native size. + +Native size modifiers are silently ignored for always native size directives. + +The endian modifiers also may be suffixed in the directives above: + +- <tt>'>'</tt> - Big-endian. +- <tt>'<'</tt> - Little-endian. + +== \Float Directives + +Each float directive specifies the packing or unpacking +for one element in the input or output array. + +=== Single-Precision \Float Directives + +- <tt>'F'</tt> or <tt>'f'</tt> - Native format: + + s = [3.0].pack('F') # => "\x00\x00@@" + s.unpack('F') # => [3.0] + +- <tt>'e'</tt> - Little-endian: + + s = [3.0].pack('e') # => "\x00\x00@@" + s.unpack('e') # => [3.0] + +- <tt>'g'</tt> - Big-endian: + + s = [3.0].pack('g') # => "@@\x00\x00" + s.unpack('g') # => [3.0] + +=== Double-Precision \Float Directives + +- <tt>'D'</tt> or <tt>'d'</tt> - Native format: + + s = [3.0].pack('D') # => "\x00\x00\x00\x00\x00\x00\b@" + s.unpack('D') # => [3.0] + +- <tt>'E'</tt> - Little-endian: + + s = [3.0].pack('E') # => "\x00\x00\x00\x00\x00\x00\b@" + s.unpack('E') # => [3.0] + +- <tt>'G'</tt> - Big-endian: + + s = [3.0].pack('G') # => "@\b\x00\x00\x00\x00\x00\x00" + s.unpack('G') # => [3.0] + +A float directive may be infinity or not-a-number: + + inf = 1.0/0.0 # => Infinity + [inf].pack('f') # => "\x00\x00\x80\x7F" + "\x00\x00\x80\x7F".unpack('f') # => [Infinity] + + nan = inf/inf # => NaN + [nan].pack('f') # => "\x00\x00\xC0\x7F" + "\x00\x00\xC0\x7F".unpack('f') # => [NaN] + +== \String Directives + +Each string directive specifies the packing or unpacking +for one byte in the input or output string. + +=== Binary \String Directives + +- <tt>'A'</tt> - Arbitrary binary string (space padded; count is width); + +nil+ is treated as the empty string: + + ['foo'].pack('A') # => "f" + ['foo'].pack('A*') # => "foo" + ['foo'].pack('A2') # => "fo" + ['foo'].pack('A4') # => "foo " + [nil].pack('A') # => " " + [nil].pack('A*') # => "" + [nil].pack('A2') # => " " + [nil].pack('A4') # => " " + + "foo\0".unpack('A') # => ["f"] + "foo\0".unpack('A4') # => ["foo"] + "foo\0bar".unpack('A10') # => ["foo\x00bar"] # Reads past "\0". + "foo ".unpack('A') # => ["f"] + "foo ".unpack('A4') # => ["foo"] + "foo".unpack('A4') # => ["foo"] + + japanese = 'こんにちは' + japanese.size # => 5 + japanese.bytesize # => 15 + [japanese].pack('A') # => "\xE3" + [japanese].pack('A*') # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF" + japanese.unpack('A') # => ["\xE3"] + japanese.unpack('A2') # => ["\xE3\x81"] + japanese.unpack('A4') # => ["\xE3\x81\x93\xE3"] + japanese.unpack('A*') # => ["\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF"] + +- <tt>'a'</tt> - Arbitrary binary string (null padded; count is width): + + ["foo"].pack('a') # => "f" + ["foo"].pack('a*') # => "foo" + ["foo"].pack('a2') # => "fo" + ["foo\0"].pack('a4') # => "foo\x00" + [nil].pack('a') # => "\x00" + [nil].pack('a*') # => "" + [nil].pack('a2') # => "\x00\x00" + [nil].pack('a4') # => "\x00\x00\x00\x00" + + "foo\0".unpack('a') # => ["f"] + "foo\0".unpack('a4') # => ["foo\x00"] + "foo ".unpack('a4') # => ["foo "] + "foo".unpack('a4') # => ["foo"] + "foo\0bar".unpack('a4') # => ["foo\x00"] # Reads past "\0". + +- <tt>'Z'</tt> - Same as <tt>'a'</tt>, + except that null is added or ignored with <tt>'*'</tt>: + + ["foo"].pack('Z*') # => "foo\x00" + [nil].pack('Z*') # => "\x00" + + "foo\0".unpack('Z*') # => ["foo"] + "foo".unpack('Z*') # => ["foo"] + "foo\0bar".unpack('Z*') # => ["foo"] # Does not read past "\0". + +=== Bit \String Directives + +- <tt>'B'</tt> - Bit string (high byte first): + + ['11111111' + '00000000'].pack('B*') # => "\xFF\x00" + ['10000000' + '01000000'].pack('B*') # => "\x80@" + + ['1'].pack('B0') # => "" + ['1'].pack('B1') # => "\x80" + ['1'].pack('B2') # => "\x80\x00" + ['1'].pack('B3') # => "\x80\x00" + ['1'].pack('B4') # => "\x80\x00\x00" + ['1'].pack('B5') # => "\x80\x00\x00" + ['1'].pack('B6') # => "\x80\x00\x00\x00" + + "\xff\x00".unpack("B*") # => ["1111111100000000"] + "\x01\x02".unpack("B*") # => ["0000000100000010"] + + "".unpack("B0") # => [""] + "\x80".unpack("B1") # => ["1"] + "\x80".unpack("B2") # => ["10"] + "\x80".unpack("B3") # => ["100"] + +- <tt>'b'</tt> - Bit string (low byte first): + + ['11111111' + '00000000'].pack('b*') # => "\xFF\x00" + ['10000000' + '01000000'].pack('b*') # => "\x01\x02" + + ['1'].pack('b0') # => "" + ['1'].pack('b1') # => "\x01" + ['1'].pack('b2') # => "\x01\x00" + ['1'].pack('b3') # => "\x01\x00" + ['1'].pack('b4') # => "\x01\x00\x00" + ['1'].pack('b5') # => "\x01\x00\x00" + ['1'].pack('b6') # => "\x01\x00\x00\x00" + + "\xff\x00".unpack("b*") # => ["1111111100000000"] + "\x01\x02".unpack("b*") # => ["1000000001000000"] + + "".unpack("b0") # => [""] + "\x01".unpack("b1") # => ["1"] + "\x01".unpack("b2") # => ["10"] + "\x01".unpack("b3") # => ["100"] + +=== Hex \String Directives + +- <tt>'H'</tt> - Hex string (high nibble first): + + ['10ef'].pack('H*') # => "\x10\xEF" + ['10ef'].pack('H0') # => "" + ['10ef'].pack('H3') # => "\x10\xE0" + ['10ef'].pack('H5') # => "\x10\xEF\x00" + + ['fff'].pack('H3') # => "\xFF\xF0" + ['fff'].pack('H4') # => "\xFF\xF0" + ['fff'].pack('H5') # => "\xFF\xF0\x00" + ['fff'].pack('H6') # => "\xFF\xF0\x00" + ['fff'].pack('H7') # => "\xFF\xF0\x00\x00" + ['fff'].pack('H8') # => "\xFF\xF0\x00\x00" + + "\x10\xef".unpack('H*') # => ["10ef"] + "\x10\xef".unpack('H0') # => [""] + "\x10\xef".unpack('H1') # => ["1"] + "\x10\xef".unpack('H2') # => ["10"] + "\x10\xef".unpack('H3') # => ["10e"] + "\x10\xef".unpack('H4') # => ["10ef"] + "\x10\xef".unpack('H5') # => ["10ef"] + +- <tt>'h'</tt> - Hex string (low nibble first): + + ['10ef'].pack('h*') # => "\x01\xFE" + ['10ef'].pack('h0') # => "" + ['10ef'].pack('h3') # => "\x01\x0E" + ['10ef'].pack('h5') # => "\x01\xFE\x00" + + ['fff'].pack('h3') # => "\xFF\x0F" + ['fff'].pack('h4') # => "\xFF\x0F" + ['fff'].pack('h5') # => "\xFF\x0F\x00" + ['fff'].pack('h6') # => "\xFF\x0F\x00" + ['fff'].pack('h7') # => "\xFF\x0F\x00\x00" + ['fff'].pack('h8') # => "\xFF\x0F\x00\x00" + + "\x01\xfe".unpack('h*') # => ["10ef"] + "\x01\xfe".unpack('h0') # => [""] + "\x01\xfe".unpack('h1') # => ["1"] + "\x01\xfe".unpack('h2') # => ["10"] + "\x01\xfe".unpack('h3') # => ["10e"] + "\x01\xfe".unpack('h4') # => ["10ef"] + "\x01\xfe".unpack('h5') # => ["10ef"] + +=== Pointer \String Directives + +- <tt>'P'</tt> - Pointer to a structure (fixed-length string): + + s = ['abc'].pack('P') # => "\xE0O\x7F\xE5\xA1\x01\x00\x00" + s.unpack('P*') # => ["abc"] + ".".unpack("P") # => [] + ("\0" * 8).unpack("P") # => [nil] + [nil].pack("P") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + +- <tt>'p'</tt> - Pointer to a null-terminated string: + + s = ['abc'].pack('p') # => "(\xE4u\xE5\xA1\x01\x00\x00" + s.unpack('p*') # => ["abc"] + ".".unpack("p") # => [] + ("\0" * 8).unpack("p") # => [nil] + [nil].pack("p") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + +=== Other \String Directives + +- <tt>'M'</tt> - Quoted printable, MIME encoding; + text mode, but input must use LF and output LF; + (see {RFC 2045}[https://www.ietf.org/rfc/rfc2045.txt]): + + ["a b c\td \ne"].pack('M') # => "a b c\td =\n\ne=\n" + ["\0"].pack('M') # => "=00=\n" + + ["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n" # => true + ("a"*73+"=\na=\n").unpack('M') == ["a"*74] # => true + (("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023] # => true + + "a b c\td =\n\ne=\n".unpack('M') # => ["a b c\td \ne"] + "=00=\n".unpack('M') # => ["\x00"] + + "pre=31=32=33after".unpack('M') # => ["pre123after"] + "pre=\nafter".unpack('M') # => ["preafter"] + "pre=\r\nafter".unpack('M') # => ["preafter"] + "pre=".unpack('M') # => ["pre="] + "pre=\r".unpack('M') # => ["pre=\r"] + "pre=hoge".unpack('M') # => ["pre=hoge"] + "pre==31after".unpack('M') # => ["pre==31after"] + "pre===31after".unpack('M') # => ["pre===31after"] + +- <tt>'m'</tt> - Base64 encoded string; + count specifies input bytes between each newline, + rounded down to nearest multiple of 3; + if count is zero, no newlines are added; + (see {RFC 4648}[https://www.ietf.org/rfc/rfc4648.txt]): + + [""].pack('m') # => "" + ["\0"].pack('m') # => "AA==\n" + ["\0\0"].pack('m') # => "AAA=\n" + ["\0\0\0"].pack('m') # => "AAAA\n" + ["\377"].pack('m') # => "/w==\n" + ["\377\377"].pack('m') # => "//8=\n" + ["\377\377\377"].pack('m') # => "////\n" + + "".unpack('m') # => [""] + "AA==\n".unpack('m') # => ["\x00"] + "AAA=\n".unpack('m') # => ["\x00\x00"] + "AAAA\n".unpack('m') # => ["\x00\x00\x00"] + "/w==\n".unpack('m') # => ["\xFF"] + "//8=\n".unpack('m') # => ["\xFF\xFF"] + "////\n".unpack('m') # => ["\xFF\xFF\xFF"] + "A\n".unpack('m') # => [""] + "AA\n".unpack('m') # => ["\x00"] + "AA=\n".unpack('m') # => ["\x00"] + "AAA\n".unpack('m') # => ["\x00\x00"] + + [""].pack('m0') # => "" + ["\0"].pack('m0') # => "AA==" + ["\0\0"].pack('m0') # => "AAA=" + ["\0\0\0"].pack('m0') # => "AAAA" + ["\377"].pack('m0') # => "/w==" + ["\377\377"].pack('m0') # => "//8=" + ["\377\377\377"].pack('m0') # => "////" + + "".unpack('m0') # => [""] + "AA==".unpack('m0') # => ["\x00"] + "AAA=".unpack('m0') # => ["\x00\x00"] + "AAAA".unpack('m0') # => ["\x00\x00\x00"] + "/w==".unpack('m0') # => ["\xFF"] + "//8=".unpack('m0') # => ["\xFF\xFF"] + "////".unpack('m0') # => ["\xFF\xFF\xFF"] + +- <tt>'u'</tt> - UU-encoded string: + + [""].pack("u") # => "" + ["a"].pack("u") # => "!80``\n" + ["aaa"].pack("u") # => "#86%A\n" + + "".unpack("u") # => [""] + "#86)C\n".unpack("u") # => ["abc"] + +== Offset Directives + +- <tt>'@'</tt> - Begin packing at the given byte offset; + for packing, null fill or shrink if necessary: + + [1, 2].pack("C@0C") # => "\x02" + [1, 2].pack("C@1C") # => "\x01\x02" + [1, 2].pack("C@5C") # => "\x01\x00\x00\x00\x00\x02" + [*1..5].pack("CCCC@2C") # => "\x01\x02\x05" + + For unpacking, cannot to move to outside the string: + + "\x01\x00\x00\x02".unpack("C@3C") # => [1, 2] + "\x00".unpack("@1C") # => [nil] + "\x00".unpack("@2C") # Raises ArgumentError. + +- <tt>'X'</tt> - For packing, shrink for the given byte offset: + + [0, 1, 2].pack("CCXC") # => "\x00\x02" + [0, 1, 2].pack("CCX2C") # => "\x02" + + For unpacking; rewind unpacking position for the given byte offset: + + "\x00\x02".unpack("CCXC") # => [0, 2, 2] + + Cannot to move to outside the string: + + [0, 1, 2].pack("CCX3C") # Raises ArgumentError. + "\x00\x02".unpack("CX3C") # Raises ArgumentError. + +- <tt>'x'</tt> - Begin packing at after the given byte offset; + for packing, null fill if necessary: + + [].pack("x0") # => "" + [].pack("x") # => "\x00" + [].pack("x8") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + + For unpacking, cannot to move to outside the string: + + "\x00\x00\x02".unpack("CxC") # => [0, 2] + "\x00\x00\x02".unpack("x3C") # => [nil] + "\x00\x00\x02".unpack("x4C") # Raises ArgumentError + +- <tt>'^'</tt> - Only for unpacking; the current position: + + "foo\0\0\0".unpack("Z*^") # => ["foo", 4] diff --git a/doc/language/ractor.md b/doc/language/ractor.md new file mode 100644 index 0000000000..1592656217 --- /dev/null +++ b/doc/language/ractor.md @@ -0,0 +1,797 @@ +# Ractor - Ruby's Actor-like concurrency abstraction + +Ractors are designed to provide parallel execution of Ruby code without thread-safety concerns. + +## Summary + +### Multiple Ractors in a ruby process + +You can create multiple Ractors which can run ruby code in parallel with each other. + +* `Ractor.new{ expr }` creates a new Ractor and `expr` can run in parallel with other ractors on a multi-core computer. +* Ruby processes start with one ractor (called the *main ractor*). +* If the main ractor terminates, all other ractors receive termination requests, similar to how threads behave. +* Each Ractor contains one or more `Thread`s. + * Threads within the same ractor share a ractor-wide global lock (GVL in MRI terminology), so they can't run in parallel with each other (without releasing the GVL explicitly in C extensions). Threads in different ractors can run in parallel. + * The overhead of creating a ractor is slightly above the overhead of creating a thread. + +### Limited sharing between Ractors + +Ractors don't share all objects, unlike threads which can access any object other than objects stored in another thread's thread-locals. + +* Most objects are *unshareable objects*. Unshareable objects can only be used by the ractor that instantiated them, so you don't need to worry about thread-safety issues resulting from using the object concurrently across ractors. +* Some objects are *shareable objects*. Here is an incomplete list to give you an idea: + * `i = 123`: All `Integer`s are shareable. + * `s = "str".freeze`: Frozen strings are shareable if they have no instance variables that refer to unshareable objects. + * `a = [1, [2], 3].freeze`: `a` is not a shareable object because `a` refers to the unshareable object `[2]` (this Array is not frozen). + * `h = {c: Object}.freeze`: `h` is shareable because `Symbol`s and `Class`es are shareable, and the Hash is frozen. + * Class/Module objects are always shareable, even if they refer to unshareable objects. + * Special shareable objects + * Ractor objects themselves are shareable. + * And more... + +### Communication between Ractors with `Ractor::Port` + +Ractors communicate with each other and synchronize their execution by exchanging messages. The `Ractor::Port` class provides this communication mechanism. + +```ruby +port = Ractor::Port.new + +Ractor.new port do |port| + # Other ractors can send to the port + port << 42 +end + +port.receive # get a message from the port. Only the ractor that created the Port can receive from it. +#=> 42 +``` + +All Ractors have a default port, which `Ractor#send`, `Ractor.receive` (etc) will use. + +### Copy & Move semantics when sending objects + +To send unshareable objects to another ractor, objects are either copied or moved. + +* Copy: deep-copies the object to the other ractor. All unshareable objects will be `Kernel#clone`ed. +* Move: moves membership to another ractor. + * The sending ractor can not access the moved object after it moves. + * There is a guarantee that only one ractor can access an unshareable object at once. + +### Thread-safety + +Ractors help to write thread-safe, concurrent programs. They allow sharing of data only through explicit message passing for +unshareable objects. Shareable objects are guaranteed to work correctly across ractors, even if the ractors are running in parallel. +This guarantee, however, only applies across ractors. You still need to use `Mutex`es and other thread-safety tools within a ractor if +you're using multiple ruby `Thread`s. + + * Most objects are unshareable. You can't create data-races across ractors due to the inability to use these objects across ractors. + * Shareable objects are protected by locks (or otherwise don't need to be) so they can be used by more than one ractor at once. + +## Creation and termination + +### `Ractor.new` + +* `Ractor.new { expr }` creates a Ractor. + +```ruby +# Ractor.new with a block creates a new Ractor +r = Ractor.new do + # This block can run in parallel with other ractors +end + +# You can name a Ractor with a `name:` argument. +r = Ractor.new name: 'my-first-ractor' do +end + +r.name #=> 'my-first-ractor' +``` + +### Block isolation + +The Ractor executes `expr` in the given block. +The given block will be isolated from its outer scope. To prevent sharing objects between ractors, outer variables, `self` and other information is isolated from the block. + +This isolation occurs at Ractor creation time (when `Ractor.new` is called). If the given block is not able to be isolated because of outer variables or `self`, an error will be raised. + +```ruby +begin + a = true + r = Ractor.new do + a #=> Ractor::IsolationError because this block accesses outer variable `a`. + end + r.join # wait for ractor to finish +rescue Ractor::IsolationError +end +``` + +* The `self` of the given block is the `Ractor` object itself. + +```ruby +r = Ractor.new do + p self.class #=> Ractor + self.object_id +end +r.value == self.object_id #=> false +``` + +Arguments passed to `Ractor.new()` become block parameters for the given block. However, Ruby does not pass the objects themselves, but sends them as messages (see below for details). + +```ruby +r = Ractor.new 'ok' do |msg| + msg #=> 'ok' +end +r.value #=> 'ok' +``` + +```ruby +# similar to the last example +r = Ractor.new do + msg = Ractor.receive + msg +end +r.send 'ok' +r.value #=> 'ok' +``` + +### The execution result of the given block + +The return value of the given block becomes an outgoing message (see below for details). + +```ruby +r = Ractor.new do + 'ok' +end +r.value #=> `ok` +``` + +An error in the given block will be propagated to the consumer of the outgoing message. + +```ruby +r = Ractor.new do + raise 'ok' # exception will be transferred to the consumer +end + +begin + r.value +rescue Ractor::RemoteError => e + e.cause.class #=> RuntimeError + e.cause.message #=> 'ok' + e.ractor #=> r +end +``` + +## Communication between Ractors + +Communication between ractors is achieved by sending and receiving messages. There are two ways to communicate: + +* (1) Sending and receiving messages via `Ractor::Port` +* (2) Using shareable container objects. For example, the Ractor::TVar gem ([ko1/ractor-tvar](https://github.com/ko1/ractor-tvar)) + +Users can control program execution timing with (1), but should not control with (2) (only perform critical sections). + +For sending and receiving messages, these are the fundamental APIs: + +* send/receive via `Ractor::Port`. + * `Ractor::Port#send(obj)` (`Ractor::Port#<<(obj)` is an alias) sends a message to the port. Ports are connected to an infinite size incoming queue so sending will never block the caller. + * `Ractor::Port#receive` dequeues a message from its own incoming queue. If the incoming queue is empty, `Ractor::Port#receive` will block the execution of the current Thread until a message is sent. + * `Ractor#send` and `Ractor.receive` use ports (their default port) internally, so are conceptually similar to the above. +* You can close a `Ractor::Port` by `Ractor::Port#close`. A port can only be closed by the ractor that created it. + * If a port is closed, you can't `send` to it. Doing so raises an exception. + * When a ractor is terminated, the ractor's ports are automatically closed. +* You can wait for a ractor's termination and receive its return value with `Ractor#value`. This is similar to `Thread#value`. + +There are 3 ways to send an object as a message: + +1) Send a reference: sending a shareable object sends only a reference to the object (fast). + +2) Copy an object: sending an unshareable object through copying it deeply (can be slow). Note that you can not send an object this way which does not support deep copy. Some `T_DATA` objects (objects whose class is defined in a C extension, such as `StringIO`) are not supported. + +3) Move an object: sending an unshareable object across ractors with a membership change. The sending Ractor can not access the moved object after moving it, otherwise an exception will be raised. Implementation note: `T_DATA` objects are not supported. + +You can choose between "Copy" and "Move" by the `move:` keyword, `Ractor#send(obj, move: true/false)`. The default is `false` ("Copy"). However, if the object is shareable it will automatically use `move`. + +### Wait for multiple Ractors with `Ractor.select` + +You can wait for messages on multiple ports at once. +The return value of `Ractor.select()` is `[port, msg]` where `port` is a ready port and `msg` is the received message. + +To make it convenient, `Ractor.select` can also accept ractors. In this case, it waits for their termination. +The return value of `Ractor.select()` is `[r, msg]` where `r` is a terminated Ractor and `msg` is the value of the ractor's block. + +Wait for a single ractor (same as `Ractor#value`): + +```ruby +r1 = Ractor.new{'r1'} + +r, obj = Ractor.select(r1) +r == r1 and obj == 'r1' #=> true +``` + +Wait for two ractors: + +```ruby +r1 = Ractor.new{'r1'} +r2 = Ractor.new{'r2'} +rs = [r1, r2] +values = [] + +while rs.any? + r, obj = Ractor.select(*rs) + rs.delete(r) + values << obj +end + +values.sort == ['r1', 'r2'] #=> true +``` + +NOTE: Using `Ractor.select()` on a very large number of ractors has the same issue as `select(2)` currently. + +### Closing ports + +* `Ractor::Port#close` closes the port (similar to `Queue#close`). + * `port.send(obj)` will raise an exception when the port is closed. + * When the queue connected to the port is empty and port is closed, `Ractor::Port#receive` raises an exception. If the queue is not empty, it dequeues an object without exceptions. +* When a Ractor terminates, the ports are closed automatically. + +Example (try to get a result from closed ractor): + +```ruby +r = Ractor.new do + 'finish' +end +r.join # success (wait for the termination) +r.value # success (will return 'finish') + +# The ractor's termination value has already been given to another ractor +Ractor.new r do |r| + r.value #=> Ractor::Error +end.join +``` + +Example (try to send to closed port): + +```ruby +r = Ractor.new do +end + +r.join # wait for termination, closes default port + +begin + r.send(1) +rescue Ractor::ClosedError + 'ok' +end +``` + +### Send a message by copying + +`Ractor::Port#send(obj)` copies `obj` deeply if `obj` is an unshareable object. + +```ruby +obj = 'str'.dup +r = Ractor.new obj do |msg| + # return received msg's object_id + msg.object_id +end + +obj.object_id == r.value #=> false +``` + +Some objects do not support copying, and raise an exception. + +```ruby +obj = Thread.new{} +begin + Ractor.new obj do |msg| + msg + end +rescue TypeError => e + e.message #=> #<TypeError: allocator undefined for Thread> +end +``` + +### Send a message by moving + +`Ractor::Port#send(obj, move: true)` moves `obj` to the destination Ractor. +If the source ractor uses the moved object (for example, calls a method like `obj.foo()`), it will raise an error. + +```ruby +r = Ractor.new do + obj = Ractor.receive + obj << ' world' +end + +str = 'hello'.dup +r.send str, move: true +# str is now moved, and accessing str from this ractor is prohibited +modified = r.value #=> 'hello world' + + +begin + # Error because it uses moved str. + str << ' exception' # raise Ractor::MovedError +rescue Ractor::MovedError + modified #=> 'hello world' +end +``` + +Some objects do not support moving, and an exception will be raised. + +```ruby +r = Ractor.new do + Ractor.receive +end + +r.send(Thread.new{}, move: true) #=> allocator undefined for Thread (TypeError) +``` + +Once an object has been moved, the source object's class is changed to `Ractor::MovedObject`. + +### Shareable objects + +The following is an inexhaustive list of shareable objects: + +* `Integer`, `Float`, `Complex`, `Rational` +* `Symbol`, frozen `String` objects that don't refer to unshareables, `true`, `false`, `nil` +* `Regexp` objects, if they have no instance variables or their instance variables refer only to shareables +* `Class` and `Module` objects +* `Ractor` and other special objects which deal with synchronization + +To make objects shareable, `Ractor.make_shareable(obj)` is provided. It tries to make the object shareable by freezing `obj` and recursively traversing its references to freeze them all. This method accepts the `copy:` keyword (default value is false). `Ractor.make_shareable(obj, copy: true)` tries to make a deep copy of `obj` and make the copied object shareable. `Ractor.make_shareable(copy: false)` has no effect on an already shareable object. If the object cannot be made shareable, a `Ractor::Error` exception will be raised. + +## Language changes to limit sharing between Ractors + +To isolate unshareable objects across ractors, we introduced additional language semantics for multi-ractor Ruby programs. + +Note that when not using ractors, these additional semantics are not needed (100% compatible with Ruby 2). + +### Global variables + +Only the main Ractor can access global variables. + +```ruby +$gv = 1 +r = Ractor.new do + $gv +end + +begin + r.join +rescue Ractor::RemoteError => e + e.cause.message #=> 'can not access global variables from non-main Ractors' +end +``` + +Note that some special global variables, such as `$stdin`, `$stdout` and `$stderr` are local to each ractor. See [[Bug #17268]](https://bugs.ruby-lang.org/issues/17268) for more details. + +### Instance variables of shareable objects + +Instance variables of classes/modules can be accessed from non-main ractors only if their values are shareable objects. + +```ruby +class C + @iv = 1 +end + +p Ractor.new do + class C + @iv + end +end.value #=> 1 +``` + +Otherwise, only the main Ractor can access instance variables of shareable objects. + +```ruby +class C + @iv = [] # unshareable object +end + +Ractor.new do + class C + begin + p @iv + rescue Ractor::IsolationError + p $!.message + #=> "can not get unshareable values from instance variables of classes/modules from non-main Ractors" + end + + begin + @iv = 42 + rescue Ractor::IsolationError + p $!.message + #=> "can not set instance variables of classes/modules by non-main Ractors" + end + end +end.join +``` + +```ruby +shared = Ractor.new{} +shared.instance_variable_set(:@iv, 'str') + +r = Ractor.new shared do |shared| + p shared.instance_variable_get(:@iv) +end + +begin + r.join +rescue Ractor::RemoteError => e + e.cause.message #=> can not access instance variables of shareable objects from non-main Ractors (Ractor::IsolationError) +end +``` + +### Class variables + +Only the main Ractor can access class variables. + +```ruby +class C + @@cv = 'str' +end + +r = Ractor.new do + class C + p @@cv + end +end + + +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +### Constants + +Only the main Ractor can read constants which refer to an unshareable object. + +```ruby +class C + CONST = 'str'.dup +end +r = Ractor.new do + C::CONST +end +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +Only the main Ractor can define constants which refer to an unshareable object. + +```ruby +class C +end +r = Ractor.new do + C::CONST = 'str'.dup +end +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +When creating/updating a library to support ractors, constants should only refer to shareable objects if they are to be used by non-main ractors. + +```ruby +TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} +``` + +In this case, `TABLE` refers to an unshareable Hash object. In order for other ractors to use `TABLE`, we need to make it shareable. We can use `Ractor.make_shareable()` like so: + +```ruby +TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) +``` + +To make it easy, Ruby 3.0 introduced a new `shareable_constant_value` file directive. + +```ruby +# shareable_constant_value: literal + +TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} +#=> Same as: TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) +``` + +The `shareable_constant_value` directive accepts the following modes (descriptions use the example: `CONST = expr`): + +* none: Do nothing. Same as: `CONST = expr` +* literal: + * if `expr` consists of literals, replaced to `CONST = Ractor.make_shareable(expr)`. + * otherwise: replaced to `CONST = expr.tap{|o| raise unless Ractor.shareable?(o)}`. +* experimental_everything: replaced to `CONST = Ractor.make_shareable(expr)`. +* experimental_copy: replaced to `CONST = Ractor.make_shareable(expr, copy: true)`. + +Except for the `none` mode (default), it is guaranteed that these constants refer only to shareable objects. + +See [syntax/comments.rdoc](../syntax/comments.rdoc) for more details. + +### Shareable procs + +Procs and lambdas are unshareable objects, even when they are frozen. To create an unshareable Proc, you must use `Ractor.shareable_proc { expr }`. Much like during Ractor creation, the proc's block is isolated from its outer environment, so it cannot access variables from the outside scope. `self` is also changed within the Proc to be `nil` by default, although a `self:` keyword can be provided if you want to customize the value to a different shareable object. + +```ruby +p = Ractor.shareable_proc { p self } +p.call #=> nil +``` + +```ruby +begin + a = 1 + pr = Ractor.shareable_proc { p a } + pr.call # never gets here +rescue Ractor::IsolationError +end +``` + +In order to dynamically define a method with `Module#define_method` that can be used from different ractors, you must define it with a shareable proc. Alternatively, you can use `Module#class_eval` or `Module#module_eval` with a String. Even though the shareable proc's `self` is initially bound to `nil`, `define_method` will bind `self` to the correct value in the method. + +```ruby +class A + define_method :testing, &Ractor.shareable_proc do + p self + end +end +Ractor.new do + a = A.new + a.testing #=> #<A:0x0000000101acfe10> +end.join +``` + +This isolation must be done to prevent the method from accessing and assigning captured outer variables across ractors. + +### Ractor-local storage + +You can store any object (even unshareables) in ractor-local storage. + +```ruby +r = Ractor.new do + values = [] + Ractor[:threads] = [] + 3.times do |i| + Ractor[:threads] << Thread.new do + values << [Ractor.receive, i+1] # Ractor.receive blocks the current thread in the current ractor until it receives a message + end + end + Ractor[:threads].each(&:join) + values +end + +r << 1 +r << 2 +r << 3 +r.value #=> [[1,1],[2,2],[3,3]] (the order can change with each run) +``` + +## Examples + +### Traditional Ring example in Actor-model + +```ruby +RN = 1_000 +CR = Ractor.current + +r = Ractor.new do + p Ractor.receive + CR << :fin +end + +RN.times{ + r = Ractor.new r do |next_r| + next_r << Ractor.receive + end +} + +p :setup_ok +r << 1 +p Ractor.receive +``` + +### Fork-join + +```ruby +def fib n + if n < 2 + 1 + else + fib(n-2) + fib(n-1) + end +end + +RN = 10 +rs = (1..RN).map do |i| + Ractor.new i do |i| + [i, fib(i)] + end +end + +until rs.empty? + r, v = Ractor.select(*rs) + rs.delete r + p answer: v +end +``` + +### Worker pool + +(1) One ractor has a pool + +```ruby +require 'prime' + +N = 1000 +RN = 10 + +# make RN workers +workers = (1..RN).map do + Ractor.new do |; result_port| + loop do + n, result_port = Ractor.receive + result_port << [n, n.prime?, Ractor.current] + end + end +end + +result_port = Ractor::Port.new +results = [] + +(1..N).each do |i| + if workers.empty? + # receive a result + n, result, w = result_port.receive + results << [n, result] + else + w = workers.pop + end + + # send a task to the idle worker ractor + w << [i, result_port] +end + +# receive a result +while results.size != N + n, result, _w = result_port.receive + results << [n, result] +end + +pp results.sort_by{|n, result| n} +``` + +### Pipeline + +```ruby +# pipeline with send/receive + +r3 = Ractor.new Ractor.current do |cr| + cr.send Ractor.receive + 'r3' +end + +r2 = Ractor.new r3 do |r3| + r3.send Ractor.receive + 'r2' +end + +r1 = Ractor.new r2 do |r2| + r2.send Ractor.receive + 'r1' +end + +r1 << 'r0' +p Ractor.receive #=> "r0r1r2r3" +``` + +### Supervise + +```ruby +# ring example again + +r = Ractor.current +(1..10).map{|i| + r = Ractor.new r, i do |r, i| + r.send Ractor.receive + "r#{i}" + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +``` + +```ruby +# ring example with an error + +r = Ractor.current +rs = (1..10).map{|i| + r = Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +r.send "r0" +p Ractor.select(*rs, Ractor.current) #=> [:receive, "r0r10r9r8r7r6r5r4r3r2r1"] +r.send "e0" +p Ractor.select(*rs, Ractor.current) +#=> +# <Thread:0x000056262de28bd8 run> terminated with exception (report_on_exception is true): +# Traceback (most recent call last): +# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' +# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' +# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception +# Traceback (most recent call last): +# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' +# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' +# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception +# 1: from /home/ko1/src/ruby/trunk/test.rb:21:in `<main>' +# <internal:ractor>:69:in `select': thrown by remote Ractor. (Ractor::RemoteError) +``` + +```ruby +# resend non-error message + +r = Ractor.current +rs = (1..10).map{|i| + r = Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +r.send "r0" +p Ractor.select(*rs, Ractor.current) +[:receive, "r0r10r9r8r7r6r5r4r3r2r1"] +msg = 'e0' +begin + r.send msg + p Ractor.select(*rs, Ractor.current) +rescue Ractor::RemoteError + msg = 'r0' + retry +end + +#=> <internal:ractor>:100:in `send': The incoming-port is already closed (Ractor::ClosedError) +# because r == r[-1] is terminated. +``` + +```ruby +# ring example with supervisor and re-start + +def make_ractor r, i + Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +end + +r = Ractor.current +rs = (1..10).map{|i| + r = make_ractor(r, i) +} + +msg = 'e0' # error causing message +begin + r.send msg + p Ractor.select(*rs, Ractor.current) +rescue Ractor::RemoteError + r = rs[-1] = make_ractor(rs[-2], rs.size-1) + msg = 'x0' + retry +end + +#=> [:receive, "x0r9r9r8r7r6r5r4r3r2r1"] +``` diff --git a/doc/language/regexp/methods.rdoc b/doc/language/regexp/methods.rdoc new file mode 100644 index 0000000000..356156ac9a --- /dev/null +++ b/doc/language/regexp/methods.rdoc @@ -0,0 +1,41 @@ +== \Regexp Methods + +Each of these Ruby core methods can accept a regexp as an argument: + +- Enumerable#all? +- Enumerable#any? +- Enumerable#grep +- Enumerable#grep_v +- Enumerable#none? +- Enumerable#one? +- Enumerable#slice_after +- Enumerable#slice_before +- Regexp#=~ +- Regexp#match +- Regexp#match? +- Regexp.new +- Regexp.union +- String#=~ +- String#[]= +- String#byteindex +- String#byterindex +- String#gsub +- String#gsub! +- String#index +- String#match +- String#match? +- String#partition +- String#rindex +- String#rpartition +- String#scan +- String#slice +- String#slice! +- String#split +- String#start_with? +- String#sub +- String#sub! +- Symbol#=~ +- Symbol#match +- Symbol#match? +- Symbol#slice +- Symbol#start_with? diff --git a/doc/language/regexp/unicode_properties.rdoc b/doc/language/regexp/unicode_properties.rdoc new file mode 100644 index 0000000000..94080f7199 --- /dev/null +++ b/doc/language/regexp/unicode_properties.rdoc @@ -0,0 +1,718 @@ +== \Regexps Based on Unicode Properties + +The properties shown here are those currently supported in Ruby. +Older versions may not support all of these. + +=== POSIX brackets + +- <tt>\p{ASCII}</tt> +- <tt>\p{Alnum}</tt> +- <tt>\p{Alphabetic}</tt>, <tt>\p{Alpha}</tt> +- <tt>\p{Blank}</tt> +- <tt>\p{Cntrl}</tt> +- <tt>\p{Digit}</tt> +- <tt>\p{Graph}</tt> +- <tt>\p{Lowercase}</tt>, <tt>\p{Lower}</tt> +- <tt>\p{Print}</tt> +- <tt>\p{Punct}</tt> +- <tt>\p{Space}</tt> +- <tt>\p{Uppercase}</tt>, <tt>\p{Upper}</tt> +- <tt>\p{Word}</tt> +- <tt>\p{XDigit}</tt> +- <tt>\p{XPosixPunct}</tt> + +=== Special + +- <tt>\p{Any}</tt> +- <tt>\p{Assigned}</tt> + +=== Major and General Categories + +- <tt>\p{Cased_Letter}</tt>, <tt>\p{LC}</tt> +- <tt>\p{Close_Punctuation}</tt>, <tt>\p{Pe}</tt> +- <tt>\p{Connector_Punctuation}</tt>, <tt>\p{Pc}</tt> +- <tt>\p{Control}</tt>, <tt>\p{Cc}</tt> +- <tt>\p{Currency_Symbol}</tt>, <tt>\p{Sc}</tt> +- <tt>\p{Dash_Punctuation}</tt>, <tt>\p{Pd}</tt> +- <tt>\p{Decimal_Number}</tt>, <tt>\p{Nd}</tt> +- <tt>\p{Enclosing_Mark}</tt>, <tt>\p{Me}</tt> +- <tt>\p{Final_Punctuation}</tt>, <tt>\p{Pf}</tt> +- <tt>\p{Format}</tt>, <tt>\p{Cf}</tt> +- <tt>\p{Initial_Punctuation}</tt>, <tt>\p{Pi}</tt> +- <tt>\p{Letter}</tt>, <tt>\p{L}</tt> +- <tt>\p{Letter_Number}</tt>, <tt>\p{Nl}</tt> +- <tt>\p{Line_Separator}</tt>, <tt>\p{Zl}</tt> +- <tt>\p{Lowercase_Letter}</tt>, <tt>\p{Ll}</tt> +- <tt>\p{Mark}</tt>, <tt>\p{M}</tt> +- <tt>\p{Math_Symbol}</tt>, <tt>\p{Sm}</tt> +- <tt>\p{Modifier_Letter}</tt>, <tt>\p{Lm}</tt> +- <tt>\p{Modifier_Symbol}</tt>, <tt>\p{Sk}</tt> +- <tt>\p{Nonspacing_Mark}</tt>, <tt>\p{Mn}</tt> +- <tt>\p{Number}</tt>, <tt>\p{N}</tt> +- <tt>\p{Open_Punctuation}</tt>, <tt>\p{Ps}</tt> +- <tt>\p{Other}</tt>, <tt>\p{C}</tt> +- <tt>\p{Other_Letter}</tt>, <tt>\p{Lo}</tt> +- <tt>\p{Other_Number}</tt>, <tt>\p{No}</tt> +- <tt>\p{Other_Punctuation}</tt>, <tt>\p{Po}</tt> +- <tt>\p{Other_Symbol}</tt>, <tt>\p{So}</tt> +- <tt>\p{Paragraph_Separator}</tt>, <tt>\p{Zp}</tt> +- <tt>\p{Private_Use}</tt>, <tt>\p{Co}</tt> +- <tt>\p{Punctuation}</tt>, <tt>\p{P}</tt> +- <tt>\p{Separator}</tt>, <tt>\p{Z}</tt> +- <tt>\p{Space_Separator}</tt>, <tt>\p{Zs}</tt> +- <tt>\p{Spacing_Mark}</tt>, <tt>\p{Mc}</tt> +- <tt>\p{Surrogate}</tt>, <tt>\p{Cs}</tt> +- <tt>\p{Symbol}</tt>, <tt>\p{S}</tt> +- <tt>\p{Titlecase_Letter}</tt>, <tt>\p{Lt}</tt> +- <tt>\p{Unassigned}</tt>, <tt>\p{Cn}</tt> +- <tt>\p{Uppercase_Letter}</tt>, <tt>\p{Lu}</tt> + +=== Prop List + +- <tt>\p{ASCII_Hex_Digit}</tt>, <tt>\p{AHex}</tt> +- <tt>\p{Bidi_Control}</tt>, <tt>\p{Bidi_C}</tt> +- <tt>\p{Dash}</tt> +- <tt>\p{Deprecated}</tt>, <tt>\p{Dep}</tt> +- <tt>\p{Diacritic}</tt>, <tt>\p{Dia}</tt> +- <tt>\p{Extender}</tt>, <tt>\p{Ext}</tt> +- <tt>\p{Hex_Digit}</tt>, <tt>\p{Hex}</tt> +- <tt>\p{Hyphen}</tt> +- <tt>\p{IDS_Binary_Operator}</tt>, <tt>\p{IDSB}</tt> +- <tt>\p{IDS_Trinary_Operator}</tt>, <tt>\p{IDST}</tt> +- <tt>\p{IDS_Unary_Operator}</tt>, <tt>\p{IDSU}</tt> +- <tt>\p{ID_Compat_Math_Continue}</tt> +- <tt>\p{ID_Compat_Math_Start}</tt> +- <tt>\p{Ideographic}</tt>, <tt>\p{Ideo}</tt> +- <tt>\p{Join_Control}</tt>, <tt>\p{Join_C}</tt> +- <tt>\p{Logical_Order_Exception}</tt>, <tt>\p{LOE}</tt> +- <tt>\p{Modifier_Combining_Mark}</tt>, <tt>\p{MCM}</tt> +- <tt>\p{Noncharacter_Code_Point}</tt>, <tt>\p{NChar}</tt> +- <tt>\p{Other_Alphabetic}</tt>, <tt>\p{OAlpha}</tt> +- <tt>\p{Other_Default_Ignorable_Code_Point}</tt>, <tt>\p{ODI}</tt> +- <tt>\p{Other_Grapheme_Extend}</tt>, <tt>\p{OGr_Ext}</tt> +- <tt>\p{Other_ID_Continue}</tt>, <tt>\p{OIDC}</tt> +- <tt>\p{Other_ID_Start}</tt>, <tt>\p{OIDS}</tt> +- <tt>\p{Other_Lowercase}</tt>, <tt>\p{OLower}</tt> +- <tt>\p{Other_Math}</tt>, <tt>\p{OMath}</tt> +- <tt>\p{Other_Uppercase}</tt>, <tt>\p{OUpper}</tt> +- <tt>\p{Pattern_Syntax}</tt>, <tt>\p{Pat_Syn}</tt> +- <tt>\p{Pattern_White_Space}</tt>, <tt>\p{Pat_WS}</tt> +- <tt>\p{Prepended_Concatenation_Mark}</tt>, <tt>\p{PCM}</tt> +- <tt>\p{Quotation_Mark}</tt>, <tt>\p{QMark}</tt> +- <tt>\p{Radical}</tt> +- <tt>\p{Regional_Indicator}</tt>, <tt>\p{RI}</tt> +- <tt>\p{Sentence_Terminal}</tt>, <tt>\p{STerm}</tt> +- <tt>\p{Soft_Dotted}</tt>, <tt>\p{SD}</tt> +- <tt>\p{Terminal_Punctuation}</tt>, <tt>\p{Term}</tt> +- <tt>\p{Unified_Ideograph}</tt>, <tt>\p{UIdeo}</tt> +- <tt>\p{Variation_Selector}</tt>, <tt>\p{VS}</tt> +- <tt>\p{White_Space}</tt>, <tt>\p{WSpace}</tt> + +=== Derived Core Properties + +- <tt>\p{Alphabetic}</tt>, <tt>\p{Alpha}</tt> +- <tt>\p{Case_Ignorable}</tt>, <tt>\p{CI}</tt> +- <tt>\p{Cased}</tt> +- <tt>\p{Changes_When_Casefolded}</tt>, <tt>\p{CWCF}</tt> +- <tt>\p{Changes_When_Casemapped}</tt>, <tt>\p{CWCM}</tt> +- <tt>\p{Changes_When_Lowercased}</tt>, <tt>\p{CWL}</tt> +- <tt>\p{Changes_When_Titlecased}</tt>, <tt>\p{CWT}</tt> +- <tt>\p{Changes_When_Uppercased}</tt>, <tt>\p{CWU}</tt> +- <tt>\p{Default_Ignorable_Code_Point}</tt>, <tt>\p{DI}</tt> +- <tt>\p{Grapheme_Base}</tt>, <tt>\p{Gr_Base}</tt> +- <tt>\p{Grapheme_Extend}</tt>, <tt>\p{Gr_Ext}</tt> +- <tt>\p{Grapheme_Link}</tt>, <tt>\p{Gr_Link}</tt> +- <tt>\p{ID_Continue}</tt>, <tt>\p{IDC}</tt> +- <tt>\p{ID_Start}</tt>, <tt>\p{IDS}</tt> +- <tt>\p{InCB_Consonant}</tt> +- <tt>\p{InCB_Extend}</tt> +- <tt>\p{InCB_Linker}</tt> +- <tt>\p{Lowercase}</tt>, <tt>\p{Lower}</tt> +- <tt>\p{Math}</tt> +- <tt>\p{Uppercase}</tt>, <tt>\p{Upper}</tt> +- <tt>\p{XID_Continue}</tt>, <tt>\p{XIDC}</tt> +- <tt>\p{XID_Start}</tt>, <tt>\p{XIDS}</tt> + +=== Scripts + +- <tt>\p{Adlam}</tt>, <tt>\p{Adlm}</tt> +- <tt>\p{Ahom}</tt> +- <tt>\p{Anatolian_Hieroglyphs}</tt>, <tt>\p{Hluw}</tt> +- <tt>\p{Arabic}</tt>, <tt>\p{Arab}</tt> +- <tt>\p{Armenian}</tt>, <tt>\p{Armn}</tt> +- <tt>\p{Avestan}</tt>, <tt>\p{Avst}</tt> +- <tt>\p{Balinese}</tt>, <tt>\p{Bali}</tt> +- <tt>\p{Bamum}</tt>, <tt>\p{Bamu}</tt> +- <tt>\p{Bassa_Vah}</tt>, <tt>\p{Bass}</tt> +- <tt>\p{Batak}</tt>, <tt>\p{Batk}</tt> +- <tt>\p{Bengali}</tt>, <tt>\p{Beng}</tt> +- <tt>\p{Beria_Erfe}</tt>, <tt>\p{Berf}</tt> +- <tt>\p{Bhaiksuki}</tt>, <tt>\p{Bhks}</tt> +- <tt>\p{Bopomofo}</tt>, <tt>\p{Bopo}</tt> +- <tt>\p{Brahmi}</tt>, <tt>\p{Brah}</tt> +- <tt>\p{Braille}</tt>, <tt>\p{Brai}</tt> +- <tt>\p{Buginese}</tt>, <tt>\p{Bugi}</tt> +- <tt>\p{Buhid}</tt>, <tt>\p{Buhd}</tt> +- <tt>\p{Canadian_Aboriginal}</tt>, <tt>\p{Cans}</tt> +- <tt>\p{Carian}</tt>, <tt>\p{Cari}</tt> +- <tt>\p{Caucasian_Albanian}</tt>, <tt>\p{Aghb}</tt> +- <tt>\p{Chakma}</tt>, <tt>\p{Cakm}</tt> +- <tt>\p{Cham}</tt> +- <tt>\p{Cherokee}</tt>, <tt>\p{Cher}</tt> +- <tt>\p{Chorasmian}</tt>, <tt>\p{Chrs}</tt> +- <tt>\p{Common}</tt>, <tt>\p{Zyyy}</tt> +- <tt>\p{Coptic}</tt>, <tt>\p{Copt}</tt> +- <tt>\p{Cuneiform}</tt>, <tt>\p{Xsux}</tt> +- <tt>\p{Cypriot}</tt>, <tt>\p{Cprt}</tt> +- <tt>\p{Cypro_Minoan}</tt>, <tt>\p{Cpmn}</tt> +- <tt>\p{Cyrillic}</tt>, <tt>\p{Cyrl}</tt> +- <tt>\p{Deseret}</tt>, <tt>\p{Dsrt}</tt> +- <tt>\p{Devanagari}</tt>, <tt>\p{Deva}</tt> +- <tt>\p{Dives_Akuru}</tt>, <tt>\p{Diak}</tt> +- <tt>\p{Dogra}</tt>, <tt>\p{Dogr}</tt> +- <tt>\p{Duployan}</tt>, <tt>\p{Dupl}</tt> +- <tt>\p{Egyptian_Hieroglyphs}</tt>, <tt>\p{Egyp}</tt> +- <tt>\p{Elbasan}</tt>, <tt>\p{Elba}</tt> +- <tt>\p{Elymaic}</tt>, <tt>\p{Elym}</tt> +- <tt>\p{Ethiopic}</tt>, <tt>\p{Ethi}</tt> +- <tt>\p{Garay}</tt>, <tt>\p{Gara}</tt> +- <tt>\p{Georgian}</tt>, <tt>\p{Geor}</tt> +- <tt>\p{Glagolitic}</tt>, <tt>\p{Glag}</tt> +- <tt>\p{Gothic}</tt>, <tt>\p{Goth}</tt> +- <tt>\p{Grantha}</tt>, <tt>\p{Gran}</tt> +- <tt>\p{Greek}</tt>, <tt>\p{Grek}</tt> +- <tt>\p{Gujarati}</tt>, <tt>\p{Gujr}</tt> +- <tt>\p{Gunjala_Gondi}</tt>, <tt>\p{Gong}</tt> +- <tt>\p{Gurmukhi}</tt>, <tt>\p{Guru}</tt> +- <tt>\p{Gurung_Khema}</tt>, <tt>\p{Gukh}</tt> +- <tt>\p{Han}</tt>, <tt>\p{Hani}</tt> +- <tt>\p{Hangul}</tt>, <tt>\p{Hang}</tt> +- <tt>\p{Hanifi_Rohingya}</tt>, <tt>\p{Rohg}</tt> +- <tt>\p{Hanunoo}</tt>, <tt>\p{Hano}</tt> +- <tt>\p{Hatran}</tt>, <tt>\p{Hatr}</tt> +- <tt>\p{Hebrew}</tt>, <tt>\p{Hebr}</tt> +- <tt>\p{Hiragana}</tt>, <tt>\p{Hira}</tt> +- <tt>\p{Imperial_Aramaic}</tt>, <tt>\p{Armi}</tt> +- <tt>\p{Inherited}</tt>, <tt>\p{Zinh}</tt> +- <tt>\p{Inscriptional_Pahlavi}</tt>, <tt>\p{Phli}</tt> +- <tt>\p{Inscriptional_Parthian}</tt>, <tt>\p{Prti}</tt> +- <tt>\p{Javanese}</tt>, <tt>\p{Java}</tt> +- <tt>\p{Kaithi}</tt>, <tt>\p{Kthi}</tt> +- <tt>\p{Kannada}</tt>, <tt>\p{Knda}</tt> +- <tt>\p{Katakana}</tt>, <tt>\p{Kana}</tt> +- <tt>\p{Kawi}</tt> +- <tt>\p{Kayah_Li}</tt>, <tt>\p{Kali}</tt> +- <tt>\p{Kharoshthi}</tt>, <tt>\p{Khar}</tt> +- <tt>\p{Khitan_Small_Script}</tt>, <tt>\p{Kits}</tt> +- <tt>\p{Khmer}</tt>, <tt>\p{Khmr}</tt> +- <tt>\p{Khojki}</tt>, <tt>\p{Khoj}</tt> +- <tt>\p{Khudawadi}</tt>, <tt>\p{Sind}</tt> +- <tt>\p{Kirat_Rai}</tt>, <tt>\p{Krai}</tt> +- <tt>\p{Lao}</tt>, <tt>\p{Laoo}</tt> +- <tt>\p{Latin}</tt>, <tt>\p{Latn}</tt> +- <tt>\p{Lepcha}</tt>, <tt>\p{Lepc}</tt> +- <tt>\p{Limbu}</tt>, <tt>\p{Limb}</tt> +- <tt>\p{Linear_A}</tt>, <tt>\p{Lina}</tt> +- <tt>\p{Linear_B}</tt>, <tt>\p{Linb}</tt> +- <tt>\p{Lisu}</tt> +- <tt>\p{Lycian}</tt>, <tt>\p{Lyci}</tt> +- <tt>\p{Lydian}</tt>, <tt>\p{Lydi}</tt> +- <tt>\p{Mahajani}</tt>, <tt>\p{Mahj}</tt> +- <tt>\p{Makasar}</tt>, <tt>\p{Maka}</tt> +- <tt>\p{Malayalam}</tt>, <tt>\p{Mlym}</tt> +- <tt>\p{Mandaic}</tt>, <tt>\p{Mand}</tt> +- <tt>\p{Manichaean}</tt>, <tt>\p{Mani}</tt> +- <tt>\p{Marchen}</tt>, <tt>\p{Marc}</tt> +- <tt>\p{Masaram_Gondi}</tt>, <tt>\p{Gonm}</tt> +- <tt>\p{Medefaidrin}</tt>, <tt>\p{Medf}</tt> +- <tt>\p{Meetei_Mayek}</tt>, <tt>\p{Mtei}</tt> +- <tt>\p{Mende_Kikakui}</tt>, <tt>\p{Mend}</tt> +- <tt>\p{Meroitic_Cursive}</tt>, <tt>\p{Merc}</tt> +- <tt>\p{Meroitic_Hieroglyphs}</tt>, <tt>\p{Mero}</tt> +- <tt>\p{Miao}</tt>, <tt>\p{Plrd}</tt> +- <tt>\p{Modi}</tt> +- <tt>\p{Mongolian}</tt>, <tt>\p{Mong}</tt> +- <tt>\p{Mro}</tt>, <tt>\p{Mroo}</tt> +- <tt>\p{Multani}</tt>, <tt>\p{Mult}</tt> +- <tt>\p{Myanmar}</tt>, <tt>\p{Mymr}</tt> +- <tt>\p{Nabataean}</tt>, <tt>\p{Nbat}</tt> +- <tt>\p{Nag_Mundari}</tt>, <tt>\p{Nagm}</tt> +- <tt>\p{Nandinagari}</tt>, <tt>\p{Nand}</tt> +- <tt>\p{New_Tai_Lue}</tt>, <tt>\p{Talu}</tt> +- <tt>\p{Newa}</tt> +- <tt>\p{Nko}</tt>, <tt>\p{Nkoo}</tt> +- <tt>\p{Nushu}</tt>, <tt>\p{Nshu}</tt> +- <tt>\p{Nyiakeng_Puachue_Hmong}</tt>, <tt>\p{Hmnp}</tt> +- <tt>\p{Ogham}</tt>, <tt>\p{Ogam}</tt> +- <tt>\p{Ol_Chiki}</tt>, <tt>\p{Olck}</tt> +- <tt>\p{Ol_Onal}</tt>, <tt>\p{Onao}</tt> +- <tt>\p{Old_Hungarian}</tt>, <tt>\p{Hung}</tt> +- <tt>\p{Old_Italic}</tt>, <tt>\p{Ital}</tt> +- <tt>\p{Old_North_Arabian}</tt>, <tt>\p{Narb}</tt> +- <tt>\p{Old_Permic}</tt>, <tt>\p{Perm}</tt> +- <tt>\p{Old_Persian}</tt>, <tt>\p{Xpeo}</tt> +- <tt>\p{Old_Sogdian}</tt>, <tt>\p{Sogo}</tt> +- <tt>\p{Old_South_Arabian}</tt>, <tt>\p{Sarb}</tt> +- <tt>\p{Old_Turkic}</tt>, <tt>\p{Orkh}</tt> +- <tt>\p{Old_Uyghur}</tt>, <tt>\p{Ougr}</tt> +- <tt>\p{Oriya}</tt>, <tt>\p{Orya}</tt> +- <tt>\p{Osage}</tt>, <tt>\p{Osge}</tt> +- <tt>\p{Osmanya}</tt>, <tt>\p{Osma}</tt> +- <tt>\p{Pahawh_Hmong}</tt>, <tt>\p{Hmng}</tt> +- <tt>\p{Palmyrene}</tt>, <tt>\p{Palm}</tt> +- <tt>\p{Pau_Cin_Hau}</tt>, <tt>\p{Pauc}</tt> +- <tt>\p{Phags_Pa}</tt>, <tt>\p{Phag}</tt> +- <tt>\p{Phoenician}</tt>, <tt>\p{Phnx}</tt> +- <tt>\p{Psalter_Pahlavi}</tt>, <tt>\p{Phlp}</tt> +- <tt>\p{Rejang}</tt>, <tt>\p{Rjng}</tt> +- <tt>\p{Runic}</tt>, <tt>\p{Runr}</tt> +- <tt>\p{Samaritan}</tt>, <tt>\p{Samr}</tt> +- <tt>\p{Saurashtra}</tt>, <tt>\p{Saur}</tt> +- <tt>\p{Sharada}</tt>, <tt>\p{Shrd}</tt> +- <tt>\p{Shavian}</tt>, <tt>\p{Shaw}</tt> +- <tt>\p{Siddham}</tt>, <tt>\p{Sidd}</tt> +- <tt>\p{Sidetic}</tt>, <tt>\p{Sidt}</tt> +- <tt>\p{SignWriting}</tt>, <tt>\p{Sgnw}</tt> +- <tt>\p{Sinhala}</tt>, <tt>\p{Sinh}</tt> +- <tt>\p{Sogdian}</tt>, <tt>\p{Sogd}</tt> +- <tt>\p{Sora_Sompeng}</tt>, <tt>\p{Sora}</tt> +- <tt>\p{Soyombo}</tt>, <tt>\p{Soyo}</tt> +- <tt>\p{Sundanese}</tt>, <tt>\p{Sund}</tt> +- <tt>\p{Sunuwar}</tt>, <tt>\p{Sunu}</tt> +- <tt>\p{Syloti_Nagri}</tt>, <tt>\p{Sylo}</tt> +- <tt>\p{Syriac}</tt>, <tt>\p{Syrc}</tt> +- <tt>\p{Tagalog}</tt>, <tt>\p{Tglg}</tt> +- <tt>\p{Tagbanwa}</tt>, <tt>\p{Tagb}</tt> +- <tt>\p{Tai_Le}</tt>, <tt>\p{Tale}</tt> +- <tt>\p{Tai_Tham}</tt>, <tt>\p{Lana}</tt> +- <tt>\p{Tai_Viet}</tt>, <tt>\p{Tavt}</tt> +- <tt>\p{Tai_Yo}</tt>, <tt>\p{Tayo}</tt> +- <tt>\p{Takri}</tt>, <tt>\p{Takr}</tt> +- <tt>\p{Tamil}</tt>, <tt>\p{Taml}</tt> +- <tt>\p{Tangsa}</tt>, <tt>\p{Tnsa}</tt> +- <tt>\p{Tangut}</tt>, <tt>\p{Tang}</tt> +- <tt>\p{Telugu}</tt>, <tt>\p{Telu}</tt> +- <tt>\p{Thaana}</tt>, <tt>\p{Thaa}</tt> +- <tt>\p{Thai}</tt> +- <tt>\p{Tibetan}</tt>, <tt>\p{Tibt}</tt> +- <tt>\p{Tifinagh}</tt>, <tt>\p{Tfng}</tt> +- <tt>\p{Tirhuta}</tt>, <tt>\p{Tirh}</tt> +- <tt>\p{Todhri}</tt>, <tt>\p{Todr}</tt> +- <tt>\p{Tolong_Siki}</tt>, <tt>\p{Tols}</tt> +- <tt>\p{Toto}</tt> +- <tt>\p{Tulu_Tigalari}</tt>, <tt>\p{Tutg}</tt> +- <tt>\p{Ugaritic}</tt>, <tt>\p{Ugar}</tt> +- <tt>\p{Unknown}</tt>, <tt>\p{Zzzz}</tt> +- <tt>\p{Vai}</tt>, <tt>\p{Vaii}</tt> +- <tt>\p{Vithkuqi}</tt>, <tt>\p{Vith}</tt> +- <tt>\p{Wancho}</tt>, <tt>\p{Wcho}</tt> +- <tt>\p{Warang_Citi}</tt>, <tt>\p{Wara}</tt> +- <tt>\p{Yezidi}</tt>, <tt>\p{Yezi}</tt> +- <tt>\p{Yi}</tt>, <tt>\p{Yiii}</tt> +- <tt>\p{Zanabazar_Square}</tt>, <tt>\p{Zanb}</tt> + +=== Blocks + +- <tt>\p{In_Adlam}</tt> +- <tt>\p{In_Aegean_Numbers}</tt> +- <tt>\p{In_Ahom}</tt> +- <tt>\p{In_Alchemical_Symbols}</tt> +- <tt>\p{In_Alphabetic_Presentation_Forms}</tt> +- <tt>\p{In_Anatolian_Hieroglyphs}</tt> +- <tt>\p{In_Ancient_Greek_Musical_Notation}</tt> +- <tt>\p{In_Ancient_Greek_Numbers}</tt> +- <tt>\p{In_Ancient_Symbols}</tt> +- <tt>\p{In_Arabic}</tt> +- <tt>\p{In_Arabic_Extended_A}</tt> +- <tt>\p{In_Arabic_Extended_B}</tt> +- <tt>\p{In_Arabic_Extended_C}</tt> +- <tt>\p{In_Arabic_Mathematical_Alphabetic_Symbols}</tt> +- <tt>\p{In_Arabic_Presentation_Forms_A}</tt> +- <tt>\p{In_Arabic_Presentation_Forms_B}</tt> +- <tt>\p{In_Arabic_Supplement}</tt> +- <tt>\p{In_Armenian}</tt> +- <tt>\p{In_Arrows}</tt> +- <tt>\p{In_Avestan}</tt> +- <tt>\p{In_Balinese}</tt> +- <tt>\p{In_Bamum}</tt> +- <tt>\p{In_Bamum_Supplement}</tt> +- <tt>\p{In_Basic_Latin}</tt> +- <tt>\p{In_Bassa_Vah}</tt> +- <tt>\p{In_Batak}</tt> +- <tt>\p{In_Bengali}</tt> +- <tt>\p{In_Beria_Erfe}</tt> +- <tt>\p{In_Bhaiksuki}</tt> +- <tt>\p{In_Block_Elements}</tt> +- <tt>\p{In_Bopomofo}</tt> +- <tt>\p{In_Bopomofo_Extended}</tt> +- <tt>\p{In_Box_Drawing}</tt> +- <tt>\p{In_Brahmi}</tt> +- <tt>\p{In_Braille_Patterns}</tt> +- <tt>\p{In_Buginese}</tt> +- <tt>\p{In_Buhid}</tt> +- <tt>\p{In_Byzantine_Musical_Symbols}</tt> +- <tt>\p{In_CJK_Compatibility}</tt> +- <tt>\p{In_CJK_Compatibility_Forms}</tt> +- <tt>\p{In_CJK_Compatibility_Ideographs}</tt> +- <tt>\p{In_CJK_Compatibility_Ideographs_Supplement}</tt> +- <tt>\p{In_CJK_Radicals_Supplement}</tt> +- <tt>\p{In_CJK_Strokes}</tt> +- <tt>\p{In_CJK_Symbols_and_Punctuation}</tt> +- <tt>\p{In_CJK_Unified_Ideographs}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_A}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_B}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_C}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_D}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_E}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_F}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_G}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_H}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_I}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_J}</tt> +- <tt>\p{In_Carian}</tt> +- <tt>\p{In_Caucasian_Albanian}</tt> +- <tt>\p{In_Chakma}</tt> +- <tt>\p{In_Cham}</tt> +- <tt>\p{In_Cherokee}</tt> +- <tt>\p{In_Cherokee_Supplement}</tt> +- <tt>\p{In_Chess_Symbols}</tt> +- <tt>\p{In_Chorasmian}</tt> +- <tt>\p{In_Combining_Diacritical_Marks}</tt> +- <tt>\p{In_Combining_Diacritical_Marks_Extended}</tt> +- <tt>\p{In_Combining_Diacritical_Marks_Supplement}</tt> +- <tt>\p{In_Combining_Diacritical_Marks_for_Symbols}</tt> +- <tt>\p{In_Combining_Half_Marks}</tt> +- <tt>\p{In_Common_Indic_Number_Forms}</tt> +- <tt>\p{In_Control_Pictures}</tt> +- <tt>\p{In_Coptic}</tt> +- <tt>\p{In_Coptic_Epact_Numbers}</tt> +- <tt>\p{In_Counting_Rod_Numerals}</tt> +- <tt>\p{In_Cuneiform}</tt> +- <tt>\p{In_Cuneiform_Numbers_and_Punctuation}</tt> +- <tt>\p{In_Currency_Symbols}</tt> +- <tt>\p{In_Cypriot_Syllabary}</tt> +- <tt>\p{In_Cypro_Minoan}</tt> +- <tt>\p{In_Cyrillic}</tt> +- <tt>\p{In_Cyrillic_Extended_A}</tt> +- <tt>\p{In_Cyrillic_Extended_B}</tt> +- <tt>\p{In_Cyrillic_Extended_C}</tt> +- <tt>\p{In_Cyrillic_Extended_D}</tt> +- <tt>\p{In_Cyrillic_Supplement}</tt> +- <tt>\p{In_Deseret}</tt> +- <tt>\p{In_Devanagari}</tt> +- <tt>\p{In_Devanagari_Extended}</tt> +- <tt>\p{In_Devanagari_Extended_A}</tt> +- <tt>\p{In_Dingbats}</tt> +- <tt>\p{In_Dives_Akuru}</tt> +- <tt>\p{In_Dogra}</tt> +- <tt>\p{In_Domino_Tiles}</tt> +- <tt>\p{In_Duployan}</tt> +- <tt>\p{In_Early_Dynastic_Cuneiform}</tt> +- <tt>\p{In_Egyptian_Hieroglyph_Format_Controls}</tt> +- <tt>\p{In_Egyptian_Hieroglyphs}</tt> +- <tt>\p{In_Egyptian_Hieroglyphs_Extended_A}</tt> +- <tt>\p{In_Elbasan}</tt> +- <tt>\p{In_Elymaic}</tt> +- <tt>\p{In_Emoticons}</tt> +- <tt>\p{In_Enclosed_Alphanumeric_Supplement}</tt> +- <tt>\p{In_Enclosed_Alphanumerics}</tt> +- <tt>\p{In_Enclosed_CJK_Letters_and_Months}</tt> +- <tt>\p{In_Enclosed_Ideographic_Supplement}</tt> +- <tt>\p{In_Ethiopic}</tt> +- <tt>\p{In_Ethiopic_Extended}</tt> +- <tt>\p{In_Ethiopic_Extended_A}</tt> +- <tt>\p{In_Ethiopic_Extended_B}</tt> +- <tt>\p{In_Ethiopic_Supplement}</tt> +- <tt>\p{In_Garay}</tt> +- <tt>\p{In_General_Punctuation}</tt> +- <tt>\p{In_Geometric_Shapes}</tt> +- <tt>\p{In_Geometric_Shapes_Extended}</tt> +- <tt>\p{In_Georgian}</tt> +- <tt>\p{In_Georgian_Extended}</tt> +- <tt>\p{In_Georgian_Supplement}</tt> +- <tt>\p{In_Glagolitic}</tt> +- <tt>\p{In_Glagolitic_Supplement}</tt> +- <tt>\p{In_Gothic}</tt> +- <tt>\p{In_Grantha}</tt> +- <tt>\p{In_Greek_Extended}</tt> +- <tt>\p{In_Greek_and_Coptic}</tt> +- <tt>\p{In_Gujarati}</tt> +- <tt>\p{In_Gunjala_Gondi}</tt> +- <tt>\p{In_Gurmukhi}</tt> +- <tt>\p{In_Gurung_Khema}</tt> +- <tt>\p{In_Halfwidth_and_Fullwidth_Forms}</tt> +- <tt>\p{In_Hangul_Compatibility_Jamo}</tt> +- <tt>\p{In_Hangul_Jamo}</tt> +- <tt>\p{In_Hangul_Jamo_Extended_A}</tt> +- <tt>\p{In_Hangul_Jamo_Extended_B}</tt> +- <tt>\p{In_Hangul_Syllables}</tt> +- <tt>\p{In_Hanifi_Rohingya}</tt> +- <tt>\p{In_Hanunoo}</tt> +- <tt>\p{In_Hatran}</tt> +- <tt>\p{In_Hebrew}</tt> +- <tt>\p{In_High_Private_Use_Surrogates}</tt> +- <tt>\p{In_High_Surrogates}</tt> +- <tt>\p{In_Hiragana}</tt> +- <tt>\p{In_IPA_Extensions}</tt> +- <tt>\p{In_Ideographic_Description_Characters}</tt> +- <tt>\p{In_Ideographic_Symbols_and_Punctuation}</tt> +- <tt>\p{In_Imperial_Aramaic}</tt> +- <tt>\p{In_Indic_Siyaq_Numbers}</tt> +- <tt>\p{In_Inscriptional_Pahlavi}</tt> +- <tt>\p{In_Inscriptional_Parthian}</tt> +- <tt>\p{In_Javanese}</tt> +- <tt>\p{In_Kaithi}</tt> +- <tt>\p{In_Kaktovik_Numerals}</tt> +- <tt>\p{In_Kana_Extended_A}</tt> +- <tt>\p{In_Kana_Extended_B}</tt> +- <tt>\p{In_Kana_Supplement}</tt> +- <tt>\p{In_Kanbun}</tt> +- <tt>\p{In_Kangxi_Radicals}</tt> +- <tt>\p{In_Kannada}</tt> +- <tt>\p{In_Katakana}</tt> +- <tt>\p{In_Katakana_Phonetic_Extensions}</tt> +- <tt>\p{In_Kawi}</tt> +- <tt>\p{In_Kayah_Li}</tt> +- <tt>\p{In_Kharoshthi}</tt> +- <tt>\p{In_Khitan_Small_Script}</tt> +- <tt>\p{In_Khmer}</tt> +- <tt>\p{In_Khmer_Symbols}</tt> +- <tt>\p{In_Khojki}</tt> +- <tt>\p{In_Khudawadi}</tt> +- <tt>\p{In_Kirat_Rai}</tt> +- <tt>\p{In_Lao}</tt> +- <tt>\p{In_Latin_1_Supplement}</tt> +- <tt>\p{In_Latin_Extended_A}</tt> +- <tt>\p{In_Latin_Extended_Additional}</tt> +- <tt>\p{In_Latin_Extended_B}</tt> +- <tt>\p{In_Latin_Extended_C}</tt> +- <tt>\p{In_Latin_Extended_D}</tt> +- <tt>\p{In_Latin_Extended_E}</tt> +- <tt>\p{In_Latin_Extended_F}</tt> +- <tt>\p{In_Latin_Extended_G}</tt> +- <tt>\p{In_Lepcha}</tt> +- <tt>\p{In_Letterlike_Symbols}</tt> +- <tt>\p{In_Limbu}</tt> +- <tt>\p{In_Linear_A}</tt> +- <tt>\p{In_Linear_B_Ideograms}</tt> +- <tt>\p{In_Linear_B_Syllabary}</tt> +- <tt>\p{In_Lisu}</tt> +- <tt>\p{In_Lisu_Supplement}</tt> +- <tt>\p{In_Low_Surrogates}</tt> +- <tt>\p{In_Lycian}</tt> +- <tt>\p{In_Lydian}</tt> +- <tt>\p{In_Mahajani}</tt> +- <tt>\p{In_Mahjong_Tiles}</tt> +- <tt>\p{In_Makasar}</tt> +- <tt>\p{In_Malayalam}</tt> +- <tt>\p{In_Mandaic}</tt> +- <tt>\p{In_Manichaean}</tt> +- <tt>\p{In_Marchen}</tt> +- <tt>\p{In_Masaram_Gondi}</tt> +- <tt>\p{In_Mathematical_Alphanumeric_Symbols}</tt> +- <tt>\p{In_Mathematical_Operators}</tt> +- <tt>\p{In_Mayan_Numerals}</tt> +- <tt>\p{In_Medefaidrin}</tt> +- <tt>\p{In_Meetei_Mayek}</tt> +- <tt>\p{In_Meetei_Mayek_Extensions}</tt> +- <tt>\p{In_Mende_Kikakui}</tt> +- <tt>\p{In_Meroitic_Cursive}</tt> +- <tt>\p{In_Meroitic_Hieroglyphs}</tt> +- <tt>\p{In_Miao}</tt> +- <tt>\p{In_Miscellaneous_Mathematical_Symbols_A}</tt> +- <tt>\p{In_Miscellaneous_Mathematical_Symbols_B}</tt> +- <tt>\p{In_Miscellaneous_Symbols}</tt> +- <tt>\p{In_Miscellaneous_Symbols_Supplement}</tt> +- <tt>\p{In_Miscellaneous_Symbols_and_Arrows}</tt> +- <tt>\p{In_Miscellaneous_Symbols_and_Pictographs}</tt> +- <tt>\p{In_Miscellaneous_Technical}</tt> +- <tt>\p{In_Modi}</tt> +- <tt>\p{In_Modifier_Tone_Letters}</tt> +- <tt>\p{In_Mongolian}</tt> +- <tt>\p{In_Mongolian_Supplement}</tt> +- <tt>\p{In_Mro}</tt> +- <tt>\p{In_Multani}</tt> +- <tt>\p{In_Musical_Symbols}</tt> +- <tt>\p{In_Myanmar}</tt> +- <tt>\p{In_Myanmar_Extended_A}</tt> +- <tt>\p{In_Myanmar_Extended_B}</tt> +- <tt>\p{In_Myanmar_Extended_C}</tt> +- <tt>\p{In_NKo}</tt> +- <tt>\p{In_Nabataean}</tt> +- <tt>\p{In_Nag_Mundari}</tt> +- <tt>\p{In_Nandinagari}</tt> +- <tt>\p{In_New_Tai_Lue}</tt> +- <tt>\p{In_Newa}</tt> +- <tt>\p{In_No_Block}</tt> +- <tt>\p{In_Number_Forms}</tt> +- <tt>\p{In_Nushu}</tt> +- <tt>\p{In_Nyiakeng_Puachue_Hmong}</tt> +- <tt>\p{In_Ogham}</tt> +- <tt>\p{In_Ol_Chiki}</tt> +- <tt>\p{In_Ol_Onal}</tt> +- <tt>\p{In_Old_Hungarian}</tt> +- <tt>\p{In_Old_Italic}</tt> +- <tt>\p{In_Old_North_Arabian}</tt> +- <tt>\p{In_Old_Permic}</tt> +- <tt>\p{In_Old_Persian}</tt> +- <tt>\p{In_Old_Sogdian}</tt> +- <tt>\p{In_Old_South_Arabian}</tt> +- <tt>\p{In_Old_Turkic}</tt> +- <tt>\p{In_Old_Uyghur}</tt> +- <tt>\p{In_Optical_Character_Recognition}</tt> +- <tt>\p{In_Oriya}</tt> +- <tt>\p{In_Ornamental_Dingbats}</tt> +- <tt>\p{In_Osage}</tt> +- <tt>\p{In_Osmanya}</tt> +- <tt>\p{In_Ottoman_Siyaq_Numbers}</tt> +- <tt>\p{In_Pahawh_Hmong}</tt> +- <tt>\p{In_Palmyrene}</tt> +- <tt>\p{In_Pau_Cin_Hau}</tt> +- <tt>\p{In_Phags_pa}</tt> +- <tt>\p{In_Phaistos_Disc}</tt> +- <tt>\p{In_Phoenician}</tt> +- <tt>\p{In_Phonetic_Extensions}</tt> +- <tt>\p{In_Phonetic_Extensions_Supplement}</tt> +- <tt>\p{In_Playing_Cards}</tt> +- <tt>\p{In_Private_Use_Area}</tt> +- <tt>\p{In_Psalter_Pahlavi}</tt> +- <tt>\p{In_Rejang}</tt> +- <tt>\p{In_Rumi_Numeral_Symbols}</tt> +- <tt>\p{In_Runic}</tt> +- <tt>\p{In_Samaritan}</tt> +- <tt>\p{In_Saurashtra}</tt> +- <tt>\p{In_Sharada}</tt> +- <tt>\p{In_Sharada_Supplement}</tt> +- <tt>\p{In_Shavian}</tt> +- <tt>\p{In_Shorthand_Format_Controls}</tt> +- <tt>\p{In_Siddham}</tt> +- <tt>\p{In_Sidetic}</tt> +- <tt>\p{In_Sinhala}</tt> +- <tt>\p{In_Sinhala_Archaic_Numbers}</tt> +- <tt>\p{In_Small_Form_Variants}</tt> +- <tt>\p{In_Small_Kana_Extension}</tt> +- <tt>\p{In_Sogdian}</tt> +- <tt>\p{In_Sora_Sompeng}</tt> +- <tt>\p{In_Soyombo}</tt> +- <tt>\p{In_Spacing_Modifier_Letters}</tt> +- <tt>\p{In_Specials}</tt> +- <tt>\p{In_Sundanese}</tt> +- <tt>\p{In_Sundanese_Supplement}</tt> +- <tt>\p{In_Sunuwar}</tt> +- <tt>\p{In_Superscripts_and_Subscripts}</tt> +- <tt>\p{In_Supplemental_Arrows_A}</tt> +- <tt>\p{In_Supplemental_Arrows_B}</tt> +- <tt>\p{In_Supplemental_Arrows_C}</tt> +- <tt>\p{In_Supplemental_Mathematical_Operators}</tt> +- <tt>\p{In_Supplemental_Punctuation}</tt> +- <tt>\p{In_Supplemental_Symbols_and_Pictographs}</tt> +- <tt>\p{In_Supplementary_Private_Use_Area_A}</tt> +- <tt>\p{In_Supplementary_Private_Use_Area_B}</tt> +- <tt>\p{In_Sutton_SignWriting}</tt> +- <tt>\p{In_Syloti_Nagri}</tt> +- <tt>\p{In_Symbols_and_Pictographs_Extended_A}</tt> +- <tt>\p{In_Symbols_for_Legacy_Computing}</tt> +- <tt>\p{In_Symbols_for_Legacy_Computing_Supplement}</tt> +- <tt>\p{In_Syriac}</tt> +- <tt>\p{In_Syriac_Supplement}</tt> +- <tt>\p{In_Tagalog}</tt> +- <tt>\p{In_Tagbanwa}</tt> +- <tt>\p{In_Tags}</tt> +- <tt>\p{In_Tai_Le}</tt> +- <tt>\p{In_Tai_Tham}</tt> +- <tt>\p{In_Tai_Viet}</tt> +- <tt>\p{In_Tai_Xuan_Jing_Symbols}</tt> +- <tt>\p{In_Tai_Yo}</tt> +- <tt>\p{In_Takri}</tt> +- <tt>\p{In_Tamil}</tt> +- <tt>\p{In_Tamil_Supplement}</tt> +- <tt>\p{In_Tangsa}</tt> +- <tt>\p{In_Tangut}</tt> +- <tt>\p{In_Tangut_Components}</tt> +- <tt>\p{In_Tangut_Components_Supplement}</tt> +- <tt>\p{In_Tangut_Supplement}</tt> +- <tt>\p{In_Telugu}</tt> +- <tt>\p{In_Thaana}</tt> +- <tt>\p{In_Thai}</tt> +- <tt>\p{In_Tibetan}</tt> +- <tt>\p{In_Tifinagh}</tt> +- <tt>\p{In_Tirhuta}</tt> +- <tt>\p{In_Todhri}</tt> +- <tt>\p{In_Tolong_Siki}</tt> +- <tt>\p{In_Toto}</tt> +- <tt>\p{In_Transport_and_Map_Symbols}</tt> +- <tt>\p{In_Tulu_Tigalari}</tt> +- <tt>\p{In_Ugaritic}</tt> +- <tt>\p{In_Unified_Canadian_Aboriginal_Syllabics}</tt> +- <tt>\p{In_Unified_Canadian_Aboriginal_Syllabics_Extended}</tt> +- <tt>\p{In_Unified_Canadian_Aboriginal_Syllabics_Extended_A}</tt> +- <tt>\p{In_Vai}</tt> +- <tt>\p{In_Variation_Selectors}</tt> +- <tt>\p{In_Variation_Selectors_Supplement}</tt> +- <tt>\p{In_Vedic_Extensions}</tt> +- <tt>\p{In_Vertical_Forms}</tt> +- <tt>\p{In_Vithkuqi}</tt> +- <tt>\p{In_Wancho}</tt> +- <tt>\p{In_Warang_Citi}</tt> +- <tt>\p{In_Yezidi}</tt> +- <tt>\p{In_Yi_Radicals}</tt> +- <tt>\p{In_Yi_Syllables}</tt> +- <tt>\p{In_Yijing_Hexagram_Symbols}</tt> +- <tt>\p{In_Zanabazar_Square}</tt> +- <tt>\p{In_Znamenny_Musical_Notation}</tt> + +=== Emoji + +- <tt>\p{Emoji}</tt> +- <tt>\p{Emoji_Component}</tt>, <tt>\p{EComp}</tt> +- <tt>\p{Emoji_Modifier}</tt>, <tt>\p{EMod}</tt> +- <tt>\p{Emoji_Modifier_Base}</tt>, <tt>\p{EBase}</tt> +- <tt>\p{Emoji_Presentation}</tt>, <tt>\p{EPres}</tt> +- <tt>\p{Extended_Pictographic}</tt>, <tt>\p{ExtPict}</tt> + +=== Graphemes + +- <tt>\p{Grapheme_Cluster_Break_CR}</tt> +- <tt>\p{Grapheme_Cluster_Break_Control}</tt> +- <tt>\p{Grapheme_Cluster_Break_Extend}</tt> +- <tt>\p{Grapheme_Cluster_Break_L}</tt> +- <tt>\p{Grapheme_Cluster_Break_LF}</tt> +- <tt>\p{Grapheme_Cluster_Break_LV}</tt> +- <tt>\p{Grapheme_Cluster_Break_LVT}</tt> +- <tt>\p{Grapheme_Cluster_Break_Prepend}</tt> +- <tt>\p{Grapheme_Cluster_Break_Regional_Indicator}</tt> +- <tt>\p{Grapheme_Cluster_Break_SpacingMark}</tt> +- <tt>\p{Grapheme_Cluster_Break_T}</tt> +- <tt>\p{Grapheme_Cluster_Break_V}</tt> +- <tt>\p{Grapheme_Cluster_Break_ZWJ}</tt> + +=== Derived Ages + +- <tt>\p{Age_10_0}</tt> +- <tt>\p{Age_11_0}</tt> +- <tt>\p{Age_12_0}</tt> +- <tt>\p{Age_12_1}</tt> +- <tt>\p{Age_13_0}</tt> +- <tt>\p{Age_14_0}</tt> +- <tt>\p{Age_15_0}</tt> +- <tt>\p{Age_15_1}</tt> +- <tt>\p{Age_16_0}</tt> +- <tt>\p{Age_17_0}</tt> +- <tt>\p{Age_1_1}</tt> +- <tt>\p{Age_2_0}</tt> +- <tt>\p{Age_2_1}</tt> +- <tt>\p{Age_3_0}</tt> +- <tt>\p{Age_3_1}</tt> +- <tt>\p{Age_3_2}</tt> +- <tt>\p{Age_4_0}</tt> +- <tt>\p{Age_4_1}</tt> +- <tt>\p{Age_5_0}</tt> +- <tt>\p{Age_5_1}</tt> +- <tt>\p{Age_5_2}</tt> +- <tt>\p{Age_6_0}</tt> +- <tt>\p{Age_6_1}</tt> +- <tt>\p{Age_6_2}</tt> +- <tt>\p{Age_6_3}</tt> +- <tt>\p{Age_7_0}</tt> +- <tt>\p{Age_8_0}</tt> +- <tt>\p{Age_9_0}</tt> diff --git a/doc/language/signals.rdoc b/doc/language/signals.rdoc new file mode 100644 index 0000000000..a82dab81c6 --- /dev/null +++ b/doc/language/signals.rdoc @@ -0,0 +1,106 @@ += Caveats for implementing Signal.trap callbacks + +As with implementing signal handlers in C or most other languages, +all code passed to Signal.trap must be reentrant. If you are not +familiar with reentrancy, you need to read up on it at +{Wikipedia}[https://en.wikipedia.org/wiki/Reentrancy_(computing)] or +elsewhere before reading the rest of this document. + +Most importantly, "thread-safety" does not guarantee reentrancy; +and methods such as Mutex#lock and Mutex#synchronize which are +commonly used for thread-safety even prevent reentrancy. + +== An implementation detail of the Ruby VM + +The Ruby VM defers Signal.trap callbacks from running until it is safe +for its internal data structures, but it does not know when it is safe +for data structures in YOUR code. Ruby implements deferred signal +handling by registering short C functions with only +{async-signal-safe functions}[http://man7.org/linux/man-pages/man7/signal-safety.7.html] as +signal handlers. These short C functions only do enough to tell the VM to +run callbacks registered via Signal.trap later in the main Ruby Thread. + +== Unsafe methods to call in Signal.trap blocks + +When in doubt, consider anything not listed as safe below as being +unsafe. + +* Mutex#lock, Mutex#synchronize and any code using them are explicitly + unsafe. This includes Monitor in the standard library which uses + Mutex to provide reentrancy. + +* Dir.chdir with block + +* any IO write operations when IO#sync is false; + including IO#write, IO#write_nonblock, IO#puts. + Pipes and sockets default to `IO#sync = true', so it is safe to + write to them unless IO#sync was disabled. + +* File#flock, as the underlying flock(2) call is not specified by POSIX + +== Commonly safe operations inside Signal.trap blocks + +* Assignment and retrieval of local, instance, and class variables + +* Most object allocations and initializations of common types + including Array, Hash, String, Struct, Time. + +* Common Array, Hash, String, Struct operations which do not execute a block + are generally safe; but beware if iteration is occurring elsewhere. + +* Hash#[], Hash#[]= (unless Hash.new was given an unsafe block) + +* Thread::Queue#push and Thread::SizedQueue#push (since Ruby 2.1) + +* Creating a new Thread via Thread.new/Thread.start can used to get + around the unusability of Mutexes inside a signal handler + +* Signal.trap is safe to use inside blocks passed to Signal.trap + +* arithmetic on Integer and Float (`+', `-', '%', '*', '/') + + Additionally, signal handlers do not run between two successive + local variable accesses, so shortcuts such as `+=' and `-=' will + not trigger a data race when used on Integer and Float classes in + signal handlers. + +== System call wrapper methods which are safe inside Signal.trap + +Since Ruby has wrappers around many +{async-signal-safe C functions}[http://man7.org/linux/man-pages/man7/signal-safety.7.html] +the corresponding wrappers for many IO, File, Dir, and Socket methods +are safe. + +(Incomplete list) + +* Dir.chdir (without block arg) +* Dir.mkdir +* Dir.open +* File#truncate +* File.link +* File.open +* File.readlink +* File.rename +* File.stat +* File.symlink +* File.truncate +* File.unlink +* File.utime +* IO#close +* IO#dup +* IO#fsync +* IO#read +* IO#read_nonblock +* IO#stat +* IO#sysread +* IO#syswrite +* IO.select +* IO.pipe +* Process.clock_gettime +* Process.exit! +* Process.fork +* Process.kill +* Process.pid +* Process.ppid +* Process.waitpid +... diff --git a/doc/language/strftime_formatting.rdoc b/doc/language/strftime_formatting.rdoc new file mode 100644 index 0000000000..2bfa6b975e --- /dev/null +++ b/doc/language/strftime_formatting.rdoc @@ -0,0 +1,525 @@ += Formats for Dates and Times + +Several Ruby time-related classes have instance method +strftime+, +which returns a formatted string representing all or part of a date or time: + +- Date#strftime. +- DateTime#strftime. +- Time#strftime. + +Each of these methods takes optional argument +format+, +which has zero or more embedded _format_ _specifications_ (see below). + +Each of these methods returns the string resulting from replacing each +format specification embedded in +format+ with a string form +of one or more parts of the date or time. + +A simple example: + + Time.now.strftime('%H:%M:%S') # => "14:02:07" + +A format specification has the form: + + %[flags][width]conversion + +It consists of: + +- A leading percent character. +- Zero or more _flags_ (each is a character). +- An optional _width_ _specifier_ (an integer). +- A _conversion_ _specifier_ (a character). + +Except for the leading percent character, +the only required part is the conversion specifier, so we begin with that. + +== Conversion Specifiers + +=== \Date (Year, Month, Day) + +- <tt>%Y</tt> - Year including century, zero-padded: + + Time.now.strftime('%Y') # => "2022" + Time.new(-1000).strftime('%Y') # => "-1000" # Before common era. + Time.new(10000).strftime('%Y') # => "10000" # Far future. + Time.new(10).strftime('%Y') # => "0010" # Zero-padded by default. + +- <tt>%y</tt> - Year without century, in range (0.99), zero-padded: + + Time.now.strftime('%y') # => "22" + Time.new(1).strftime('%y') # => "01" # Zero-padded by default. + +- <tt>%C</tt> - Century, zero-padded: + + Time.now.strftime('%C') # => "20" + Time.new(-1000).strftime('%C') # => "-10" # Before common era. + Time.new(10000).strftime('%C') # => "100" # Far future. + Time.new(100).strftime('%C') # => "01" # Zero-padded by default. + +- <tt>%m</tt> - Month of the year, in range (1..12), zero-padded: + + Time.new(2022, 1).strftime('%m') # => "01" # Zero-padded by default. + Time.new(2022, 12).strftime('%m') # => "12" + +- <tt>%B</tt> - Full month name, capitalized: + + Time.new(2022, 1).strftime('%B') # => "January" + Time.new(2022, 12).strftime('%B') # => "December" + +- <tt>%b</tt> - Abbreviated month name, capitalized: + + Time.new(2022, 1).strftime('%b') # => "Jan" + Time.new(2022, 12).strftime('%h') # => "Dec" + +- <tt>%h</tt> - Same as <tt>%b</tt>. + +- <tt>%d</tt> - Day of the month, in range (1..31), zero-padded: + + Time.new(2002, 1, 1).strftime('%d') # => "01" + Time.new(2002, 1, 31).strftime('%d') # => "31" + +- <tt>%e</tt> - Day of the month, in range (1..31), blank-padded: + + Time.new(2002, 1, 1).strftime('%e') # => " 1" + Time.new(2002, 1, 31).strftime('%e') # => "31" + +- <tt>%j</tt> - Day of the year, in range (1..366), zero-padded: + + Time.new(2002, 1, 1).strftime('%j') # => "001" + Time.new(2002, 12, 31).strftime('%j') # => "365" + +=== \Time (Hour, Minute, Second, Subsecond) + +- <tt>%H</tt> - Hour of the day, in range (0..23), zero-padded: + + Time.new(2022, 1, 1, 1).strftime('%H') # => "01" + Time.new(2022, 1, 1, 13).strftime('%H') # => "13" + +- <tt>%k</tt> - Hour of the day, in range (0..23), blank-padded: + + Time.new(2022, 1, 1, 1).strftime('%k') # => " 1" + Time.new(2022, 1, 1, 13).strftime('%k') # => "13" + +- <tt>%I</tt> - Hour of the day, in range (1..12), zero-padded: + + Time.new(2022, 1, 1, 1).strftime('%I') # => "01" + Time.new(2022, 1, 1, 13).strftime('%I') # => "01" + +- <tt>%l</tt> - Hour of the day, in range (1..12), blank-padded: + + Time.new(2022, 1, 1, 1).strftime('%l') # => " 1" + Time.new(2022, 1, 1, 13).strftime('%l') # => " 1" + +- <tt>%P</tt> - Meridian indicator, lowercase: + + Time.new(2022, 1, 1, 1).strftime('%P') # => "am" + Time.new(2022, 1, 1, 13).strftime('%P') # => "pm" + +- <tt>%p</tt> - Meridian indicator, uppercase: + + Time.new(2022, 1, 1, 1).strftime('%p') # => "AM" + Time.new(2022, 1, 1, 13).strftime('%p') # => "PM" + +- <tt>%M</tt> - Minute of the hour, in range (0..59), zero-padded: + + Time.new(2022, 1, 1, 1, 0, 0).strftime('%M') # => "00" + +- <tt>%S</tt> - Second of the minute in range (0..59), zero-padded: + + Time.new(2022, 1, 1, 1, 0, 0, 0).strftime('%S') # => "00" + +- <tt>%L</tt> - Millisecond of the second, in range (0..999), zero-padded: + + Time.new(2022, 1, 1, 1, 0, 0, 0).strftime('%L') # => "000" + +- <tt>%N</tt> - Fractional seconds, default width is 9 digits (nanoseconds): + + t = Time.now # => 2022-06-29 07:10:20.3230914 -0500 + t.strftime('%N') # => "323091400" # Default. + + Use {width specifiers}[rdoc-ref:@Width+Specifiers] + to adjust units: + + t.strftime('%3N') # => "323" # Milliseconds. + t.strftime('%6N') # => "323091" # Microseconds. + t.strftime('%9N') # => "323091400" # Nanoseconds. + t.strftime('%12N') # => "323091400000" # Picoseconds. + t.strftime('%15N') # => "323091400000000" # Femptoseconds. + t.strftime('%18N') # => "323091400000000000" # Attoseconds. + t.strftime('%21N') # => "323091400000000000000" # Zeptoseconds. + t.strftime('%24N') # => "323091400000000000000000" # Yoctoseconds. + +- <tt>%s</tt> - Number of seconds since the epoch: + + Time.now.strftime('%s') # => "1656505136" + +=== Timezone + +- <tt>%z</tt> - Timezone as hour and minute offset from UTC: + + Time.now.strftime('%z') # => "-0500" + +- <tt>%Z</tt> - Timezone name (platform-dependent): + + Time.now.strftime('%Z') # => "Central Daylight Time" + +=== Weekday + +- <tt>%A</tt> - Full weekday name: + + Time.now.strftime('%A') # => "Wednesday" + +- <tt>%a</tt> - Abbreviated weekday name: + + Time.now.strftime('%a') # => "Wed" + +- <tt>%u</tt> - Day of the week, in range (1..7), Monday is 1: + + t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500 + t.strftime('%a') # => "Sun" + t.strftime('%u') # => "7" + +- <tt>%w</tt> - Day of the week, in range (0..6), Sunday is 0: + + t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500 + t.strftime('%a') # => "Sun" + t.strftime('%w') # => "0" + +=== Week Number + +- <tt>%U</tt> - Week number of the year, in range (0..53), zero-padded, + where each week begins on a Sunday: + + t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500 + t.strftime('%a') # => "Sun" + t.strftime('%U') # => "26" + +- <tt>%W</tt> - Week number of the year, in range (0..53), zero-padded, + where each week begins on a Monday: + + t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500 + t.strftime('%a') # => "Sun" + t.strftime('%W') # => "25" + +=== Week Dates + +See {ISO 8601 week dates}[https://en.wikipedia.org/wiki/ISO_8601#Week_dates]. + + t0 = Time.new(2023, 1, 1) # => 2023-01-01 00:00:00 -0600 + t1 = Time.new(2024, 1, 1) # => 2024-01-01 00:00:00 -0600 + +- <tt>%G</tt> - Week-based year: + + t0.strftime('%G') # => "2022" + t1.strftime('%G') # => "2024" + +- <tt>%g</tt> - Week-based year without century, in range (0..99), zero-padded: + + t0.strftime('%g') # => "22" + t1.strftime('%g') # => "24" + +- <tt>%V</tt> - Week number of the week-based year, in range (1..53), + zero-padded: + + t0.strftime('%V') # => "52" + t1.strftime('%V') # => "01" + +=== Literals + +- <tt>%n</tt> - Newline character "\n": + + Time.now.strftime('%n') # => "\n" + +- <tt>%t</tt> - Tab character "\t": + + Time.now.strftime('%t') # => "\t" + +- <tt>%%</tt> - Percent character '%': + + Time.now.strftime('%%') # => "%" + +=== Shorthand Conversion Specifiers + +Each shorthand specifier here is shown with its corresponding +longhand specifier. + +- <tt>%c</tt> - \Date and time: + + Time.now.strftime('%c') # => "Wed Jun 29 08:01:41 2022" + Time.now.strftime('%a %b %e %T %Y') # => "Wed Jun 29 08:02:07 2022" + +- <tt>%D</tt> - \Date: + + Time.now.strftime('%D') # => "06/29/22" + Time.now.strftime('%m/%d/%y') # => "06/29/22" + +- <tt>%F</tt> - ISO 8601 date: + + Time.now.strftime('%F') # => "2022-06-29" + Time.now.strftime('%Y-%m-%d') # => "2022-06-29" + +- <tt>%v</tt> - VMS date: + + Time.now.strftime('%v') # => "29-JUN-2022" + Time.now.strftime('%e-%^b-%4Y') # => "29-JUN-2022" + +- <tt>%x</tt> - Same as <tt>%D</tt>. + +- <tt>%X</tt> - Same as <tt>%T</tt>. + +- <tt>%r</tt> - 12-hour time: + + Time.new(2022, 1, 1, 1).strftime('%r') # => "01:00:00 AM" + Time.new(2022, 1, 1, 1).strftime('%I:%M:%S %p') # => "01:00:00 AM" + Time.new(2022, 1, 1, 13).strftime('%r') # => "01:00:00 PM" + Time.new(2022, 1, 1, 13).strftime('%I:%M:%S %p') # => "01:00:00 PM" + +- <tt>%R</tt> - 24-hour time: + + Time.new(2022, 1, 1, 1).strftime('%R') # => "01:00" + Time.new(2022, 1, 1, 1).strftime('%H:%M') # => "01:00" + Time.new(2022, 1, 1, 13).strftime('%R') # => "13:00" + Time.new(2022, 1, 1, 13).strftime('%H:%M') # => "13:00" + +- <tt>%T</tt> - 24-hour time: + + Time.new(2022, 1, 1, 1).strftime('%T') # => "01:00:00" + Time.new(2022, 1, 1, 1).strftime('%H:%M:%S') # => "01:00:00" + Time.new(2022, 1, 1, 13).strftime('%T') # => "13:00:00" + Time.new(2022, 1, 1, 13).strftime('%H:%M:%S') # => "13:00:00" + +- <tt>%+</tt> (not supported in Time#strftime) - \Date and time: + + DateTime.now.strftime('%+') + # => "Wed Jun 29 08:31:53 -05:00 2022" + DateTime.now.strftime('%a %b %e %H:%M:%S %Z %Y') + # => "Wed Jun 29 08:32:18 -05:00 2022" + +== Flags + +Flags may affect certain formatting specifications. + +Multiple flags may be given with a single conversion specified; +order does not matter. + +=== Padding Flags + +- <tt>0</tt> - Pad with zeroes: + + Time.new(10).strftime('%0Y') # => "0010" + +- <tt>_</tt> - Pad with blanks: + + Time.new(10).strftime('%_Y') # => " 10" + +- <tt>-</tt> - Don't pad: + + Time.new(10).strftime('%-Y') # => "10" + +=== Casing Flags + +- <tt>^</tt> - Upcase result: + + Time.new(2022, 1).strftime('%B') # => "January" # No casing flag. + Time.new(2022, 1).strftime('%^B') # => "JANUARY" + +- <tt>#</tt> - Swapcase result: + + Time.now.strftime('%p') # => "AM" + Time.now.strftime('%^p') # => "AM" + Time.now.strftime('%#p') # => "am" + +=== Timezone Flags + +- <tt>:</tt> - Put timezone as colon-separated hours and minutes: + + Time.now.strftime('%:z') # => "-05:00" + +- <tt>::</tt> - Put timezone as colon-separated hours, minutes, and seconds: + + Time.now.strftime('%::z') # => "-05:00:00" + +== Width Specifiers + +The integer width specifier gives a minimum width for the returned string: + + Time.new(2002).strftime('%Y') # => "2002" # No width specifier. + Time.new(2002).strftime('%10Y') # => "0000002002" + Time.new(2002, 12).strftime('%B') # => "December" # No width specifier. + Time.new(2002, 12).strftime('%10B') # => " December" + Time.new(2002, 12).strftime('%3B') # => "December" # Ignored if too small. + += Specialized Format Strings + +Here are a few specialized format strings, +each based on an external standard. + +== HTTP Format + +The HTTP date format is based on +{RFC 2616}[https://www.rfc-editor.org/rfc/rfc2616], +and treats dates in the format <tt>'%a, %d %b %Y %T GMT'</tt>: + + d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03> + # Return HTTP-formatted string. + httpdate = d.httpdate # => "Sat, 03 Feb 2001 00:00:00 GMT" + # Return new date parsed from HTTP-formatted string. + Date.httpdate(httpdate) # => #<Date: 2001-02-03> + # Return hash parsed from HTTP-formatted string. + Date._httpdate(httpdate) + # => {:wday=>6, :mday=>3, :mon=>2, :year=>2001, :hour=>0, :min=>0, :sec=>0, :zone=>"GMT", :offset=>0} + +== RFC 3339 Format + +The RFC 3339 date format is based on +{RFC 3339}[https://www.rfc-editor.org/rfc/rfc3339]: + + d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03> + # Return 3339-formatted string. + rfc3339 = d.rfc3339 # => "2001-02-03T00:00:00+00:00" + # Return new date parsed from 3339-formatted string. + Date.rfc3339(rfc3339) # => #<Date: 2001-02-03> + # Return hash parsed from 3339-formatted string. + Date._rfc3339(rfc3339) + # => {:year=>2001, :mon=>2, :mday=>3, :hour=>0, :min=>0, :sec=>0, :zone=>"+00:00", :offset=>0} + +== RFC 2822 Format + +The RFC 2822 date format is based on +{RFC 2822}[https://www.rfc-editor.org/rfc/rfc2822], +and treats dates in the format <tt>'%a, %-d %b %Y %T %z'</tt>]: + + d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03> + # Return 2822-formatted string. + rfc2822 = d.rfc2822 # => "Sat, 3 Feb 2001 00:00:00 +0000" + # Return new date parsed from 2822-formatted string. + Date.rfc2822(rfc2822) # => #<Date: 2001-02-03> + # Return hash parsed from 2822-formatted string. + Date._rfc2822(rfc2822) + # => {:wday=>6, :mday=>3, :mon=>2, :year=>2001, :hour=>0, :min=>0, :sec=>0, :zone=>"+0000", :offset=>0} + +== JIS X 0301 Format + +The JIS X 0301 format includes the +{Japanese era name}[https://en.wikipedia.org/wiki/Japanese_era_name], +and treats dates in the format <tt>'%Y-%m-%d'</tt> +with the first letter of the romanized era name prefixed: + + d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03> + # Return 0301-formatted string. + jisx0301 = d.jisx0301 # => "H13.02.03" + # Return new date parsed from 0301-formatted string. + Date.jisx0301(jisx0301) # => #<Date: 2001-02-03> + # Return hash parsed from 0301-formatted string. + Date._jisx0301(jisx0301) # => {:year=>2001, :mon=>2, :mday=>3} + +== ISO 8601 Format Specifications + +This section shows format specifications that are compatible with +{ISO 8601}[https://en.wikipedia.org/wiki/ISO_8601]. +Details for various formats may be seen at the links. + +Examples in this section assume: + + t = Time.now # => 2022-06-29 16:49:25.465246 -0500 + +=== Dates + +See {ISO 8601 dates}[https://en.wikipedia.org/wiki/ISO_8601#Dates]. + +- {Years}[https://en.wikipedia.org/wiki/ISO_8601#Years]: + + - Basic year (+YYYY+): + + t.strftime('%Y') # => "2022" + + - Expanded year (<tt>±YYYYY</tt>): + + t.strftime('+%5Y') # => "+02022" + t.strftime('-%5Y') # => "-02022" + +- {Calendar dates}[https://en.wikipedia.org/wiki/ISO_8601#Calendar_dates]: + + - Basic date (+YYYYMMDD+): + + t.strftime('%Y%m%d') # => "20220629" + + - Extended date (<tt>YYYY-MM-DD</tt>): + + t.strftime('%Y-%m-%d') # => "2022-06-29" + + - Reduced extended date (<tt>YYYY-MM</tt>): + + t.strftime('%Y-%m') # => "2022-06" + +- {Week dates}[https://en.wikipedia.org/wiki/ISO_8601#Week_dates]: + + - Basic date (+YYYYWww+ or +YYYYWwwD+): + + t.strftime('%Y%Ww') # => "202226w" + t.strftime('%Y%Ww%u') # => "202226w3" + + - Extended date (<tt>YYYY-Www</tt> or <tt>YYYY-Www-D<tt>): + + t.strftime('%Y-%Ww') # => "2022-26w" + t.strftime('%Y-%Ww-%u') # => "2022-26w-3" + +- {Ordinal dates}[https://en.wikipedia.org/wiki/ISO_8601#Ordinal_dates]: + + - Basic date (+YYYYDDD+): + + t.strftime('%Y%j') # => "2022180" + + - Extended date (<tt>YYYY-DDD</tt>): + + t.strftime('%Y-%j') # => "2022-180" + +=== Times + +See {ISO 8601 times}[https://en.wikipedia.org/wiki/ISO_8601#Times]. + +- Times: + + - Basic time (+Thhmmss.sss+, +Thhmmss+, +Thhmm+, or +Thh+): + + t.strftime('T%H%M%S.%L') # => "T164925.465" + t.strftime('T%H%M%S') # => "T164925" + t.strftime('T%H%M') # => "T1649" + t.strftime('T%H') # => "T16" + + - Extended time (+Thh:mm:ss.sss+, +Thh:mm:ss+, or +Thh:mm+): + + t.strftime('T%H:%M:%S.%L') # => "T16:49:25.465" + t.strftime('T%H:%M:%S') # => "T16:49:25" + t.strftime('T%H:%M') # => "T16:49" + +- {Time zone designators}[https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]: + + - Timezone (+time+ represents a valid time, + +hh+ represents a valid 2-digit hour, + and +mm+ represents a valid 2-digit minute): + + - Basic timezone (<tt>time±hhmm</tt>, <tt>time±hh</tt>, or +timeZ+): + + t.strftime('T%H%M%S%z') # => "T164925-0500" + t.strftime('T%H%M%S%z').slice(0..-3) # => "T164925-05" + t.strftime('T%H%M%SZ') # => "T164925Z" + + - Extended timezone (<tt>time±hh:mm</tt>): + + t.strftime('T%H:%M:%S%z') # => "T16:49:25-0500" + + - See also: + + - {Local time (unqualified)}[https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)]. + - {Coordinated Universal Time (UTC)}[https://en.wikipedia.org/wiki/ISO_8601#Coordinated_Universal_Time_(UTC)]. + - {Time offsets from UTC}[https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC]. + +=== Combined \Date and \Time + +See {ISO 8601 Combined date and time representations}[https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations]. + +An ISO 8601 combined date and time representation may be any +ISO 8601 date and any ISO 8601 time, +separated by the letter +T+. + +For the relevant +strftime+ formats, see {Dates}[rdoc-ref:@Dates] and {Times}[rdoc-ref:@Times] above. |
