diff options
Diffstat (limited to 'doc')
177 files changed, 7815 insertions, 5475 deletions
diff --git a/doc/.document b/doc/.document index 6e6caa8333..337289a662 100644 --- a/doc/.document +++ b/doc/.document @@ -1,12 +1,14 @@ -*.md -*.rb +[^_]*.md +[^_]*.rb [^_]*.rdoc contributing +distribution NEWS syntax optparse -date -rdoc -regexp -yjit -ruby +jit +security +language +strscan +file + diff --git a/doc/NEWS/NEWS-4.0.0.md b/doc/NEWS/NEWS-4.0.0.md new file mode 100644 index 0000000000..5d932fbf5d --- /dev/null +++ b/doc/NEWS/NEWS-4.0.0.md @@ -0,0 +1,802 @@ +# NEWS for Ruby 4.0.0 + +This document is a list of user-visible feature changes +since the **3.4.0** release, except for bug fixes. + +Note that each entry is kept to a minimum, see links for details. + +## Language changes + +* `*nil` no longer calls `nil.to_a`, similar to how `**nil` does + not call `nil.to_hash`. [[Feature #21047]] + +* Logical binary operators (`||`, `&&`, `and` and `or`) at the + beginning of a line continue the previous line, like fluent dot. + The following code examples are equal: + + ```ruby + if condition1 + && condition2 + ... + end + ``` + + Previously: + + ```ruby + if condition1 && condition2 + ... + end + ``` + + ```ruby + if condition1 && + condition2 + ... + end + ``` + + [[Feature #20925]] + +## Core classes updates + +Note: We're only listing outstanding class updates. + +* Array + + * `Array#rfind` has been added as a more efficient alternative to `array.reverse_each.find` [[Feature #21678]] + * `Array#find` has been added as a more efficient override of `Enumerable#find` [[Feature #21678]] +* Binding + + * `Binding#local_variables` does no longer include numbered parameters. + Also, `Binding#local_variable_get`, `Binding#local_variable_set`, and + `Binding#local_variable_defined?` reject to handle numbered parameters. + [[Bug #21049]] + + * `Binding#implicit_parameters`, `Binding#implicit_parameter_get`, and + `Binding#implicit_parameter_defined?` have been added to access + numbered parameters and "it" parameter. [[Bug #21049]] + +* Enumerator + + * `Enumerator.produce` now accepts an optional `size` keyword argument + to specify the size of the enumerator. It can be an integer, + `Float::INFINITY`, a callable object (such as a lambda), or `nil` to + indicate unknown size. When not specified, the size defaults to + `Float::INFINITY`. + + ```ruby + # Infinite enumerator + enum = Enumerator.produce(1, size: Float::INFINITY, &:succ) + enum.size # => Float::INFINITY + + # Finite enumerator with known/computable size + abs_dir = File.expand_path("./baz") # => "/foo/bar/baz" + traverser = Enumerator.produce(abs_dir, size: -> { abs_dir.count("/") + 1 }) { + raise StopIteration if it == "/" + File.dirname(it) + } + traverser.size # => 4 + ``` + + [[Feature #21701]] + +* ErrorHighlight + + * When an ArgumentError is raised, it now displays code snippets for + both the method call (caller) and the method definition (callee). + [[Feature #21543]] + + ``` + test.rb:1:in 'Object#add': wrong number of arguments (given 1, expected 2) (ArgumentError) + + caller: test.rb:3 + | add(1) + ^^^ + callee: test.rb:1 + | def add(x, y) = x + y + ^^^ + from test.rb:3:in '<main>' + ``` + +* Fiber + + * Introduce support for `Fiber#raise(cause:)` argument similar to + `Kernel#raise`. [[Feature #21360]] + +* Fiber::Scheduler + + * Introduce `Fiber::Scheduler#fiber_interrupt` to interrupt a fiber with a + given exception. The initial use case is to interrupt a fiber that is + waiting on a blocking IO operation when the IO operation is closed. + [[Feature #21166]] + + * Introduce `Fiber::Scheduler#yield` to allow the fiber scheduler to + continue processing when signal exceptions are disabled. + [[Bug #21633]] + + * Reintroduce the `Fiber::Scheduler#io_close` hook for asynchronous `IO#close`. + + * Invoke `Fiber::Scheduler#io_write` when flushing the IO write buffer. + [[Bug #21789]] + +* File + + * `File::Stat#birthtime` is now available on Linux via the statx + system call when supported by the kernel and filesystem. + [[Feature #21205]] + +* IO + + * `IO.select` accepts `Float::INFINITY` as a timeout argument. + [[Feature #20610]] + + * A deprecated behavior, process creation by `IO` class methods + with a leading `|`, was removed. [[Feature #19630]] + +* Kernel + + * `Kernel#inspect` now checks for the existence of a `#instance_variables_to_inspect` method, + allowing control over which instance variables are displayed in the `#inspect` string: + + ```ruby + class DatabaseConfig + def initialize(host, user, password) + @host = host + @user = user + @password = password + end + + private def instance_variables_to_inspect = [:@host, :@user] + end + + conf = DatabaseConfig.new("localhost", "root", "hunter2") + conf.inspect #=> #<DatabaseConfig:0x0000000104def350 @host="localhost", @user="root"> + ``` + + [[Feature #21219]] + + * A deprecated behavior, process creation by `Kernel#open` with a + leading `|`, was removed. [[Feature #19630]] + +* Math + + * `Math.log1p` and `Math.expm1` are added. [[Feature #21527]] + +* Pathname + + * Pathname has been promoted from a default gem to a core class of Ruby. + [[Feature #17473]] + +* Proc + + * `Proc#parameters` now shows anonymous optional parameters as `[:opt]` + instead of `[:opt, nil]`, making the output consistent with when the + anonymous parameter is required. [[Bug #20974]] + +* Ractor + + * `Ractor::Port` class was added for a new synchronization mechanism + to communicate between Ractors. [[Feature #21262]] + + ```ruby + port1 = Ractor::Port.new + port2 = Ractor::Port.new + Ractor.new port1, port2 do |port1, port2| + port1 << 1 + port2 << 11 + port1 << 2 + port2 << 12 + end + 2.times{ p port1.receive } #=> 1, 2 + 2.times{ p port2.receive } #=> 11, 12 + ``` + + `Ractor::Port` provides the following methods: + + * `Ractor::Port#receive` + * `Ractor::Port#send` (or `Ractor::Port#<<`) + * `Ractor::Port#close` + * `Ractor::Port#closed?` + + As a result, `Ractor.yield` and `Ractor#take` were removed. + + * `Ractor#join` and `Ractor#value` were added to wait for the + termination of a Ractor. These are similar to `Thread#join` + and `Thread#value`. + + * `Ractor#monitor` and `Ractor#unmonitor` were added as low-level + interfaces used internally to implement `Ractor#join`. + + * `Ractor.select` now only accepts Ractors and Ports. If Ractors are given, + it returns when a Ractor terminates. + + * `Ractor#default_port` was added. Each `Ractor` has a default port, + which is used by `Ractor.send`, `Ractor.receive`. + + * `Ractor#close_incoming` and `Ractor#close_outgoing` were removed. + + * `Ractor.shareable_proc` and `Ractor.shareable_lambda` are introduced + to make shareable Proc or lambda. + [[Feature #21550]], [[Feature #21557]] + +* Range + + * `Range#to_set` now performs size checks to prevent issues with + endless ranges. [[Bug #21654]] + + * `Range#overlap?` now correctly handles infinite (unbounded) ranges. + [[Bug #21185]] + + * `Range#max` behavior on beginless integer ranges has been fixed. + [[Bug #21174]] [[Bug #21175]] + +* Ruby + + * A new toplevel module `Ruby` has been defined, which contains + Ruby-related constants. This module was reserved in Ruby 3.4 + and is now officially defined. [[Feature #20884]] + +* Ruby::Box + + * A new (experimental) feature to provide separation about definitions. + For the detail of "Ruby Box", see [doc/language/box.md](doc/language/box.md). + [[Feature #21311]] [[Misc #21385]] + +* Set + + * `Set` is now a core class, instead of an autoloaded stdlib class. + [[Feature #21216]] + + * `Set#inspect` now uses a simpler display, similar to literal arrays. + (e.g., `Set[1, 2, 3]` instead of `#<Set: {1, 2, 3}>`). [[Feature #21389]] + + * Passing arguments to `Set#to_set` and `Enumerable#to_set` is now deprecated. + [[Feature #21390]] + +* Socket + + * `Socket.tcp` & `TCPSocket.new` accepts an `open_timeout` keyword argument to specify + the timeout for the initial connection. [[Feature #21347]] + * When a user-specified timeout occurred in `TCPSocket.new`, either `Errno::ETIMEDOUT` + or `IO::TimeoutError` could previously be raised depending on the situation. + This behavior has been unified so that `IO::TimeoutError` is now consistently raised. + (Please note that, in `Socket.tcp`, there are still cases where `Errno::ETIMEDOUT` + may be raised in similar situations, and that in both cases `Errno::ETIMEDOUT` may be + raised when the timeout occurs at the OS level.) + +* String + + * Update Unicode to Version 17.0.0 and Emoji Version 17.0. + [[Feature #19908]][[Feature #20724]][[Feature #21275]] (also applies to Regexp) + + * `String#strip`, `strip!`, `lstrip`, `lstrip!`, `rstrip`, and `rstrip!` + are extended to accept `*selectors` arguments. [[Feature #21552]] + +* Thread + + * Introduce support for `Thread#raise(cause:)` argument similar to + `Kernel#raise`. [[Feature #21360]] + +## Stdlib updates + +We only list stdlib changes that are notable feature changes. + +Other changes are listed in the following sections. We also listed release +history from the previous bundled version that is Ruby 3.4.0 if it has GitHub +releases. + +The following bundled gems are promoted from default gems. + +* ostruct 0.6.3 + * 0.6.1 to [v0.6.2][ostruct-v0.6.2], [v0.6.3][ostruct-v0.6.3] +* pstore 0.2.0 + * 0.1.4 to [v0.2.0][pstore-v0.2.0] +* benchmark 0.5.0 + * 0.4.0 to [v0.4.1][benchmark-v0.4.1], [v0.5.0][benchmark-v0.5.0] +* logger 1.7.0 + * 1.6.4 to [v1.6.5][logger-v1.6.5], [v1.6.6][logger-v1.6.6], [v1.7.0][logger-v1.7.0] +* rdoc 7.0.3 + * 6.14.0 to [v6.14.1][rdoc-v6.14.1], [v6.14.2][rdoc-v6.14.2], [v6.15.0][rdoc-v6.15.0], [v6.15.1][rdoc-v6.15.1], [v6.16.0][rdoc-v6.16.0], [v6.16.1][rdoc-v6.16.1], [v6.17.0][rdoc-v6.17.0], [v7.0.0][rdoc-v7.0.0], [v7.0.1][rdoc-v7.0.1], [v7.0.2][rdoc-v7.0.2], [v7.0.3][rdoc-v7.0.3] +* win32ole 1.9.2 + * 1.9.1 to [v1.9.2][win32ole-v1.9.2] +* irb 1.16.0 + * 1.14.3 to [v1.15.0][irb-v1.15.0], [v1.15.1][irb-v1.15.1], [v1.15.2][irb-v1.15.2], [v1.15.3][irb-v1.15.3], [v1.16.0][irb-v1.16.0] +* reline 0.6.3 + * 0.6.0 to [v0.6.1][reline-v0.6.1], [v0.6.2][reline-v0.6.2], [v0.6.3][reline-v0.6.3] +* readline 0.0.4 +* fiddle 1.1.8 + * 1.1.6 to [v1.1.7][fiddle-v1.1.7], [v1.1.8][fiddle-v1.1.8] + +The following default gem is added. + +* win32-registry 0.1.2 + +The following default gems are updated. + +* RubyGems 4.0.3 +* bundler 4.0.3 +* date 3.5.1 + * 3.4.1 to [v3.5.0][date-v3.5.0], [v3.5.1][date-v3.5.1] +* delegate 0.6.1 + * 0.4.0 to [v0.5.0][delegate-v0.5.0], [v0.6.0][delegate-v0.6.0], [v0.6.1][delegate-v0.6.1] +* digest 3.2.1 + * 3.2.0 to [v3.2.1][digest-v3.2.1] +* english 0.8.1 + * 0.8.0 to [v0.8.1][english-v0.8.1] +* erb 6.0.1 + * 4.0.4 to [v5.1.2][erb-v5.1.2], [v5.1.3][erb-v5.1.3], [v6.0.0][erb-v6.0.0], [v6.0.1][erb-v6.0.1] +* error_highlight 0.7.1 +* etc 1.4.6 +* fcntl 1.3.0 + * 1.2.0 to [v1.3.0][fcntl-v1.3.0] +* fileutils 1.8.0 + * 1.7.3 to [v1.8.0][fileutils-v1.8.0] +* forwardable 1.4.0 + * 1.3.3 to [v1.4.0][forwardable-v1.4.0] +* io-console 0.8.2 + * 0.8.1 to [v0.8.2][io-console-v0.8.2] +* io-nonblock 0.3.2 +* io-wait 0.4.0 + * 0.3.2 to [v0.3.3][io-wait-v0.3.3], [v0.3.5.test1][io-wait-v0.3.5.test1], [v0.3.5][io-wait-v0.3.5], [v0.3.6][io-wait-v0.3.6], [v0.4.0][io-wait-v0.4.0] +* ipaddr 1.2.8 +* json 2.18.0 + * 2.9.1 to [v2.10.0][json-v2.10.0], [v2.10.1][json-v2.10.1], [v2.10.2][json-v2.10.2], [v2.11.0][json-v2.11.0], [v2.11.1][json-v2.11.1], [v2.11.2][json-v2.11.2], [v2.11.3][json-v2.11.3], [v2.12.0][json-v2.12.0], [v2.12.1][json-v2.12.1], [v2.12.2][json-v2.12.2], [v2.13.0][json-v2.13.0], [v2.13.1][json-v2.13.1], [v2.13.2][json-v2.13.2], [v2.14.0][json-v2.14.0], [v2.14.1][json-v2.14.1], [v2.15.0][json-v2.15.0], [v2.15.1][json-v2.15.1], [v2.15.2][json-v2.15.2], [v2.16.0][json-v2.16.0], [v2.17.0][json-v2.17.0], [v2.17.1][json-v2.17.1], [v2.18.0][json-v2.18.0] +* net-http 0.9.1 + * 0.6.0 to [v0.7.0][net-http-v0.7.0], [v0.8.0][net-http-v0.8.0], [v0.9.0][net-http-v0.9.0], [v0.9.1][net-http-v0.9.1] +* openssl 4.0.0 + * 3.3.1 to [v3.3.2][openssl-v3.3.2], [v4.0.0][openssl-v4.0.0] +* optparse 0.8.1 + * 0.6.0 to [v0.7.0][optparse-v0.7.0], [v0.8.0][optparse-v0.8.0], [v0.8.1][optparse-v0.8.1] +* pp 0.6.3 + * 0.6.2 to [v0.6.3][pp-v0.6.3] +* prism 1.7.0 + * 1.5.2 to [v1.6.0][prism-v1.6.0], [v1.7.0][prism-v1.7.0] +* psych 5.3.1 + * 5.2.2 to [v5.2.3][psych-v5.2.3], [v5.2.4][psych-v5.2.4], [v5.2.5][psych-v5.2.5], [v5.2.6][psych-v5.2.6], [v5.3.0][psych-v5.3.0], [v5.3.1][psych-v5.3.1] +* resolv 0.7.0 + * 0.6.2 to [v0.6.3][resolv-v0.6.3], [v0.7.0][resolv-v0.7.0] +* stringio 3.2.0 + * 3.1.2 to [v3.1.3][stringio-v3.1.3], [v3.1.4][stringio-v3.1.4], [v3.1.5][stringio-v3.1.5], [v3.1.6][stringio-v3.1.6], [v3.1.7][stringio-v3.1.7], [v3.1.8][stringio-v3.1.8], [v3.1.9][stringio-v3.1.9], [v3.2.0][stringio-v3.2.0] +* strscan 3.1.6 + * 3.1.2 to [v3.1.3][strscan-v3.1.3], [v3.1.4][strscan-v3.1.4], [v3.1.5][strscan-v3.1.5], [v3.1.6][strscan-v3.1.6] +* time 0.4.2 + * 0.4.1 to [v0.4.2][time-v0.4.2] +* timeout 0.6.0 + * 0.4.3 to [v0.4.4][timeout-v0.4.4], [v0.5.0][timeout-v0.5.0], [v0.6.0][timeout-v0.6.0] +* uri 1.1.1 + * 1.0.4 to [v1.1.0][uri-v1.1.0], [v1.1.1][uri-v1.1.1] +* weakref 0.1.4 + * 0.1.3 to [v0.1.4][weakref-v0.1.4] +* zlib 3.2.2 + * 3.2.1 to [v3.2.2][zlib-v3.2.2] + +The following bundled gems are updated. + +* minitest 6.0.0 +* power_assert 3.0.1 + * 2.0.5 to [v3.0.0][power_assert-v3.0.0], [v3.0.1][power_assert-v3.0.1] +* rake 13.3.1 + * 13.2.1 to [v13.3.0][rake-v13.3.0], [v13.3.1][rake-v13.3.1] +* test-unit 3.7.5 + * 3.6.7 to [3.6.8][test-unit-3.6.8], [3.6.9][test-unit-3.6.9], [3.7.0][test-unit-3.7.0], [3.7.1][test-unit-3.7.1], [3.7.2][test-unit-3.7.2], [3.7.3][test-unit-3.7.3], [3.7.4][test-unit-3.7.4], [3.7.5][test-unit-3.7.5] +* rexml 3.4.4 +* rss 0.3.2 + * 0.3.1 to [0.3.2][rss-0.3.2] +* net-ftp 0.3.9 + * 0.3.8 to [v0.3.9][net-ftp-v0.3.9] +* net-imap 0.6.2 + * 0.5.8 to [v0.5.9][net-imap-v0.5.9], [v0.5.10][net-imap-v0.5.10], [v0.5.11][net-imap-v0.5.11], [v0.5.12][net-imap-v0.5.12], [v0.5.13][net-imap-v0.5.13], [v0.6.0][net-imap-v0.6.0], [v0.6.1][net-imap-v0.6.1], [v0.6.2][net-imap-v0.6.2] +* net-smtp 0.5.1 + * 0.5.0 to [v0.5.1][net-smtp-v0.5.1] +* matrix 0.4.3 + * 0.4.2 to [v0.4.3][matrix-v0.4.3] +* prime 0.1.4 + * 0.1.3 to [v0.1.4][prime-v0.1.4] +* rbs 3.10.0 + * 3.8.0 to [v3.8.1][rbs-v3.8.1], [v3.9.0.dev.1][rbs-v3.9.0.dev.1], [v3.9.0.pre.1][rbs-v3.9.0.pre.1], [v3.9.0.pre.2][rbs-v3.9.0.pre.2], [v3.9.0][rbs-v3.9.0], [v3.9.1][rbs-v3.9.1], [v3.9.2][rbs-v3.9.2], [v3.9.3][rbs-v3.9.3], [v3.9.4][rbs-v3.9.4], [v3.9.5][rbs-v3.9.5], [v3.10.0.pre.1][rbs-v3.10.0.pre.1], [v3.10.0.pre.2][rbs-v3.10.0.pre.2], [v3.10.0][rbs-v3.10.0] +* typeprof 0.31.1 +* debug 1.11.1 + * 1.11.0 to [v1.11.1][debug-v1.11.1] +* base64 0.3.0 + * 0.2.0 to [v0.3.0][base64-v0.3.0] +* bigdecimal 4.0.1 + * 3.1.8 to [v3.2.0][bigdecimal-v3.2.0], [v3.2.1][bigdecimal-v3.2.1], [v3.2.2][bigdecimal-v3.2.2], [v3.2.3][bigdecimal-v3.2.3], [v3.3.0][bigdecimal-v3.3.0], [v3.3.1][bigdecimal-v3.3.1], [v4.0.0][bigdecimal-v4.0.0], [v4.0.1][bigdecimal-v4.0.1] +* drb 2.2.3 + * 2.2.1 to [v2.2.3][drb-v2.2.3] +* syslog 0.3.0 + * 0.2.0 to [v0.3.0][syslog-v0.3.0] +* csv 3.3.5 + * 3.3.2 to [v3.3.3][csv-v3.3.3], [v3.3.4][csv-v3.3.4], [v3.3.5][csv-v3.3.5] +* repl_type_completor 0.1.12 + +### RubyGems and Bundler + +Ruby 4.0 bundled RubyGems and Bundler version 4. see the following links for details. + +* [Upgrading to RubyGems/Bundler 4 - RubyGems Blog](https://blog.rubygems.org/2025/12/03/upgrade-to-rubygems-bundler-4.html) +* [4.0.0 Released - RubyGems Blog](https://blog.rubygems.org/2025/12/03/4.0.0-released.html) +* [4.0.1 Released - RubyGems Blog](https://blog.rubygems.org/2025/12/09/4.0.1-released.html) +* [4.0.2 Released - RubyGems Blog](https://blog.rubygems.org/2025/12/17/4.0.2-released.html) +* [4.0.3 Released - RubyGems Blog](https://blog.rubygems.org/2025/12/23/4.0.3-released.html) + +## Supported platforms + +* Windows + + * Dropped support for MSVC versions older than 14.0 (_MSC_VER 1900). + This means Visual Studio 2015 or later is now required. + +## Compatibility issues + +* The following methods were removed from Ractor due to the addition of `Ractor::Port`: + + * `Ractor.yield` + * `Ractor#take` + * `Ractor#close_incoming` + * `Ractor#close_outgoing` + + [[Feature #21262]] + +* `ObjectSpace._id2ref` is deprecated. [[Feature #15408]] + +* `Process::Status#&` and `Process::Status#>>` have been removed. + They were deprecated in Ruby 3.3. [[Bug #19868]] + +* `rb_path_check` has been removed. This function was used for + `$SAFE` path checking which was removed in Ruby 2.7, + and was already deprecated. + [[Feature #20971]] + +* A backtrace for `ArgumentError` of "wrong number of arguments" now + include the receiver's class or module name (e.g., in `Foo#bar` + instead of in `bar`). [[Bug #21698]] + +* Backtraces no longer display `internal` frames. + These methods now appear as if it is in the Ruby source file, + consistent with other C-implemented methods. [[Bug #20968]] + + Before: + ``` + ruby -e '[1].fetch_values(42)' + <internal:array>:211:in 'Array#fetch': index 42 outside of array bounds: -1...1 (IndexError) + from <internal:array>:211:in 'block in Array#fetch_values' + from <internal:array>:211:in 'Array#map!' + from <internal:array>:211:in 'Array#fetch_values' + from -e:1:in '<main>' + ``` + + After: + ``` + $ ruby -e '[1].fetch_values(42)' + -e:1:in 'Array#fetch_values': index 42 outside of array bounds: -1...1 (IndexError) + from -e:1:in '<main>' + ``` + +## Stdlib compatibility issues + +* CGI library is removed from the default gems. Now we only provide `cgi/escape` for + the following methods: + + * `CGI.escape` and `CGI.unescape` + * `CGI.escapeHTML` and `CGI.unescapeHTML` + * `CGI.escapeURIComponent` and `CGI.unescapeURIComponent` + * `CGI.escapeElement` and `CGI.unescapeElement` + + [[Feature #21258]] + +* With the move of `Set` from stdlib to core class, `set/sorted_set.rb` has + been removed, and `SortedSet` is no longer an autoloaded constant. Please + install the `sorted_set` gem and `require 'sorted_set'` to use `SortedSet`. + [[Feature #21287]] + +* Net::HTTP + + * The default behavior of automatically setting the `Content-Type` header + to `application/x-www-form-urlencoded` for requests with a body + (e.g., `POST`, `PUT`) when the header was not explicitly set has been + removed. If your application relied on this automatic default, your + requests will now be sent without a Content-Type header, potentially + breaking compatibility with certain servers. + [[GH-net-http #205]] + +## C API updates + +* IO + + * `rb_thread_fd_close` is deprecated and now a no-op. If you need to expose + file descriptors from C extensions to Ruby code, create an `IO` instance + using `RUBY_IO_MODE_EXTERNAL` and use `rb_io_close(io)` to close it (this + also interrupts and waits for all pending operations on the `IO` + instance). Directly closing file descriptors does not interrupt pending + operations, and may lead to undefined behaviour. In other words, if two + `IO` objects share the same file descriptor, closing one does not affect + the other. [[Feature #18455]] + +* GVL + + * `rb_thread_call_with_gvl` now works with or without the GVL. + This allows gems to avoid checking `ruby_thread_has_gvl_p`. + Please still be diligent about the GVL. [[Feature #20750]] + +* Set + + * A C API for `Set` has been added. The following methods are supported: + [[Feature #21459]] + + * `rb_set_foreach` + * `rb_set_new` + * `rb_set_new_capa` + * `rb_set_lookup` + * `rb_set_add` + * `rb_set_clear` + * `rb_set_delete` + * `rb_set_size` + +## Implementation improvements + +* `Class#new` (ex. `Object.new`) is faster in all cases, but especially when passing keyword arguments. This has also been integrated into YJIT and ZJIT. [[Feature #21254]] +* GC heaps of different size pools now grow independently, reducing memory usage when only some pools contain long-lived objects +* GC sweeping is faster on pages of large objects +* "Generic ivar" objects (String, Array, `TypedData`, etc.) now use a new internal "fields" object for faster instance variable access +* The GC avoids maintaining an internal `id2ref` table until it is first used, making `object_id` allocation and GC sweeping faster +* `object_id` and `hash` are faster on Class and Module objects +* Larger bignum Integers can remain embedded using variable width allocation +* `Random`, `Enumerator::Product`, `Enumerator::Chain`, `Addrinfo`, + `StringScanner`, and some internal objects are now write-barrier protected, + which reduces GC overhead. + +### Ractor + +A lot of work has gone into making Ractors more stable, performant, and usable. These improvements bring Ractor implementation closer to leaving experimental status. + +* Performance improvements + * Frozen strings and the symbol table internally use a lock-free hash set [[Feature #21268]] + * Method cache lookups avoid locking in most cases + * Class (and generic ivar) instance variable access is faster and avoids locking + * CPU cache contention is avoided in object allocation by using a per-ractor counter + * CPU cache contention is avoided in xmalloc/xfree by using a thread-local counter + * `object_id` avoids locking in most cases +* Bug fixes and stability + * Fixed possible deadlocks when combining Ractors and Threads + * Fixed issues with require and autoload in a Ractor + * Fixed encoding/transcoding issues across Ractors + * Fixed race conditions in GC operations and method invalidation + * Fixed issues with processes forking after starting a Ractor + * GC allocation counts are now accurate under Ractors + * Fixed TracePoints not working after GC [[Bug #19112]] + +## JIT + +* ZJIT + * Introduce an [experimental method-based JIT compiler](https://docs.ruby-lang.org/en/master/jit/zjit_md.html). + Where available, ZJIT can be enabled at runtime with the `--zjit` option or by calling `RubyVM::ZJIT.enable`. + When building Ruby, Rust 1.85.0 or later is required to include ZJIT support. + * As of Ruby 4.0.0, ZJIT is faster than the interpreter, but not yet as fast as YJIT. + We encourage experimentation with ZJIT, but advise against deploying it in production for now. + * Our goal is to make ZJIT faster than YJIT and production-ready in Ruby 4.1. +* YJIT + * `RubyVM::YJIT.runtime_stats` + * `ratio_in_yjit` no longer works in the default build. + Use `--enable-yjit=stats` on `configure` to enable it on `--yjit-stats`. + * Add `invalidate_everything` to default stats, which is + incremented when every code is invalidated by TracePoint. + * Add `mem_size:` and `call_threshold:` options to `RubyVM::YJIT.enable`. +* RJIT + * `--rjit` is removed. We will move the implementation of the third-party JIT API + to the [ruby/rjit](https://github.com/ruby/rjit) repository. + +[Feature #15408]: https://bugs.ruby-lang.org/issues/15408 +[Feature #17473]: https://bugs.ruby-lang.org/issues/17473 +[Feature #18455]: https://bugs.ruby-lang.org/issues/18455 +[Bug #19112]: https://bugs.ruby-lang.org/issues/19112 +[Feature #19630]: https://bugs.ruby-lang.org/issues/19630 +[Bug #19868]: https://bugs.ruby-lang.org/issues/19868 +[Feature #19908]: https://bugs.ruby-lang.org/issues/19908 +[Feature #20610]: https://bugs.ruby-lang.org/issues/20610 +[Feature #20724]: https://bugs.ruby-lang.org/issues/20724 +[Feature #20750]: https://bugs.ruby-lang.org/issues/20750 +[Feature #20884]: https://bugs.ruby-lang.org/issues/20884 +[Feature #20925]: https://bugs.ruby-lang.org/issues/20925 +[Bug #20968]: https://bugs.ruby-lang.org/issues/20968 +[Feature #20971]: https://bugs.ruby-lang.org/issues/20971 +[Bug #20974]: https://bugs.ruby-lang.org/issues/20974 +[Feature #21047]: https://bugs.ruby-lang.org/issues/21047 +[Bug #21049]: https://bugs.ruby-lang.org/issues/21049 +[Feature #21166]: https://bugs.ruby-lang.org/issues/21166 +[Bug #21174]: https://bugs.ruby-lang.org/issues/21174 +[Bug #21175]: https://bugs.ruby-lang.org/issues/21175 +[Bug #21185]: https://bugs.ruby-lang.org/issues/21185 +[Feature #21205]: https://bugs.ruby-lang.org/issues/21205 +[Feature #21216]: https://bugs.ruby-lang.org/issues/21216 +[Feature #21219]: https://bugs.ruby-lang.org/issues/21219 +[Feature #21254]: https://bugs.ruby-lang.org/issues/21254 +[Feature #21258]: https://bugs.ruby-lang.org/issues/21258 +[Feature #21268]: https://bugs.ruby-lang.org/issues/21268 +[Feature #21262]: https://bugs.ruby-lang.org/issues/21262 +[Feature #21275]: https://bugs.ruby-lang.org/issues/21275 +[Feature #21287]: https://bugs.ruby-lang.org/issues/21287 +[Feature #21311]: https://bugs.ruby-lang.org/issues/21311 +[Feature #21347]: https://bugs.ruby-lang.org/issues/21347 +[Feature #21360]: https://bugs.ruby-lang.org/issues/21360 +[Misc #21385]: https://bugs.ruby-lang.org/issues/21385 +[Feature #21389]: https://bugs.ruby-lang.org/issues/21389 +[Feature #21390]: https://bugs.ruby-lang.org/issues/21390 +[Feature #21459]: https://bugs.ruby-lang.org/issues/21459 +[Feature #21527]: https://bugs.ruby-lang.org/issues/21527 +[Feature #21543]: https://bugs.ruby-lang.org/issues/21543 +[Feature #21550]: https://bugs.ruby-lang.org/issues/21550 +[Feature #21552]: https://bugs.ruby-lang.org/issues/21552 +[Feature #21557]: https://bugs.ruby-lang.org/issues/21557 +[Bug #21633]: https://bugs.ruby-lang.org/issues/21633 +[Bug #21654]: https://bugs.ruby-lang.org/issues/21654 +[Feature #21678]: https://bugs.ruby-lang.org/issues/21678 +[Bug #21698]: https://bugs.ruby-lang.org/issues/21698 +[Feature #21701]: https://bugs.ruby-lang.org/issues/21701 +[Bug #21789]: https://bugs.ruby-lang.org/issues/21789 +[GH-net-http #205]: https://github.com/ruby/net-http/issues/205 +[ostruct-v0.6.2]: https://github.com/ruby/ostruct/releases/tag/v0.6.2 +[ostruct-v0.6.3]: https://github.com/ruby/ostruct/releases/tag/v0.6.3 +[pstore-v0.2.0]: https://github.com/ruby/pstore/releases/tag/v0.2.0 +[benchmark-v0.4.1]: https://github.com/ruby/benchmark/releases/tag/v0.4.1 +[benchmark-v0.5.0]: https://github.com/ruby/benchmark/releases/tag/v0.5.0 +[logger-v1.6.5]: https://github.com/ruby/logger/releases/tag/v1.6.5 +[logger-v1.6.6]: https://github.com/ruby/logger/releases/tag/v1.6.6 +[logger-v1.7.0]: https://github.com/ruby/logger/releases/tag/v1.7.0 +[rdoc-v6.14.1]: https://github.com/ruby/rdoc/releases/tag/v6.14.1 +[rdoc-v6.14.2]: https://github.com/ruby/rdoc/releases/tag/v6.14.2 +[rdoc-v6.15.0]: https://github.com/ruby/rdoc/releases/tag/v6.15.0 +[rdoc-v6.15.1]: https://github.com/ruby/rdoc/releases/tag/v6.15.1 +[rdoc-v6.16.0]: https://github.com/ruby/rdoc/releases/tag/v6.16.0 +[rdoc-v6.16.1]: https://github.com/ruby/rdoc/releases/tag/v6.16.1 +[rdoc-v6.17.0]: https://github.com/ruby/rdoc/releases/tag/v6.17.0 +[rdoc-v7.0.0]: https://github.com/ruby/rdoc/releases/tag/v7.0.0 +[rdoc-v7.0.1]: https://github.com/ruby/rdoc/releases/tag/v7.0.1 +[rdoc-v7.0.2]: https://github.com/ruby/rdoc/releases/tag/v7.0.2 +[rdoc-v7.0.3]: https://github.com/ruby/rdoc/releases/tag/v7.0.3 +[win32ole-v1.9.2]: https://github.com/ruby/win32ole/releases/tag/v1.9.2 +[irb-v1.15.0]: https://github.com/ruby/irb/releases/tag/v1.15.0 +[irb-v1.15.1]: https://github.com/ruby/irb/releases/tag/v1.15.1 +[irb-v1.15.2]: https://github.com/ruby/irb/releases/tag/v1.15.2 +[irb-v1.15.3]: https://github.com/ruby/irb/releases/tag/v1.15.3 +[irb-v1.16.0]: https://github.com/ruby/irb/releases/tag/v1.16.0 +[reline-v0.6.1]: https://github.com/ruby/reline/releases/tag/v0.6.1 +[reline-v0.6.2]: https://github.com/ruby/reline/releases/tag/v0.6.2 +[reline-v0.6.3]: https://github.com/ruby/reline/releases/tag/v0.6.3 +[fiddle-v1.1.7]: https://github.com/ruby/fiddle/releases/tag/v1.1.7 +[fiddle-v1.1.8]: https://github.com/ruby/fiddle/releases/tag/v1.1.8 +[date-v3.5.0]: https://github.com/ruby/date/releases/tag/v3.5.0 +[date-v3.5.1]: https://github.com/ruby/date/releases/tag/v3.5.1 +[delegate-v0.5.0]: https://github.com/ruby/delegate/releases/tag/v0.5.0 +[delegate-v0.6.0]: https://github.com/ruby/delegate/releases/tag/v0.6.0 +[delegate-v0.6.1]: https://github.com/ruby/delegate/releases/tag/v0.6.1 +[digest-v3.2.1]: https://github.com/ruby/digest/releases/tag/v3.2.1 +[english-v0.8.1]: https://github.com/ruby/english/releases/tag/v0.8.1 +[erb-v5.1.2]: https://github.com/ruby/erb/releases/tag/v5.1.2 +[erb-v5.1.3]: https://github.com/ruby/erb/releases/tag/v5.1.3 +[erb-v6.0.0]: https://github.com/ruby/erb/releases/tag/v6.0.0 +[erb-v6.0.1]: https://github.com/ruby/erb/releases/tag/v6.0.1 +[fcntl-v1.3.0]: https://github.com/ruby/fcntl/releases/tag/v1.3.0 +[fileutils-v1.8.0]: https://github.com/ruby/fileutils/releases/tag/v1.8.0 +[forwardable-v1.4.0]: https://github.com/ruby/forwardable/releases/tag/v1.4.0 +[io-console-v0.8.2]: https://github.com/ruby/io-console/releases/tag/v0.8.2 +[io-wait-v0.3.3]: https://github.com/ruby/io-wait/releases/tag/v0.3.3 +[io-wait-v0.3.5.test1]: https://github.com/ruby/io-wait/releases/tag/v0.3.5.test1 +[io-wait-v0.3.5]: https://github.com/ruby/io-wait/releases/tag/v0.3.5 +[io-wait-v0.3.6]: https://github.com/ruby/io-wait/releases/tag/v0.3.6 +[io-wait-v0.4.0]: https://github.com/ruby/io-wait/releases/tag/v0.4.0 +[json-v2.10.0]: https://github.com/ruby/json/releases/tag/v2.10.0 +[json-v2.10.1]: https://github.com/ruby/json/releases/tag/v2.10.1 +[json-v2.10.2]: https://github.com/ruby/json/releases/tag/v2.10.2 +[json-v2.11.0]: https://github.com/ruby/json/releases/tag/v2.11.0 +[json-v2.11.1]: https://github.com/ruby/json/releases/tag/v2.11.1 +[json-v2.11.2]: https://github.com/ruby/json/releases/tag/v2.11.2 +[json-v2.11.3]: https://github.com/ruby/json/releases/tag/v2.11.3 +[json-v2.12.0]: https://github.com/ruby/json/releases/tag/v2.12.0 +[json-v2.12.1]: https://github.com/ruby/json/releases/tag/v2.12.1 +[json-v2.12.2]: https://github.com/ruby/json/releases/tag/v2.12.2 +[json-v2.13.0]: https://github.com/ruby/json/releases/tag/v2.13.0 +[json-v2.13.1]: https://github.com/ruby/json/releases/tag/v2.13.1 +[json-v2.13.2]: https://github.com/ruby/json/releases/tag/v2.13.2 +[json-v2.14.0]: https://github.com/ruby/json/releases/tag/v2.14.0 +[json-v2.14.1]: https://github.com/ruby/json/releases/tag/v2.14.1 +[json-v2.15.0]: https://github.com/ruby/json/releases/tag/v2.15.0 +[json-v2.15.1]: https://github.com/ruby/json/releases/tag/v2.15.1 +[json-v2.15.2]: https://github.com/ruby/json/releases/tag/v2.15.2 +[json-v2.16.0]: https://github.com/ruby/json/releases/tag/v2.16.0 +[json-v2.17.0]: https://github.com/ruby/json/releases/tag/v2.17.0 +[json-v2.17.1]: https://github.com/ruby/json/releases/tag/v2.17.1 +[json-v2.18.0]: https://github.com/ruby/json/releases/tag/v2.18.0 +[net-http-v0.7.0]: https://github.com/ruby/net-http/releases/tag/v0.7.0 +[net-http-v0.8.0]: https://github.com/ruby/net-http/releases/tag/v0.8.0 +[net-http-v0.9.0]: https://github.com/ruby/net-http/releases/tag/v0.9.0 +[net-http-v0.9.1]: https://github.com/ruby/net-http/releases/tag/v0.9.1 +[openssl-v3.3.2]: https://github.com/ruby/openssl/releases/tag/v3.3.2 +[openssl-v4.0.0]: https://github.com/ruby/openssl/releases/tag/v4.0.0 +[optparse-v0.7.0]: https://github.com/ruby/optparse/releases/tag/v0.7.0 +[optparse-v0.8.0]: https://github.com/ruby/optparse/releases/tag/v0.8.0 +[optparse-v0.8.1]: https://github.com/ruby/optparse/releases/tag/v0.8.1 +[pp-v0.6.3]: https://github.com/ruby/pp/releases/tag/v0.6.3 +[prism-v1.6.0]: https://github.com/ruby/prism/releases/tag/v1.6.0 +[prism-v1.7.0]: https://github.com/ruby/prism/releases/tag/v1.7.0 +[psych-v5.2.3]: https://github.com/ruby/psych/releases/tag/v5.2.3 +[psych-v5.2.4]: https://github.com/ruby/psych/releases/tag/v5.2.4 +[psych-v5.2.5]: https://github.com/ruby/psych/releases/tag/v5.2.5 +[psych-v5.2.6]: https://github.com/ruby/psych/releases/tag/v5.2.6 +[psych-v5.3.0]: https://github.com/ruby/psych/releases/tag/v5.3.0 +[psych-v5.3.1]: https://github.com/ruby/psych/releases/tag/v5.3.1 +[resolv-v0.6.3]: https://github.com/ruby/resolv/releases/tag/v0.6.3 +[resolv-v0.7.0]: https://github.com/ruby/resolv/releases/tag/v0.7.0 +[stringio-v3.1.3]: https://github.com/ruby/stringio/releases/tag/v3.1.3 +[stringio-v3.1.4]: https://github.com/ruby/stringio/releases/tag/v3.1.4 +[stringio-v3.1.5]: https://github.com/ruby/stringio/releases/tag/v3.1.5 +[stringio-v3.1.6]: https://github.com/ruby/stringio/releases/tag/v3.1.6 +[stringio-v3.1.7]: https://github.com/ruby/stringio/releases/tag/v3.1.7 +[stringio-v3.1.8]: https://github.com/ruby/stringio/releases/tag/v3.1.8 +[stringio-v3.1.9]: https://github.com/ruby/stringio/releases/tag/v3.1.9 +[stringio-v3.2.0]: https://github.com/ruby/stringio/releases/tag/v3.2.0 +[strscan-v3.1.3]: https://github.com/ruby/strscan/releases/tag/v3.1.3 +[strscan-v3.1.4]: https://github.com/ruby/strscan/releases/tag/v3.1.4 +[strscan-v3.1.5]: https://github.com/ruby/strscan/releases/tag/v3.1.5 +[strscan-v3.1.6]: https://github.com/ruby/strscan/releases/tag/v3.1.6 +[time-v0.4.2]: https://github.com/ruby/time/releases/tag/v0.4.2 +[timeout-v0.4.4]: https://github.com/ruby/timeout/releases/tag/v0.4.4 +[timeout-v0.5.0]: https://github.com/ruby/timeout/releases/tag/v0.5.0 +[timeout-v0.6.0]: https://github.com/ruby/timeout/releases/tag/v0.6.0 +[uri-v1.1.0]: https://github.com/ruby/uri/releases/tag/v1.1.0 +[uri-v1.1.1]: https://github.com/ruby/uri/releases/tag/v1.1.1 +[weakref-v0.1.4]: https://github.com/ruby/weakref/releases/tag/v0.1.4 +[zlib-v3.2.2]: https://github.com/ruby/zlib/releases/tag/v3.2.2 +[power_assert-v3.0.0]: https://github.com/ruby/power_assert/releases/tag/v3.0.0 +[power_assert-v3.0.1]: https://github.com/ruby/power_assert/releases/tag/v3.0.1 +[rake-v13.3.0]: https://github.com/ruby/rake/releases/tag/v13.3.0 +[rake-v13.3.1]: https://github.com/ruby/rake/releases/tag/v13.3.1 +[test-unit-3.6.8]: https://github.com/test-unit/test-unit/releases/tag/3.6.8 +[test-unit-3.6.9]: https://github.com/test-unit/test-unit/releases/tag/3.6.9 +[test-unit-3.7.0]: https://github.com/test-unit/test-unit/releases/tag/3.7.0 +[test-unit-3.7.1]: https://github.com/test-unit/test-unit/releases/tag/3.7.1 +[test-unit-3.7.2]: https://github.com/test-unit/test-unit/releases/tag/3.7.2 +[test-unit-3.7.3]: https://github.com/test-unit/test-unit/releases/tag/3.7.3 +[test-unit-3.7.4]: https://github.com/test-unit/test-unit/releases/tag/3.7.4 +[test-unit-3.7.5]: https://github.com/test-unit/test-unit/releases/tag/3.7.5 +[rss-0.3.2]: https://github.com/ruby/rss/releases/tag/0.3.2 +[net-ftp-v0.3.9]: https://github.com/ruby/net-ftp/releases/tag/v0.3.9 +[net-imap-v0.5.9]: https://github.com/ruby/net-imap/releases/tag/v0.5.9 +[net-imap-v0.5.10]: https://github.com/ruby/net-imap/releases/tag/v0.5.10 +[net-imap-v0.5.11]: https://github.com/ruby/net-imap/releases/tag/v0.5.11 +[net-imap-v0.5.12]: https://github.com/ruby/net-imap/releases/tag/v0.5.12 +[net-imap-v0.5.13]: https://github.com/ruby/net-imap/releases/tag/v0.5.13 +[net-imap-v0.6.0]: https://github.com/ruby/net-imap/releases/tag/v0.6.0 +[net-imap-v0.6.1]: https://github.com/ruby/net-imap/releases/tag/v0.6.1 +[net-imap-v0.6.2]: https://github.com/ruby/net-imap/releases/tag/v0.6.2 +[net-smtp-v0.5.1]: https://github.com/ruby/net-smtp/releases/tag/v0.5.1 +[matrix-v0.4.3]: https://github.com/ruby/matrix/releases/tag/v0.4.3 +[prime-v0.1.4]: https://github.com/ruby/prime/releases/tag/v0.1.4 +[rbs-v3.8.1]: https://github.com/ruby/rbs/releases/tag/v3.8.1 +[rbs-v3.9.0.dev.1]: https://github.com/ruby/rbs/releases/tag/v3.9.0.dev.1 +[rbs-v3.9.0.pre.1]: https://github.com/ruby/rbs/releases/tag/v3.9.0.pre.1 +[rbs-v3.9.0.pre.2]: https://github.com/ruby/rbs/releases/tag/v3.9.0.pre.2 +[rbs-v3.9.0]: https://github.com/ruby/rbs/releases/tag/v3.9.0 +[rbs-v3.9.1]: https://github.com/ruby/rbs/releases/tag/v3.9.1 +[rbs-v3.9.2]: https://github.com/ruby/rbs/releases/tag/v3.9.2 +[rbs-v3.9.3]: https://github.com/ruby/rbs/releases/tag/v3.9.3 +[rbs-v3.9.4]: https://github.com/ruby/rbs/releases/tag/v3.9.4 +[rbs-v3.9.5]: https://github.com/ruby/rbs/releases/tag/v3.9.5 +[rbs-v3.10.0.pre.1]: https://github.com/ruby/rbs/releases/tag/v3.10.0.pre.1 +[rbs-v3.10.0.pre.2]: https://github.com/ruby/rbs/releases/tag/v3.10.0.pre.2 +[rbs-v3.10.0]: https://github.com/ruby/rbs/releases/tag/v3.10.0 +[debug-v1.11.1]: https://github.com/ruby/debug/releases/tag/v1.11.1 +[base64-v0.3.0]: https://github.com/ruby/base64/releases/tag/v0.3.0 +[bigdecimal-v3.2.0]: https://github.com/ruby/bigdecimal/releases/tag/v3.2.0 +[bigdecimal-v3.2.1]: https://github.com/ruby/bigdecimal/releases/tag/v3.2.1 +[bigdecimal-v3.2.2]: https://github.com/ruby/bigdecimal/releases/tag/v3.2.2 +[bigdecimal-v3.2.3]: https://github.com/ruby/bigdecimal/releases/tag/v3.2.3 +[bigdecimal-v3.3.0]: https://github.com/ruby/bigdecimal/releases/tag/v3.3.0 +[bigdecimal-v3.3.1]: https://github.com/ruby/bigdecimal/releases/tag/v3.3.1 +[bigdecimal-v4.0.0]: https://github.com/ruby/bigdecimal/releases/tag/v4.0.0 +[bigdecimal-v4.0.1]: https://github.com/ruby/bigdecimal/releases/tag/v4.0.1 +[drb-v2.2.3]: https://github.com/ruby/drb/releases/tag/v2.2.3 +[syslog-v0.3.0]: https://github.com/ruby/syslog/releases/tag/v0.3.0 +[csv-v3.3.3]: https://github.com/ruby/csv/releases/tag/v3.3.3 +[csv-v3.3.4]: https://github.com/ruby/csv/releases/tag/v3.3.4 +[csv-v3.3.5]: https://github.com/ruby/csv/releases/tag/v3.3.5 diff --git a/doc/_regexp.rdoc b/doc/_regexp.rdoc index c9f3742241..4ad6118ddd 100644 --- a/doc/_regexp.rdoc +++ b/doc/_regexp.rdoc @@ -26,20 +26,20 @@ A regexp may be used: re.match('good') # => nil See sections {Method match}[rdoc-ref:Regexp@Method+match] - and {Operator =~}[rdoc-ref:Regexp@Operator+-3D~]. + and {Operator =~}[rdoc-ref:Regexp@Operator-]. - To determine whether a string matches a given pattern: re.match?('food') # => true re.match?('good') # => false - See section {Method match?}[rdoc-ref:Regexp@Method+match-3F]. + See section {Method match?}[rdoc-ref:Regexp@Method+match]. - As an argument for calls to certain methods in other classes and modules; most such methods accept an argument that may be either a string or the (much more powerful) regexp. - See {Regexp Methods}[rdoc-ref:regexp/methods.rdoc]. + See {Regexp Methods}[rdoc-ref:language/regexp/methods.rdoc]. == \Regexp Objects @@ -64,7 +64,7 @@ A regular expression may be created with: /foo/ # => /foo/ - A <tt>%r</tt> regexp literal - (see {%r: Regexp Literals}[rdoc-ref:syntax/literals.rdoc@25r-3A+Regexp+Literals]): + (see {%r: Regexp Literals}[rdoc-ref:syntax/literals.rdoc@r-regexp+literals]): # Same delimiter character at beginning and end; # useful for avoiding escaping characters @@ -113,7 +113,7 @@ none sets {global variables}[rdoc-ref:Regexp@Global+Variables]: Certain regexp-oriented methods assign values to global variables: - <tt>#match</tt>: see {Method match}[rdoc-ref:Regexp@Method+match]. -- <tt>#=~</tt>: see {Operator =~}[rdoc-ref:Regexp@Operator+-3D~]. +- <tt>#=~</tt>: see {Operator =~}[rdoc-ref:Regexp@Operator-]. The affected global variables are: @@ -414,21 +414,21 @@ Each of these anchors matches a boundary: Lookahead anchors: -- <tt>(?=_pat_)</tt>: Positive lookahead assertion: +- <tt>(?=pat)</tt>: Positive lookahead assertion: ensures that the following characters match _pat_, but doesn't include those characters in the matched substring. -- <tt>(?!_pat_)</tt>: Negative lookahead assertion: +- <tt>(?!pat)</tt>: Negative lookahead assertion: ensures that the following characters <i>do not</i> match _pat_, but doesn't include those characters in the matched substring. Lookbehind anchors: -- <tt>(?<=_pat_)</tt>: Positive lookbehind assertion: +- <tt>(?<=pat)</tt>: Positive lookbehind assertion: ensures that the preceding characters match _pat_, but doesn't include those characters in the matched substring. -- <tt>(?<!_pat_)</tt>: Negative lookbehind assertion: +- <tt>(?<!pat)</tt>: Negative lookbehind assertion: ensures that the preceding characters do not match _pat_, but doesn't include those characters in the matched substring. @@ -561,9 +561,9 @@ Quantifier matching may be greedy, lazy, or possessive: More: - About greedy and lazy matching, see - {Choosing Minimal or Maximal Repetition}[https://doc.lagout.org/programmation/Regular%20Expressions/Regular%20Expressions%20Cookbook_%20Detailed%20Solutions%20in%20Eight%20Programming%20Languages%20%282nd%20ed.%29%20%5BGoyvaerts%20%26%20Levithan%202012-09-06%5D.pdf#tutorial-backtrack]. + {Choosing Minimal or Maximal Repetition}[https://www.oreilly.com/library/view/regular-expressions-cookbook/9780596802837/ch02s13.html]. - About possessive matching, see - {Eliminate Needless Backtracking}[https://doc.lagout.org/programmation/Regular%20Expressions/Regular%20Expressions%20Cookbook_%20Detailed%20Solutions%20in%20Eight%20Programming%20Languages%20%282nd%20ed.%29%20%5BGoyvaerts%20%26%20Levithan%202012-09-06%5D.pdf#tutorial-backtrack]. + {Eliminate Needless Backtracking}[https://www.oreilly.com/library/view/regular-expressions-cookbook/9780596802837/ch02s14.html]. === Groups and Captures @@ -574,7 +574,7 @@ A simple regexp has (at most) one match: re.match('1943-02-04').size # => 1 re.match('foo') # => nil -Adding one or more pairs of parentheses, <tt>(_subexpression_)</tt>, +Adding one or more pairs of parentheses, <tt>(subexpression)</tt>, defines _groups_, which may result in multiple matched substrings, called _captures_: @@ -647,8 +647,8 @@ A regexp may contain any number of groups: - For a large number of groups: - - The ordinary <tt>\\_n_</tt> notation applies only for _n_ in range (1..9). - - The <tt>MatchData[_n_]</tt> notation applies for any non-negative _n_. + - The ordinary <tt>\\n</tt> notation applies only for _n_ in range (1..9). + - The <tt>MatchData[n]</tt> notation applies for any non-negative _n_. - <tt>\0</tt> is a special backreference, referring to the entire matched string; it may not be used within the regexp itself, @@ -661,7 +661,7 @@ A regexp may contain any number of groups: As seen above, a capture can be referred to by its number. A capture can also have a name, -prefixed as <tt>?<_name_></tt> or <tt>?'_name_'</tt>, +prefixed as <tt>?<name></tt> or <tt>?'name'</tt>, and the name (symbolized) may be used as an index in <tt>MatchData[]</tt>: md = /\$(?<dollars>\d+)\.(?'cents'\d+)/.match("$3.67") @@ -676,7 +676,7 @@ When a regexp contains a named capture, there are no unnamed captures: /\$(?<dollars>\d+)\.(\d+)/.match("$3.67") # => #<MatchData "$3.67" dollars:"3"> -A named group may be backreferenced as <tt>\k<_name_></tt>: +A named group may be backreferenced as <tt>\k<name></tt>: /(?<vowel>[aeiou]).\k<vowel>.\k<vowel>/.match('ototomy') # => #<MatchData "ototo" vowel:"o"> @@ -713,7 +713,7 @@ Analysis: 1. The leading subexpression <tt>"</tt> in the pattern matches the first character <tt>"</tt> in the target string. -2. The next subexpression <tt>.*</tt> matches the next substring <tt>Quote“</tt> +2. The next subexpression <tt>.*</tt> matches the next substring <tt>Quote"</tt> (including the trailing double-quote). 3. Now there is nothing left in the target string to match the trailing subexpression <tt>"</tt> in the pattern; @@ -732,10 +732,10 @@ see {Atomic Group}[https://www.regular-expressions.info/atomic.html]. ==== Subexpression Calls -As seen above, a backreference number (<tt>\\_n_</tt>) or name (<tt>\k<_name_></tt>) +As seen above, a backreference number (<tt>\\n</tt>) or name (<tt>\k<name></tt>) gives access to a captured _substring_; the corresponding regexp _subexpression_ may also be accessed, -via the number (<tt>\\g<i>n</i></tt>) or name (<tt>\g<_name_></tt>): +via the number n (<tt>\\gn</tt>) or name (<tt>\g<name></tt>): /\A(?<paren>\(\g<paren>*\))*\z/.match('(())') # ^1 @@ -764,16 +764,16 @@ The pattern: 9. Matches the fourth character in the string, <tt>')'</tt>. 10. Matches the end of the string. -See {Subexpression calls}[https://learnbyexample.github.io/Ruby_Regexp/groupings-and-backreferences.html?highlight=subexpression#subexpression-calls]. +See {Subexpression calls}[https://learnbyexample.github.io/Ruby_Regexp/groupings-and-backreferences.html#subexpression-calls]. ==== Conditionals -The conditional construct takes the form <tt>(?(_cond_)_yes_|_no_)</tt>, where: +The conditional construct takes the form <tt>(?(cond)yes|no)</tt>, where: - _cond_ may be a capture number or name. - The match to be applied is _yes_ if _cond_ is captured; otherwise the match to be applied is _no_. -- If not needed, <tt>|_no_</tt> may be omitted. +- If not needed, <tt>|no</tt> may be omitted. Examples: @@ -802,7 +802,7 @@ The absence operator is a special group that matches anything which does _not_ m ==== Unicode Properties -The <tt>/\p{_property_name_}/</tt> construct (with lowercase +p+) +The <tt>/\p{property_name}/</tt> construct (with lowercase +p+) matches characters using a Unicode property name, much like a character class; property +Alpha+ specifies alphabetic characters: @@ -821,7 +821,7 @@ Or by using <tt>\P</tt> (uppercase +P+): /\P{Alpha}/.match('1') # => #<MatchData "1"> /\P{Alpha}/.match('a') # => nil -See {Unicode Properties}[rdoc-ref:regexp/unicode_properties.rdoc] +See {Unicode Properties}[rdoc-ref:language/regexp/unicode_properties.rdoc] for regexps based on the numerous properties. Some commonly-used properties correspond to POSIX bracket expressions: @@ -930,7 +930,7 @@ Punctuation: - +C+, +Other+: +Cc+, +Cf+, +Cn+, +Co+, or +Cs+. - {Cc, Control}[https://www.compart.com/en/unicode/category/Cc]. - {Cf, Format}[https://www.compart.com/en/unicode/category/Cf]. -- {Cn, Unassigned}[https://www.compart.com/en/unicode/category/Cn]. +- {Cn, Unassigned}[http://zuga.net/articles/unicode/category/unassigned/]. - {Co, Private_Use}[https://www.compart.com/en/unicode/category/Co]. - {Cs, Surrogate}[https://www.compart.com/en/unicode/category/Cs]. @@ -1033,23 +1033,23 @@ See also {Extended Mode}[rdoc-ref:Regexp@Extended+Mode]. Each of these modifiers sets a mode for the regexp: -- +i+: <tt>/_pattern_/i</tt> sets +- +i+: <tt>/pattern/i</tt> sets {Case-Insensitive Mode}[rdoc-ref:Regexp@Case-Insensitive+Mode]. -- +m+: <tt>/_pattern_/m</tt> sets +- +m+: <tt>/pattern/m</tt> sets {Multiline Mode}[rdoc-ref:Regexp@Multiline+Mode]. -- +x+: <tt>/_pattern_/x</tt> sets +- +x+: <tt>/pattern/x</tt> sets {Extended Mode}[rdoc-ref:Regexp@Extended+Mode]. -- +o+: <tt>/_pattern_/o</tt> sets +- +o+: <tt>/pattern/o</tt> sets {Interpolation Mode}[rdoc-ref:Regexp@Interpolation+Mode]. Any, all, or none of these may be applied. Modifiers +i+, +m+, and +x+ may be applied to subexpressions: -- <tt>(?_modifier_)</tt> turns the mode "on" for ensuing subexpressions -- <tt>(?-_modifier_)</tt> turns the mode "off" for ensuing subexpressions -- <tt>(?_modifier_:_subexp_)</tt> turns the mode "on" for _subexp_ within the group -- <tt>(?-_modifier_:_subexp_)</tt> turns the mode "off" for _subexp_ within the group +- <tt>(?modifier)</tt> turns the mode "on" for ensuing subexpressions +- <tt>(?-modifier)</tt> turns the mode "off" for ensuing subexpressions +- <tt>(?modifier:subexp)</tt> turns the mode "on" for _subexp_ within the group +- <tt>(?-modifier:subexp)</tt> turns the mode "off" for _subexp_ within the group Example: @@ -1128,6 +1128,13 @@ Regexp in extended mode: re = /#{pattern}/x re.match('MCMXLIII') # => #<MatchData "MCMXLIII" 1:"CM" 2:"XL" 3:"III"> +Comments in regexp literals cannot include unescaped terminator +characters: + + / + foo # the following slash \/ must be escaped + /x + === Interpolation Mode Modifier +o+ means that the first time a literal regexp with interpolations @@ -1166,22 +1173,22 @@ A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers. -- <tt>/_pat_/n</tt>: US-ASCII if only containing US-ASCII characters, +- <tt>/pat/n</tt>: US-ASCII if only containing US-ASCII characters, otherwise ASCII-8BIT: /foo/n.encoding # => #<Encoding:US-ASCII> /foo\xff/n.encoding # => #<Encoding:ASCII-8BIT> /foo\x7f/n.encoding # => #<Encoding:US-ASCII> -- <tt>/_pat_/u</tt>: UTF-8 +- <tt>/pat/u</tt>: UTF-8 /foo/u.encoding # => #<Encoding:UTF-8> -- <tt>/_pat_/e</tt>: EUC-JP +- <tt>/pat/e</tt>: EUC-JP /foo/e.encoding # => #<Encoding:EUC-JP> -- <tt>/_pat_/s</tt>: Windows-31J +- <tt>/pat/s</tt>: Windows-31J /foo/s.encoding # => #<Encoding:Windows-31J> @@ -1251,7 +1258,7 @@ the potential vulnerability arising from this is the {regular expression denial- \Regexp matching can apply an optimization to prevent ReDoS attacks. When the optimization is applied, matching time increases linearly (not polynomially or exponentially) -in relation to the input size, and a ReDoS attach is not possible. +in relation to the input size, and a ReDoS attack is not possible. This optimization is applied if the pattern meets these criteria: @@ -1272,13 +1279,13 @@ because the optimization uses memoization (which may invoke large memory consump == References -Read (online PDF books): +Read: -- {Mastering Regular Expressions}[https://ia902508.us.archive.org/10/items/allitebooks-02/Mastering%20Regular%20Expressions%2C%203rd%20Edition.pdf] +- <i>Mastering Regular Expressions</i> by Jeffrey E.F. Friedl. -- {Regular Expressions Cookbook}[https://doc.lagout.org/programmation/Regular%20Expressions/Regular%20Expressions%20Cookbook_%20Detailed%20Solutions%20in%20Eight%20Programming%20Languages%20%282nd%20ed.%29%20%5BGoyvaerts%20%26%20Levithan%202012-09-06%5D.pdf] +- <i>Regular Expressions Cookbook</i> by Jan Goyvaerts & Steven Levithan. -Explore, test (interactive online editor): +Explore, test: -- {Rubular}[https://rubular.com/]. +- {Rubular}[https://rubular.com/]: interactive online editor. diff --git a/doc/_timezones.rdoc b/doc/_timezones.rdoc index a2ac46584f..945654c163 100644 --- a/doc/_timezones.rdoc +++ b/doc/_timezones.rdoc @@ -11,7 +11,7 @@ Certain +Time+ methods accept arguments that specify timezones: The value given with any of these must be one of the following (each detailed below): -- {Hours/minutes offset}[rdoc-ref:Time@Hours-2FMinutes+Offsets]. +- {Hours/minutes offset}[rdoc-ref:Time@HoursMinutes+Offsets]. - {Single-letter offset}[rdoc-ref:Time@Single-Letter+Offsets]. - {Integer offset}[rdoc-ref:Time@Integer+Offsets]. - {Timezone object}[rdoc-ref:Time@Timezone+Objects]. diff --git a/doc/command_injection.rdoc b/doc/command_injection.rdoc deleted file mode 100644 index ee33d4a04e..0000000000 --- a/doc/command_injection.rdoc +++ /dev/null @@ -1,37 +0,0 @@ -= Command Injection - -Some Ruby core methods accept string data -that includes text to be executed as a system command. - -They should not be called with unknown or unsanitized commands. - -These methods include: - -- Kernel.exec -- Kernel.spawn -- Kernel.system -- {\`command` (backtick method)}[rdoc-ref:Kernel#`] - (also called by the expression <tt>%x[command]</tt>). -- IO.popen (when called with other than <tt>"-"</tt>). - -Some methods execute a system command only if the given path name starts -with a <tt>|</tt>: - -- Kernel.open(command). -- IO.read(command). -- IO.write(command). -- IO.binread(command). -- IO.binwrite(command). -- IO.readlines(command). -- IO.foreach(command). -- URI.open(command). - -Note that some of these methods do not execute commands when called -from subclass +File+: - -- File.read(path). -- File.write(path). -- File.binread(path). -- File.binwrite(path). -- File.readlines(path). -- File.foreach(path). diff --git a/doc/command_line/environment.md b/doc/command_line/environment.md deleted file mode 100644 index 8f6d595f6c..0000000000 --- a/doc/command_line/environment.md +++ /dev/null @@ -1,174 +0,0 @@ -## Environment - -Certain command-line options affect the execution environment -of the invoked Ruby program. - -### About the Examples - -The examples here use command-line option `-e`, -which passes the Ruby code to be executed on the command line itself: - -```console -$ ruby -e 'puts "Hello, World."' -``` - -### Option `-C` - -The argument to option `-C` specifies a working directory -for the invoked Ruby program; -does not change the working directory for the current process: - -```console -$ basename `pwd` -ruby -$ ruby -C lib -e 'puts File.basename(Dir.pwd)' -lib -$ basename `pwd` -ruby -``` - -Whitespace between the option and its argument may be omitted. - -### Option `-I` - -The argument to option `-I` specifies a directory -to be added to the array in global variable `$LOAD_PATH`; -the option may be given more than once: - -```console -$ pushd /tmp -$ ruby -e 'p $LOAD_PATH.size' -8 -$ ruby -I my_lib -I some_lib -e 'p $LOAD_PATH.size' -10 -$ ruby -I my_lib -I some_lib -e 'p $LOAD_PATH.take(2)' -["/tmp/my_lib", "/tmp/some_lib"] -$ popd -``` - -Whitespace between the option and its argument may be omitted. - -### Option `-r` - -The argument to option `-r` specifies a library to be required -before executing the Ruby program; -the option may be given more than once: - -```console -$ ruby -e 'p defined?(JSON); p defined?(CSV)' -nil -nil -$ ruby -r CSV -r JSON -e 'p defined?(JSON); p defined?(CSV)' -"constant" -"constant" -``` - -Whitespace between the option and its argument may be omitted. - -### Option `-0` - -Option `-0` defines the input record separator `$/` -for the invoked Ruby program. - -The optional argument to the option must be octal digits, -each in the range `0..7`; -these digits are prefixed with digit `0` to form an octal value: - -- If no argument is given, the input record separator is `0x00`. -- If the argument is `0`, the input record separator is `''`; - see {Special Line Separator Values}[rdoc-ref:IO@Special+Line+Separator+Values]. -- If the argument is in range `(1..0377)`, - it becomes the character value of the input record separator `$/`. -- Otherwise, the input record separator is `nil`. - -Examples: - -```console -$ ruby -0 -e 'p $/' -"\x00" -$ ruby -00 -e 'p $/' -"" -$ ruby -012 -e 'p $/' -"\n" -$ ruby -015 -e 'p $/' -"\r" -$ ruby -0377 -e 'p $/' -"\xFF" -$ ruby -0400 -e 'p $/' -nil -``` - -The option may not be separated from its argument by whitespace. - -### Option `-d` - -Some code in (or called by) the Ruby program may include statements or blocks -conditioned by the global variable `$DEBUG` (e.g., `if $DEBUG`); -these commonly write to `$stdout` or `$stderr`. - -The default value for `$DEBUG` is `false`; -option `-d` (or `--debug`) sets it to `true`: - -```console -$ ruby -e 'p $DEBUG' -false -$ ruby -d -e 'p $DEBUG' -true -``` - -### Option '-w' - -Option `-w` (lowercase letter) is equivalent to option `-W1` (uppercase letter). - -### Option `-W` - -Any Ruby code can create a <i>warning message</i> by calling method Kernel#warn; -methods in the Ruby core and standard libraries can also create warning messages. -Such a message may be printed on `$stderr` -(or not, depending on certain settings). - -Option `-W` helps determine whether a particular warning message -will be written, -by setting the initial value of global variable `$-W`: - -- `-W0`: Sets `$-W` to `0` (silent; no warnings). -- `-W1`: Sets `$-W` to `1` (moderate verbosity). -- `-W2`: Sets `$-W` to `2` (high verbosity). -- `-W`: Same as `-W2` (high verbosity). -- Option not given: Same as `-W1` (moderate verbosity). - -The value of `$-W`, in turn, determines which warning messages (if any) -are to be printed to `$stdout` (see Kernel#warn): - -```console -$ ruby -W1 -e 'p $foo' -nil -$ ruby -W2 -e 'p $foo' --e:1: warning: global variable '$foo' not initialized -nil -``` - -Ruby code may also define warnings for certain categories; -these are the default settings for the defined categories: - -```ruby -Warning[:experimental] # => true -Warning[:deprecated] # => false -Warning[:performance] # => false -``` - -They may also be set: - -```ruby -Warning[:experimental] = false -Warning[:deprecated] = true -Warning[:performance] = true -``` - -You can suppress a category by prefixing `no-` to the category name: - -```console -$ ruby -W:no-experimental -e 'p IO::Buffer.new' -#<IO::Buffer> -``` - diff --git a/doc/bug_triaging.rdoc b/doc/contributing/bug_triaging.rdoc index 83fe88cabe..83fe88cabe 100644 --- a/doc/bug_triaging.rdoc +++ b/doc/contributing/bug_triaging.rdoc diff --git a/doc/contributing/building_ruby.md b/doc/contributing/building_ruby.md index eac83fc00e..a283a2f3db 100644 --- a/doc/contributing/building_ruby.md +++ b/doc/contributing/building_ruby.md @@ -17,11 +17,11 @@ * [autoconf] - 2.67 or later * [gperf] - 3.1 or later * Usually unneeded; only if you edit some source files using gperf - * ruby - 3.0 or later + * ruby - 3.1 or later * We can upgrade this version to system ruby version of the latest Ubuntu LTS. * git - 2.32 or later - * Anterior versions may work; 2.32 or later will prevent build + * Earlier versions may work; 2.32 or later will prevent build errors in case your system `.gitconfig` uses `$HOME` paths. 2. Install optional, recommended dependencies: @@ -151,7 +151,7 @@ ruby ├── build # Created in step 2 and populated in step 4 │ ├── GNUmakefile # Generated by `../configure` │ ├── Makefile # Generated by `../configure` -│ ├── object.o # Compiled object file, built my `make` +│ ├── object.o # Compiled object file, built by `make` │ └── ... other compiled `.o` object files │ │ # Other interesting files: @@ -184,7 +184,7 @@ cause build failures. ## Building on Windows The documentation for building on Windows can be found in [the separated -file](../windows.md). +file](../distribution/windows.md). ## More details diff --git a/doc/contributing/concurrency_guide.md b/doc/contributing/concurrency_guide.md new file mode 100644 index 0000000000..1fb58f7203 --- /dev/null +++ b/doc/contributing/concurrency_guide.md @@ -0,0 +1,154 @@ +# Concurrency Guide + +This is a guide to thinking about concurrency in the cruby source code, whether that's contributing to Ruby +by writing C or by contributing to one of the JITs. This does not touch on native extensions, only the core +language. It will go over: + +* What needs synchronizing? +* How to use the VM lock, and what you can and can't do when you've acquired this lock. +* What you can and can't do when you've acquired other native locks. +* The difference between the VM lock and the GVL. +* What a VM barrier is and when to use it. +* The lock ordering of some important locks. +* How ruby interrupt handling works. +* The timer thread and what it's responsible for. + +## What needs synchronizing? + +Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. The timer thread +is a native thread that interacts with other ruby threads and changes some VM internals, so if these changes can be done in parallel by both the timer +thread and a ruby thread, they need to be synchronized. + +When you add ractors to the mix, it gets more complicated. However, ractors allow you to forget about synchronization for non-shareable objects because +they aren't used across ractors. Only one ruby thread can touch the object at once. For shareable objects, they are deeply frozen so there isn't any +mutation on the objects themselves. However, something like reading/writing constants across ractors does need to be synchronized. In this case, ruby threads need to +see a consistent view of the VM. If publishing the update takes 2 steps or even two separate instructions, like in this case, synchronization is required. + +Most synchronization is to protect VM internals. These internals include structures for the thread scheduler on each ractor, the global ractor scheduler, the +coordination between ruby threads and ractors, global tables (for `fstrings`, encodings, symbols and global vars), etc. Anything that can be mutated by a ractor +that can also be read or mutated by another ractor at the same time requires proper synchronization. + +## The VM Lock + +There's only one VM lock and it is for critical sections that can only be entered by one ractor at a time. +Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run +without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using +atomics, you need to use a lock and this is a convenient one to use. Unlike other locks, you can allocate ruby-managed +memory with it held. When you take the VM lock, there are things you can and can't do during your critical section: + +You can (as long as no other locks are also held before the VM lock): + +* Create ruby objects, call `ruby_xmalloc`, etc. + +You can't: + +* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including: + + * Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen. + This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as + `rb_obj_respond_to` are also disallowed. + + * Calling `rb_raise`. This will call `initialize` on the new exception object. With the VM lock + held, nothing you call should be able to raise an exception. `NoMemoryError` is allowed, however. + + * Calling `rb_nogvl` or a ruby-level mechanism that can context switch like `rb_mutex_lock`. + + * Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or + something equivalent. A blocking operation is one that blocks the thread's progress, such as `sleep` or `IO#read`. + +Internally, the VM lock is the `vm->ractor.sync.lock`. + +You need to be on a ruby thread to take the VM lock. You also can't take it inside any functions that could be called during sweeping, as MMTK sweeps +on another thread and you need a valid `ec` to grab the lock. For this same reason (among others), you can't take it from the timer thread either. + +## Other Locks + +All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean +anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (protects global scheduling data structures), +the thread scheduling lock (local to each ractor, protects per-ractor scheduling data structures) and the ractor lock (local to each ractor, protects ractor data structures). + +When you acquire one of these locks, + +You can: + +* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use +ruby allocation through the use of macros! + +* Use `ccan` lists, as they don't allocate. + +* Do the usual things like set variables or struct fields, manipulate linked lists, signal condition variables etc. + +You can't: + +* Allocate ruby-managed memory. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this +is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible +(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread +is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs because the barrier will never finish. + +* Raise exceptions. You also can't use `EC_JUMP_TAG` if it jumps out of the critical section. + +* Context switch. See the `VM Lock` section for more info. + +## Difference Between VM Lock and GVL + +The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks. +It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors, +there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock" like it once was before ractors. + +## VM Barriers + +Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running `GC`, for instance. +To get a barrier, you take the VM Lock and call `rb_vm_barrier()`. For the duration that the VM lock is held, no other ractors will be running. It's not used +often as taking a barrier slows ractor performance down considerably, but it's useful to know about and is sometimes the only solution. + +## Lock Orderings + +It's a good idea to not hold more than 2 locks at once on the same thread. Locking multiple locks can introduce deadlocks, so do it with care. When locking +multiple locks at once, follow an ordering that is consistent across the program, otherwise you can introduce deadlocks. Here are the orderings of some important locks: + +* VM lock before ractor_sched_lock +* thread_sched_lock before ractor_sched_lock +* interrupt_lock before timer_th.waiting_lock +* timer_th.waiting_lock before ractor_sched_lock + +These orderings are subject to change, so check the source if you're not sure. On top of this: + +* During each `ubf` (unblock) function, the VM lock can be taken around it in some circumstances. This happens during VM shutdown, for example. +See the "Interrupt Handling" section for more details. + +## Ruby Interrupt Handling + +When the VM runs ruby code, ruby's threads intermittently check ruby-level interrupts. These software interrupts +are for various things in ruby and they can be set by other ruby threads or the timer thread. + +* Ruby threads check when they should give up their timeslice. The native thread switches to another ruby thread when their time is up. +* The timer thread sends a "trap" interrupt to the main thread if any ruby-level signal handlers are pending. +* Ruby threads can have other ruby threads run tasks for them by sending them an interrupt. For instance, ractors send +the main thread an interrupt when they need to `require` a file so that it's done on the main thread. They wait for the +main thread's result. +* During VM shutdown, a "terminate" interrupt is sent to all ractor main threads top stop them asap. +* When calling `Thread#raise`, the caller sends an interrupt to that thread telling it which exception to raise. +* Unlocking a mutex sends the next waiter (if any) an interrupt telling it to grab the lock. +* Signalling or broadcasting on a condition variable tells the waiter(s) to wake up. + +This isn't a complete list. + +When sending an interrupt to a ruby thread, the ruby thread can be blocked. For example, it could be in the middle of a `TCPSocket#read` call. If so, +the receiving thread's `ubf` (unblock function) gets called from the thread (ruby thread or timer thread) that sent the interrupt. +Each ruby thread has a `ubf` that is set when it enters a blocking operation and is unset after returning from it. By default, this `ubf` function sends a +`SIGVTALRM` to the receiving thread to try to unblock it from the kernel so it can check its interrupts. There are other `ubfs` that +aren't associated with a syscall, such as when calling `Ractor#join` or `sleep`. All `ubfs` are called with the `interrupt_lock` held, +so take that into account when using locks inside `ubfs`. + +Remember, `ubfs` can be called from the timer thread so you cannot assume an `ec` inside them. The `ec` (execution context) is only set on ruby threads. + +## The Timer Thread + +The timer thread has a few functions. They are: + +* Send interrupts to ruby threads that have run for their whole timeslice. +* Wake up M:N ruby threads (threads in non-main ractors) blocked on IO or after a specified timeout. This +uses `kqueue` or `epoll`, depending on the OS, to receive IO events on behalf of the threads. +* Continue calling the `SIGVTARLM` signal if a thread is still blocked on a syscall after the first `ubf` call. +* Signal native threads (`SNT`) waiting on a ractor if there are ractors waiting in the global run queue. +* Create more `SNT`s if some are blocked, like on IO or on `Ractor#join`. diff --git a/doc/contributing/documentation_guide.md b/doc/contributing/documentation_guide.md index 8a73543e6c..7c73ad1c50 100644 --- a/doc/contributing/documentation_guide.md +++ b/doc/contributing/documentation_guide.md @@ -6,8 +6,8 @@ in the Ruby core and in the Ruby standard library. ## Generating documentation -Most Ruby documentation lives in the source files and is written in -[RDoc format](https://ruby.github.io/rdoc/RDoc/MarkupReference.html). +Most Ruby documentation lives in the source files, and is written in RDoc format +(described in the [RDoc Markup Reference]). Some pages live under the `doc` folder and can be written in either `.rdoc` or `.md` format, determined by the file extension. @@ -20,12 +20,19 @@ build directory: make html ``` +Or, to start a live-reloading server that automatically refreshes +the browser when you edit source files: + +```sh +make html-server +``` + +Then visit http://localhost:4000 in your browser. +To use a different port: `make html-server RDOC_SERVER_PORT=8080`. + If you don't have a build directory, follow the [quick start guide](building_ruby.md#label-Quick+start+guide) up to step 4. -Then you can preview your changes by opening -`{build folder}/.ext/html/index.html` file in your browser. - ## Goal The goal of Ruby documentation is to impart the most important @@ -43,14 +50,12 @@ Use your judgment about what the user needs to know. - Write short declarative or imperative sentences. - Group sentences into (ideally short) paragraphs, each covering a single topic. -- Organize material with - [headings]. -- Refer to authoritative and relevant sources using - [links](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Links). +- Organize material with [headings]. +- Refer to authoritative and relevant sources using [links]. - Use simple verb tenses: simple present, simple past, simple future. - Use simple sentence structure, not compound or complex structure. - Avoid: - - Excessive comma-separated phrases; consider a [list]. + - Excessive comma-separated phrases; consider a [list][lists]. - Idioms and culture-specific references. - Overuse of headings. - Using US-ASCII-incompatible characters in C source files; @@ -105,16 +110,16 @@ involving new files `doc/*.rdoc`: */ ``` -### \RDoc +### RDoc Ruby is documented using RDoc. -For information on \RDoc syntax and features, see the -[RDoc Markup Reference](https://ruby.github.io/rdoc/RDoc/MarkupReference.html). +For information on RDoc syntax and features, +see the [RDoc Markup Reference]. ### Output from `irb` For code examples, consider using interactive Ruby, -[irb](https://ruby-doc.org/stdlib/libdoc/irb/rdoc/IRB.html). +[irb]. For a code example that includes `irb` output, consider aligning `# => ...` in successive lines. @@ -133,16 +138,15 @@ Organize a long discussion for a class or module with [headings]. Do not use formal headings in the documentation for a method or constant. In the rare case where heading-like structures are needed -within the documentation for a method or constant, use -[bold text](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Bold) -as pseudo-headings. +within the documentation for a method or constant, +use [bold text] as pseudo-headings. ### Blank Lines A blank line begins a new paragraph. -A [code block](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Code+Blocks) -or [list] should be preceded by and followed by a blank line. +A [code block][code blocks] +or [list][lists] should be preceded by and followed by a blank line. This is unnecessary for the HTML output, but helps in the `ri` output. ### \Method Names @@ -185,7 +189,7 @@ renders as: > - File.new > - File#read. -In general, \RDoc's auto-linking should not be suppressed. +In general, RDoc's auto-linking should not be suppressed. For example, we should write just plain _Float_ (which is auto-linked): ```rdoc @@ -265,16 +269,16 @@ and _never_ when referring to the class itself. When writing an explicit link, follow these guidelines. -#### +rdoc-ref+ Scheme +#### `rdoc-ref` Scheme -Use the +rdoc-ref+ scheme for: +Use the `rdoc-ref` scheme for: - A link in core documentation to other core documentation. - A link in core documentation to documentation in a standard library package. - A link in a standard library package to other documentation in that same standard library package. -See section "+rdoc-ref+ Scheme" in [links]. +See section "`rdoc-ref` Scheme" in [links]. #### URL-Based Link @@ -291,13 +295,37 @@ The link should lead to a target in https://docs.ruby-lang.org/en/master/. Also use a full URL-based link for a link to an off-site document. +#### Fragments + +In general, a link that includes a [fragment][fragment] +must cite the exact identifier on the target page; +otherwise, the browser finds no suitable identifier, +and does not scroll to the desired part of the page. + +However, certain pages on `github.com` and `github.io` +support "fuzzy" identifier matching, so that URL +https://github.com/rdp/ruby_tutorials_core/wiki/Ruby-Talk-FAQ#-why-are-rubys-floats-imprecise, +(whose fragment is `-why-are-rubys-floats-imprecise`) +scrolls to heading "Why are ruby’s floats imprecise?" +even though the identifier there actually is the longer +`#user-content--why-are-rubys-floats-imprecise`. + +Ruby documentation should avoid using these shortened fragments, for two reasons: + +- The GitHub pages that do this implement it using Javascript; + if the user's browser has Javascript disabled + (which some employers actually require), + the shortened fragment is ineffective and the desired scrolling does not occur. +- A program that checks links in Ruby documentation will find no suitable identifier, + and therefore will report the fragment as not found. + ### Variable Names The name of a variable (as specified in its call-seq) should be marked up as [monofont]. Also, use monofont text for the name of a transient variable -(i.e., one defined and used only in the discussion, such as +n+). +(i.e., one defined and used only in the discussion, such as `n`). ### HTML Tags @@ -312,13 +340,12 @@ In particular, avoid building tables with HTML tags Alternatives: -- A {verbatim text block}[https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Verbatim+Text+Blocks], +- A [verbatim text block][verbatim text blocks], using spaces and punctuation to format the text; - note that {text markup}[https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Text+Markup] - will not be honored: + note that [text markup][text markup] will not be honored: - Example {source}[https://github.com/ruby/ruby/blob/34d802f32f00df1ac0220b62f72605827c16bad8/file.c#L6570-L6596]. - - Corresponding {output}[rdoc-ref:File@Read-2FWrite+Mode]. + - Corresponding {output}[rdoc-ref:File@ReadWrite+Mode]. - (Markdown format only): A {Github Flavored Markdown (GFM) table}[https://github.github.com/gfm/#tables-extension-], using special formatting for the text: @@ -326,6 +353,16 @@ Alternatives: - Example {source}[https://github.com/ruby/ruby/blob/34d802f32f00df1ac0220b62f72605827c16bad8/doc/contributing/glossary.md?plain=1]. - Corresponding {output}[https://docs.ruby-lang.org/en/master/contributing/glossary_md.html]. +### Languages in Examples + +For symbols and strings in documentation examples: + +- Prefer \English in \English documentation: <tt>'Hello'</tt>. +- Prefer Japanese in Japanese documentation: <tt>'こんにちは'</tt>. +- If a second language is needed (as, for example, characters with different byte-sizes), + prefer Japanese in \English documentation and \English in Japanese documentation. +- Use other languages examples only as necessary: see String#capitalize. + ## Documenting Classes and Modules The general structure of the class or module documentation should be: @@ -364,9 +401,9 @@ Guidelines: and a short description. - If the method has aliases, mention them in parentheses before the colon (and do not list the aliases separately). - - Check the rendered documentation to determine whether \RDoc has recognized + - Check the rendered documentation to determine whether RDoc has recognized the method and linked to it; if not, manually insert a - [link](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Links). + [link][links]. - If there are numerous entries, consider grouping them into subsections with headings. - If there are more than a few such subsections, @@ -388,11 +425,11 @@ The general structure of the method documentation should be: ### Calling Sequence (for methods written in C) -For methods written in Ruby, \RDoc documents the calling sequence automatically. +For methods written in Ruby, RDoc documents the calling sequence automatically. -For methods written in C, \RDoc cannot determine what arguments -the method accepts, so those need to be documented using \RDoc directive -[`call-seq:`](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Directives+for+Method+Documentation). +For methods written in C, RDoc cannot determine what arguments +the method accepts, so those need to be documented using RDoc directive +[`call-seq:`][call-seq] For a singleton method, use the form: @@ -481,10 +518,10 @@ Return types: - If the method can return multiple different types, separate the types with "or" and, if necessary, commas. -- If the method can return multiple types, use +object+. -- If the method returns the receiver, use +self+. +- If the method can return multiple types, use `object`. +- If the method returns the receiver, use `self`. - If the method returns an object of the same class, - prefix `new_` if and only if the object is not +self+; + prefix `new_` if and only if the object is not `self`; example: `new_array`. Aliases: @@ -567,7 +604,7 @@ argument passed if it is not obvious, not explicitly mentioned in the details, and not implicitly shown in the examples. If there is more than one argument or block argument, use a -[labeled list](https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Labeled+Lists). +[labeled list][lists]. ### Corner Cases and Exceptions @@ -608,6 +645,15 @@ For methods that accept multiple argument types, in some cases it can be useful to document the different argument types separately. It's best to use a separate paragraph for each case you are discussing. -[headings]: https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Headings -[list]: https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Lists -[monofont]: https://ruby.github.io/rdoc/RDoc/MarkupReference.html#class-RDoc::MarkupReference-label-Monofont +[bold text]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#bold +[call-seq]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#directive-for-specifying-rdoc-source-format +[code blocks]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#code-blocks +[fragment]: https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Fragment +[headings]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#headings +[irb]: https://ruby.github.io/irb/index.html +[links]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#links +[lists]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#lists +[monofont]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#monofont +[RDoc Markup Reference]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html +[text markup]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#text-markup +[verbatim text blocks]: https://ruby.github.io/rdoc/doc/markup_reference/rdoc_rdoc.html#verbatim-text-blocks diff --git a/doc/dtrace_probes.rdoc b/doc/contributing/dtrace_probes.rdoc index 1b20597ab4..1b20597ab4 100644 --- a/doc/dtrace_probes.rdoc +++ b/doc/contributing/dtrace_probes.rdoc diff --git a/doc/contributing/glossary.md b/doc/contributing/glossary.md index 6f9c335028..3ec9796147 100644 --- a/doc/contributing/glossary.md +++ b/doc/contributing/glossary.md @@ -4,15 +4,17 @@ Just a list of acronyms I've run across in the Ruby source code and their meanin | Term | Definition | | --- | -----------| +| `bmethod` | Method defined by `define_method() {}` (a Block that runs as a Method). | | `BIN` | Basic Instruction Name. Used as a macro to reference the YARV instruction. Converts pop into YARVINSN_pop. | | `bop` | Basic Operator. Relates to methods like `Integer` plus and minus which can be optimized as long as they haven't been redefined. | | `cc` | Call Cache. An inline cache structure for the call site. Stored in the `cd` | | `cd` | Call Data. A data structure that points at the `ci` and the `cc`. `iseq` objects points at the `cd`, and access call information and call caches via this structure | | CFG | Control Flow Graph. Representation of the program where all control-flow and data dependencies have been made explicit by unrolling the stack and local variables. | | `cfp`| Control Frame Pointer. Represents a Ruby stack frame. Calling a method pushes a new frame (cfp), returning pops a frame. Points at the `pc`, `sp`, `ep`, and the corresponding `iseq`| -| `ci` | Call Information. Refers to an `rb_callinfo` struct. Contains call information about the call site, including number of parameters to be passed, whether it they are keyword arguments or not, etc. Used in conjunction with the `cc` and `cd`. | +| `ci` | Call Information. Refers to an `rb_callinfo` struct. Contains call information about the call site, including number of parameters to be passed, whether they are keyword arguments or not, etc. Used in conjunction with the `cc` and `cd`. | +| `cme` | Callable Method Entry. Refers to the `rb_callable_method_entry_t` struct, the internal representation of a Ruby method that has `defined_class` and `owner` set and is ready for dispatch. | | `cref` | Class reference. A structure pointing to the class reference where `klass_or_self`, visibility scope, and refinements are stored. It also stores a pointer to the next class in the hierarchy referenced by `rb_cref_struct * next`. The Class reference is lexically scoped. | -| CRuby | Implementation of Ruby written in C | +| CRuby | Reference implementation of Ruby written in C | | `cvar` | Class Variable. Refers to a Ruby class variable like `@@foo` | | `dvar` | Dynamic Variable. Used by the parser to refer to local variables that are defined outside of the current lexical scope. For example `def foo; bar = 1; -> { p bar }; end` the "bar" inside the block is a `dvar` | | `ec` | Execution Context. The top level VM context, points at the current `cfp` | @@ -33,10 +35,13 @@ Just a list of acronyms I've run across in the Ruby source code and their meanin | `me` | Method Entry. Refers to an `rb_method_entry_t` struct, the internal representation of a Ruby method. | | MRI | Matz's Ruby Implementation | | `pc` | Program Counter. Usually the instruction that will be executed _next_ by the VM. Pointed to by the `cfp` and incremented by the VM | +| `snt` | Shared Native Thread. OS thread on which many ruby threads can run. Ruby threads from different ractors can even run on the same SNT. Ruby threads can switch SNTs when they context switch. SNTs are used in the M:N threading model. By default, non-main ractors use this model. +| `dnt` | Dedicated Native Thread. OS thread on which only one ruby thread can run. The ruby thread always runs on that same OS thread. DNTs are used in the 1:1 threading model. By default, the main ractor uses this model. | `sp` | Stack Pointer. The top of the stack. The VM executes instructions in the `iseq` and instructions will push and pop values on the stack. The VM updates the `sp` on the `cfp` to point at the top of the stack| +| ST table | ST table is the main C implementation of a hash (smaller Ruby hashes may be backed by AR tables). | | `svar` | Special Variable. Refers to special local variables like `$~` and `$_`. See the `getspecial` instruction in `insns.def` | | `VALUE` | VALUE is a pointer to a ruby object from the Ruby C code. | -| VM | Virtual Machine. In MRI's case YARV (Yet Another Ruby VM) +| VM | Virtual Machine. In MRI's case YARV (Yet Another Ruby VM) | WB | Write Barrier. To do with GC write barriers | | WC | Wild Card. As seen in instructions like `getlocal_WC_0`. It means this instruction takes a "wild card" for the parameter (in this case an index for a local) | | YARV | Yet Another Ruby VM. The virtual machine that CRuby uses | diff --git a/doc/contributing/making_changes_to_stdlibs.md b/doc/contributing/making_changes_to_stdlibs.md index 2156a61e39..2ceb2e6075 100644 --- a/doc/contributing/making_changes_to_stdlibs.md +++ b/doc/contributing/making_changes_to_stdlibs.md @@ -4,7 +4,7 @@ Everything in the [lib](https://github.com/ruby/ruby/tree/master/lib) directory If you'd like to make contributions to standard libraries, do so in the standalone repositories, and the changes will be automatically mirrored into the Ruby repository. -For example, CSV lives in [a separate repository](https://github.com/ruby/csv) and is mirrored into [Ruby](https://github.com/ruby/ruby/tree/master/lib/csv). +For example, ERB lives in [a separate repository](https://github.com/ruby/erb) and is mirrored into [Ruby](https://github.com/ruby/ruby/tree/master/lib/erb). ## Maintainers diff --git a/doc/memory_view.md b/doc/contributing/memory_view.md index 0b1369163d..0b1369163d 100644 --- a/doc/memory_view.md +++ b/doc/contributing/memory_view.md diff --git a/doc/contributing/vm_stack_and_frames.md b/doc/contributing/vm_stack_and_frames.md new file mode 100644 index 0000000000..c7dc59db16 --- /dev/null +++ b/doc/contributing/vm_stack_and_frames.md @@ -0,0 +1,163 @@ +# Ruby VM Stack and Frame Layout + +This document explains the Ruby VM stack architecture, including how the value +stack (SP) and control frames (CFP) share a single contiguous memory region, +and how individual frames are structured. + +## VM Stack Architecture + +The Ruby VM uses a single contiguous stack (`ec->vm_stack`) with two different +regions growing toward each other. Understanding this requires distinguishing +the overall architecture (how CFPs and values share one stack) from individual +frame internals (how values are organized for one single frame). + +```text +High addresses (ec->vm_stack + ec->vm_stack_size) + ↓ + [CFP region starts here] ← RUBY_VM_END_CONTROL_FRAME(ec) + [CFP - 1] New frame pushed here (grows downward) + [CFP - 2] Another frame + ... + + (Unused space - stack overflow when they meet) + + ... Value stack grows UP toward higher addresses + [SP + n] Values pushed here + [ec->cfp->sp] Current executing frame's stack pointer + ↑ +Low addresses (ec->vm_stack) +``` + +The "unused space" represents free space available for new frames and values. When this gap closes (CFP meets SP), stack overflow occurs. + +### Stack Growth Directions + +**Control Frames (CFP):** + +- Start at `ec->vm_stack + ec->vm_stack_size` (high addresses) +- Grow **downward** toward lower addresses as frames are pushed +- Each new frame is allocated at `cfp - 1` (lower address) +- The `rb_control_frame_t` structure itself moves downward + +**Value Stack (SP):** + +- Starts at `ec->vm_stack` (low addresses) +- Grows **upward** toward higher addresses as values are pushed +- Each frame's `cfp->sp` points to the top of its value stack + +### Stack Overflow + +When recursive calls push too many frames, CFP grows downward until it collides +with SP growing upward. The VM detects this with `CHECK_VM_STACK_OVERFLOW0`, +which computes `const rb_control_frame_struct *bound = (void *)&sp[margin];` +and raises if `cfp <= &bound[1]`. + +## Understanding Individual Frame Value Stacks + +Each frame has its own portion of the overall VM stack, called its "VM value stack" +or simply "value stack". This space is pre-allocated when the frame is created, +with size determined by: + +- `local_size` - space for local variables +- `stack_max` - maximum depth for temporary values during execution + +The frame's value stack grows upward from its base (where self/arguments/locals +live) toward `cfp->sp` (the current top of temporary values). + +## Visualizing How Frames Fit in the VM Stack + +The left side shows the overall VM stack with CFP metadata separated from frame +values. The right side zooms into one frame's value region, revealing its internal +structure. + +```text +Overall VM Stack (ec->vm_stack): Zooming into Frame 2's value stack: + +High addr (vm_stack + vm_stack_size) High addr (cfp->sp) + ↓ ┌ + [CFP 1 metadata] │ [Temporaries] + [CFP 2 metadata] ─────────┐ │ [Env: Flags/Block/CME] ← cfp->ep + [CFP 3 metadata] │ │ [Locals] + ──────────────── │ ┌─┤ [Arguments] + (unused space) │ │ │ [self] + ──────────────── │ │ └ + [Frame 3 values] │ │ Low addr (frame base) + [Frame 2 values] <────────┴───────┘ + [Frame 1 values] + ↑ +Low addr (vm_stack) +``` + +## Examining a Single Frame's Value Stack + +Now let's walk through a concrete Ruby program to see how a single frame's +value stack is structured internally: + +```ruby +def foo(x, y) + z = x.casecmp(y) +end + +foo(:one, :two) +``` + +First, after arguments are evaluated and right before the `send` to `foo`: + +```text + ┌────────────┐ + putself │ :two │ + putobject :one 0x2 ├────────────┤ + putobject :two │ :one │ +► send <:foo, argc:2> 0x1 ├────────────┤ + leave │ self │ + 0x0 └────────────┘ +``` + +The `put*` instructions have pushed 3 items onto the stack. It's now time to +add a new control frame for `foo`. The following is the shape of the stack +after one instruction in `foo`: + +```text + cfp->sp=0x8 at this point. + 0x8 ┌────────────┐◄──Stack space for temporaries + │ :one │ live above the environment. + 0x7 ├────────────┤ + getlocal x@0 │ < flags > │ foo's rb_control_frame_t +► getlocal y@1 0x6 ├────────────┤◄──has cfp->ep=0x6 + send <:casecmp, argc:1> │ <no block> │ + dup 0x5 ├────────────┤ The flags, block, and CME triple + setlocal z@2 │ <CME: foo> │ (VM_ENV_DATA_SIZE) form an + leave 0x4 ├────────────┤ environment. They can be used to + │ z (nil) │ figure out what local variables + 0x3 ├────────────┤ are below them. + │ :two │ + 0x2 ├────────────┤ Notice how the arguments, now + │ :one │ locals, never moved. This layout + 0x1 ├────────────┤ allows for argument transfer + │ self │ without copying. + 0x0 └────────────┘ +``` + +Given that locals have lower address than `cfp->ep`, it makes sense then that +`getlocal` in `insns.def` has `val = *(vm_get_ep(GET_EP(), level) - idx);`. +When accessing variables in the immediate scope, where `level=0`, it's +essentially `val = cfp->ep[-idx];`. + +Note that this EP-relative index has a different basis than the index that comes +after "@" in disassembly listings. The "@" index is relative to the 0th local +(`x` in this case). + +### Q&A + +Q: It seems that the receiver is always at an offset relative to EP, + like locals. Couldn't we use EP to access it instead of using `cfp->self`? + +A: Not all calls put the `self` in the callee on the stack. Two + examples are `Proc#call`, where the receiver is the Proc object, but `self` + inside the callee is `Proc#receiver`, and `yield`, where the receiver isn't + pushed onto the stack before the arguments. + +Q: Why have `cfp->ep` when it seems that everything is below `cfp->sp`? + +A: In the example, `cfp->ep` points to the stack, but it can also point to the + GC heap. Blocks can capture and evacuate their environment to the heap. diff --git a/doc/csv/arguments/io.rdoc b/doc/csv/arguments/io.rdoc deleted file mode 100644 index f5fe1d1975..0000000000 --- a/doc/csv/arguments/io.rdoc +++ /dev/null @@ -1,5 +0,0 @@ -* Argument +io+ should be an IO object that is: - * Open for reading; on return, the IO object will be closed. - * Positioned at the beginning. - To position at the end, for appending, use method CSV.generate. - For any other positioning, pass a preset \StringIO object instead. diff --git a/doc/csv/options/common/col_sep.rdoc b/doc/csv/options/common/col_sep.rdoc deleted file mode 100644 index 3f23c6d2d3..0000000000 --- a/doc/csv/options/common/col_sep.rdoc +++ /dev/null @@ -1,57 +0,0 @@ -====== Option +col_sep+ - -Specifies the \String field separator to be used -for both parsing and generating. -The \String will be transcoded into the data's \Encoding before use. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:col_sep) # => "," (comma) - -Using the default (comma): - str = CSV.generate do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0\nbar,1\nbaz,2\n" - ary = CSV.parse(str) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using +:+ (colon): - col_sep = ':' - str = CSV.generate(col_sep: col_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo:0\nbar:1\nbaz:2\n" - ary = CSV.parse(str, col_sep: col_sep) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using +::+ (two colons): - col_sep = '::' - str = CSV.generate(col_sep: col_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo::0\nbar::1\nbaz::2\n" - ary = CSV.parse(str, col_sep: col_sep) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using <tt>''</tt> (empty string): - col_sep = '' - str = CSV.generate(col_sep: col_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo0\nbar1\nbaz2\n" - ---- - -Raises an exception if parsing with the empty \String: - col_sep = '' - # Raises ArgumentError (:col_sep must be 1 or more characters: "") - CSV.parse("foo0\nbar1\nbaz2\n", col_sep: col_sep) - diff --git a/doc/csv/options/common/quote_char.rdoc b/doc/csv/options/common/quote_char.rdoc deleted file mode 100644 index 67fd3af68b..0000000000 --- a/doc/csv/options/common/quote_char.rdoc +++ /dev/null @@ -1,42 +0,0 @@ -====== Option +quote_char+ - -Specifies the character (\String of length 1) used used to quote fields -in both parsing and generating. -This String will be transcoded into the data's \Encoding before use. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:quote_char) # => "\"" (double quote) - -This is useful for an application that incorrectly uses <tt>'</tt> (single-quote) -to quote fields, instead of the correct <tt>"</tt> (double-quote). - -Using the default (double quote): - str = CSV.generate do |csv| - csv << ['foo', 0] - csv << ["'bar'", 1] - csv << ['"baz"', 2] - end - str # => "foo,0\n'bar',1\n\"\"\"baz\"\"\",2\n" - ary = CSV.parse(str) - ary # => [["foo", "0"], ["'bar'", "1"], ["\"baz\"", "2"]] - -Using <tt>'</tt> (single-quote): - quote_char = "'" - str = CSV.generate(quote_char: quote_char) do |csv| - csv << ['foo', 0] - csv << ["'bar'", 1] - csv << ['"baz"', 2] - end - str # => "foo,0\n'''bar''',1\n\"baz\",2\n" - ary = CSV.parse(str, quote_char: quote_char) - ary # => [["foo", "0"], ["'bar'", "1"], ["\"baz\"", "2"]] - ---- - -Raises an exception if the \String length is greater than 1: - # Raises ArgumentError (:quote_char has to be nil or a single character String) - CSV.new('', quote_char: 'xx') - -Raises an exception if the value is not a \String: - # Raises ArgumentError (:quote_char has to be nil or a single character String) - CSV.new('', quote_char: :foo) diff --git a/doc/csv/options/common/row_sep.rdoc b/doc/csv/options/common/row_sep.rdoc deleted file mode 100644 index eae15b4a84..0000000000 --- a/doc/csv/options/common/row_sep.rdoc +++ /dev/null @@ -1,91 +0,0 @@ -====== Option +row_sep+ - -Specifies the row separator, a \String or the \Symbol <tt>:auto</tt> (see below), -to be used for both parsing and generating. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:row_sep) # => :auto - ---- - -When +row_sep+ is a \String, that \String becomes the row separator. -The String will be transcoded into the data's Encoding before use. - -Using <tt>"\n"</tt>: - row_sep = "\n" - str = CSV.generate(row_sep: row_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0\nbar,1\nbaz,2\n" - ary = CSV.parse(str) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using <tt>|</tt> (pipe): - row_sep = '|' - str = CSV.generate(row_sep: row_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0|bar,1|baz,2|" - ary = CSV.parse(str, row_sep: row_sep) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using <tt>--</tt> (two hyphens): - row_sep = '--' - str = CSV.generate(row_sep: row_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0--bar,1--baz,2--" - ary = CSV.parse(str, row_sep: row_sep) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using <tt>''</tt> (empty string): - row_sep = '' - str = CSV.generate(row_sep: row_sep) do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0bar,1baz,2" - ary = CSV.parse(str, row_sep: row_sep) - ary # => [["foo", "0bar", "1baz", "2"]] - ---- - -When +row_sep+ is the \Symbol +:auto+ (the default), -generating uses <tt>"\n"</tt> as the row separator: - str = CSV.generate do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0\nbar,1\nbaz,2\n" - -Parsing, on the other hand, invokes auto-discovery of the row separator. - -Auto-discovery reads ahead in the data looking for the next <tt>\r\n</tt>, +\n+, or +\r+ sequence. -The sequence will be selected even if it occurs in a quoted field, -assuming that you would have the same line endings there. - -Example: - str = CSV.generate do |csv| - csv << [:foo, 0] - csv << [:bar, 1] - csv << [:baz, 2] - end - str # => "foo,0\nbar,1\nbaz,2\n" - ary = CSV.parse(str) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -The default <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>) is used -if any of the following is true: -* None of those sequences is found. -* Data is +ARGF+, +STDIN+, +STDOUT+, or +STDERR+. -* The stream is only available for output. - -Obviously, discovery takes a little time. Set manually if speed is important. Also note that IO objects should be opened in binary mode on Windows if this feature will be used as the line-ending translation can cause problems with resetting the document position to where it was before the read ahead. diff --git a/doc/csv/options/generating/force_quotes.rdoc b/doc/csv/options/generating/force_quotes.rdoc deleted file mode 100644 index 11afd1a16c..0000000000 --- a/doc/csv/options/generating/force_quotes.rdoc +++ /dev/null @@ -1,17 +0,0 @@ -====== Option +force_quotes+ - -Specifies the boolean that determines whether each output field is to be double-quoted. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:force_quotes) # => false - -For examples in this section: - ary = ['foo', 0, nil] - -Using the default, +false+: - str = CSV.generate_line(ary) - str # => "foo,0,\n" - -Using +true+: - str = CSV.generate_line(ary, force_quotes: true) - str # => "\"foo\",\"0\",\"\"\n" diff --git a/doc/csv/options/generating/quote_empty.rdoc b/doc/csv/options/generating/quote_empty.rdoc deleted file mode 100644 index 4c5645c662..0000000000 --- a/doc/csv/options/generating/quote_empty.rdoc +++ /dev/null @@ -1,12 +0,0 @@ -====== Option +quote_empty+ - -Specifies the boolean that determines whether an empty value is to be double-quoted. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:quote_empty) # => true - -With the default +true+: - CSV.generate_line(['"', ""]) # => "\"\"\"\",\"\"\n" - -With +false+: - CSV.generate_line(['"', ""], quote_empty: false) # => "\"\"\"\",\n" diff --git a/doc/csv/options/generating/write_converters.rdoc b/doc/csv/options/generating/write_converters.rdoc deleted file mode 100644 index d1a9cc748f..0000000000 --- a/doc/csv/options/generating/write_converters.rdoc +++ /dev/null @@ -1,25 +0,0 @@ -====== Option +write_converters+ - -Specifies converters to be used in generating fields. -See {Write Converters}[#class-CSV-label-Write+Converters] - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:write_converters) # => nil - -With no write converter: - str = CSV.generate_line(["\na\n", "\tb\t", " c "]) - str # => "\"\na\n\",\tb\t, c \n" - -With a write converter: - strip_converter = proc {|field| field.strip } - str = CSV.generate_line(["\na\n", "\tb\t", " c "], write_converters: strip_converter) - str # => "a,b,c\n" - -With two write converters (called in order): - upcase_converter = proc {|field| field.upcase } - downcase_converter = proc {|field| field.downcase } - write_converters = [upcase_converter, downcase_converter] - str = CSV.generate_line(['a', 'b', 'c'], write_converters: write_converters) - str # => "a,b,c\n" - -See also {Write Converters}[#class-CSV-label-Write+Converters] diff --git a/doc/csv/options/generating/write_empty_value.rdoc b/doc/csv/options/generating/write_empty_value.rdoc deleted file mode 100644 index 67be5662cb..0000000000 --- a/doc/csv/options/generating/write_empty_value.rdoc +++ /dev/null @@ -1,15 +0,0 @@ -====== Option +write_empty_value+ - -Specifies the object that is to be substituted for each field -that has an empty \String. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:write_empty_value) # => "" - -Without the option: - str = CSV.generate_line(['a', '', 'c', '']) - str # => "a,\"\",c,\"\"\n" - -With the option: - str = CSV.generate_line(['a', '', 'c', ''], write_empty_value: "x") - str # => "a,x,c,x\n" diff --git a/doc/csv/options/generating/write_headers.rdoc b/doc/csv/options/generating/write_headers.rdoc deleted file mode 100644 index c56aa48adb..0000000000 --- a/doc/csv/options/generating/write_headers.rdoc +++ /dev/null @@ -1,29 +0,0 @@ -====== Option +write_headers+ - -Specifies the boolean that determines whether a header row is included in the output; -ignored if there are no headers. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:write_headers) # => nil - -Without +write_headers+: - file_path = 't.csv' - CSV.open(file_path,'w', - :headers => ['Name','Value'] - ) do |csv| - csv << ['foo', '0'] - end - CSV.open(file_path) do |csv| - csv.shift - end # => ["foo", "0"] - -With +write_headers+": - CSV.open(file_path,'w', - :write_headers => true, - :headers => ['Name','Value'] - ) do |csv| - csv << ['foo', '0'] - end - CSV.open(file_path) do |csv| - csv.shift - end # => ["Name", "Value"] diff --git a/doc/csv/options/generating/write_nil_value.rdoc b/doc/csv/options/generating/write_nil_value.rdoc deleted file mode 100644 index 65d33ff54e..0000000000 --- a/doc/csv/options/generating/write_nil_value.rdoc +++ /dev/null @@ -1,14 +0,0 @@ -====== Option +write_nil_value+ - -Specifies the object that is to be substituted for each +nil+-valued field. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:write_nil_value) # => nil - -Without the option: - str = CSV.generate_line(['a', nil, 'c', nil]) - str # => "a,,c,\n" - -With the option: - str = CSV.generate_line(['a', nil, 'c', nil], write_nil_value: "x") - str # => "a,x,c,x\n" diff --git a/doc/csv/options/parsing/converters.rdoc b/doc/csv/options/parsing/converters.rdoc deleted file mode 100644 index 211fa48de6..0000000000 --- a/doc/csv/options/parsing/converters.rdoc +++ /dev/null @@ -1,46 +0,0 @@ -====== Option +converters+ - -Specifies converters to be used in parsing fields. -See {Field Converters}[#class-CSV-label-Field+Converters] - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:converters) # => nil - -The value may be a field converter name -(see {Stored Converters}[#class-CSV-label-Stored+Converters]): - str = '1,2,3' - # Without a converter - array = CSV.parse_line(str) - array # => ["1", "2", "3"] - # With built-in converter :integer - array = CSV.parse_line(str, converters: :integer) - array # => [1, 2, 3] - -The value may be a converter list -(see {Converter Lists}[#class-CSV-label-Converter+Lists]): - str = '1,3.14159' - # Without converters - array = CSV.parse_line(str) - array # => ["1", "3.14159"] - # With built-in converters - array = CSV.parse_line(str, converters: [:integer, :float]) - array # => [1, 3.14159] - -The value may be a \Proc custom converter: -(see {Custom Field Converters}[#class-CSV-label-Custom+Field+Converters]): - str = ' foo , bar , baz ' - # Without a converter - array = CSV.parse_line(str) - array # => [" foo ", " bar ", " baz "] - # With a custom converter - array = CSV.parse_line(str, converters: proc {|field| field.strip }) - array # => ["foo", "bar", "baz"] - -See also {Custom Field Converters}[#class-CSV-label-Custom+Field+Converters] - ---- - -Raises an exception if the converter is not a converter name or a \Proc: - str = 'foo,0' - # Raises NoMethodError (undefined method `arity' for nil:NilClass) - CSV.parse(str, converters: :foo) diff --git a/doc/csv/options/parsing/empty_value.rdoc b/doc/csv/options/parsing/empty_value.rdoc deleted file mode 100644 index 7d3bcc078c..0000000000 --- a/doc/csv/options/parsing/empty_value.rdoc +++ /dev/null @@ -1,13 +0,0 @@ -====== Option +empty_value+ - -Specifies the object that is to be substituted -for each field that has an empty \String. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:empty_value) # => "" (empty string) - -With the default, <tt>""</tt>: - CSV.parse_line('a,"",b,"",c') # => ["a", "", "b", "", "c"] - -With a different object: - CSV.parse_line('a,"",b,"",c', empty_value: 'x') # => ["a", "x", "b", "x", "c"] diff --git a/doc/csv/options/parsing/field_size_limit.rdoc b/doc/csv/options/parsing/field_size_limit.rdoc deleted file mode 100644 index 797c5776fc..0000000000 --- a/doc/csv/options/parsing/field_size_limit.rdoc +++ /dev/null @@ -1,39 +0,0 @@ -====== Option +field_size_limit+ - -Specifies the \Integer field size limit. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:field_size_limit) # => nil - -This is a maximum size CSV will read ahead looking for the closing quote for a field. -(In truth, it reads to the first line ending beyond this size.) -If a quote cannot be found within the limit CSV will raise a MalformedCSVError, -assuming the data is faulty. -You can use this limit to prevent what are effectively DoS attacks on the parser. -However, this limit can cause a legitimate parse to fail; -therefore the default value is +nil+ (no limit). - -For the examples in this section: - str = <<~EOT - "a","b" - " - 2345 - ","" - EOT - str # => "\"a\",\"b\"\n\"\n2345\n\",\"\"\n" - -Using the default +nil+: - ary = CSV.parse(str) - ary # => [["a", "b"], ["\n2345\n", ""]] - -Using <tt>50</tt>: - field_size_limit = 50 - ary = CSV.parse(str, field_size_limit: field_size_limit) - ary # => [["a", "b"], ["\n2345\n", ""]] - ---- - -Raises an exception if a field is too long: - big_str = "123456789\n" * 1024 - # Raises CSV::MalformedCSVError (Field size exceeded in line 1.) - CSV.parse('valid,fields,"' + big_str + '"', field_size_limit: 2048) diff --git a/doc/csv/options/parsing/header_converters.rdoc b/doc/csv/options/parsing/header_converters.rdoc deleted file mode 100644 index 309180805f..0000000000 --- a/doc/csv/options/parsing/header_converters.rdoc +++ /dev/null @@ -1,43 +0,0 @@ -====== Option +header_converters+ - -Specifies converters to be used in parsing headers. -See {Header Converters}[#class-CSV-label-Header+Converters] - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:header_converters) # => nil - -Identical in functionality to option {converters}[#class-CSV-label-Option+converters] -except that: -- The converters apply only to the header row. -- The built-in header converters are +:downcase+ and +:symbol+. - -This section assumes prior execution of: - str = <<-EOT - Name,Value - foo,0 - bar,1 - baz,2 - EOT - # With no header converter - table = CSV.parse(str, headers: true) - table.headers # => ["Name", "Value"] - -The value may be a header converter name -(see {Stored Converters}[#class-CSV-label-Stored+Converters]): - table = CSV.parse(str, headers: true, header_converters: :downcase) - table.headers # => ["name", "value"] - -The value may be a converter list -(see {Converter Lists}[#class-CSV-label-Converter+Lists]): - header_converters = [:downcase, :symbol] - table = CSV.parse(str, headers: true, header_converters: header_converters) - table.headers # => [:name, :value] - -The value may be a \Proc custom converter -(see {Custom Header Converters}[#class-CSV-label-Custom+Header+Converters]): - upcase_converter = proc {|field| field.upcase } - table = CSV.parse(str, headers: true, header_converters: upcase_converter) - table.headers # => ["NAME", "VALUE"] - -See also {Custom Header Converters}[#class-CSV-label-Custom+Header+Converters] - diff --git a/doc/csv/options/parsing/headers.rdoc b/doc/csv/options/parsing/headers.rdoc deleted file mode 100644 index 0ea151f24b..0000000000 --- a/doc/csv/options/parsing/headers.rdoc +++ /dev/null @@ -1,63 +0,0 @@ -====== Option +headers+ - -Specifies a boolean, \Symbol, \Array, or \String to be used -to define column headers. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:headers) # => false - ---- - -Without +headers+: - str = <<-EOT - Name,Count - foo,0 - bar,1 - bax,2 - EOT - csv = CSV.new(str) - csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\""> - csv.headers # => nil - csv.shift # => ["Name", "Count"] - ---- - -If set to +true+ or the \Symbol +:first_row+, -the first row of the data is treated as a row of headers: - str = <<-EOT - Name,Count - foo,0 - bar,1 - bax,2 - EOT - csv = CSV.new(str, headers: true) - csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:2 col_sep:"," row_sep:"\n" quote_char:"\"" headers:["Name", "Count"]> - csv.headers # => ["Name", "Count"] - csv.shift # => #<CSV::Row "Name":"bar" "Count":"1"> - ---- - -If set to an \Array, the \Array elements are treated as headers: - str = <<-EOT - foo,0 - bar,1 - bax,2 - EOT - csv = CSV.new(str, headers: ['Name', 'Count']) - csv - csv.headers # => ["Name", "Count"] - csv.shift # => #<CSV::Row "Name":"bar" "Count":"1"> - ---- - -If set to a \String +str+, method <tt>CSV::parse_line(str, options)</tt> is called -with the current +options+, and the returned \Array is treated as headers: - str = <<-EOT - foo,0 - bar,1 - bax,2 - EOT - csv = CSV.new(str, headers: 'Name,Count') - csv - csv.headers # => ["Name", "Count"] - csv.shift # => #<CSV::Row "Name":"bar" "Count":"1"> diff --git a/doc/csv/options/parsing/liberal_parsing.rdoc b/doc/csv/options/parsing/liberal_parsing.rdoc deleted file mode 100644 index 603de28613..0000000000 --- a/doc/csv/options/parsing/liberal_parsing.rdoc +++ /dev/null @@ -1,38 +0,0 @@ -====== Option +liberal_parsing+ - -Specifies the boolean or hash value that determines whether -CSV will attempt to parse input not conformant with RFC 4180, -such as double quotes in unquoted fields. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:liberal_parsing) # => false - -For the next two examples: - str = 'is,this "three, or four",fields' - -Without +liberal_parsing+: - # Raises CSV::MalformedCSVError (Illegal quoting in str 1.) - CSV.parse_line(str) - -With +liberal_parsing+: - ary = CSV.parse_line(str, liberal_parsing: true) - ary # => ["is", "this \"three", " or four\"", "fields"] - -Use the +backslash_quote+ sub-option to parse values that use -a backslash to escape a double-quote character. This -causes the parser to treat <code>\"</code> as if it were -<code>""</code>. - -For the next two examples: - str = 'Show,"Harry \"Handcuff\" Houdini, the one and only","Tampa Theater"' - -With +liberal_parsing+, but without the +backslash_quote+ sub-option: - # Incorrect interpretation of backslash; incorrectly interprets the quoted comma as a field separator. - ary = CSV.parse_line(str, liberal_parsing: true) - ary # => ["Show", "\"Harry \\\"Handcuff\\\" Houdini", " the one and only\"", "Tampa Theater"] - puts ary[1] # => "Harry \"Handcuff\" Houdini - -With +liberal_parsing+ and its +backslash_quote+ sub-option: - ary = CSV.parse_line(str, liberal_parsing: { backslash_quote: true }) - ary # => ["Show", "Harry \"Handcuff\" Houdini, the one and only", "Tampa Theater"] - puts ary[1] # => Harry "Handcuff" Houdini, the one and only diff --git a/doc/csv/options/parsing/nil_value.rdoc b/doc/csv/options/parsing/nil_value.rdoc deleted file mode 100644 index 412e8795e8..0000000000 --- a/doc/csv/options/parsing/nil_value.rdoc +++ /dev/null @@ -1,12 +0,0 @@ -====== Option +nil_value+ - -Specifies the object that is to be substituted for each null (no-text) field. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:nil_value) # => nil - -With the default, +nil+: - CSV.parse_line('a,,b,,c') # => ["a", nil, "b", nil, "c"] - -With a different object: - CSV.parse_line('a,,b,,c', nil_value: 0) # => ["a", 0, "b", 0, "c"] diff --git a/doc/csv/options/parsing/return_headers.rdoc b/doc/csv/options/parsing/return_headers.rdoc deleted file mode 100644 index 45d2e3f3de..0000000000 --- a/doc/csv/options/parsing/return_headers.rdoc +++ /dev/null @@ -1,22 +0,0 @@ -====== Option +return_headers+ - -Specifies the boolean that determines whether method #shift -returns or ignores the header row. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:return_headers) # => false - -Examples: - str = <<-EOT - Name,Count - foo,0 - bar,1 - bax,2 - EOT - # Without return_headers first row is str. - csv = CSV.new(str, headers: true) - csv.shift # => #<CSV::Row "Name":"foo" "Count":"0"> - # With return_headers first row is headers. - csv = CSV.new(str, headers: true, return_headers: true) - csv.shift # => #<CSV::Row "Name":"Name" "Count":"Count"> - diff --git a/doc/csv/options/parsing/skip_blanks.rdoc b/doc/csv/options/parsing/skip_blanks.rdoc deleted file mode 100644 index 2c8f7b7bb8..0000000000 --- a/doc/csv/options/parsing/skip_blanks.rdoc +++ /dev/null @@ -1,31 +0,0 @@ -====== Option +skip_blanks+ - -Specifies a boolean that determines whether blank lines in the input will be ignored; -a line that contains a column separator is not considered to be blank. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:skip_blanks) # => false - -See also option {skiplines}[#class-CSV-label-Option+skip_lines]. - -For examples in this section: - str = <<-EOT - foo,0 - - bar,1 - baz,2 - - , - EOT - -Using the default, +false+: - ary = CSV.parse(str) - ary # => [["foo", "0"], [], ["bar", "1"], ["baz", "2"], [], [nil, nil]] - -Using +true+: - ary = CSV.parse(str, skip_blanks: true) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]] - -Using a truthy value: - ary = CSV.parse(str, skip_blanks: :foo) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]] diff --git a/doc/csv/options/parsing/skip_lines.rdoc b/doc/csv/options/parsing/skip_lines.rdoc deleted file mode 100644 index 1481c40a5f..0000000000 --- a/doc/csv/options/parsing/skip_lines.rdoc +++ /dev/null @@ -1,37 +0,0 @@ -====== Option +skip_lines+ - -Specifies an object to use in identifying comment lines in the input that are to be ignored: -* If a \Regexp, ignores lines that match it. -* If a \String, converts it to a \Regexp, ignores lines that match it. -* If +nil+, no lines are considered to be comments. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:skip_lines) # => nil - -For examples in this section: - str = <<-EOT - # Comment - foo,0 - bar,1 - baz,2 - # Another comment - EOT - str # => "# Comment\nfoo,0\nbar,1\nbaz,2\n# Another comment\n" - -Using the default, +nil+: - ary = CSV.parse(str) - ary # => [["# Comment"], ["foo", "0"], ["bar", "1"], ["baz", "2"], ["# Another comment"]] - -Using a \Regexp: - ary = CSV.parse(str, skip_lines: /^#/) - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Using a \String: - ary = CSV.parse(str, skip_lines: '#') - ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - ---- - -Raises an exception if given an object that is not a \Regexp, a \String, or +nil+: - # Raises ArgumentError (:skip_lines has to respond to #match: 0) - CSV.parse(str, skip_lines: 0) diff --git a/doc/csv/options/parsing/strip.rdoc b/doc/csv/options/parsing/strip.rdoc deleted file mode 100644 index 56ae4310c3..0000000000 --- a/doc/csv/options/parsing/strip.rdoc +++ /dev/null @@ -1,15 +0,0 @@ -====== Option +strip+ - -Specifies the boolean value that determines whether -whitespace is stripped from each input field. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:strip) # => false - -With default value +false+: - ary = CSV.parse_line(' a , b ') - ary # => [" a ", " b "] - -With value +true+: - ary = CSV.parse_line(' a , b ', strip: true) - ary # => ["a", "b"] diff --git a/doc/csv/options/parsing/unconverted_fields.rdoc b/doc/csv/options/parsing/unconverted_fields.rdoc deleted file mode 100644 index 3e7f839d49..0000000000 --- a/doc/csv/options/parsing/unconverted_fields.rdoc +++ /dev/null @@ -1,27 +0,0 @@ -====== Option +unconverted_fields+ - -Specifies the boolean that determines whether unconverted field values are to be available. - -Default value: - CSV::DEFAULT_OPTIONS.fetch(:unconverted_fields) # => nil - -The unconverted field values are those found in the source data, -prior to any conversions performed via option +converters+. - -When option +unconverted_fields+ is +true+, -each returned row (\Array or \CSV::Row) has an added method, -+unconverted_fields+, that returns the unconverted field values: - str = <<-EOT - foo,0 - bar,1 - baz,2 - EOT - # Without unconverted_fields - csv = CSV.parse(str, converters: :integer) - csv # => [["foo", 0], ["bar", 1], ["baz", 2]] - csv.first.respond_to?(:unconverted_fields) # => false - # With unconverted_fields - csv = CSV.parse(str, converters: :integer, unconverted_fields: true) - csv # => [["foo", 0], ["bar", 1], ["baz", 2]] - csv.first.respond_to?(:unconverted_fields) # => true - csv.first.unconverted_fields # => ["foo", "0"] diff --git a/doc/csv/recipes/filtering.rdoc b/doc/csv/recipes/filtering.rdoc deleted file mode 100644 index 1552bf0fb8..0000000000 --- a/doc/csv/recipes/filtering.rdoc +++ /dev/null @@ -1,158 +0,0 @@ -== Recipes for Filtering \CSV - -These recipes are specific code examples for specific \CSV filtering tasks. - -For other recipes, see {Recipes for CSV}[./recipes_rdoc.html]. - -All code snippets on this page assume that the following has been executed: - require 'csv' - -=== Contents - -- {Source and Output Formats}[#label-Source+and+Output+Formats] - - {Filtering String to String}[#label-Filtering+String+to+String] - - {Recipe: Filter String to String with Headers}[#label-Recipe-3A+Filter+String+to+String+with+Headers] - - {Recipe: Filter String to String Without Headers}[#label-Recipe-3A+Filter+String+to+String+Without+Headers] - - {Filtering String to IO Stream}[#label-Filtering+String+to+IO+Stream] - - {Recipe: Filter String to IO Stream with Headers}[#label-Recipe-3A+Filter+String+to+IO+Stream+with+Headers] - - {Recipe: Filter String to IO Stream Without Headers}[#label-Recipe-3A+Filter+String+to+IO+Stream+Without+Headers] - - {Filtering IO Stream to String}[#label-Filtering+IO+Stream+to+String] - - {Recipe: Filter IO Stream to String with Headers}[#label-Recipe-3A+Filter+IO+Stream+to+String+with+Headers] - - {Recipe: Filter IO Stream to String Without Headers}[#label-Recipe-3A+Filter+IO+Stream+to+String+Without+Headers] - - {Filtering IO Stream to IO Stream}[#label-Filtering+IO+Stream+to+IO+Stream] - - {Recipe: Filter IO Stream to IO Stream with Headers}[#label-Recipe-3A+Filter+IO+Stream+to+IO+Stream+with+Headers] - - {Recipe: Filter IO Stream to IO Stream Without Headers}[#label-Recipe-3A+Filter+IO+Stream+to+IO+Stream+Without+Headers] - -=== Source and Output Formats - -You can use a Unix-style "filter" for \CSV data. -The filter reads source \CSV data and writes output \CSV data as modified by the filter. -The input and output \CSV data may be any mixture of \Strings and \IO streams. - -==== Filtering \String to \String - -You can filter one \String to another, with or without headers. - -===== Recipe: Filter \String to \String with Headers - -Use class method CSV.filter with option +headers+ to filter a \String to another \String: - in_string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - out_string = '' - CSV.filter(in_string, out_string, headers: true) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - out_string # => "Name,Value\nFOO,0000\nBAR,1111\nBAZ,2222\n" - -===== Recipe: Filter \String to \String Without Headers - -Use class method CSV.filter without option +headers+ to filter a \String to another \String: - in_string = "foo,0\nbar,1\nbaz,2\n" - out_string = '' - CSV.filter(in_string, out_string) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n" - -==== Filtering \String to \IO Stream - -You can filter a \String to an \IO stream, with or without headers. - -===== Recipe: Filter \String to \IO Stream with Headers - -Use class method CSV.filter with option +headers+ to filter a \String to an \IO stream: - in_string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.open(path, 'w') do |out_io| - CSV.filter(in_string, out_io, headers: true) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - p File.read(path) # => "Name,Value\nFOO,0000\nBAR,1111\nBAZ,2222\n" - -===== Recipe: Filter \String to \IO Stream Without Headers - -Use class method CSV.filter without option +headers+ to filter a \String to an \IO stream: - in_string = "foo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.open(path, 'w') do |out_io| - CSV.filter(in_string, out_io) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - p File.read(path) # => "FOO,0000\nBAR,1111\nBAZ,2222\n" - -==== Filtering \IO Stream to \String - -You can filter an \IO stream to a \String, with or without headers. - -===== Recipe: Filter \IO Stream to \String with Headers - -Use class method CSV.filter with option +headers+ to filter an \IO stream to a \String: - in_string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, in_string) - out_string = '' - File.open(path, headers: true) do |in_io| - CSV.filter(in_io, out_string, headers: true) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - out_string # => "Name,Value\nFOO,0000\nBAR,1111\nBAZ,2222\n" - -===== Recipe: Filter \IO Stream to \String Without Headers - -Use class method CSV.filter without option +headers+ to filter an \IO stream to a \String: - in_string = "foo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, in_string) - out_string = '' - File.open(path) do |in_io| - CSV.filter(in_io, out_string) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n" - -==== Filtering \IO Stream to \IO Stream - -You can filter an \IO stream to another \IO stream, with or without headers. - -===== Recipe: Filter \IO Stream to \IO Stream with Headers - -Use class method CSV.filter with option +headers+ to filter an \IO stream to another \IO stream: - in_path = 't.csv' - in_string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - File.write(in_path, in_string) - out_path = 'u.csv' - File.open(in_path) do |in_io| - File.open(out_path, 'w') do |out_io| - CSV.filter(in_io, out_io, headers: true) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - end - p File.read(out_path) # => "Name,Value\nFOO,0000\nBAR,1111\nBAZ,2222\n" - -===== Recipe: Filter \IO Stream to \IO Stream Without Headers - -Use class method CSV.filter without option +headers+ to filter an \IO stream to another \IO stream: - in_path = 't.csv' - in_string = "foo,0\nbar,1\nbaz,2\n" - File.write(in_path, in_string) - out_path = 'u.csv' - File.open(in_path) do |in_io| - File.open(out_path, 'w') do |out_io| - CSV.filter(in_io, out_io) do |row| - row[0] = row[0].upcase - row[1] *= 4 - end - end - end - p File.read(out_path) # => "FOO,0000\nBAR,1111\nBAZ,2222\n" diff --git a/doc/csv/recipes/generating.rdoc b/doc/csv/recipes/generating.rdoc deleted file mode 100644 index e61838d31a..0000000000 --- a/doc/csv/recipes/generating.rdoc +++ /dev/null @@ -1,246 +0,0 @@ -== Recipes for Generating \CSV - -These recipes are specific code examples for specific \CSV generating tasks. - -For other recipes, see {Recipes for CSV}[./recipes_rdoc.html]. - -All code snippets on this page assume that the following has been executed: - require 'csv' - -=== Contents - -- {Output Formats}[#label-Output+Formats] - - {Generating to a String}[#label-Generating+to+a+String] - - {Recipe: Generate to String with Headers}[#label-Recipe-3A+Generate+to+String+with+Headers] - - {Recipe: Generate to String Without Headers}[#label-Recipe-3A+Generate+to+String+Without+Headers] - - {Generating to a File}[#label-Generating+to+a+File] - - {Recipe: Generate to File with Headers}[#label-Recipe-3A+Generate+to+File+with+Headers] - - {Recipe: Generate to File Without Headers}[#label-Recipe-3A+Generate+to+File+Without+Headers] - - {Generating to IO an Stream}[#label-Generating+to+an+IO+Stream] - - {Recipe: Generate to IO Stream with Headers}[#label-Recipe-3A+Generate+to+IO+Stream+with+Headers] - - {Recipe: Generate to IO Stream Without Headers}[#label-Recipe-3A+Generate+to+IO+Stream+Without+Headers] -- {Converting Fields}[#label-Converting+Fields] - - {Recipe: Filter Generated Field Strings}[#label-Recipe-3A+Filter+Generated+Field+Strings] - - {Recipe: Specify Multiple Write Converters}[#label-Recipe-3A+Specify+Multiple+Write+Converters] -- {RFC 4180 Compliance}[#label-RFC+4180+Compliance] - - {Row Separator}[#label-Row+Separator] - - {Recipe: Generate Compliant Row Separator}[#label-Recipe-3A+Generate+Compliant+Row+Separator] - - {Recipe: Generate Non-Compliant Row Separator}[#label-Recipe-3A+Generate+Non-Compliant+Row+Separator] - - {Column Separator}[#label-Column+Separator] - - {Recipe: Generate Compliant Column Separator}[#label-Recipe-3A+Generate+Compliant+Column+Separator] - - {Recipe: Generate Non-Compliant Column Separator}[#label-Recipe-3A+Generate+Non-Compliant+Column+Separator] - - {Quote Character}[#label-Quote+Character] - - {Recipe: Generate Compliant Quote Character}[#label-Recipe-3A+Generate+Compliant+Quote+Character] - - {Recipe: Generate Non-Compliant Quote Character}[#label-Recipe-3A+Generate+Non-Compliant+Quote+Character] - -=== Output Formats - -You can generate \CSV output to a \String, to a \File (via its path), or to an \IO stream. - -==== Generating to a \String - -You can generate \CSV output to a \String, with or without headers. - -===== Recipe: Generate to \String with Headers - -Use class method CSV.generate with option +headers+ to generate to a \String. - -This example uses method CSV#<< to append the rows -that are to be generated: - output_string = CSV.generate('', headers: ['Name', 'Value'], write_headers: true) do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Name,Value\nFoo,0\nBar,1\nBaz,2\n" - -===== Recipe: Generate to \String Without Headers - -Use class method CSV.generate without option +headers+ to generate to a \String. - -This example uses method CSV#<< to append the rows -that are to be generated: - output_string = CSV.generate do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Foo,0\nBar,1\nBaz,2\n" - -==== Generating to a \File - -You can generate /CSV data to a \File, with or without headers. - -===== Recipe: Generate to \File with Headers - -Use class method CSV.open with option +headers+ generate to a \File. - -This example uses method CSV#<< to append the rows -that are to be generated: - path = 't.csv' - CSV.open(path, 'w', headers: ['Name', 'Value'], write_headers: true) do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - p File.read(path) # => "Name,Value\nFoo,0\nBar,1\nBaz,2\n" - -===== Recipe: Generate to \File Without Headers - -Use class method CSV.open without option +headers+ to generate to a \File. - -This example uses method CSV#<< to append the rows -that are to be generated: - path = 't.csv' - CSV.open(path, 'w') do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - p File.read(path) # => "Foo,0\nBar,1\nBaz,2\n" - -==== Generating to an \IO Stream - -You can generate \CSV data to an \IO stream, with or without headers. - -==== Recipe: Generate to \IO Stream with Headers - -Use class method CSV.new with option +headers+ to generate \CSV data to an \IO stream: - path = 't.csv' - File.open(path, 'w') do |file| - csv = CSV.new(file, headers: ['Name', 'Value'], write_headers: true) - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - p File.read(path) # => "Name,Value\nFoo,0\nBar,1\nBaz,2\n" - -===== Recipe: Generate to \IO Stream Without Headers - -Use class method CSV.new without option +headers+ to generate \CSV data to an \IO stream: - path = 't.csv' - File.open(path, 'w') do |file| - csv = CSV.new(file) - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - p File.read(path) # => "Foo,0\nBar,1\nBaz,2\n" - -=== Converting Fields - -You can use _write_ _converters_ to convert fields when generating \CSV. - -==== Recipe: Filter Generated Field Strings - -Use option <tt>:write_converters</tt> and a custom converter to convert field values when generating \CSV. - -This example defines and uses a custom write converter to strip whitespace from generated fields: - strip_converter = proc {|field| field.respond_to?(:strip) ? field.strip : field } - output_string = CSV.generate(write_converters: strip_converter) do |csv| - csv << [' foo ', 0] - csv << [' bar ', 1] - csv << [' baz ', 2] - end - output_string # => "foo,0\nbar,1\nbaz,2\n" - -==== Recipe: Specify Multiple Write Converters - -Use option <tt>:write_converters</tt> and multiple custom converters -to convert field values when generating \CSV. - -This example defines and uses two custom write converters to strip and upcase generated fields: - strip_converter = proc {|field| field.respond_to?(:strip) ? field.strip : field } - upcase_converter = proc {|field| field.respond_to?(:upcase) ? field.upcase : field } - converters = [strip_converter, upcase_converter] - output_string = CSV.generate(write_converters: converters) do |csv| - csv << [' foo ', 0] - csv << [' bar ', 1] - csv << [' baz ', 2] - end - output_string # => "FOO,0\nBAR,1\nBAZ,2\n" - -=== RFC 4180 Compliance - -By default, \CSV generates data that is compliant with -{RFC 4180}[https://www.rfc-editor.org/rfc/rfc4180] -with respect to: -- Column separator. -- Quote character. - -==== Row Separator - -RFC 4180 specifies the row separator CRLF (Ruby <tt>"\r\n"</tt>). - -===== Recipe: Generate Compliant Row Separator - -For strict compliance, use option +:row_sep+ to specify row separator <tt>"\r\n"</tt>: - output_string = CSV.generate('', row_sep: "\r\n") do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Foo,0\r\nBar,1\r\nBaz,2\r\n" - -===== Recipe: Generate Non-Compliant Row Separator - -For data with non-compliant row separators, use option +:row_sep+ with a different value: -This example source uses semicolon (<tt>";'</tt>) as its row separator: - output_string = CSV.generate('', row_sep: ";") do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Foo,0;Bar,1;Baz,2;" - -==== Column Separator - -RFC 4180 specifies column separator COMMA (Ruby <tt>","</tt>). - -===== Recipe: Generate Compliant Column Separator - -Because the \CSV default comma separator is <tt>","</tt>, -you need not specify option +:col_sep+ for compliant data: - output_string = CSV.generate('') do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Foo,0\nBar,1\nBaz,2\n" - -===== Recipe: Generate Non-Compliant Column Separator - -For data with non-compliant column separators, use option +:col_sep+. -This example source uses TAB (<tt>"\t"</tt>) as its column separator: - output_string = CSV.generate('', col_sep: "\t") do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "Foo\t0\nBar\t1\nBaz\t2\n" - -==== Quote Character - -RFC 4180 specifies quote character DQUOTE (Ruby <tt>"\""</tt>). - -===== Recipe: Generate Compliant Quote Character - -Because the \CSV default quote character is <tt>"\""</tt>, -you need not specify option +:quote_char+ for compliant data: - output_string = CSV.generate('', force_quotes: true) do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "\"Foo\",\"0\"\n\"Bar\",\"1\"\n\"Baz\",\"2\"\n" - -===== Recipe: Generate Non-Compliant Quote Character - -For data with non-compliant quote characters, use option +:quote_char+. -This example source uses SQUOTE (<tt>"'"</tt>) as its quote character: - output_string = CSV.generate('', quote_char: "'", force_quotes: true) do |csv| - csv << ['Foo', 0] - csv << ['Bar', 1] - csv << ['Baz', 2] - end - output_string # => "'Foo','0'\n'Bar','1'\n'Baz','2'\n" diff --git a/doc/csv/recipes/parsing.rdoc b/doc/csv/recipes/parsing.rdoc deleted file mode 100644 index 1b7071e33f..0000000000 --- a/doc/csv/recipes/parsing.rdoc +++ /dev/null @@ -1,545 +0,0 @@ -== Recipes for Parsing \CSV - -These recipes are specific code examples for specific \CSV parsing tasks. - -For other recipes, see {Recipes for CSV}[./recipes_rdoc.html]. - -All code snippets on this page assume that the following has been executed: - require 'csv' - -=== Contents - -- {Source Formats}[#label-Source+Formats] - - {Parsing from a String}[#label-Parsing+from+a+String] - - {Recipe: Parse from String with Headers}[#label-Recipe-3A+Parse+from+String+with+Headers] - - {Recipe: Parse from String Without Headers}[#label-Recipe-3A+Parse+from+String+Without+Headers] - - {Parsing from a File}[#label-Parsing+from+a+File] - - {Recipe: Parse from File with Headers}[#label-Recipe-3A+Parse+from+File+with+Headers] - - {Recipe: Parse from File Without Headers}[#label-Recipe-3A+Parse+from+File+Without+Headers] - - {Parsing from an IO Stream}[#label-Parsing+from+an+IO+Stream] - - {Recipe: Parse from IO Stream with Headers}[#label-Recipe-3A+Parse+from+IO+Stream+with+Headers] - - {Recipe: Parse from IO Stream Without Headers}[#label-Recipe-3A+Parse+from+IO+Stream+Without+Headers] -- {RFC 4180 Compliance}[#label-RFC+4180+Compliance] - - {Row Separator}[#label-Row+Separator] - - {Recipe: Handle Compliant Row Separator}[#label-Recipe-3A+Handle+Compliant+Row+Separator] - - {Recipe: Handle Non-Compliant Row Separator}[#label-Recipe-3A+Handle+Non-Compliant+Row+Separator] - - {Column Separator}[#label-Column+Separator] - - {Recipe: Handle Compliant Column Separator}[#label-Recipe-3A+Handle+Compliant+Column+Separator] - - {Recipe: Handle Non-Compliant Column Separator}[#label-Recipe-3A+Handle+Non-Compliant+Column+Separator] - - {Quote Character}[#label-Quote+Character] - - {Recipe: Handle Compliant Quote Character}[#label-Recipe-3A+Handle+Compliant+Quote+Character] - - {Recipe: Handle Non-Compliant Quote Character}[#label-Recipe-3A+Handle+Non-Compliant+Quote+Character] - - {Recipe: Allow Liberal Parsing}[#label-Recipe-3A+Allow+Liberal+Parsing] -- {Special Handling}[#label-Special+Handling] - - {Special Line Handling}[#label-Special+Line+Handling] - - {Recipe: Ignore Blank Lines}[#label-Recipe-3A+Ignore+Blank+Lines] - - {Recipe: Ignore Selected Lines}[#label-Recipe-3A+Ignore+Selected+Lines] - - {Special Field Handling}[#label-Special+Field+Handling] - - {Recipe: Strip Fields}[#label-Recipe-3A+Strip+Fields] - - {Recipe: Handle Null Fields}[#label-Recipe-3A+Handle+Null+Fields] - - {Recipe: Handle Empty Fields}[#label-Recipe-3A+Handle+Empty+Fields] -- {Converting Fields}[#label-Converting+Fields] - - {Converting Fields to Objects}[#label-Converting+Fields+to+Objects] - - {Recipe: Convert Fields to Integers}[#label-Recipe-3A+Convert+Fields+to+Integers] - - {Recipe: Convert Fields to Floats}[#label-Recipe-3A+Convert+Fields+to+Floats] - - {Recipe: Convert Fields to Numerics}[#label-Recipe-3A+Convert+Fields+to+Numerics] - - {Recipe: Convert Fields to Dates}[#label-Recipe-3A+Convert+Fields+to+Dates] - - {Recipe: Convert Fields to DateTimes}[#label-Recipe-3A+Convert+Fields+to+DateTimes] - - {Recipe: Convert Assorted Fields to Objects}[#label-Recipe-3A+Convert+Assorted+Fields+to+Objects] - - {Recipe: Convert Fields to Other Objects}[#label-Recipe-3A+Convert+Fields+to+Other+Objects] - - {Recipe: Filter Field Strings}[#label-Recipe-3A+Filter+Field+Strings] - - {Recipe: Register Field Converters}[#label-Recipe-3A+Register+Field+Converters] - - {Using Multiple Field Converters}[#label-Using+Multiple+Field+Converters] - - {Recipe: Specify Multiple Field Converters in Option :converters}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+Option+-3Aconverters] - - {Recipe: Specify Multiple Field Converters in a Custom Converter List}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+a+Custom+Converter+List] -- {Converting Headers}[#label-Converting+Headers] - - {Recipe: Convert Headers to Lowercase}[#label-Recipe-3A+Convert+Headers+to+Lowercase] - - {Recipe: Convert Headers to Symbols}[#label-Recipe-3A+Convert+Headers+to+Symbols] - - {Recipe: Filter Header Strings}[#label-Recipe-3A+Filter+Header+Strings] - - {Recipe: Register Header Converters}[#label-Recipe-3A+Register+Header+Converters] - - {Using Multiple Header Converters}[#label-Using+Multiple+Header+Converters] - - {Recipe: Specify Multiple Header Converters in Option :header_converters}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+Option+-3Aheader_converters] - - {Recipe: Specify Multiple Header Converters in a Custom Header Converter List}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+a+Custom+Header+Converter+List] -- {Diagnostics}[#label-Diagnostics] - - {Recipe: Capture Unconverted Fields}[#label-Recipe-3A+Capture+Unconverted+Fields] - - {Recipe: Capture Field Info}[#label-Recipe-3A+Capture+Field+Info] - -=== Source Formats - -You can parse \CSV data from a \String, from a \File (via its path), or from an \IO stream. - -==== Parsing from a \String - -You can parse \CSV data from a \String, with or without headers. - -===== Recipe: Parse from \String with Headers - -Use class method CSV.parse with option +headers+ to read a source \String all at once -(may have memory resource implications): - string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - CSV.parse(string, headers: true) # => #<CSV::Table mode:col_or_row row_count:4> - -Use instance method CSV#each with option +headers+ to read a source \String one row at a time: - CSV.new(string, headers: true).each do |row| - p row - end -Output: - #<CSV::Row "Name":"foo" "Value":"0"> - #<CSV::Row "Name":"bar" "Value":"1"> - #<CSV::Row "Name":"baz" "Value":"2"> - -===== Recipe: Parse from \String Without Headers - -Use class method CSV.parse without option +headers+ to read a source \String all at once -(may have memory resource implications): - string = "foo,0\nbar,1\nbaz,2\n" - CSV.parse(string) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Use instance method CSV#each without option +headers+ to read a source \String one row at a time: - CSV.new(string).each do |row| - p row - end -Output: - ["foo", "0"] - ["bar", "1"] - ["baz", "2"] - -==== Parsing from a \File - -You can parse \CSV data from a \File, with or without headers. - -===== Recipe: Parse from \File with Headers - -Use instance method CSV#read with option +headers+ to read a file all at once: - string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, string) - CSV.read(path, headers: true) # => #<CSV::Table mode:col_or_row row_count:4> - -Use class method CSV.foreach with option +headers+ to read one row at a time: - CSV.foreach(path, headers: true) do |row| - p row - end -Output: - #<CSV::Row "Name":"foo" "Value":"0"> - #<CSV::Row "Name":"bar" "Value":"1"> - #<CSV::Row "Name":"baz" "Value":"2"> - -===== Recipe: Parse from \File Without Headers - -Use class method CSV.read without option +headers+ to read a file all at once: - string = "foo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, string) - CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Use class method CSV.foreach without option +headers+ to read one row at a time: - CSV.foreach(path) do |row| - p row - end -Output: - ["foo", "0"] - ["bar", "1"] - ["baz", "2"] - -==== Parsing from an \IO Stream - -You can parse \CSV data from an \IO stream, with or without headers. - -===== Recipe: Parse from \IO Stream with Headers - -Use class method CSV.parse with option +headers+ to read an \IO stream all at once: - string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, string) - File.open(path) do |file| - CSV.parse(file, headers: true) - end # => #<CSV::Table mode:col_or_row row_count:4> - -Use class method CSV.foreach with option +headers+ to read one row at a time: - File.open(path) do |file| - CSV.foreach(file, headers: true) do |row| - p row - end - end -Output: - #<CSV::Row "Name":"foo" "Value":"0"> - #<CSV::Row "Name":"bar" "Value":"1"> - #<CSV::Row "Name":"baz" "Value":"2"> - -===== Recipe: Parse from \IO Stream Without Headers - -Use class method CSV.parse without option +headers+ to read an \IO stream all at once: - string = "foo,0\nbar,1\nbaz,2\n" - path = 't.csv' - File.write(path, string) - File.open(path) do |file| - CSV.parse(file) - end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -Use class method CSV.foreach without option +headers+ to read one row at a time: - File.open(path) do |file| - CSV.foreach(file) do |row| - p row - end - end -Output: - ["foo", "0"] - ["bar", "1"] - ["baz", "2"] - -=== RFC 4180 Compliance - -By default, \CSV parses data that is compliant with -{RFC 4180}[https://www.rfc-editor.org/rfc/rfc4180] -with respect to: -- Row separator. -- Column separator. -- Quote character. - -==== Row Separator - -RFC 4180 specifies the row separator CRLF (Ruby <tt>"\r\n"</tt>). - -Although the \CSV default row separator is <tt>"\n"</tt>, -the parser also by default handles row separator <tt>"\r"</tt> and the RFC-compliant <tt>"\r\n"</tt>. - -===== Recipe: Handle Compliant Row Separator - -For strict compliance, use option +:row_sep+ to specify row separator <tt>"\r\n"</tt>, -which allows the compliant row separator: - source = "foo,1\r\nbar,1\r\nbaz,2\r\n" - CSV.parse(source, row_sep: "\r\n") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] -But rejects other row separators: - source = "foo,1\nbar,1\nbaz,2\n" - CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError - source = "foo,1\rbar,1\rbaz,2\r" - CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError - source = "foo,1\n\rbar,1\n\rbaz,2\n\r" - CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError - -===== Recipe: Handle Non-Compliant Row Separator - -For data with non-compliant row separators, use option +:row_sep+. -This example source uses semicolon (<tt>";"</tt>) as its row separator: - source = "foo,1;bar,1;baz,2;" - CSV.parse(source, row_sep: ';') # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] - -==== Column Separator - -RFC 4180 specifies column separator COMMA (Ruby <tt>","</tt>). - -===== Recipe: Handle Compliant Column Separator - -Because the \CSV default comma separator is ',', -you need not specify option +:col_sep+ for compliant data: - source = "foo,1\nbar,1\nbaz,2\n" - CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] - -===== Recipe: Handle Non-Compliant Column Separator - -For data with non-compliant column separators, use option +:col_sep+. -This example source uses TAB (<tt>"\t"</tt>) as its column separator: - source = "foo,1\tbar,1\tbaz,2" - CSV.parse(source, col_sep: "\t") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] - -==== Quote Character - -RFC 4180 specifies quote character DQUOTE (Ruby <tt>"\""</tt>). - -===== Recipe: Handle Compliant Quote Character - -Because the \CSV default quote character is <tt>"\""</tt>, -you need not specify option +:quote_char+ for compliant data: - source = "\"foo\",\"1\"\n\"bar\",\"1\"\n\"baz\",\"2\"\n" - CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] - -===== Recipe: Handle Non-Compliant Quote Character - -For data with non-compliant quote characters, use option +:quote_char+. -This example source uses SQUOTE (<tt>"'"</tt>) as its quote character: - source = "'foo','1'\n'bar','1'\n'baz','2'\n" - CSV.parse(source, quote_char: "'") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] - -==== Recipe: Allow Liberal Parsing - -Use option +:liberal_parsing+ to specify that \CSV should -attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields: - source = 'is,this "three, or four",fields' - CSV.parse(source) # Raises MalformedCSVError - CSV.parse(source, liberal_parsing: true) # => [["is", "this \"three", " or four\"", "fields"]] - -=== Special Handling - -You can use parsing options to specify special handling for certain lines and fields. - -==== Special Line Handling - -Use parsing options to specify special handling for blank lines, or for other selected lines. - -===== Recipe: Ignore Blank Lines - -Use option +:skip_blanks+ to ignore blank lines: - source = <<-EOT - foo,0 - - bar,1 - baz,2 - - , - EOT - parsed = CSV.parse(source, skip_blanks: true) - parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]] - -===== Recipe: Ignore Selected Lines - -Use option +:skip_lines+ to ignore selected lines. - source = <<-EOT - # Comment - foo,0 - bar,1 - baz,2 - # Another comment - EOT - parsed = CSV.parse(source, skip_lines: /^#/) - parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] - -==== Special Field Handling - -Use parsing options to specify special handling for certain field values. - -===== Recipe: Strip Fields - -Use option +:strip+ to strip parsed field values: - CSV.parse_line(' a , b ', strip: true) # => ["a", "b"] - -===== Recipe: Handle Null Fields - -Use option +:nil_value+ to specify a value that will replace each field -that is null (no text): - CSV.parse_line('a,,b,,c', nil_value: 0) # => ["a", 0, "b", 0, "c"] - -===== Recipe: Handle Empty Fields - -Use option +:empty_value+ to specify a value that will replace each field -that is empty (\String of length 0); - CSV.parse_line('a,"",b,"",c', empty_value: 'x') # => ["a", "x", "b", "x", "c"] - -=== Converting Fields - -You can use field converters to change parsed \String fields into other objects, -or to otherwise modify the \String fields. - -==== Converting Fields to Objects - -Use field converters to change parsed \String objects into other, more specific, objects. - -There are built-in field converters for converting to objects of certain classes: -- \Float -- \Integer -- \Date -- \DateTime - -Other built-in field converters include: -- +:numeric+: converts to \Integer and \Float. -- +:all+: converts to \DateTime, \Integer, \Float. - -You can also define field converters to convert to objects of other classes. - -===== Recipe: Convert Fields to Integers - -Convert fields to \Integer objects using built-in converter +:integer+: - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, converters: :integer) - parsed.map {|row| row['Value'].class} # => [Integer, Integer, Integer] - -===== Recipe: Convert Fields to Floats - -Convert fields to \Float objects using built-in converter +:float+: - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, converters: :float) - parsed.map {|row| row['Value'].class} # => [Float, Float, Float] - -===== Recipe: Convert Fields to Numerics - -Convert fields to \Integer and \Float objects using built-in converter +:numeric+: - source = "Name,Value\nfoo,0\nbar,1.1\nbaz,2.2\n" - parsed = CSV.parse(source, headers: true, converters: :numeric) - parsed.map {|row| row['Value'].class} # => [Integer, Float, Float] - -===== Recipe: Convert Fields to Dates - -Convert fields to \Date objects using built-in converter +:date+: - source = "Name,Date\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2001-02-03\n" - parsed = CSV.parse(source, headers: true, converters: :date) - parsed.map {|row| row['Date'].class} # => [Date, Date, Date] - -===== Recipe: Convert Fields to DateTimes - -Convert fields to \DateTime objects using built-in converter +:date_time+: - source = "Name,DateTime\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2020-05-07T14:59:00-05:00\n" - parsed = CSV.parse(source, headers: true, converters: :date_time) - parsed.map {|row| row['DateTime'].class} # => [DateTime, DateTime, DateTime] - -===== Recipe: Convert Assorted Fields to Objects - -Convert assorted fields to objects using built-in converter +:all+: - source = "Type,Value\nInteger,0\nFloat,1.0\nDateTime,2001-02-04\n" - parsed = CSV.parse(source, headers: true, converters: :all) - parsed.map {|row| row['Value'].class} # => [Integer, Float, DateTime] - -===== Recipe: Convert Fields to Other Objects - -Define a custom field converter to convert \String fields into other objects. -This example defines and uses a custom field converter -that converts each column-1 value to a \Rational object: - rational_converter = proc do |field, field_context| - field_context.index == 1 ? field.to_r : field - end - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, converters: rational_converter) - parsed.map {|row| row['Value'].class} # => [Rational, Rational, Rational] - -==== Recipe: Filter Field Strings - -Define a custom field converter to modify \String fields. -This example defines and uses a custom field converter -that strips whitespace from each field value: - strip_converter = proc {|field| field.strip } - source = "Name,Value\n foo , 0 \n bar , 1 \n baz , 2 \n" - parsed = CSV.parse(source, headers: true, converters: strip_converter) - parsed['Name'] # => ["foo", "bar", "baz"] - parsed['Value'] # => ["0", "1", "2"] - -==== Recipe: Register Field Converters - -Register a custom field converter, assigning it a name; -then refer to the converter by its name: - rational_converter = proc do |field, field_context| - field_context.index == 1 ? field.to_r : field - end - CSV::Converters[:rational] = rational_converter - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, converters: :rational) - parsed['Value'] # => [(0/1), (1/1), (2/1)] - -==== Using Multiple Field Converters - -You can use multiple field converters in either of these ways: -- Specify converters in option +:converters+. -- Specify converters in a custom converter list. - -===== Recipe: Specify Multiple Field Converters in Option +:converters+ - -Apply multiple field converters by specifying them in option +:converters+: - source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n" - parsed = CSV.parse(source, headers: true, converters: [:integer, :float]) - parsed['Value'] # => [0, 1.0, 2.0] - -===== Recipe: Specify Multiple Field Converters in a Custom Converter List - -Apply multiple field converters by defining and registering a custom converter list: - strip_converter = proc {|field| field.strip } - CSV::Converters[:strip] = strip_converter - CSV::Converters[:my_converters] = [:integer, :float, :strip] - source = "Name,Value\n foo , 0 \n bar , 1.0 \n baz , 2.0 \n" - parsed = CSV.parse(source, headers: true, converters: :my_converters) - parsed['Name'] # => ["foo", "bar", "baz"] - parsed['Value'] # => [0, 1.0, 2.0] - -=== Converting Headers - -You can use header converters to modify parsed \String headers. - -Built-in header converters include: -- +:symbol+: converts \String header to \Symbol. -- +:downcase+: converts \String header to lowercase. - -You can also define header converters to otherwise modify header \Strings. - -==== Recipe: Convert Headers to Lowercase - -Convert headers to lowercase using built-in converter +:downcase+: - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, header_converters: :downcase) - parsed.headers # => ["name", "value"] - -==== Recipe: Convert Headers to Symbols - -Convert headers to downcased Symbols using built-in converter +:symbol+: - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, header_converters: :symbol) - parsed.headers # => [:name, :value] - parsed.headers.map {|header| header.class} # => [Symbol, Symbol] - -==== Recipe: Filter Header Strings - -Define a custom header converter to modify \String fields. -This example defines and uses a custom header converter -that capitalizes each header \String: - capitalize_converter = proc {|header| header.capitalize } - source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, header_converters: capitalize_converter) - parsed.headers # => ["Name", "Value"] - -==== Recipe: Register Header Converters - -Register a custom header converter, assigning it a name; -then refer to the converter by its name: - capitalize_converter = proc {|header| header.capitalize } - CSV::HeaderConverters[:capitalize] = capitalize_converter - source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, headers: true, header_converters: :capitalize) - parsed.headers # => ["Name", "Value"] - -==== Using Multiple Header Converters - -You can use multiple header converters in either of these ways: -- Specify header converters in option +:header_converters+. -- Specify header converters in a custom header converter list. - -===== Recipe: Specify Multiple Header Converters in Option :header_converters - -Apply multiple header converters by specifying them in option +:header_converters+: - source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n" - parsed = CSV.parse(source, headers: true, header_converters: [:downcase, :symbol]) - parsed.headers # => [:name, :value] - -===== Recipe: Specify Multiple Header Converters in a Custom Header Converter List - -Apply multiple header converters by defining and registering a custom header converter list: - CSV::HeaderConverters[:my_header_converters] = [:symbol, :downcase] - source = "NAME,VALUE\nfoo,0\nbar,1.0\nbaz,2.0\n" - parsed = CSV.parse(source, headers: true, header_converters: :my_header_converters) - parsed.headers # => [:name, :value] - -=== Diagnostics - -==== Recipe: Capture Unconverted Fields - -To capture unconverted field values, use option +:unconverted_fields+: - source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" - parsed = CSV.parse(source, converters: :integer, unconverted_fields: true) - parsed # => [["Name", "Value"], ["foo", 0], ["bar", 1], ["baz", 2]] - parsed.each {|row| p row.unconverted_fields } -Output: - ["Name", "Value"] - ["foo", "0"] - ["bar", "1"] - ["baz", "2"] - -==== Recipe: Capture Field Info - -To capture field info in a custom converter, accept two block arguments. -The first is the field value; the second is a +CSV::FieldInfo+ object: - strip_converter = proc {|field, field_info| p field_info; field.strip } - source = " foo , 0 \n bar , 1 \n baz , 2 \n" - parsed = CSV.parse(source, converters: strip_converter) - parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] -Output: - #<struct CSV::FieldInfo index=0, line=1, header=nil> - #<struct CSV::FieldInfo index=1, line=1, header=nil> - #<struct CSV::FieldInfo index=0, line=2, header=nil> - #<struct CSV::FieldInfo index=1, line=2, header=nil> - #<struct CSV::FieldInfo index=0, line=3, header=nil> - #<struct CSV::FieldInfo index=1, line=3, header=nil> diff --git a/doc/csv/recipes/recipes.rdoc b/doc/csv/recipes/recipes.rdoc deleted file mode 100644 index 9bf7885b1e..0000000000 --- a/doc/csv/recipes/recipes.rdoc +++ /dev/null @@ -1,6 +0,0 @@ -== Recipes for \CSV - -The recipes are specific code examples for specific tasks. See: -- {Recipes for Parsing CSV}[./parsing_rdoc.html] -- {Recipes for Generating CSV}[./generating_rdoc.html] -- {Recipes for Filtering CSV}[./filtering_rdoc.html] diff --git a/doc/distribution.md b/doc/distribution/distribution.md index 164e1b7109..e0e4ad354b 100644 --- a/doc/distribution.md +++ b/doc/distribution/distribution.md @@ -31,7 +31,7 @@ This will create several tarball in the `tmp` directory. The tarball will be nam ## Building the Tarball -See [Building Ruby](contributing/building_ruby.md). +See [Building Ruby](../contributing/building_ruby.md). ## Updating the Ruby Standard Library diff --git a/doc/windows.md b/doc/distribution/windows.md index 4ea03d0507..26a727d7ad 100644 --- a/doc/windows.md +++ b/doc/distribution/windows.md @@ -122,7 +122,7 @@ sh ../../ruby/configure -C --disable-install-doc --with-opt-dir=C:\Users\usernam 4. If you want to build from GIT source, following commands are required. * `git` - * `ruby` 3.0 or later + * `ruby` 3.1 or later You can use [scoop](https://scoop.sh/) to install them like: @@ -285,6 +285,7 @@ Any icon files(`*.ico`) in the build directory, directories specified with _icondirs_ make variable and `win32` directory under the ruby source directory will be included in DLL or executable files, according to their base names. + $(RUBY_INSTALL_NAME).ico or ruby.ico --> $(RUBY_INSTALL_NAME).exe $(RUBYW_INSTALL_NAME).ico or rubyw.ico --> $(RUBYW_INSTALL_NAME).exe the others --> $(RUBY_SO_NAME).dll diff --git a/doc/examples/files.rdoc b/doc/examples/files.rdoc index f736132770..cb400c81be 100644 --- a/doc/examples/files.rdoc +++ b/doc/examples/files.rdoc @@ -7,8 +7,8 @@ text = <<~EOT Fifth line EOT -# Russian text. -russian = "\u{442 435 441 442}" # => "тест" +# Japanese text. +japanese = 'こんにちは' # Binary data. data = "\u9990\u9991\u9992\u9993\u9994" @@ -16,8 +16,8 @@ data = "\u9990\u9991\u9992\u9993\u9994" # Text file. File.write('t.txt', text) -# File with Russian text. -File.write('t.rus', russian) +# File with Japanese text. +File.write('t.ja', japanese) # File with binary data. f = File.new('t.dat', 'wb:UTF-16') diff --git a/doc/extension.ja.rdoc b/doc/extension.ja.rdoc index 2f7856f3d4..a943f7a109 100644 --- a/doc/extension.ja.rdoc +++ b/doc/extension.ja.rdoc @@ -1,5 +1,7 @@ # extension.ja.rdoc - -*- RDoc -*- created at: Mon Aug 7 16:45:54 JST 1995 +{English}[rdoc-ref:extension.rdoc] + = Rubyの拡張ライブラリの作り方 Rubyの拡張ライブラリの作り方を説明します. @@ -335,11 +337,11 @@ rb_ary_aref(int argc, const VALUE *argv, VALUE ary) :: rb_ary_entry(VALUE ary, long offset) :: - \ary[offset] + ary\[offset] rb_ary_store(VALUE ary, long offset, VALUE obj) :: - \ary[offset] = obj + ary\[offset] = obj rb_ary_subseq(VALUE ary, long beg, long len) :: diff --git a/doc/extension.rdoc b/doc/extension.rdoc index ba59d107ab..9fc507706e 100644 --- a/doc/extension.rdoc +++ b/doc/extension.rdoc @@ -1,5 +1,7 @@ # extension.rdoc - -*- RDoc -*- created at: Mon Aug 7 16:45:54 JST 1995 +{日本語}[rdoc-ref:extension.ja.rdoc] + = Creating extension libraries for Ruby This document explains how to make extension libraries for Ruby. @@ -315,11 +317,11 @@ rb_ary_aref(int argc, const VALUE *argv, VALUE ary) :: rb_ary_entry(VALUE ary, long offset) :: - \ary[offset] + ary\[offset] rb_ary_store(VALUE ary, long offset, VALUE obj) :: - \ary[offset] = obj + ary\[offset] = obj rb_ary_subseq(VALUE ary, long beg, long len) :: @@ -759,6 +761,50 @@ RUBY_TYPED_FROZEN_SHAREABLE :: If this flag is not set, the object can not become a shareable object by Ractor.make_shareable() method. +RUBY_TYPED_EMBEDDABLE :: + + This flag indicates that Ruby may store the C struct inside the object + slot, rather than allocate it separately with +malloc+. + However, it is not a guarantee. Ruby may decide not to embed the object. + For instance if it's too large to fit into one of the available slot sizes. + + Embedding the C struct inside the object slot reduces pointer chasing, + malloc overhead, and improves sweep performance. + In some cases, it can also reduce the memory footprint of the object. + + To be embeddable, types must abide by some restrictions: + + * Pointers to the C struct, or into the C struct, MUST NOT be stored, + as they become invalid when GC compaction occurs. + It is however valid to pass and use such pointers for as long as the Ruby + object remains on the stack. + + In a sense, this is similar to the restrictions of a stack allocated struct. + + The +RB_GC_GUARD+ macro must be used to ensure the object is not moved by + compaction and not freed, unless the object is passed directly as an + argument from Ruby to C, i.e. as a parameter of a function used with + +rb_define_method+ and similar. + + * The +DATA_PTR+ and +RTYPEDDATA_DATA+ macro can't be used. + Only +RTYPEDDATA_GET_DATA+` or +TypedData_Get_Struct+ macros can be used + with embeddable objects. + Accessing `RDATA(obj)->data` or `RTYPEDDATA(obj)->data` is invalid too. + + * The +dfree+ function MUST NOT free the C struct itself. + Setting +dfree+ to +RUBY_DEFAULT_FREE+ is fine. + To support older Ruby versions without this feature, you can + conditionally free the C struct if +RUBY_TYPED_EMBEDDABLE+ isn't defined. + + * The type must have the +RUBY_TYPED_FREE_IMMEDIATELY+ flag set. + + If the embedded C struct is of variable size, +rb_data_typed_object_zalloc+ + can be used instead of +TypedData_Make_Struct+. + + See {Embedded TypedData}[rdoc-ref:@Appendix+G.+Embedded+TypedData] for a + commented example of how to use +RUBY_TYPED_EMBEDDABLE+. + + Note that this macro can raise an exception. If sval to be wrapped holds a resource needs to be released (e.g., allocated memory, handle from an external library, and etc), you will have to use rb_protect. @@ -2047,7 +2093,7 @@ the <code>*_kw</code> functions introduced in Ruby 2.7. #define rb_proc_call_with_block_kw(p, c, v, b, kw) rb_proc_call_with_block(p, c, v, b) #define rb_method_call_kw(c, v, m, kw) rb_method_call(c, v, m) #define rb_method_call_with_block_kw(c, v, m, b, kw) rb_method_call_with_block(c, v, m, b) - #define rb_eval_cmd_kwd(c, a, kw) rb_eval_cmd(c, a, 0) + #define rb_eval_cmd_kw(c, a, kw) rb_eval_cmd(c, a, 0) #endif == Appendix C. Functions available for use in extconf.rb @@ -2283,6 +2329,89 @@ To make a "Ractor-safe" C extension, we need to check the following points: making of a Ractor-safe extension. This document will be extended as they are discovered. +== Appendix G. Embedded TypedData + +Here is an example of how to use +RUBY_TYPED_EMBEDDABLE+:: + + struct my_data { + struct timespec created_at; + size_t buffer_capa; + char *buffer; + }; + + static void + my_data_free(void *ptr) + { + struct my_data *data = (struct my_data *)ptr; + + // Deliberately don't free `ptr` if it is embeddable. + // Only auxiliary memory need to be freed. + ruby_xfree(data->buffer); + } + + static size_t + my_data_size(const void *ptr) + { + const struct my_data *data = (const struct my_data *)ptr; + // We don't need to account for `sizeof(struct my_struct)` because it is embedded inside the Ruby object. + // Only auxiliary memory need to be reported. + return data->buffer_capa; + } + + static const rb_data_type_t my_type = { + .wrap_struct_name = "my_type", + .function = { + .dfree = my_data_free, + .dsize = my_data_size, + } + .flags = RUBY_TYPED_FREE_IMMEDIATELY | RUBY_TYPED_EMBEDDABLE, + }; + + static VALUE + my_data_alloc(VALUE klass) + { + struct my_data *data; + VALUE obj = TypedData_Make_Struct(klass, struct my_data, &my_type, data); + + // Is it fine to pass pointers into the embedded struct, for as long as + // the called function won't use it after the Ruby object have left the stack. + clock_gettime(CLOCK_REALTIME, &data->created_at); + data->buffer_capa = 1024; + data->buffer = ZALLOC_N(char, data->buffer_capa); + + return obj + } + + static VALUE + my_data_m_parse(VALUE klass) + { + struct my_data *data; + VALUE my_data_obj = my_data_alloc(klass); + TypedData_Get_Struct(obj, struct my_data, &my_type, data); + + // `my_data_obj` was allocated from C, `RB_GC_GUARD` must be used to + // ensure the compiler will keep its reference on the stack. + RB_GC_GUARD(my_data_obj) + } + + static VALUE + my_data_read(VALUE self) + { + struct my_data *data; + TypedData_Get_Struct(obj, struct my_data, &my_type, data); + + // `self` is received from `rb_define_method` so `RB_GC_GUARD` isn't necessary. + return rb_str_new(data->buffer, data->buffer_capa) + } + + void + Init_my_data(void) + { + VALUE cMyData = rb_define_class("MyData"); + rb_define_method(cMyData, "read", my_data_read, 0); + rb_define_singleton_method(cMyData, "parse", my_data_m_parse, 0); + } + -- Local variables: fill-column: 70 diff --git a/doc/file/filename_globbing.md b/doc/file/filename_globbing.md new file mode 100644 index 0000000000..ce4549bffe --- /dev/null +++ b/doc/file/filename_globbing.md @@ -0,0 +1,299 @@ +# Filename Globbing + +Filename globbing is a pattern-matching feature implemented in certain Ruby methods: + +- Dir.glob. +- [`Dir[]`](https://docs.ruby-lang.org/en/master/Dir.html#method-c-5B-5D). +- Pathname.glob. +- Pathname#glob. + +Each `glob` method finds filesystem entries (files and directories) +that match certain patterns. + +These methods are quite different +from [filename-matching](rdoc-ref:filename_matching.md) methods, +which match patterns against string paths, and do not access the filesystem. + +## Patterns + +These are the basic elements of filename-globbing patterns; +see the sections below for details: + +| Pattern | Meaning | Examples | +|:------------------------:|------------------------------------------|------------------------------| +| Simple string. | Matches itself. | `'LEGAL'` | +| `'*'` | Matches any sequence of characters. | `'*.txt'` | +| `'?'` | Matches any single character. | `'?.txt'` | +| `'[abc]'`,<br>`'[^abc]'` | Matches a single character from a set. | `'x[abc]y'`,<br>`'x[^abc]y'` | +| `'[a-z]`',<br>`'[^a-z]'` | Matches a single character from a range. | `'x[0-9]y'`,<br>`'x[^0-9]y'` | +| `'{ , }'` | Matches alternatives. | `'{abc,def}'` | +| `'**'` | Matches directories recursively. | `'**/test.rb'` | +| `'\'` | Escapes the next character. | `'\\*'`, `'\?'` | + +## Patterns + +### Simple \String + +A "simple string" is one that does not contain special filename-globbing patterns; +see the table above. + +A simple string matches itself: + +```ruby +Dir.glob('LEGAL') # => ["LEGAL"] +Dir.glob('LEGA') # => [] # Must be exact. +Dir.glob('legal') # => [] # Case-sensitive. +``` + +Note that case-sensitivity may _not_ be modified by flags. + +By default, the Windows short name pattern is disabled: + +```ruby +Dir.glob('PROGRAM~1') # => [] +``` + +It may be enabled by flag [`File::FNM_SHORTNAME`](#constant-filefnmshortname). + + +### Any Sequence of Characters (`'*'`) + +The asterisk pattern (`'*'`) matches any sequence of characters: + +```ruby +Dir.glob('*').take(3) # => ["BSDL", "CONTRIBUTING.md", "COPYING"] +Dir.glob('\*') # => [] # Escaped. +``` + +By default, the asterisk pattern does not match a leading period (as in a dot-file): + +```ruby +Dir.glob('*').select {|entry| entry.start_with?('.') } # => [] +``` + +That matching may be enabled by flag [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch). + +The asterisk pattern does not match across file separators: + +```ruby +Dir.glob('*.rb').select {|entry| entry.include?('/') } # => [] +``` + +Therefore flag File::FNM_PATHNAME does not affect the pattern. + +### Single Character (`'?'`) + +The question-mark pattern (`'?'`) matches any single character: + +```ruby +Dir.glob('???') # => ["GPL", "bin", "doc", "enc", "ext", "jit", "lib", "man"] +Dir.glob('??') # => ["gc"] # Only one entry with a 2-character name. +Dir.glob('?') # => [] # No entries with a 1-character name. +Dir.glob('\?') # => [] # No entries containing character '?'. +``` + +By default, the question-mark pattern does not match a leading period (as in a dot-file): + +```ruby +Dir.glob(".???") # => [".git"] +Dir.glob("????").select {|entry| entry.start_with?('.') } # => [] +``` + +That matching may be enabled by flag [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch). + +### Single Character from a Set (`'[abc]'`, `'[^abc]'`) + +Characters enclosed in square brackets define a set of characters, +any of which matches a single character: + +```ruby +Dir.glob('[efgh][abcd]') # => ["gc"] +Dir.glob('\[efgh][abcd]') # => [] # Escaped. +``` + +The character set may be negated: + +```ruby +Dir.glob('[^abcd][^efgh]') # => ["gc"] +``` + +### Single Character from a \Range (`'[a-c]'`, `'[^a-c]'`) + +A range of characters enclosed in square brackets defines a set of characters, +any of which matches a single character: + +```ruby +Dir.glob('[k-m][h-j][a-c]') # => ["lib"] +Dir.glob('\[k-m][h-j][a-c]') # => [] # Escaped. +``` + +The range may be negated: + +```ruby +Dir.glob('[^k-m][h-j][a-c]') # => [] +Dir.glob('[^a-c][^k-m][^h-j]') # => ["GPL", "doc", "enc", "ext", "jit", "lib", "man"] +``` + +### Alternatives (`'{ , }'`) + +The alternatives pattern consists of comma-separated strings +enclosed in curly braces: + +```ruby +Dir.glob('{k,L,R}*') # => ["kernel.rb", "LEGAL", "README.ja.md", "README.md"] +Dir.glob('{R,L,k}*') # => ["README.ja.md", "README.md", "LEGAL", "kernel.rb"] +# Whitespace matters: +Dir.glob('{k ,L,R}*') # => ["LEGAL", "README.ja.md", "README.md"] +``` + +### Recursive Directory Matching (`'**'`) + +The double-asterisk pattern (`'**'`) matches directories recursively: + +```ruby +# Find all entries everywhere ending with '.ja'. +Dir.glob('**/*.ja') +# => ["COPYING.ja", "doc/pty/README.expect.ja", "doc/pty/README.ja"] + +# Find all entries everywhere ending with '.rb'. +Dir.glob('**/*.rb').size # => 7574 +Dir.glob('**/*.rb').take(3) +# => ["KNOWNBUGS.rb", "array.rb", "ast.rb"] + +# Find all entries in directory 'lib' ending with `.rb'. +Dir.glob('lib/**/*.rb').size # => 626 +Dir.glob('lib/**/*.rb').take(3) +# # => +# ["lib/English.rb", +# "lib/bundled_gems.rb", +# "lib/bundler/build_metadata.rb"] + +# Find all entries in directory 'test/ruby' ending with '.rb'. +Dir.glob('test/ruby/**/*.rb').size # => 200 +Dir.glob('test/ruby/**/*.rb').take(3) +# # => +# ["test/ruby/allpairs.rb", +# "test/ruby/beginmainend.rb", +# "test/ruby/box/a.1_1_0.rb"] + +# Escaped. +Dir.glob('\**/*.rb') # => [] +``` + + +### Escape (`'\'`) + +The backslash character (`'\'`) may be used to escape any of the characters +that filename globbing treats as special: + +```ruby +Dir.glob('\*') # => [] +Dir.glob('\?') # => [] +Dir.glob('\[efgh][abcd]') # => [] +Dir.glob('\[k-m][h-j][a-c]') # => [] +Dir.glob('\**/*.rb') # => [] +``` + +## Keyword Arguments + +| Keyword | Value | Default | Meaning | +|-------------------|--------------------------|:-------:|-----------------------------------------| +| [`base`](#base) | \String path. | `'.'` | Root for searching. | +| [`flags`](#flags) | Logical OR of constants. | `0` | Modify globbing behavior. | +| [`sort`](#sort) | `true` or `false` | `true` | Whether returned array is to be sorted. | + +### `base` + +Optional keyword argument `base` (defaults to `'.'`) +specifies where in the filesystem the searching is to begin: + +```ruby +Dir.glob('*').size # => 241 +Dir.glob('*').take(3) +# => ["BSDL", "CONTRIBUTING.md", "COPYING"] + +Dir.glob('*', base: 'lib').size # => 72 +Dir.glob('*', base: 'lib').take(3) +# => ["English.gemspec", "English.rb", "bundled_gems.rb"] + +Dir.glob('*', base: 'lib/net').size # => 5 +Dir.glob('*', base: 'lib/net').take(3) +# => ["http", "http.rb", "https.rb"] +``` + +### `flags` + +Optional keyword argument `flags` (defaults to `0`) may be the bitwise OR +of the constants `File::FNM*`: + +```ruby +Dir.glob('*', flags: File::FNM_DOTMATCH | File::FNM_NOESCAPE) +``` + +These are the constants for filename-globbing patterns; +see the sections below for details: + + +| Constant | Meaning | +|-----------------------------------------------------|--------------------------------------------| +| [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch) | Make pattern `'*'` match a leading period. | +| [`File::FNM_NOESCAPE`](#constant-filefnmnoescape) | Disable escaping. | +| [`File::FNM_SHORTNAME`](#constant-filefnmshortname) | Enable short-name matching (Windows only). | + +These constants do not affect filename globbing: + +- File::FNM_CASEFOLD. +- File::FNM_EXTGLOB. +- File::FNM_PATHNAME. +- File::FNM_SYSCASE. + +#### Constant File::FNM_DOTMATCH + +By default, filename globbing does not allow patterns `'*'` and `'?'` to match a dotfile name +(i.e, an entry name beginning with a dot); +use constant [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch) +to enable the match: + +```ruby +Dir.glob('*').size # => 241 +Dir.glob('*', flags: File::FNM_DOTMATCH).size # => 256 +Dir.glob('*', flags: File::FNM_DOTMATCH).take(3) # => [".", ".dir-locals.el", ".document"] +``` + +#### Constant File::FNM_NOESCAPE + +By default filename globbing has escaping enabled; +use constant [`File::FNM_NOESCAPE`](#constant-filefnmnoescape) +to disable it: + +```ruby +Dir.glob('*').size # => 241 +Dir.glob('\*').size # => 0 +``` + +#### Constant File::FNM_SHORTNAME + +By default, Windows shortname matching is disabled; +use constant [`File::FNM_SHORTNAME`](#constant-filefnmshortname) +to enable it (on Windows only). + +Using that constant allows patterns to match short names +in filename globbing on Windows, +which can be useful for compatibility with legacy applications +that rely on these short names; +see [8.3 filename](https://en.wikipedia.org/wiki/8.3_filename). +This feature helps ensure that file operations work correctly +even when dealing with files that have long names. + +### `sort` + +Optional keyword argument `sort` (defaults to `'true'`) +specifies whether the returned array is to be sorted: + +```ruby +Dir.glob('*').take(3) +# => ["BSDL", "CONTRIBUTING.md", "COPYING"] +Dir.glob('*', sort: false).take(3) +# => ["gc.rb", "yjit.rb", "iseq.h"] +``` + diff --git a/doc/file/filename_matching.md b/doc/file/filename_matching.md new file mode 100644 index 0000000000..cf5b60bac2 --- /dev/null +++ b/doc/file/filename_matching.md @@ -0,0 +1,353 @@ +# Filename Matching + +Filename matching is a pattern-matching feature implemented in certain Ruby methods: + +- File.fnmatch. +- Pathname#fnmatch. + +Each `fnmatch` method matches a pattern against a string _path_; +these methods operate only on strings, and do not access the file system. + +These methods are quite different +from [filename-globbing](rdoc-ref:filename_globbing.md) methods, +which match patterns against string paths found in the actual file system. + +## Patterns + +These are the basic elements of filename matching patterns; +see the sections below for details: + +| Pattern | Meaning | Examples | +|:------------------------:|--------------------------------------------|------------------------------| +| Simple string. | Matches itself. | `'Rakefile'`, `'LEGAL'` | +| `'*'` | Matches any sequence of characters. | `'*.txt'` | +| `'?'` | Matches any single character. | `'?.txt'` | +| `'[abc]'`,<br>`'[^abc]'` | Matches a single character from a set. | `'x[abc]y'`,<br>`'x[^abc]y'` | +| `'[a-z]`',<br>`'[^a-z]'` | Matches a single character from a range. | `'x[0-9]y'`,<br>`'x[^0-9]y'` | +| `'\'` | Escapes the next character. | `'\\*'`, `'\?'` | + +There are two other patterns that are disabled by default: + +- Directory-like substring (`'**'`); + see [`File::FNM_PATHNAME`](#constant-filefnmpathname) below. +- Alternatives (`'{ , }'`); + see [`File::FNM_EXTGLOB`](#constant-filefnmextglob) below. + +### Simple \String + +A "simple string" is one that does not contain special filename-matching patterns; +see the table above. + +A simple string matches itself: + +```ruby +File.fnmatch('xyzzy', 'xyzzy') # => true +File.fnmatch('one_two_three', 'one_two_three') # => true +File.fnmatch('123', '123') # => true +File.fnmatch('Form 27B/6', 'Form 27B/6') # => true +File.fnmatch('bcd', 'abcde') # => false # Must be exact. +``` + +By default, the matching is case-sensitive: + +```ruby +File.fnmatch('abc', 'ABC') # => false +``` + +Case-sensitivity may be modified by flags: + +- [`File::FNM_CASEFOLD`](#constant-filefnmcasefold). +- [`File::FNM_SYSCASE`](#constant-filefnmsyscase). + +By default, the alternatives pattern is disabled: + +```rutby +File.fnmatch('R{ub,foo}y', 'Ruby') # => false +``` + +It may be enabled by flag [`File::FNM_EXTGLOB`](#constant-filefnmextglob). + +By default, the Windows short name pattern is disabled: + +```ruby +File.fnmatch('PROGRAM~1', 'Program Files') # => false +``` + +It may be enabled by flag [`File::FNM_SHORTNAME`](#constant-filefnmshortname). + +### Any Sequence of Characters (`'*'`) + +The asterisk pattern (`'*'`) matches any sequence of characters: + +```ruby +File.fnmatch('*', 'foo') # => true +File.fnmatch('*', '') # => true +File.fnmatch('*', '*') # => true +File.fnmatch('\*', 'foo') # => false # Escaped. +``` + +By default, the asterisk pattern does not match a leading period (as in a dot-file): + +```ruby +File.fnmatch('*', '.document') # => false +``` + +That matching may be enabled by flag [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch). + +By default, the asterisk pattern matches across file separators: + +```ruby +File.fnmatch('*.rb', 'lib/test.rb') # => true +``` + +That matching may be disabled by flag [`File::FNM_PATHNAME`](#constant-filefnmpathname). + +### Single Character (`'?'`) + +The question-mark pattern (`'?'`) matches any single character: + +```ruby +File.fnmatch('?', 'f') # => true +File.fnmatch("foo-?.txt", "foo-1.txt") # => true +File.fnmatch('?', 'foo') # => false +File.fnmatch('?', '') # => false +File.fnmatch('\?', 'f') # => false # Escaped. +``` + +By default, pattern `'?'` matches the file separator: + +```ruby +File.fnmatch('foo?boo', 'foo/boo') # => true +``` + +That matching may be disabled by flag [`File::FNM_PATHNAME`](#constant-filefnmpathname). + +### Single Character from a Set (`'[abc]'`, `'[^abc]'`) + +Characters enclosed in square brackets define a set of characters, +any of which matches a single character: + +```ruby +File.fnmatch('[ruby]', 'r') # => true +File.fnmatch('[ruby]', 'u') # => true +File.fnmatch('[ruby]', 'y') # => true +File.fnmatch('[ruby]', 'ruby') # => false +File.fnmatch('\[ruby]', 'r') # => false # Escaped. +``` + +The character set may be negated: + +```ruby +File.fnmatch('[^ruby]', 'r') # => false +File.fnmatch('[^ruby]', 'u') # => false +``` + +### Single Character from a \Range (`'[a-c]'`, `'[^a-c]'`) + +A range of characters enclosed in square brackets defines a set of characters, +any of which matches a single character: + +```ruby +File.fnmatch('[a-c]', 'b') # => true +File.fnmatch('[a-c]', 'd') # => false +File.fnmatch('[a-c]', 'abc') # => false +File.fnmatch('R[t-v][a-c]y', 'Ruby') # => true # Multiple ranges allowed. +File.fnmatch('\[a-c]', 'b') # => false # Escaped. +``` + +The range may be negated: + +```ruby +File.fnmatch('[^a-c]', 'b') # => false +File.fnmatch('[^a-c]', 'd') # => true +``` + +### Escape (`'\'`) + +The backslash character (`'\'`) may be used to escape any of the characters +that filename matching treats as special: + +```ruby +File.fnmatch('[a-c]', 'b') # => true +File.fnmatch('\[a-c]', 'b') # => false +File.fnmatch('[a-c\]', 'b') # => false +File.fnmatch('[a\-c]', 'b') # => false + +File.fnmatch('{a,b}', 'b', File::FNM_EXTGLOB) # => true +File.fnmatch('\{a,b}', 'b', File::FNM_EXTGLOB) # => false +File.fnmatch('{a\,b}', 'b', File::FNM_EXTGLOB) # => false +File.fnmatch('{a,b\}', 'b', File::FNM_EXTGLOB) # => false +``` + +Use a double-backslash to represent an ordinary backslash: + +```ruby +File.fnmatch('\\\\', '\\') # => true +``` + +By default escape pattern `'\'` is enabled; +it may be disabled by flag [`File::FNM_NOESCAPE`](#constant-filefnmnoescape). + +## Flags + +Optional argument `flags` (defaults to `0`) may be the bitwise OR +of the constants `File::FNM*`. + +These are the constants for filename-matching patterns; +see the sections below for details: + +| Constant | Meaning | +|-----------------------------------------------------|-------------------------------------------------------------| +| [`File::FNM_CASEFOLD`](#constant-filefnmcasefold) | Make the pattern case-insensitive. | +| [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch) | Make pattern `*` match a leading period.. | +| [`File::FNM_EXTGLOB`](#constant-filefnmextglob) | Enable alternatives in pattern. | +| [`File::FNM_NOESCAPE`](#constant-filefnmnoescape) | Disable escaping. | +| [`File::FNM_PATHNAME`](#constant-filefnmpathname) | Make patterns `'*'` and `'?'` not match the file separator. | +| [`File::FNM_SHORTNAME`](#constant-filefnmshortname) | Enable short-name matching (Windows only). | +| [`File::FNM_SYSCASE`](#constant-filefnmsyscase) | Make the pattern use OS's case sensitivity. | + + +### Constant File::FNM_CASEFOLD + +By default, filename matching is case-sensitive; +use constant [`File::FNM_CASEFOLD`](#constant-filefnmcasefold) +to make the matching case-insensitive: + +```ruby +File.fnmatch('abc', 'ABC') # => false +File.fnmatch('abc', 'ABC', File::FNM_CASEFOLD) # => true +``` + +### Constant File::FNM_DOTMATCH + +By default, filename matching does not allow pattern `'*'` to match a dotfile name +(i.e, a filename beginning with a dot); +use constant [`File::FNM_DOTMATCH`](#constant-filefnmdotmatch) +to enable the match: + +```ruby +File.fnmatch('*', '.document') # => false +File.fnmatch('*', '.document', File::FNM_DOTMATCH) # => true +``` +### Constant File::FNM_EXTGLOB + +By default, filename matching has the alternative notation disabled; +use constant [`File::FNM_EXTGLOB`](#constant-filefnmextglob) +to enable it: + +```ruby +File.fnmatch('R{ub,foo}y', 'Ruby') # => false +File.fnmatch('R{ub,foo}y', 'Ruby', File::FNM_EXTGLOB) # => true +``` + +The alternatives pattern consists of zero or more unquoted strings, +separated by commas, and enclosed in curly braces: + +```ruby +File.fnmatch('R{ub,foo,bar}y', 'Ruby') # => false # Not enabled. +File.fnmatch('R{ub,foo,bar}y', 'Ruby', File::FNM_EXTGLOB) # => true +# Whitespace matters. +File.fnmatch('R{ub ,foo,bar}y', 'Ruby', File::FNM_EXTGLOB) # => false +File.fnmatch('R{ ub,foo,bar}y', 'Ruby', File::FNM_EXTGLOB) # => false +# Special characters remain in force: +File.fnmatch('{*,?}', 'hello', File::FNM_EXTGLOB) # => true +File.fnmatch('{*ello,?}', 'hello', File::FNM_EXTGLOB) # => true +File.fnmatch('{*ELLO,?}', 'hello', File::FNM_EXTGLOB) # => false +File.fnmatch('{*ELLO,?????}', 'hello', File::FNM_EXTGLOB) # => true +# With the flag not given. +File.fnmatch('R{ub,foo,bar}y', 'Ruby') # => false +``` + +### Constant File::FNM_NOESCAPE + +By default filename matching has escaping enabled; +use constant [`File::FNM_NOESCAPE`](#constant-filefnmnoescape) +to disable it: + +```ruby +File.fnmatch('\*\?\*\*', '*?**') # => true +File.fnmatch('\*\?\*\*', '*?**', File::FNM_NOESCAPE) # => false +``` + +### Constant File::FNM_PATHNAME + +Flag [`File::FNM_PATHNAME`](#constant-filefnmpathname) affects +patterns `'**'`, `'*'`, and `'?'`. + +By default, the double-asterisk pattern (`'**'`) is equivalent to pattern `'*'`, +and matches any sequence of directory-like substrings: + +```ruby +File.fnmatch('**', 'a/b/c') # => true +File.fnmatch('*', 'a/b/c') # => true +``` + +When flag [`File::FNM_PATHNAME`](#constant-filefnmpathname) is given, +the pattern matches only one component of a file path: + +```ruby +File.fnmatch('**', 'a/b/c') # => true # Matches 'a/b/c'. +File.fnmatch('**', 'a/b/c', File::FNM_PATHNAME) # => false # Matches only 'a'. +File.fnmatch('**', 'a/b/c', File::FNM_PATHNAME) # => false # Matches only 'a/b'. +File.fnmatch('**/*', 'a/b/c', File::FNM_PATHNAME) # => true # Matches 'a/b', then 'c'. +``` + +By default, filename matching enables pattern `'*'` to match +at or across the file separator (`File::SEPARATOR`); +use constant [`File::FNM_PATHNAME`](#constant-filefnmpathname) +to disable such matching: + +```ruby +File::SEPARATOR # => "/" +File.fnmatch('*.rb', 'lib/test.rb') # => true +File.fnmatch('*.rb', 'lib/test.rb', File::FNM_PATHNAME) # => false +``` + +By default, filename matching enables pattern `'?'` to match +at or across the file separator (`File::SEPARATOR`); +use constant [`File::FNM_PATHNAME`](#constant-filefnmpathname) +to disable such matching: + +```ruby +File.fnmatch('foo?boo', 'foo/boo') # => true +File.fnmatch('foo?boo', 'foo/boo', File::FNM_PATHNAME) # => false +``` + +### Constant File::FNM_SHORTNAME + +By default, Windows shortname matching is disabled; +use constant [`File::FNM_SHORTNAME`](#constant-filefnmshortname) +to enable it (on Windows only). + +Using that constant allows patterns to match short names +in filename matching on Windows, +which can be useful for compatibility with legacy applications +that rely on these short names; +see [8.3 filename](https://en.wikipedia.org/wiki/8.3_filename). +This feature helps ensure that file operations work correctly +even when dealing with files that have long names. + +```ruby +File::FNM_SHORTNAME.zero? # => false # On Windows, not zero; may be enabled. +File::FNM_SHORTNAME.zero? # => true # Elsewhere, always zero; may not be enabled. + +File.fnmatch('PROGRAM~1', 'Program Files') # => false +# This will be true if and only if on Windows and short name 'PROGRAM~1' exists. +File.fnmatch('PROGRAM~1', 'Program Files', File::FNM_SHORTNAME) # => true +``` + +### Constant File::FNM_SYSCASE + +By default, filename matching uses Ruby's own case-sensitivity rules; +use constant [`File::FNM_SYSCASE`](#constant-filefnmsyscase) +to use the case-sensitivity rules of the underlying file system: + +```ruby +File::FNM_SYSCASE.zero? # => false # On Windows, not zero; may be enabled. +File::FNM_SYSCASE.zero? # => true # Elsewhere, always zero; may not be enabled. + +File.fnmatch('abc', 'ABC') # => false # Ruby; case-sensitive. +File.fnmatch('abc', 'ABC', File::FNM_SYSCASE) # => true # Windows; case-insensitive. +File.fnmatch('abc', 'ABC', File::FNM_SYSCASE) # => false # Linus; case-sensitive. +``` + diff --git a/doc/file/timestamps.md b/doc/file/timestamps.md new file mode 100644 index 0000000000..c8ad616567 --- /dev/null +++ b/doc/file/timestamps.md @@ -0,0 +1,83 @@ +# \File System Timestamps + +A file system entry (the name of a file or directory) +has several times (called timestamps) associated with it. + +The Ruby methods that return these timestamps (each as a Time object) +are actually returning "whatever the OS says," +and so their behaviors may vary among OS platforms. +If a platform does not support a particular timestamp, +the corresponding Ruby methods raise NotImplementedError. + +These timestamps are: + +| Name | Meaning | Changes | +|:--------------------------------:|----------------------------------------|-----------------------| +| [`birthtime`](#birth-time) | Create time. | Never. | +| [`mtime`](#modification-time) | Modification time. | When written. | +| [`atime`](#access-time) | Access time. | When read or written. | +| [`ctime`](#metadata-change-time) | Metadata-change time (or create time). | See below. | + +## Birth \Time + +The birth time for an entry is the time the entry was created. +The birth time does not change, although if the entry is deleted and re-created, +the birth time will be different. + +Each of these methods returns the birth time for an entry as a Time object: + +- File::birthtime. +- File#birthtime. +- File::Stat#birthtime. +- Pathname#birthtime. + +On Windows, each of these methods also returns the birth time: + +- File::ctime. +- File#ctime. +- File::Stat#ctime. +- Pathname#ctime. + +## Modification \Time + +The modification time for an entry is the time the entry was last modified. +The modification time is updated when the entry is written, +though some file systems may delay the update. + +Each of these methods returns the modification time for an entry as a Time object: + +- File::mtime. +- File#mtime. +- File::Stat#mtime. +- Pathname#mtime. + +## Access \Time + +The access time for an entry is the time the entry last read. +The access time is updated when the entry is read, +though some file systems may delay the update. + +Each of these methods returns the access time for an entry as a Time object: + +- File::atime. +- File#atime. +- File::Stat#atime. +- Pathname#atime. + +## Metadata-Change \Time + +The metadata-change time for an entry is the time the entry last read. +The metadata-change time is updated when the entry's metadata is changed; +changing access mode or permissions may update the metadata-change time, +though some file systems may delay the update. + +On non-Windows systems, +each of these methods returns the metadata-change time for an entry: + +- File::ctime. +- File#ctime. +- File::Stat#ctime. +- Pathname#ctime. + +On Windows, each `ctime` method returns the birth time, +not the metadata-change time. diff --git a/doc/float.rb b/doc/float.rb new file mode 100644 index 0000000000..93b57ebc4c --- /dev/null +++ b/doc/float.rb @@ -0,0 +1,128 @@ +# A \Float object stores a real number +# using the native architecture's double-precision floating-point representation. +# +# == \Float Imprecisions +# +# Some real numbers can be represented precisely as \Float objects: +# +# 37.5 # => 37.5 +# 98.75 # => 98.75 +# 12.3125 # => 12.3125 +# +# Others cannot; among these are the transcendental numbers, including: +# +# - Pi, <i>π</i>: in mathematics, a number of infinite precision: +# 3.1415926535897932384626433... (to 25 places); +# in Ruby, it is of limited precision (in this case, to 16 decimal places): +# +# Math::PI # => 3.141592653589793 +# +# - Euler's number, <i>e</i>: in mathematics, a number of infinite precision: +# 2.7182818284590452353602874... (to 25 places); +# in Ruby, it is of limited precision (in this case, to 15 decimal places): +# +# Math::E # => 2.718281828459045 +# +# Some floating-point computations in Ruby give precise results: +# +# 1.0/2 # => 0.5 +# 100.0/8 # => 12.5 +# +# Others do not: +# +# - In mathematics, 2/3 as a decimal number is an infinitely-repeating decimal: +# 0.666... (forever); +# in Ruby, +2.0/3+ is of limited precision (in this case, to 16 decimal places): +# +# 2.0/3 # => 0.6666666666666666 +# +# - In mathematics, the square root of 2 is an irrational number of infinite precision: +# 1.4142135623730950488016887... (to 25 decimal places); +# in Ruby, it is of limited precision (in this case, to 16 decimal places): +# +# Math.sqrt(2.0) # => 1.4142135623730951 +# +# - Even a simple computation can introduce imprecision: +# +# x = 0.1 + 0.2 # => 0.30000000000000004 +# y = 0.3 # => 0.3 +# x == y # => false +# +# See: +# +# - https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html +# - https://github.com/rdp/ruby_tutorials_core/wiki/Ruby-Talk-FAQ#user-content--why-are-rubys-floats-imprecise +# - https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems +# +# Note that precise storage and computation of rational numbers +# is possible using Rational objects. +# +# == Creating a \Float +# +# You can create a \Float object explicitly with: +# +# - A {floating-point literal}[rdoc-ref:syntax/literals.rdoc@Float+Literals]. +# +# You can convert certain objects to Floats with: +# +# - Method #Float. +# +# == What's Here +# +# First, what's elsewhere. Class \Float: +# +# - Inherits from +# {class Numeric}[rdoc-ref:Numeric@Whats+Here] +# and {class Object}[rdoc-ref:Object@Whats+Here]. +# - Includes {module Comparable}[rdoc-ref:Comparable@Whats+Here]. +# +# Here, class \Float provides methods for: +# +# - {Querying}[rdoc-ref:Float@Querying] +# - {Comparing}[rdoc-ref:Float@Comparing] +# - {Converting}[rdoc-ref:Float@Converting] +# +# === Querying +# +# - #finite?: Returns whether +self+ is finite. +# - #hash: Returns the integer hash code for +self+. +# - #infinite?: Returns whether +self+ is infinite. +# - #nan?: Returns whether +self+ is a NaN (not-a-number). +# +# === Comparing +# +# - #<: Returns whether +self+ is less than the given value. +# - #<=: Returns whether +self+ is less than or equal to the given value. +# - #<=>: Returns a number indicating whether +self+ is less than, equal +# to, or greater than the given value. +# - #== (aliased as #=== and #eql?): Returns whether +self+ is equal to +# the given value. +# - #>: Returns whether +self+ is greater than the given value. +# - #>=: Returns whether +self+ is greater than or equal to the given value. +# +# === Converting +# +# - #% (aliased as #modulo): Returns +self+ modulo the given value. +# - #*: Returns the product of +self+ and the given value. +# - #**: Returns the value of +self+ raised to the power of the given value. +# - #+: Returns the sum of +self+ and the given value. +# - #-: Returns the difference of +self+ and the given value. +# - #/: Returns the quotient of +self+ and the given value. +# - #ceil: Returns the smallest number greater than or equal to +self+. +# - #coerce: Returns a 2-element array containing the given value converted to a \Float +# and +self+ +# - #divmod: Returns a 2-element array containing the quotient and remainder +# results of dividing +self+ by the given value. +# - #fdiv: Returns the \Float result of dividing +self+ by the given value. +# - #floor: Returns the greatest number smaller than or equal to +self+. +# - #next_float: Returns the next-larger representable \Float. +# - #prev_float: Returns the next-smaller representable \Float. +# - #quo: Returns the quotient from dividing +self+ by the given value. +# - #round: Returns +self+ rounded to the nearest value, to a given precision. +# - #to_i (aliased as #to_int): Returns +self+ truncated to an Integer. +# - #to_s (aliased as #inspect): Returns a string containing the place-value +# representation of +self+ in the given radix. +# - #truncate: Returns +self+ truncated to a given precision. +# + + class Float; end diff --git a/doc/forwardable.rd.ja b/doc/forwardable.rd.ja deleted file mode 100644 index 53e8202513..0000000000 --- a/doc/forwardable.rd.ja +++ /dev/null @@ -1,80 +0,0 @@ - -- forwardable.rb - $Release Version: 1.1 $ - $Revision$ - -=begin -= Forwardable - -クラスに対しメソッドの委譲機能を定義します. - -== 使い方 - -クラスに対してextendして使います. - - class Foo - extend Forwardable - - def_delegators("@out", "printf", "print") - def_delegators(:@in, :gets) - def_delegator(:@contents, :[], "content_at") - end - f = Foo.new - f.printf ... - f.gets - f.content_at(1) - -== メソッド - ---- Forwardable#def_instance_delegators(accessor, *methods) - - ((|methods|))で渡されたメソッドのリストを((|accessorに|))委譲する - ようにします. - ---- Forwardable#def_instance_delegator(accessor, method, ali = method) - - ((||method|))で渡されたメソッドを((|accessor|))に委譲するようにし - ます. ((|ali|))が引数として渡されたときは, メソッド((|ali|))が呼ば - れたときには, ((|accessor|))に対し((|method|))を呼び出します. - ---- Forwardable#def_delegators(accessor, *methods) - - ((|Forwardable#def_instance_delegators|))の別名です. - ---- Forwardable#def_delegator(accessor, method, ali = method) - - ((|Forwardable#def_instance_delegator|))の別名です. - -= SingleForwardable - -オブジェクトに対し, メソッドの委譲機能を定義します. - -== 使い方 - -オブジェクトに対して((|extend|))して使います. - - g = Goo.new - g.extend SingleForwardable - g.def_delegator("@out", :puts) - g.puts ... - -== メソッド - ---- SingleForwardable#def_singleton_delegators(accessor, *methods) - - ((|methods|))で渡されたメソッドのリストを((|accessor|))に委譲する - ようにします. - ---- SingleForwardable#def_singleton_delegator(accessor, method, ali = method) - - ((|method|))で渡されたメソッドを((|accessor|))に委譲するようにしま - す. ((|ali|))が引数として渡されたときは, メソッド((|ali|))が呼ばれ - たときには, ((|accessor|))に対し((|method|))を呼び出します. - ---- SingleForwardable#def_delegators(accessor, *methods) - - ((|SingleForwardable#def_singleton_delegators|))の別名です. - ---- SingleForwardable#def_delegator(accessor, method, ali = method) - - ((|SingleForwardable#def_singleton_delegator|))の別名です. -=end diff --git a/doc/yjit/yjit.md b/doc/jit/yjit.md index 0024c780b9..d91877c30e 100644 --- a/doc/yjit/yjit.md +++ b/doc/jit/yjit.md @@ -14,7 +14,7 @@ This project is open source and falls under the same license as CRuby. <p align="center"><b> If you're using YJIT in production, please - <a href="mailto:maxime.chevalierboisvert@shopify.com">share your success stories with us!</a> + <a href="mailto:ruby@shopify.com">share your success stories with us!</a> </b></p> If you wish to learn more about the approach taken, here are some conference talks and publications: @@ -91,7 +91,10 @@ git clone https://github.com/ruby/ruby yjit cd yjit ``` -The YJIT `ruby` binary can be built with either GCC or Clang. It can be built either in dev (debug) mode or in release mode. For maximum performance, compile YJIT in release mode with GCC. More detailed build instructions are provided in the [Ruby README](https://github.com/ruby/ruby#how-to-build). +The YJIT `ruby` binary can be built with either GCC or Clang. +It can be built either in dev (debug) mode or in release mode. +For maximum performance, compile YJIT in release mode with GCC. +See [Building Ruby](rdoc-ref:contributing/building_ruby.md@building-ruby). ```sh # Configure in release mode for maximum performance, build and install @@ -318,8 +321,8 @@ Some of the counters include: * `:code_gc_count` - number of garbage collections of compiled code since process start * `:vm_insns_count` - number of instructions executed by the Ruby interpreter * `:compiled_iseq_count` - number of bytecode sequences compiled -* `:inline_code_size` - size in bytes of compiled YJIT blocks -* `:outline_code_size` - size in bytes of YJIT error-handling compiled code +* `:inline_code_size` - size in bytes of main-line machine code +* `:outlined_code_size` - size in bytes of relatively uncommonly executed machine code * `:side_exit_count` - number of side exits taken at runtime * `:total_exit_count` - number of exits, including side exits, taken at runtime * `:avg_len_in_yjit` - avg. number of instructions in compiled blocks before exiting to interpreter @@ -388,7 +391,7 @@ There are multiple test suites: - `make test-all` - `make test-spec` - `make check` runs all of the above -- `make yjit-smoke-test` runs quick checks to see that YJIT is working correctly +- `make yjit-check` runs quick checks to see that YJIT is working correctly The tests can be run in parallel like this: @@ -522,7 +525,7 @@ PERF=record ruby --yjit-perf=codegen -Iharness-perf benchmarks/lobsters/benchmar # Aggregate results perf script > /tmp/perf.txt -../ruby/misc/yjit_perf.py /tmp/perf.txt +../ruby/misc/jit_perf.py /tmp/perf.txt ``` #### Building perf with Python support @@ -540,5 +543,5 @@ make make install # Aggregate results -perf script -s ../ruby/misc/yjit_perf.py +perf script -s ../ruby/misc/jit_perf.py ``` diff --git a/doc/jit/zjit.md b/doc/jit/zjit.md new file mode 100644 index 0000000000..ebe5cc4f9b --- /dev/null +++ b/doc/jit/zjit.md @@ -0,0 +1,461 @@ +<p align="center"> + <img src="https://github.com/user-attachments/assets/27abfe03-3e96-4220-b6f1-278bb0c87684" width="400"> +</p> + +# ZJIT: ADVANCED RUBY JIT PROTOTYPE + +ZJIT is a method-based just-in-time (JIT) compiler for Ruby. It uses profile +information from the interpreter to guide optimization in the compiler. + +ZJIT is currently supported for macOS, Linux and BSD on x86-64 and arm64/aarch64 CPUs. +This project is open source and falls under the same license as CRuby. + +## Current Limitations + +ZJIT may not be suitable for certain applications. It currently only supports macOS, Linux and BSD on x86-64 and arm64/aarch64 CPUs. ZJIT will use more memory than the Ruby interpreter because the JIT compiler needs to generate machine code in memory and maintain additional state information. +You can change how much executable memory is allocated using [ZJIT's command-line options](rdoc-ref:@Command-Line+Options). + +## Contributing + +We welcome open source contributions. Feel free to open new issues to report +bugs or just to ask questions. Suggestions on how to make this document more +helpful for new contributors are most welcome. + +Bug fixes and bug reports are very valuable to us. If you find a bug in ZJIT, +it's very possible that nobody has reported it before, or that we don't have +a good reproduction for it, so please open a ticket on [the official Ruby bug +tracker][rubybugs] (or, if you don't want to make an account, [on +Shopify/ruby][shopifyruby]) and provide as much information as you can about +your configuration and a description of how you encountered the problem. List +the commands you used to run ZJIT so that we can easily reproduce the issue on +our end and investigate it. If you are able to produce a small program +reproducing the error to help us track it down, that is very much appreciated +as well. + +[rubybugs]: https://bugs.ruby-lang.org/projects/ruby-master +[shopifyruby]: https://github.com/Shopify/ruby/issues + +If you would like to contribute a large patch to ZJIT, we suggest [chatting on +Zulip][zulip] for a casual chat and then opening an issue on the [Shopify/ruby +repository][shopifyruby] so that we can have a technical discussion. A common +problem is that sometimes people submit large pull requests to open source +projects without prior communication, and we have to reject them because the +work they implemented does not fit within the design of the project. We want to +save you time and frustration, so please reach out so we can have a productive +discussion as to how you can contribute patches we will want to merge into +ZJIT. + +[zulip]: https://zjit.zulipchat.com/ + +## Build Instructions + +Refer to [Building Ruby](rdoc-ref:contributing/building_ruby.md) for general build prerequisites. +Additionally, ZJIT requires Rust 1.85.0 or later. Release builds need only `rustc`. Development +builds require `cargo` and may download dependencies. GNU Make is required. + +### For normal use + +To build ZJIT on macOS: + +```bash +./autogen.sh + +./configure \ + --enable-zjit \ + --prefix="$HOME"/.rubies/ruby-zjit \ + --disable-install-doc \ + --with-opt-dir="$(brew --prefix openssl):$(brew --prefix readline):$(brew --prefix libyaml)" + +make -j miniruby +``` + +To build ZJIT on Linux: + +```bash +./autogen.sh + +./configure \ + --enable-zjit \ + --prefix="$HOME"/.rubies/ruby-zjit \ + --disable-install-doc + +make -j miniruby +``` + +### For development + +To build ZJIT on macOS: + +```bash +./autogen.sh + +./configure \ + --enable-zjit=dev \ + --prefix="$HOME"/.rubies/ruby-zjit \ + --disable-install-doc \ + --with-opt-dir="$(brew --prefix openssl):$(brew --prefix readline):$(brew --prefix libyaml)" + +make -j miniruby +``` + +To build ZJIT on Linux: + +```bash +./autogen.sh + +./configure \ + --enable-zjit=dev \ + --prefix="$HOME"/.rubies/ruby-zjit \ + --disable-install-doc + +make -j miniruby +``` + +Note that `--enable-zjit=dev` does a lot of IR validation, which will help to catch errors early but mean compilation and warmup are significantly slower. + +The valid values for `--enable-zjit` are, from fastest to slowest: +* `--enable-zjit`: enable ZJIT in release mode for maximum performance +* `--enable-zjit=stats`: enable ZJIT in extended-stats mode +* `--enable-zjit=dev_nodebug`: enable ZJIT in development mode but without slow runtime checks +* `--enable-zjit=dev`: enable ZJIT in debug mode for development, also enables `RUBY_DEBUG` + +### Regenerate bindings + +When modifying `zjit/bindgen/src/main.rs` you need to regenerate bindings in `zjit/src/cruby_bindings.inc.rs` with: + +```bash +make zjit-bindgen +``` + +## Documentation + +### Command-Line Options + +See `ruby --help` for ZJIT-specific command-line options: + +``` +$ ruby --help +... +ZJIT options: + --zjit-mem-size=num + Max amount of memory that ZJIT can use in MiB (default: 128). + --zjit-call-threshold=num + Number of calls to trigger JIT (default: 30). + --zjit-num-profiles=num + Number of profiled calls before JIT (default: 5). + --zjit-stats[=quiet] + Enable collecting ZJIT statistics (=quiet to suppress output). + --zjit-disable Disable ZJIT for lazily enabling it with RubyVM::ZJIT.enable. + --zjit-perf Dump ISEQ symbols into /tmp/perf-{}.map for Linux perf. + --zjit-log-compiled-iseqs=path + Log compiled ISEQs to the file. The file will be truncated. + --zjit-trace-exits[=counter] + Record source on side-exit. `Counter` picks specific counter. + --zjit-trace-exits-sample-rate=num + Frequency at which to record side exits. Must be `usize`. +$ +``` + +### Source level documentation + +You can generate and open the source level documentation in your browser using: + +```bash +cargo doc --document-private-items -p zjit --open +``` + +### Graph of the Type System + +You can generate a graph of the ZJIT type hierarchy using: + +```bash +ruby zjit/src/hir_type/gen_hir_type.rb > zjit/src/hir_type/hir_type.inc.rs +dot -O -Tpdf zjit_types.dot +open zjit_types.dot.pdf +``` + +## Testing + +Note that tests link against CRuby, so directly calling `cargo test`, or `cargo nextest` should not build. All tests are instead accessed through `make`. + +### Setup + +First, ensure you have `cargo` installed. If you do not already have it, you can use [rustup.rs](https://rustup.rs/). + +Also install cargo-binstall with: + +```bash +cargo install cargo-binstall +``` + +Make sure to add `--enable-zjit=dev` when you run `configure`, then install the following tools: + +```bash +cargo binstall --secure cargo-nextest +cargo binstall --secure cargo-insta +``` + +`cargo-insta` is used for updating snapshots. `cargo-nextest` runs each test in its own process, which is valuable since CRuby only supports booting once per process, and most APIs are not thread safe. + +### Running unit tests + +For testing functionality within ZJIT, use: + +```bash +make zjit-test +``` + +You can also run a single test case by specifying the function name: + +```bash +make zjit-test ZJIT_TESTS=test_putobject +``` + +#### Snapshot Testing + +ZJIT uses [insta](https://insta.rs/) for snapshot testing within unit tests. When tests fail due to snapshot mismatches, pending snapshots are created. The test command will notify you if there are pending snapshots: + +``` +Pending snapshots found. Accept with: make zjit-test-update +``` + +To update/accept all the snapshot changes: + +```bash +make zjit-test-update +``` + +You can also review snapshot changes interactively one by one: + +```bash +cd zjit && cargo insta review +``` + +Test changes will be reviewed alongside code changes. + +### Running integration tests + +This command runs Ruby execution tests. + +```bash +make test-all TESTS="test/ruby/test_zjit.rb" +``` + +You can also run a single test case by matching the method name: + +```bash +make test-all TESTS="test/ruby/test_zjit.rb -n TestZJIT#test_putobject" +``` + +### Running all tests + +Runs both `make zjit-test` and `test/ruby/test_zjit.rb`: + +```bash +make zjit-check +``` + +## Statistics Collection + +ZJIT provides detailed statistics about JIT compilation and execution behavior. + +### Basic Stats + +Run with basic statistics printed on exit: + +```bash +./miniruby --zjit-stats script.rb +``` + +Collect stats without printing (access via `RubyVM::ZJIT.stats` in Ruby): + +```bash +./miniruby --zjit-stats=quiet script.rb +``` + +### Accessing Stats in Ruby + +```ruby +# Check if stats are enabled +if RubyVM::ZJIT.stats_enabled? + stats = RubyVM::ZJIT.stats + puts "Compiled ISEQs: #{stats[:compiled_iseq_count]}" + puts "Failed ISEQs: #{stats[:failed_iseq_count]}" + + # You can also reset stats during execution + RubyVM::ZJIT.reset_stats! +end +``` + +### Performance Ratio + +The `ratio_in_zjit` stat shows the percentage of Ruby instructions executed in JIT code vs interpreter. +This metric only appears when ZJIT is built with `--enable-zjit=stats` [or more](#build-instructions) (which enables `rb_vm_insn_count` tracking) and represents a key performance indicator for ZJIT effectiveness. + +### Tracing side exits + +`--zjit-trace-exits` records a backtrace every time compiled code takes a +side exit. The output is a [Fuchsia Trace Format](https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format) +(`.fxt`) file written to `/tmp/perfetto-{pid}.fxt`, which can be opened +directly in [Perfetto UI](https://ui.perfetto.dev/) or queried with the +[Perfetto trace processor](https://perfetto.dev/docs/quickstart/trace-analysis). + +```bash +$ ./miniruby --zjit-trace-exits -e ' +def poly(x) + x.to_s +end + +30.times { poly(1) } +30.times { poly("hello") } +30.times { poly(:sym) } +' +ZJIT: writing trace exits to /tmp/perfetto-123456.fxt +``` + +To find the hottest side-exit locations, open the `.fxt` file in +[Perfetto UI](https://ui.perfetto.dev/) and run an SQL query via the +"Query (SQL)" tab in the bottom panel. Alternatively, download +`trace_processor_shell` to query from the command line: + +```bash +curl -Lo /tmp/trace_processor_shell https://get.perfetto.dev/trace_processor +chmod +x /tmp/trace_processor_shell + +/tmp/trace_processor_shell /tmp/perfetto-123456.fxt -Q " +SELECT reason, backtrace, count(*) AS exits FROM ( + SELECT + s.id, + s.name AS reason, + group_concat(a.display_value, ' <- ') AS backtrace + FROM slice s + JOIN args a USING(arg_set_id) + WHERE s.category = 'side_exit' + GROUP BY s.id +) +GROUP BY reason, backtrace +ORDER BY exits DESC +LIMIT 30 +" +``` + +Example output: + +``` +"reason","backtrace","exits" +"GuardType(Fixnum)","Object#poly (-e) <- block in <main> (-e) <- Integer#times (<internal:numeric>) <- <main> (-e)",60 +``` + +You can also trace a specific counter with `--zjit-trace-exits=<counter_name>` +(e.g. `--zjit-trace-exits=exit_compile_error`), or downsample with +`--zjit-trace-exits-sample-rate=N` to record every N-th exit. +Enabling `--zjit-trace-exits-sample-rate=N` will automatically enable +`--zjit-trace-exits`. + +### Viewing HIR as text + +The compiled ZJIT HIR can be viewed as text using the `--zjit-dump-hir` option. However, HIR will only be generated if the call threshold is reached (default 30). By setting the threshold to 1 you can easily view the HIR for code snippets such as `1 + 1`: + +```bash +./miniruby --zjit --zjit-dump-hir --zjit-call-threshold=1 -e "1 + 1" +``` + +Note that this disables profiling. To inject interpreter profiles into ZJIT, consider running your sample code 30 times: + +```bash +./miniruby --zjit --zjit-dump-hir -e "30.times { 1 + 1 }" +``` + +### Viewing HIR in Iongraph + +Using `--zjit-dump-hir-iongraph` will dump all compiled functions into a directory named `/tmp/zjit-iongraph-{PROCESS_PID}`. Each file will be named `func_{ZJIT_FUNC_NAME}.json`. In order to use them in the Iongraph viewer, you'll need to use `jq` to collate them to a single file. An example invocation of `jq` is shown below for reference. + +`jq --slurp --null-input '.functions=inputs | .version=1' /tmp/zjit-iongraph-{PROCESS_PID}/func*.json > ~/Downloads/ion.json` + +From there, you can use https://mozilla-spidermonkey.github.io/iongraph/ to view your trace. + +### Printing ZJIT Errors + +`--zjit-debug` prints ZJIT compilation errors and other diagnostics: + +```bash +./miniruby --zjit-debug script.rb +``` + +As you might guess from the name, this option is intended mostly for ZJIT developers. + +## Useful dev commands + +To view YARV output for code snippets: + +```bash +./miniruby --dump=insns -e0 +``` + +To run code snippets with ZJIT: + +```bash +./miniruby --zjit -e0 +``` + +You can also try https://www.rubyexplorer.xyz/ to view Ruby YARV disasm output with syntax highlighting +in a way that can be easily shared with other team members. + +## Understanding Ruby Stacks + +Ruby execution involves three distinct stacks and understanding them will help you understand ZJIT's implementation: + +### 1. Native Stack + +- **Purpose**: Return addresses and saved registers. ZJIT also uses it for some C functions' argument arrays +- **Management**: OS-managed, one per native thread +- **Growth**: Downward from high addresses +- **Constants**: `NATIVE_STACK_PTR`, `NATIVE_BASE_PTR` + +### 2. Ruby VM Stack + +The Ruby VM uses a single contiguous memory region (`ec->vm_stack`) containing two sub-stacks that grow toward each other. When they meet, stack overflow occurs. + +See [doc/contributing/vm_stack_and_frames.md](rdoc-ref:contributing/vm_stack_and_frames.md) for detailed architecture and frame layout. + +**Control Frame Stack:** + +- **Stores**: Frame metadata (`rb_control_frame_t` structures) +- **Growth**: Downward from `vm_stack + size` (high addresses) +- **Constants**: `CFP` + +**Value Stack:** + +- **Stores**: YARV bytecode operands (self, arguments, locals, temporaries) +- **Growth**: Upward from `vm_stack` (low addresses) +- **Constants**: `SP` + +## ZJIT Glossary + +This glossary contains terms that are helpful for understanding ZJIT. + +Please note that some terms may appear in CRuby internals too but with different meanings. + +| Term | Definition | +| ----------------- | ------------------------------------------------------------------------------------------------------------------------------- | +| HIR | High-level Intermediate Representation. High-level (Ruby semantics) graph representation in static single-assignment (SSA) form | +| LIR | Low-level Intermediate Representation. Low-level IR used in the backend for assembly generation | +| SSA | Static Single Assignment. A form where each variable is assigned exactly once | +| `opnd` | Operand. An operand to an IR instruction (can be register, memory, immediate, etc.) | +| `dst` | Destination. The output operand of an instruction where the result is stored | +| VReg | Virtual Register. A virtual register that gets lowered to physical register or memory | +| `insn_id` | Instruction ID. An index of an instruction in a function | +| `block_id` | The index of a basic block, which effectively acts like a pointer | +| `branch` | Control flow edge between basic blocks in the compiled code | +| `cb` | Code Block. Memory region for generated machine code | +| `entry` | The starting address of compiled code for an ISEQ | +| Patch Point | Location in generated code that can be modified later in case assumptions get invalidated | +| Frame State | Captured state of the Ruby stack frame at a specific point for deoptimization | +| Guard | A run-time check that ensures assumptions are still valid | +| `invariant` | An assumption that JIT code relies on, requiring invalidation if broken | +| Deopt | Deoptimization. Process of falling back from JIT code to interpreter | +| Side Exit | Exit from JIT code back to interpreter | +| Type Lattice | Hierarchy of types used for type inference and optimization | +| Constant Folding | Optimization that evaluates constant expressions at compile time | +| RSP | x86-64 stack pointer register used for native stack operations | +| Register Spilling | Process of moving register values to memory when running out of physical registers | diff --git a/doc/language/box.md b/doc/language/box.md new file mode 100644 index 0000000000..92514b3ec9 --- /dev/null +++ b/doc/language/box.md @@ -0,0 +1,357 @@ +# Ruby Box - Ruby's in-process separation of Classes and Modules + +Ruby Box is designed to provide separated spaces in a Ruby process, to isolate application code, libraries and monkey patches. + +## Known issues + +* Experimental warning is shown when ruby starts with `RUBY_BOX=1` (specify `-W:no-experimental` option to hide it) +* Installing native extensions may fail under `RUBY_BOX=1` because of stack level too deep in extconf.rb +* `require 'active_support/core_ext'` may fail under `RUBY_BOX=1` +* Defined methods in a box may not be referred by built-in methods written in Ruby + +## TODOs + +* Add the loaded box on iseq to check if another box tries running the iseq (add a field only when VM_CHECK_MODE?) +* Assign its own TOPLEVEL_BINDING in boxes +* Fix calling `warn` in boxes to refer `$VERBOSE` and `Warning.warn` in the box +* Make an internal data container class `Ruby::Box::Entry` invisible +* More test cases about `$LOAD_PATH` and `$LOADED_FEATURES` + +## How to use + +### Enabling Ruby Box + +First, an environment variable should be set at the ruby process bootup: `RUBY_BOX=1`. +The only valid value is `1` to enable Ruby Box. Other values (or unset `RUBY_BOX`) means disabling Ruby Box. And setting the value after Ruby program starts doesn't work. + +### Using Ruby Box + +`Ruby::Box` class is the entrypoint of Ruby Box. + +```ruby +box = Ruby::Box.new +box.require('something') # or require_relative, load +``` + +The required file (either .rb or .so/.dll/.bundle) is loaded in the box (`box` here). The required/loaded files from `something` will be loaded in the box recursively. + +```ruby +# something.rb + +X = 1 + +class Something + def self.x = X + def x = ::X +end +``` + +Classes/modules, those methods and constants defined in the box can be accessed via `box` object. + +```ruby +X = 2 +p X # 2 +p ::X # 2 +p box::Something.x # 1 +p box::X # 1 +``` + +Instance methods defined in the box also run with definitions in the box. + +```ruby +s = box::Something.new + +p s.x # 1 +``` + +## Specifications + +### Ruby Box types + +There are three box types: + +* Master box +* Root box +* User boxes + +Ruby bootstrap runs in the root box, and a + +There is the root box, just a single box in a Ruby process. All builtin classes/modules are defined and run in the root box. (See "Builtin classes and modules".) + +User boxes are to run user-written programs and libraries loaded from user programs. The user's main program (specified by the `ruby` command line argument) is executed in the "main" box, which is a user box automatically created at the end of Ruby's bootstrap. The files specified with `-r` command line option will be required in the main box. + +Calling `Ruby::Box.new` creates an "optional" box (a user, non-main box), technically equal to the main box. + +Ruby also has the master box. The master box is the "master copy" of all boxes. Boxes will be created as a copy of the master box. The master box is only for the source of box copies, and no code runs in the master box. + + +``` +[master] + | + |----[root] + | + |----[main] + | + |----[user box 1] + | + |----[user box 2] + ... +``` + +### Ruby Box class and instances + +`Ruby::Box` is a class, as a subclass of `Module`. `Ruby::Box` instances are a kind of `Module`. + +### Classes and modules defined in boxes + +The classes and modules, newly defined in a box `box`, are accessible via `box`. For example, if a class `A` is defined in `box`, it is accessible as `box::A` from outside of the box. + +In the box `box`, `A` can be referred to as `A` (and `::A`). + +### Built-in classes and modules reopened in boxes + +In boxes, builtin classes/modules are visible and can be reopened. Those classes/modules can be reopened using `class` or `module` clauses, and class/module definitions can be changed. + +The changed definitions are visible only in the box. In other boxes, builtin classes/modules and those instances work without changed definitions. + +```ruby +# in foo.rb +class String + BLANK_PATTERN = /\A\s*\z/ + def blank? + self.match?(BLANK_PATTERN) + end +end + +module Foo + def self.foo = "foo" + + def self.foo_is_blank? + foo.blank? + end +end + +Foo.foo.blank? #=> false +"foo".blank? #=> false + +# in main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.foo_is_blank? #=> false (#blank? called in box) + +"foo".blank? # NoMethodError +String::BLANK_PATTERN # NameError +``` + +The main box and `box` above are different boxes, so monkey patches in main are also invisible in `box`. + +### Builtin classes and modules + +In the box context, "builtin" classes and modules are classes and modules: + +* Accessible without any `require` calls in user scripts +* Defined before any user program start running + +Hereafter, "builtin classes and modules" will be referred to as just "builtin classes". + +Builtin classes and modules are loaded in all boxes, and run in the root box. + +### Exceptional non-built-in classes/modules + +There are some exceptional classes/modules that are enabled in default, but aren't built-in classes. Those classes/modules are: + +* `RubyGems` +* `ErrorHighlight` +* `DidYouMean` +* `SyntaxSuggest` + +Those classes/modules (part of default gems) are loaded in each boxes independently. If a user box's code calls RubyGems, it calls the RubyGems inside the box itself, instead of the root box's one. + +### Builtin classes referred via box objects + +Builtin classes in a box `box` can be referred from other boxes. For example, `box::String` is a valid reference, and `String` and `box::String` are identical (`String == box::String`, `String.object_id == box::String.object_id`). + +`box::String`-like reference returns just a `String` in the current box, so its definition is `String` in the box, not in `box`. + +```ruby +# foo.rb +class String + def self.foo = "foo" +end + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::String.foo # NoMethodError +``` + +### Class instance variables, class variables, constants + +Builtin classes can have different sets of class instance variables, class variables and constants between boxes. + +```ruby +# foo.rb +class Array + @v = "foo" + @@v = "_foo_" + V = "FOO" +end + +Array.instance_variable_get(:@v) #=> "foo" +Array.class_variable_get(:@@v) #=> "_foo_" +Array.const_get(:V) #=> "FOO" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +Array.instance_variable_get(:@v) #=> nil +Array.class_variable_get(:@@v) # NameError +Array.const_get(:V) # NameError +``` + +### Global variables + +In boxes, changes on global variables are also isolated in the boxes. Changes on global variables in a box are visible/applied only in the box. + +```ruby +# foo.rb +$foo = "foo" +$VERBOSE = nil + +puts "This appears: '#{$foo}'" + +# main.rb +p $foo #=> nil +p $VERBOSE #=> false + +box = Ruby::Box.new +box.require_relative('foo') # "This appears: 'foo'" + +p $foo #=> nil +p $VERBOSE #=> false +``` + +### Top level constants + +Usually, top level constants are defined as constants of `Object`. In boxes, top level constants are constants of `Object` in the box. And the box object `box`'s constants are strictly equal to constants of `Object`. + +```ruby +# foo.rb +FOO = 100 + +FOO #=> 100 +Object::FOO #=> 100 + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::FOO #=> 100 + +FOO # NameError +Object::FOO # NameError +``` + +### Top level methods + +Top level methods are private instance methods of `Object`, in each box. + +```ruby +# foo.rb +def yay = "foo" + +class Foo + def self.say = yay +end + +Foo.say #=> "foo" +yay #=> "foo" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.say #=> "foo" + +yay # NoMethodError +``` + +There is no way to expose top level methods in boxes to others. +(See "Expose top level methods as a method of the box object" in "Discussions" section below) + +### Ruby Box scopes + +Ruby Box works in file scope. One `.rb` file runs in a single box. + +Once a file is loaded in a box `box`, all methods/procs defined/created in the file run in `box`. + +### Utility methods + +Several methods are available for trying/testing Ruby Box. + +* `Ruby::Box.current` returns the current box +* `Ruby::Box.enabled?` returns true/false to represent `RUBY_BOX=1` is specified or not +* `Ruby::Box.root` returns the root box +* `Ruby::Box.main` returns the main box +* `Ruby::Box#eval` evaluates a Ruby code (String) in the receiver box, just like calling `#load` with a file + +## Implementation details + +#### ISeq inline method/constant cache + +As described above in "Ruby Box scopes", an ".rb" file runs in a box. So method/constant resolution will be done in a box consistently. + +That means ISeq inline caches work well even with boxes. Otherwise, it's a bug. + +#### Method call global cache (gccct) + +`rb_funcall()` C function refers to the global cc cache table (gccct), and the cache key is calculated with the current box. + +So, `rb_funcall()` calls have a performance penalty when Ruby Box is enabled. + +#### Current box and loading box + +The current box is the box that the executing code is in. `Ruby::Box.current` returns the current box object. + +The loading box is an internally managed box to determine the box to load newly required/loaded files. For example, `box` is the loading box when `box.require("foo")` is called. + +## Discussions + +#### More builtin methods written in Ruby + +If Ruby Box is enabled by default, builtin methods can be written in Ruby because it can't be overridden by users' monkey patches. Builtin Ruby methods can be JIT-ed, and it could bring performance reward. + +#### Monkey patching methods called by builtin methods + +Builtin methods sometimes call other builtin methods. For example, `Hash#map` calls `Hash#each` to retrieve entries to be mapped. Without Ruby Box, Ruby users can overwrite `Hash#each` and expect the behavior change of `Hash#map` as a result. + +But with boxes, `Hash#map` runs in the root box. Ruby users can define `Hash#each` only in user boxes, so users cannot change `Hash#map`'s behavior in this case. To achieve it, users should override both`Hash#map` and `Hash#each` (or only `Hash#map`). + +It is a breaking change. + +Users can define methods using `Ruby::Box.root.eval(...)`, but it's clearly not ideal API. + +#### Assigning values to global variables used by builtin methods + +Similar to monkey patching methods, global variables assigned in a box is separated from the root box. Methods defined in the root box referring a global variable can't find the re-assigned one. + +#### Context of `$LOAD_PATH` and `$LOADED_FEATURES` + +Global variables `$LOAD_PATH` and `$LOADED_FEATURES` control `require` method behaviors. So those variables are determined by the loading box instead of the current box. + +This could potentially conflict with the user's expectations. We should find the solution. + +#### Expose top level methods as a method of the box object + +Currently, top level methods in boxes are not accessible from outside of the box. But there might be a use case to call other box's top level methods. + +#### Separate `cc_tbl` and `callable_m_tbl`, `cvc_tbl` for less classext CoW + +The fields of `rb_classext_t` contains several cache(-like) data, `cc_tbl`(callcache table), `callable_m_tbl`(table of resolved complemented methods) and `cvc_tbl`(class variable cache table). + +The classext CoW is triggered when the contents of `rb_classext_t` are changed, including `cc_tbl`, `callable_m_tbl`, and `cvc_tbl`. But those three tables are changed by just calling methods or referring class variables. So, currently, classext CoW is triggered much more times than the original expectation. + +If we can move those three tables outside of `rb_classext_t`, the number of copied `rb_classext_t` will be much less than the current implementation. diff --git a/doc/bsearch.rdoc b/doc/language/bsearch.rdoc index 90705853d7..90705853d7 100644 --- a/doc/bsearch.rdoc +++ b/doc/language/bsearch.rdoc diff --git a/doc/date/calendars.rdoc b/doc/language/calendars.rdoc index 4e6fd8334b..a2540f1c43 100644 --- a/doc/date/calendars.rdoc +++ b/doc/language/calendars.rdoc @@ -31,7 +31,7 @@ See also {a concrete example here}[rdoc-ref:DateTime@When+should+you+use+DateTim === Argument +start+ Certain methods in class \Date handle differences in the -{Julian and Gregorian calendars}[rdoc-ref:date/calendars.rdoc@Julian+and+Gregorian+Calendars] +{Julian and Gregorian calendars}[rdoc-ref:@Julian+and+Gregorian+Calendars] by accepting an optional argument +start+, whose value may be: - Date::ITALY (the default): the created date is Julian diff --git a/doc/case_mapping.rdoc b/doc/language/case_mapping.rdoc index d40155db03..d40155db03 100644 --- a/doc/case_mapping.rdoc +++ b/doc/language/case_mapping.rdoc diff --git a/doc/character_selectors.rdoc b/doc/language/character_selectors.rdoc index 47cf242be7..8bfc9b719b 100644 --- a/doc/character_selectors.rdoc +++ b/doc/language/character_selectors.rdoc @@ -14,6 +14,8 @@ Each of these instance methods accepts one or more character selectors: - String#delete!(*selectors): returns +self+ or +nil+. - String#squeeze(*selectors): returns a new string. - String#squeeze!(*selectors): returns +self+ or +nil+. +- String#strip(*selectors): returns a new string. +- String#strip!(*selectors): returns +self+ or +nil+. A character selector identifies zero or more characters in +self+ that are to be operands for the method. @@ -29,7 +31,6 @@ contained in the selector itself: 'abracadabra'.delete('abc') # => "rdr" '0123456789'.delete('258') # => "0134679" '!@#$%&*()_+'.delete('+&#') # => "!@$%*()_" - 'тест'.delete('т') # => "ес" 'こんにちは'.delete('に') # => "こんちは" Note that order and repetitions do not matter: @@ -79,6 +80,8 @@ These instance methods accept multiple character selectors: - String#delete!(*selectors): returns +self+ or +nil+. - String#squeeze(*selectors): returns a new string. - String#squeeze!(*selectors): returns +self+ or +nil+. +- String#strip(*selectors): returns a new string. +- String#strip!(*selectors): returns +self+ or +nil+. In effect, the given selectors are formed into a single selector consisting of only those characters common to _all_ of the given selectors. diff --git a/doc/dig_methods.rdoc b/doc/language/dig_methods.rdoc index 366275d451..366275d451 100644 --- a/doc/dig_methods.rdoc +++ b/doc/language/dig_methods.rdoc diff --git a/doc/encodings.rdoc b/doc/language/encodings.rdoc index bd87c38e9e..683842d3fb 100644 --- a/doc/encodings.rdoc +++ b/doc/language/encodings.rdoc @@ -138,7 +138,7 @@ A Ruby String object has an encoding that is an instance of class \Encoding. The encoding may be retrieved by method String#encoding. The default encoding for a string literal is the script encoding; -see {Script Encoding}[rdoc-ref:encodings.rdoc@Script+Encoding]. +see {Script Encoding}[rdoc-ref:@Script+Encoding]. 's'.encoding # => #<Encoding:UTF-8> @@ -147,7 +147,7 @@ The default encoding for a string created with method String.new is: - For no argument, ASCII-8BIT. - For a \String object argument, the encoding of that string. - For a string literal, the script encoding; - see {Script Encoding}[rdoc-ref:encodings.rdoc@Script+Encoding]. + see {Script Encoding}[rdoc-ref:@Script+Encoding]. In either case, any encoding may be specified: @@ -193,7 +193,7 @@ The default encoding for these, however, is: - US-ASCII, if all characters are US-ASCII. - The script encoding, otherwise; - see (Script Encoding)[rdoc-ref:encodings.rdoc@Script+Encoding]. + see (Script Encoding)[rdoc-ref:@Script+Encoding]. == Filesystem \Encoding @@ -393,7 +393,7 @@ These keyword-value pairs specify encoding options: - <tt>:replace: nil</tt> (default): Set replacement string to default value: <tt>"\uFFFD"</tt> ("�") for a Unicode encoding, <tt>'?'</tt> otherwise. - - <tt>:replace: _some_string_</tt>: Set replacement string to the given +some_string+; + - <tt>:replace: some_string</tt>: Set replacement string to the given +some_string+; overrides +:fallback+. Examples: @@ -407,12 +407,12 @@ These keyword-value pairs specify encoding options: One of these may be specified: - <tt>:fallback: nil</tt> (default): No replacement fallback. - - <tt>:fallback: _hash_like_object_</tt>: Set replacement fallback to the given - +hash_like_object+; the replacement string is <tt>_hash_like_object_[X]</tt>. - - <tt>:fallback: _method_</tt>: Set replacement fallback to the given - +method+; the replacement string is <tt>_method_(X)</tt>. - - <tt>:fallback: _proc_</tt>: Set replacement fallback to the given - +proc+; the replacement string is <tt>_proc_[X]</tt>. + - <tt>:fallback: hash_like_object</tt>: Set replacement fallback to the given + +hash_like_object+; the replacement string is <tt>hash_like_object[X]</tt>. + - <tt>:fallback: method</tt>: Set replacement fallback to the given + +method+; the replacement string is <tt>method(X)</tt>. + - <tt>:fallback: proc</tt>: Set replacement fallback to the given + +proc+; the replacement string is <tt>proc[X]</tt>. Examples: diff --git a/doc/exceptions.md b/doc/language/exceptions.md index 2c47455911..5f8f0ece69 100644 --- a/doc/exceptions.md +++ b/doc/language/exceptions.md @@ -504,18 +504,18 @@ These methods return backtrace information: By default, Ruby sets the backtrace of the exception to the location where it was raised. -The developer might adjust this by either providing +backtrace+ argument +The developer might adjust this by either providing `backtrace` argument to Kernel#raise, or using Exception#set_backtrace. Note that: -- by default, both +backtrace+ and +backtrace_locations+ represent the same backtrace; +- by default, both `backtrace` and `backtrace_locations` represent the same backtrace; - if the developer sets the backtrace by one of the above methods to an array of Thread::Backtrace::Location, they still represent the same backtrace; - if the developer sets the backtrace to a string or an array of strings: - - by Kernel#raise: +backtrace_locations+ become +nil+; - - by Exception#set_backtrace: +backtrace_locations+ preserve the original + - by Kernel#raise: `backtrace_locations` become `nil`; + - by Exception#set_backtrace: `backtrace_locations` preserve the original value; -- if the developer sets the backtrace to +nil+ by Exception#set_backtrace, - +backtrace_locations+ preserve the original value; but if the exception is then - reraised, both +backtrace+ and +backtrace_locations+ become the location of reraise. +- if the developer sets the backtrace to `nil` by Exception#set_backtrace, + `backtrace_locations` preserve the original value; but if the exception is then + reraised, both `backtrace` and `backtrace_locations` become the location of reraise. diff --git a/doc/fiber.md b/doc/language/fiber.md index d9011cce2f..d9011cce2f 100644 --- a/doc/fiber.md +++ b/doc/language/fiber.md diff --git a/doc/format_specifications.rdoc b/doc/language/format_specifications.rdoc index bdfdc24953..763470aa02 100644 --- a/doc/format_specifications.rdoc +++ b/doc/language/format_specifications.rdoc @@ -30,8 +30,9 @@ It consists of: - A leading percent character. - Zero or more _flags_ (each is a character). -- An optional _width_ _specifier_ (an integer). -- An optional _precision_ _specifier_ (a period followed by a non-negative integer). +- An optional _width_ _specifier_ (an integer, or <tt>*</tt>). +- An optional _precision_ _specifier_ (a period followed by a non-negative + integer, or <tt>*</tt>). - A _type_ _specifier_ (a character). Except for the leading percent character, @@ -45,42 +46,42 @@ The links lead to the details and examples. === \Integer Type Specifiers - +b+ or +B+: Format +argument+ as a binary integer. - See {Specifiers b and B}[rdoc-ref:format_specifications.rdoc@Specifiers+b+and+B]. + See {Specifiers b and B}[rdoc-ref:@Specifiers+b+and+B]. - +d+, +i+, or +u+ (all are identical): Format +argument+ as a decimal integer. - See {Specifier d}[rdoc-ref:format_specifications.rdoc@Specifier+d]. + See {Specifier d}[rdoc-ref:@Specifier+d]. - +o+: Format +argument+ as an octal integer. - See {Specifier o}[rdoc-ref:format_specifications.rdoc@Specifier+o]. + See {Specifier o}[rdoc-ref:@Specifier+o]. - +x+ or +X+: Format +argument+ as a hexadecimal integer. - See {Specifiers x and X}[rdoc-ref:format_specifications.rdoc@Specifiers+x+and+X]. + See {Specifiers x and X}[rdoc-ref:@Specifiers+x+and+X]. === Floating-Point Type Specifiers - +a+ or +A+: Format +argument+ as hexadecimal floating-point number. - See {Specifiers a and A}[rdoc-ref:format_specifications.rdoc@Specifiers+a+and+A]. + See {Specifiers a and A}[rdoc-ref:@Specifiers+a+and+A]. - +e+ or +E+: Format +argument+ in scientific notation. - See {Specifiers e and E}[rdoc-ref:format_specifications.rdoc@Specifiers+e+and+E]. + See {Specifiers e and E}[rdoc-ref:@Specifiers+e+and+E]. - +f+: Format +argument+ as a decimal floating-point number. - See {Specifier f}[rdoc-ref:format_specifications.rdoc@Specifier+f]. + See {Specifier f}[rdoc-ref:@Specifier+f]. - +g+ or +G+: Format +argument+ in a "general" format. - See {Specifiers g and G}[rdoc-ref:format_specifications.rdoc@Specifiers+g+and+G]. + See {Specifiers g and G}[rdoc-ref:@Specifiers+g+and+G]. === Other Type Specifiers - +c+: Format +argument+ as a character. - See {Specifier c}[rdoc-ref:format_specifications.rdoc@Specifier+c]. + See {Specifier c}[rdoc-ref:@Specifier+c]. - +p+: Format +argument+ as a string via <tt>argument.inspect</tt>. - See {Specifier p}[rdoc-ref:format_specifications.rdoc@Specifier+p]. + See {Specifier p}[rdoc-ref:@Specifier+p]. - +s+: Format +argument+ as a string via <tt>argument.to_s</tt>. - See {Specifier s}[rdoc-ref:format_specifications.rdoc@Specifier+s]. + See {Specifier s}[rdoc-ref:@Specifier+s]. - <tt>%</tt>: Format +argument+ (<tt>'%'</tt>) as a single percent character. - See {Specifier %}[rdoc-ref:format_specifications.rdoc@Specifier+-25]. + See {Specifier %}[rdoc-ref:@Specifier+-25]. == Flags The effect of a flag may vary greatly among type specifiers. These remarks are general in nature. -See {type-specific details}[rdoc-ref:format_specifications.rdoc@Type+Specifier+Details+and+Examples]. +See {type-specific details}[rdoc-ref:@Type+Specifier+Details+and+Examples]. Multiple flags may be given with single type specifier; order does not matter. @@ -125,13 +126,6 @@ Left-pad with zeros instead of spaces: sprintf('%6d', 100) # => " 100" sprintf('%06d', 100) # => "000100" -=== <tt>'*'</tt> Flag - -Use the next argument as the field width: - - sprintf('%d', 20, 14) # => "20" - sprintf('%*d', 20, 14) # => " 14" - === <tt>'n$'</tt> Flag Format the (1-based) <tt>n</tt>th argument into this field: @@ -152,6 +146,11 @@ of the formatted field: # Ignore if too small. sprintf('%1d', 100) # => "100" +If the width specifier is <tt>'*'</tt> instead of an integer, the actual minimum +width is taken from the argument list: + + sprintf('%*d', 20, 14) # => " 14" + == Precision Specifier A precision specifier is a decimal point followed by zero or more @@ -169,7 +168,7 @@ if the integer is longer than the precision: sprintf('%.d', 0) # => "" sprintf('%.0d', 0) # => "" -For the +a+/+A+, +e+/+E+, +f+/+F+ specifiers, the precision specifies +For the +a+/+A+, +e+/+E+, +f+ specifiers, the precision specifies the number of digits after the decimal point to be written: sprintf('%.2f', 3.14159) # => "3.14" @@ -194,6 +193,11 @@ the number of characters to write: sprintf('%s', Time.now) # => "2022-05-04 11:59:16 -0400" sprintf('%.10s', Time.now) # => "2022-05-04" +If the precision specifier is <tt>'*'</tt> instead of a non-negative integer, +the actual precision is taken from the argument list: + + sprintf('%.*d', 20, 1) # => "00000000000000000001" + == Type Specifier Details and Examples === Specifiers +a+ and +A+ diff --git a/doc/globals.md b/doc/language/globals.md index b9315f5ff9..0f6b632a08 100644 --- a/doc/globals.md +++ b/doc/language/globals.md @@ -10,74 +10,76 @@ To use the module: require 'English' ``` -## Summary +## In Brief ### Exceptions -| Variable | English | Contains | -|-------------|-------------------|----------------------------------------------------| -| `$!` | `$ERROR_INFO` | Exception object; set by Kernel#raise. | -| `$@` | `$ERROR_POSITION` | Array of backtrace positions; set by Kernel#raise. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:--------:|:-----------------:|----------------------------------------|:---------:|:---------:|--------------| +| `$!` | `$ERROR_INFO` | \Exception object or `nil` | `nil` | Yes | Kernel#raise | +| `$@` | `$ERROR_POSITION` | \Array of backtrace positions or `nil` | `nil` | Yes | Kernel#raise | -### Pattern Matching +### Matched \Data -| Variable | English | Contains | -|---------------|---------------------|--------------------------------------------------| -| `$~` | `$LAST_MATCH_INFO` | MatchData object; set by matcher method. | -| `$&` | `$MATCH` | Matched substring; set by matcher method. | -| `` $` `` | `$PRE_MATCH` | Substring left of match; set by matcher method. | -| `$'` | `$POST_MATCH` | Substring right of match; set by matcher method. | -| `$+` | `$LAST_PAREN_MATCH` | Last group matched; set by matcher method. | -| `$1` | | First group matched; set by matcher method. | -| `$2` | | Second group matched; set by matcher method. | -| <tt>$_n_</tt> | | <i>n</i>th group matched; set by matcher method. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:---------:|:-------------------:|-----------------------------------|:---------:|:---------:|-----------------| +| `$~` | `$LAST_MATCH_INFO` | \MatchData object or `nil` | `nil` | No | Matcher methods | +| `$&` | `$MATCH` | Matched substring or `nil` | `nil` | No | Matcher methods | +| `` $` `` | `$PRE_MATCH` | Substring left of match or `nil` | `nil` | No | Matcher methods | +| `$'` | `$POST_MATCH` | Substring right of match or `nil` | `nil` | No | Matcher methods | +| `$+` | `$LAST_PAREN_MATCH` | Last group matched or `nil` | `nil` | No | Matcher methods | +| `$1` | | First group matched or `nil` | `nil` | Yes | Matcher methods | +| `$2` | | Second group matched or `nil` | `nil` | Yes | Matcher methods | +| `$n` | | <i>n</i>th group matched or `nil` | `nil` | Yes | Matcher methods | ### Separators -| Variable | English | Contains | -|----------|----------------------------|--------------------------------------------| -| `$/` | `$INPUT_RECORD_SEPARATOR` | Input record separator; initially newline. | -| `$\` | `$OUTPUT_RECORD_SEPARATOR` | Output record separator; initially `nil`. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:---------------------------:|-------------------------|:---------:|:---------:|----------| +| `$/`, `$-0` | `$INPUT_RECORD_SEPARATOR` | Input record separator | Newline | No | | +| `$\` | `$OUTPUT_RECORD_SEPARATOR` | Output record separator | `nil` | No | | ### Streams -| Variable | English | Contains | -|-----------|-----------------------------|-----------------------------------------------| -| `$stdin` | | Standard input stream; initially `STDIN`. | -| `$stdout` | | Standard input stream; initially `STDIOUT`. | -| `$stderr` | | Standard input stream; initially `STDERR`. | -| `$<` | `$DEFAULT_INPUT` | Default standard input; `ARGF` or `$stdin`. | -| `$>` | `$DEFAULT_OUTPUT` | Default standard output; initially `$stdout`. | -| `$.` | `$INPUT_LINE_NUMBER`, `$NR` | Input position of most recently read stream. | -| `$_` | `$LAST_READ_LINE` | String from most recently read stream. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:---------:|:----------------------------:|---------------------------------------------|:---------:|:---------:|----------------------| +| `$stdin` | | Standard input stream | `STDIN` | No | | +| `$stdout` | | Standard output stream | `STDOUT` | No | | +| `$stderr` | | Standard error stream | `STDERR` | No | | +| `$<` | `$DEFAULT_INPUT` | Default standard input | `ARGF` | Yes | | +| `$>` | `$DEFAULT_OUTPUT` | Default standard output | `STDOUT` | No | | +| `$.` | `$INPUT_LINE_NUMBER`, `$NR` | Input position of most recently read stream | 0 | No | Certain read methods | +| `$_` | `$LAST_READ_LINE` | String from most recently read stream | `nil` | No | Certain read methods | ### Processes -| Variable | English | Contains | -|---------------------------|-----------------------|--------------------------------------------------------| -| `$0` | | Initially, the name of the executing program. | -| `$*` | `$ARGV` | Points to the `ARGV` array. | -| `$$` | `$PROCESS_ID`, `$PID` | Process ID of the current process. | -| `$?` | `$CHILD_STATUS` | Process::Status of most recently exited child process. | -| `$LOAD_PATH`, `$:`, `$-I` | | Array of paths to be searched. | -| `$LOADED_FEATURES`, `$"` | | Array of paths to loaded files. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-------------------------:|:----------------------:|---------------------------------|:-------------:|:---------:|----------| +| `$0`, `$PROGRAM_NAME` | | Program name | Program name | No | | +| `$*` | `$ARGV` | \ARGV array | `ARGV` | Yes | | +| `$$` | `$PROCESS_ID`, `$PID` | Process id | Process PID | Yes | | +| `$?` | `$CHILD_STATUS` | Status of recently exited child | `nil` | Yes | | +| `$LOAD_PATH`, `$:`, `$-I` | | \Array of search paths | Ruby defaults | Yes | | +| `$LOADED_FEATURES`, `$"` | | \Array of load paths | Ruby defaults | Yes | | ### Debugging -| Variable | English | Contains | -|-------------|---------|--------------------------------------------------------| -| `$FILENAME` | | The value returned by method ARGF.filename. | -| `$DEBUG` | | Initially, whether option `-d` or `--debug` was given. | -| `$VERBOSE` | | Initially, whether option `-V` or `-W` was given. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:--------:|--------------------------------------------|:----------------------------:|:---------:|----------| +| `$FILENAME` | | Value returned by method `ARGF.filename` | Command-line argument or '-' | Yes | | +| `$DEBUG` | | Whether option `-d` or `--debug` was given | Command-line option | No | | +| `$VERBOSE` | | Whether option `-V` or `-W` was given | Command-line option | No | | ### Other Variables -| Variable | English | Contains | -|----------|---------|------------------------------------------------| -| `$-a` | | Whether option `-a` was given. | -| `$-i` | | Extension given with command-line option `-i`. | -| `$-l` | | Whether option `-l` was given. | -| `$-p` | | Whether option `-p` was given. | +| Variable | \English | Contains | Initially | Read-Only | Reset By | +|:-----------:|:--------:|-----------------------------------------------|:---------:|:---------:|----------| +| `$-F`, `$;` | | Separator given with command-line option `-F` | | | | +| `$-a` | | Whether option `-a` was given | | Yes | | +| `$-i` | | Extension given with command-line option `-i` | | No | | +| `$-l` | | Whether option `-l` was given | | Yes | | +| `$-p` | | Whether option `-p` was given | | Yes | | +| `$F` | | \Array of `$_` split by `$-F` | | | | ## Exceptions @@ -125,13 +127,13 @@ Output: English - `$ERROR_POSITION`. -## Pattern Matching +## Matched \Data These global variables store information about the most recent successful match in the current scope. For details and examples, -see {Regexp Global Variables}[rdoc-ref:Regexp@Global+Variables]. +see [Regexp Global Variables]. ### `$~` (\MatchData) @@ -165,7 +167,7 @@ English - `$LAST_PAREN_MATCH`. ### `$1`, `$2`, \Etc. (Matched Group) -For <tt>$_n_</tt> the <i>n</i>th group of the match. +For <tt>$n</tt> the <i>n</i>th group of the match. No \English. @@ -174,6 +176,10 @@ No \English. ### `$/` (Input Record Separator) An input record separator, initially newline. +Set by the [command-line option `-0`]. + +Setting to non-nil value by other than the command-line option is +deprecated. English - `$INPUT_RECORD_SEPARATOR`, `$RS`. @@ -183,6 +189,12 @@ Aliased as `$-0`. An output record separator, initially `nil`. +Copied from `$/` when the [command-line option `-l`] is +given. + +Setting to non-nil value by other than the command-line option is +deprecated. + English - `$OUTPUT_RECORD_SEPARATOR`, `$ORS`. ## Streams @@ -270,9 +282,9 @@ by Kernel#load and Kernel#require. Singleton method `$LOAD_PATH.resolve_feature_path(feature)` returns: -- <tt>[:rb, _path_]</tt>, where `path` is the path to the Ruby file to be +- <tt>[:rb, path]</tt>, where `path` is the path to the Ruby file to be loaded for the given `feature`. -- <tt>[:so, _path_]</tt>, where `path` is the path to the shared object file +- <tt>[:so, path]</tt>, where `path` is the path to the shared object file to be loaded for the given `feature`. - `nil` if there is no such `feature` and `path`. @@ -318,8 +330,8 @@ The value returned by method ARGF.filename. ### `$DEBUG` -Initially `true` if command-line option `-d` or `--debug` is given, -otherwise initially `false`; +Initially `true` if [command-line option `-d`] or +[`--debug`][command-line option `-d`] is given, otherwise initially `false`; may be set to either value in the running program. When `true`, prints each raised exception to `$stderr`. @@ -328,8 +340,8 @@ Aliased as `$-d`. ### `$VERBOSE` -Initially `true` if command-line option `-v` or `-w` is given, -otherwise initially `false`; +Initially `true` if [command-line option `-v`] or +[command-line option `-w`] is given, otherwise initially `false`; may be set to either value, or to `nil`, in the running program. When `true`, enables Ruby warnings. @@ -340,24 +352,40 @@ Aliased as `$-v` and `$-w`. ## Other Variables +### `$-F` + +The default field separator in String#split; must be a String or a +Regexp, and can be set with [command-line option `-F`]. + +Setting to non-nil value by other than the command-line option is +deprecated. + +Aliased as `$;`. + ### `$-a` -Whether command-line option `-a` was given; read-only. +Whether [command-line option `-a`] was given; read-only. ### `$-i` -Contains the extension given with command-line option `-i`, +Contains the extension given with [command-line option `-i`], or `nil` if none. An alias of ARGF.inplace_mode. ### `$-l` -Whether command-line option `-l` was set; read-only. +Whether [command-line option `-l`] was set; read-only. ### `$-p` -Whether command-line option `-p` was given; read-only. +Whether [command-line option `-p`] was given; read-only. + +### `$F` + +If the [command-line option `-a`] is given, the array +obtained by splitting `$_` by `$-F` is assigned at the start of each +`-l`/`-p` loop. ## Deprecated @@ -365,8 +393,6 @@ Whether command-line option `-p` was given; read-only. ### `$,` -### `$;` - # Pre-Defined Global Constants ## Summary @@ -374,7 +400,7 @@ Whether command-line option `-p` was given; read-only. ### Streams | Constant | Contains | -|----------|-------------------------| +|:--------:|-------------------------| | `STDIN` | Standard input stream. | | `STDOUT` | Standard output stream. | | `STDERR` | Standard error stream. | @@ -397,11 +423,11 @@ Whether command-line option `-p` was given; read-only. | `RUBY_ENGINE_VERSION` | String Ruby engine version. | | `RUBY_DESCRIPTION` | String Ruby description. | -### Embedded Data +### Embedded \Data -| Constant | Contains | -|----------|--------------------------------------------------------------------| -| `DATA` | File containing embedded data (lines following `__END__`, if any). | +| Constant | Contains | +|:---------------------:|-------------------------------------------------------------------------------| +| `DATA` | File containing embedded data (lines following `__END__`, if any). | ## Streams @@ -570,3 +596,16 @@ Output: "Bar\n" "Baz\n" ``` + +[command-line option `-0`]: rdoc-ref:language/options.md@-0-set--input-record-separator +[command-line option `-F`]: rdoc-ref:language/options.md@-f-set-input-field-separator +[command-line option `-a`]: rdoc-ref:language/options.md@-a-split-input-lines-into-fields +[command-line option `-d`]: rdoc-ref:language/options.md@-d-set-debug-to-true +[command-line option `-i`]: rdoc-ref:language/options.md@-i-set-argf-in-place-mode +[command-line option `-l`]: rdoc-ref:language/options.md@-l-set-output-record-separator-chop-lines +[command-line option `-p`]: rdoc-ref:language/options.md@-p--n-with-printing +[command-line option `-v`]: rdoc-ref:language/options.md@-v-print-version-set-verbose +[command-line option `-w`]: rdoc-ref:language/options.md@-w-synonym-for--w1 + +[Regexp Global Variables]: rdoc-ref:Regexp@Global+Variables + diff --git a/doc/hash_inclusion.rdoc b/doc/language/hash_inclusion.rdoc index 05c2b0932a..05c2b0932a 100644 --- a/doc/hash_inclusion.rdoc +++ b/doc/language/hash_inclusion.rdoc diff --git a/doc/implicit_conversion.rdoc b/doc/language/implicit_conversion.rdoc index e244096125..e244096125 100644 --- a/doc/implicit_conversion.rdoc +++ b/doc/language/implicit_conversion.rdoc diff --git a/doc/marshal.rdoc b/doc/language/marshal.rdoc index 740064ade6..740064ade6 100644 --- a/doc/marshal.rdoc +++ b/doc/language/marshal.rdoc diff --git a/doc/ruby/option_dump.md b/doc/language/option_dump.md index a156484bf6..328c6b52af 100644 --- a/doc/ruby/option_dump.md +++ b/doc/language/option_dump.md @@ -1,7 +1,7 @@ # Option `--dump` For other argument values, -see {Option --dump}[options_md.html#label--dump-3A+Dump+Items]. +see {Option `--dump`}[rdoc-ref:options.md@--dump+Dump+Items]. For the examples here, we use this program: @@ -18,7 +18,7 @@ The supported dump items: $ ruby --dump=insns t.rb == disasm: #<ISeq:<main>@t.rb:1 (1,0)-(1,10)> (catch: FALSE) 0000 putself ( 1)[Li] - 0001 putstring "Foo" + 0001 dupstring "Foo" 0003 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE> 0005 leave ``` diff --git a/doc/ruby/options.md b/doc/language/options.md index 943b5f967b..1329b7ca63 100644 --- a/doc/ruby/options.md +++ b/doc/language/options.md @@ -1,10 +1,3 @@ -<!--- - CAUTION - - This page on docs.ruby-lang.org is displayed in Ruby's help message (-h and --help). - Please make sure you update the link when renaming or moving this file. ----> - # Ruby Command-Line Options ## About the Examples @@ -68,15 +61,15 @@ nil See also: -- {Option -a}[rdoc-ref:ruby/options.md@a-3A+Split+Input+Lines+into+Fields]: +- [Option `-a`][-a]: Split input lines into fields. -- {Option -F}[rdoc-ref:ruby/options.md@F-3A+Set+Input+Field+Separator]: +- [Option `-F`][-F]: Set input field separator. -- {Option -l}[rdoc-ref:ruby/options.md@l-3A+Set+Output+Record+Separator-3B+Chop+Lines]: +- [Option `-l`][-l]: Set output record separator; chop lines. -- {Option -n}[rdoc-ref:ruby/options.md@n-3A+Run+Program+in+gets+Loop]: +- [Option `-n`][-n]: Run program in `gets` loop. -- {Option -p}[rdoc-ref:ruby/options.md@p-3A+-n-2C+with+Printing]: +- [Option `-p`][-p]: `-n`, with printing. ### `-a`: Split Input Lines into Fields @@ -98,15 +91,15 @@ and the default field separator is `$;`. See also: -- {Option -0}[rdoc-ref:ruby/options.md@0-3A+Set+-24-2F+-28Input+Record+Separator-29]: +- [Option `-0`][-0]: Set `$/` (input record separator). -- {Option -F}[rdoc-ref:ruby/options.md@F-3A+Set+Input+Field+Separator]: +- [Option `-F`][-F]: Set input field separator. -- {Option -l}[rdoc-ref:ruby/options.md@l-3A+Set+Output+Record+Separator-3B+Chop+Lines]: +- [Option `-l`][-l]: Set output record separator; chop lines. -- {Option -n}[rdoc-ref:ruby/options.md@n-3A+Run+Program+in+gets+Loop]: +- [Option `-n`][-n]: Run program in `gets` loop. -- {Option -p}[rdoc-ref:ruby/options.md@p-3A+-n-2C+with+Printing]: +- [Option `-p`][-p]: `-n`, with printing. ### `-c`: Check Syntax @@ -136,6 +129,24 @@ $ basename `pwd` ruby ``` +This option is accumulative; relative paths are solved from the +previous working directory. + +```console +$ ruby -C / -C usr -e 'puts Dir.pwd' +/usr +``` + +If the argument is not an existing directory, a fatal error will +occur: + +```console +$ ruby -C /nonexistent +ruby: Can't chdir to /nonexistent (fatal) +$ ruby -C /dev/null +ruby: Can't chdir to /dev/null (fatal) +``` + Whitespace between the option and its argument may be omitted. ### `-d`: Set `$DEBUG` to `true` @@ -154,7 +165,7 @@ $ ruby -d -e 'p $DEBUG' true ``` -Option `--debug` is an alias for option `-d`. +[Option `--debug`][--debug] is an alias for option `-d`. ### `-e`: Execute Given Ruby Code @@ -193,9 +204,9 @@ Whitespace between the option and its argument may be omitted. See also: -- {Option --external-encoding}[options_md.html#label--external-encoding-3A+Set+Default+External+Encoding]: +- [Option `--external-encoding`][--external-encoding]: Set default external encoding. -- {Option --internal-encoding}[options_md.html#label--internal-encoding-3A+Set+Default+Internal+Encoding]: +- [Option `--internal-encoding`][--internal-encoding]: Set default internal encoding. Option `--encoding` is an alias for option `-E`. @@ -228,15 +239,15 @@ The argument must immediately follow the option See also: -- {Option -0}[rdoc-ref:ruby/options.md@0-3A+Set+-24-2F+-28Input+Record+Separator-29]: +- [Option `-0`][-0]: Set `$/` (input record separator). -- {Option -a}[rdoc-ref:ruby/options.md@a-3A+Split+Input+Lines+into+Fields]: +- [Option `-a`][-a]: Split input lines into fields. -- {Option -l}[rdoc-ref:ruby/options.md@l-3A+Set+Output+Record+Separator-3B+Chop+Lines]: +- [Option `-l`][-l]: Set output record separator; chop lines. -- {Option -n}[rdoc-ref:ruby/options.md@n-3A+Run+Program+in+gets+Loop]: +- [Option `-n`][-n]: Run program in `gets` loop. -- {Option -p}[rdoc-ref:ruby/options.md@p-3A+-n-2C+with+Printing]: +- [Option `-p`][-p]: `-n`, with printing. ### `-h`: Print Short Help Message @@ -280,6 +291,16 @@ $ ruby -I my_lib -I some_lib -e 'p $LOAD_PATH.take(2)' $ popd ``` +This option and [option `-C`][-C] will +be applied in the order in the command line; expansion of `-I` options +are affected by preceeding `-C` options. + +```console +$ ruby -C / -Ilib -C usr -Ilib -e 'puts $:[0, 2]' +/lib +/usr/lib +``` + Whitespace between the option and its argument may be omitted. ### `-l`: Set Output Record Separator; Chop Lines @@ -314,15 +335,15 @@ $ ruby -ln -e 'p $_' desiderata.txt See also: -- {Option -0}[rdoc-ref:ruby/options.md@0-3A+Set+-24-2F+-28Input+Record+Separator-29]: +- [Option `-0`][-0]: Set `$/` (input record separator). -- {Option -a}[rdoc-ref:ruby/options.md@a-3A+Split+Input+Lines+into+Fields]: +- [Option `-a`][-a]: Split input lines into fields. -- {Option -F}[rdoc-ref:ruby/options.md@F-3A+Set+Input+Field+Separator]: +- [Option `-F`][-F]: Set input field separator. -- {Option -n}[rdoc-ref:ruby/options.md@n-3A+Run+Program+in+gets+Loop]: +- [Option `-n`][-n]: Run program in `gets` loop. -- {Option -p}[rdoc-ref:ruby/options.md@p-3A+-n-2C+with+Printing]: +- [Option `-p`][-p]: `-n`, with printing. ### `-n`: Run Program in `gets` Loop @@ -348,15 +369,15 @@ be on good terms with all persons. See also: -- {Option -0}[rdoc-ref:ruby/options.md@0-3A+Set+-24-2F+-28Input+Record+Separator-29]: +- [Option `-0`][-0]: Set `$/` (input record separator). -- {Option -a}[rdoc-ref:ruby/options.md@a-3A+Split+Input+Lines+into+Fields]: +- [Option `-a`][-a]: Split input lines into fields. -- {Option -F}[rdoc-ref:ruby/options.md@F-3A+Set+Input+Field+Separator]: +- [Option `-F`][-F]: Set input field separator. -- {Option -l}[rdoc-ref:ruby/options.md@l-3A+Set+Output+Record+Separator-3B+Chop+Lines]: +- [Option `-l`][-l]: Set output record separator; chop lines. -- {Option -p}[rdoc-ref:ruby/options.md@p-3A+-n-2C+with+Printing]: +- [Option `-p`][-p]: `-n`, with printing. ### `-p`: `-n`, with Printing @@ -377,15 +398,15 @@ be on good terms with all persons. See also: -- {Option -0}[rdoc-ref:ruby/options.md@0-3A+Set+-24-2F+-28Input+Record+Separator-29]: +- [Option `-0`][-0]: Set `$/` (input record separator). -- {Option -a}[rdoc-ref:ruby/options.md@a-3A+Split+Input+Lines+into+Fields]: +- [Option `-a`][-a]: Split input lines into fields. -- {Option -F}[rdoc-ref:ruby/options.md@F-3A+Set+Input+Field+Separator]: +- [Option `-F`][-F]: Set input field separator. -- {Option -l}[rdoc-ref:ruby/options.md@l-3A+Set+Output+Record+Separator-3B+Chop+Lines]: +- [Option `-l`][-l]: Set output record separator; chop lines. -- {Option -n}[rdoc-ref:ruby/options.md@n-3A+Run+Program+in+gets+Loop]: +- [Option `-n`][-n]: Run program in `gets` loop. ### `-r`: Require Library @@ -398,11 +419,15 @@ the option may be given more than once: $ ruby -e 'p defined?(JSON); p defined?(CSV)' nil nil -$ ruby -r CSV -r JSON -e 'p defined?(JSON); p defined?(CSV)' +$ ruby -r csv -r json -e 'p defined?(JSON); p defined?(CSV)' "constant" "constant" ``` +The library is loaded with the `Kernel#require` method, after the +other options such as [`-C`][-C], [`-I`][-I], and "custom options" by +[`-s`][-s], are applied: + Whitespace between the option and its argument may be omitted. ### `-s`: Define Global Variable @@ -411,6 +436,10 @@ Option `-s` specifies that a "custom option" is to define a global variable in the invoked Ruby program: - The custom option must appear _after_ the program name. +- If there is no script name in the command line (using {option + -e}[rdoc-ref:@-e+Execute+Given+Ruby+Code] or implicit reading from + `$stdin`), the custom options must be separated from the other + interpreter options with a `--`. - The custom option must begin with single hyphen (e.g., `-foo`), not two hyphens (e.g., `--foo`). - The name of the global variable is based on the option name: @@ -433,9 +462,6 @@ $ ruby -s t.rb -foo=baz -bar=bat ["baz", "bat"] ``` -The option may not be used with -{option -e}[rdoc-ref:ruby/options.md@e-3A+Execute+Given+Ruby+Code] - ### `-S`: Search Directories in `ENV['PATH']` Option `-S` specifies that the Ruby interpreter @@ -583,7 +609,7 @@ ruby - Copyright (C) 1993-2024 Yukihiro Matsumoto ### `--debug`: Alias for `-d` Option `--debug` is an alias for -{option -d}[rdoc-ref:ruby/options.md@d-3A+Set+-24DEBUG+to+true]. +[option `-d`][-d]. ### `--disable`: Disable Features @@ -602,7 +628,7 @@ The supported features: - `frozen-string-literal`: Freeze all string literals (default: disabled). - `jit`: JIT compiler (default: disabled). -See also {option --enable}[options_md.html#label--enable-3A+Enable+Features]. +See also [option `--enable`][--enable]. ### `--dump`: Dump Items @@ -613,18 +639,18 @@ Some of the argument values cause the command to behave as if a different option was given: - `--dump=copyright`: - Same as {option \-\-copyright}[options_md.html#label--copyright-3A+Print+Ruby+Copyright]. + Same as [option `--copyright`][--copyright]. - `--dump=help`: - Same as {option \-\-help}[options_md.html#label--help-3A+Print+Help+Message]. + Same as [option `--help`][--help]. - `--dump=syntax`: - Same as {option -c}[rdoc-ref:ruby/options.md@c-3A+Check+Syntax]. + Same as [option `-c`][-c]. - `--dump=usage`: - Same as {option -h}[rdoc-ref:ruby/options.md@h-3A+Print+Short+Help+Message]. + Same as [option `-h`][-h]. - `--dump=version`: - Same as {option \-\-version}[options_md.html#label--version-3A+Print+Ruby+Version]. + Same as [option `--version`][--version]. For other argument values and examples, -see {Option --dump}[option_dump_md.html]. +see {Option `--dump`}[rdoc-ref:option_dump.md]. ### `--enable`: Enable Features @@ -636,19 +662,19 @@ ruby --enable=gems,rubyopt t.rb ``` For the features, -see {option --disable}[options_md.html#label--disable-3A+Disable+Features]. +see [option `--disable`][--disable]. ### `--encoding`: Alias for `-E`. Option `--encoding` is an alias for -{option -E}[rdoc-ref:ruby/options.md@E-3A+Set+Default+Encodings]. +[option `-E`][-E]. ### `--external-encoding`: Set Default External \Encoding Option `--external-encoding` sets the default external encoding for the invoked Ruby program; -for values of +encoding+, -see {Encoding: Names and Aliases}[rdoc-ref:encodings.rdoc@Names+and+Aliases]. +for values of `encoding`, +see [Encoding: Names and Aliases]. ```console $ ruby -e 'puts Encoding::default_external' @@ -669,8 +695,8 @@ For a shorter help message, use option `-h`. Option `--internal-encoding` sets the default internal encoding for the invoked Ruby program; -for values of +encoding+, -see {Encoding: Names and Aliases}[rdoc-ref:encodings.rdoc@Names+and+Aliases]. +for values of `encoding`, +see [Encoding: Names and Aliases]. ```console $ ruby -e 'puts Encoding::default_internal.nil?' @@ -682,7 +708,7 @@ CESU-8 ### `--jit` Option `--jit` is an alias for option `--yjit`, which enables YJIT; -see additional YJIT options in the [YJIT documentation](rdoc-ref:yjit/yjit.md). +see additional YJIT options in the [YJIT documentation](rdoc-ref:jit/yjit.md). ### `--verbose`: Set `$VERBOSE` @@ -693,3 +719,26 @@ and disables input from `$stdin`. Option `--version` prints the version of the Ruby interpreter, then exits. +[-0]: rdoc-ref:@-0+Set++Input+Record+Separator +[-C]: rdoc-ref:@-C+Set+Working+Directory +[-E]: rdoc-ref:@-E+Set+Default+Encodings +[-F]: rdoc-ref:@-F+Set+Input+Field+Separator +[-I]: rdoc-ref:@-I+Add+to+LOADPATH +[-a]: rdoc-ref:@-a+Split+Input+Lines+into+Fields +[-c]: rdoc-ref:@-c+Check+Syntax +[-d]: rdoc-ref:@-d+Set+DEBUG+to+true +[-e]: rdoc-ref:@-e+Execute+Given+Ruby+Code +[-h]: rdoc-ref:@-h+Print+Short+Help+Message +[-l]: rdoc-ref:@-l+Set+Output+Record+Separator+Chop+Lines +[-n]: rdoc-ref:@-n+Run+Program+in+gets+Loop +[-p]: rdoc-ref:@-p+-n+with+Printing +[-s]: rdoc-ref:@-s+Define+Global+Variable +[--copyright]: rdoc-ref:@--copyright+Print+Ruby+Copyright +[--debug]: rdoc-ref:@--debug+Alias+for+-d +[--disable]: rdoc-ref:@--disable+Disable+Features +[--enable]: rdoc-ref:@--enable+Enable+Features +[--external-encoding]: rdoc-ref:@--external+encoding+Set+Default+External+Encoding +[--internal-encoding]: rdoc-ref:@--internal+encoding+Set+Default+Internal+Encoding +[--help]: rdoc-ref:@--help+Print+Help+Message +[--version]: rdoc-ref:@--version+Print+Ruby+Version +[Encoding: Names and Aliases]: rdoc-ref:encodings.rdoc@Names+and+Aliases diff --git a/doc/language/packed_data.md b/doc/language/packed_data.md new file mode 100644 index 0000000000..1b133367d6 --- /dev/null +++ b/doc/language/packed_data.md @@ -0,0 +1,886 @@ +# Packed \Data + +## Quick Reference + +These tables summarize the directives for packing and unpacking. + +### For Integers + +| Directive | Meaning | +|-----------------------|-----------------------------------------------------------------------------------------------------| +| `C` | 8-bit unsigned (`unsigned char`) | +| `S` | 16-bit unsigned, native endian (`uint16_t`) | +| `L` | 32-bit unsigned, native endian (`uint32_t`) | +| `Q` | 64-bit unsigned, native endian (`uint64_t`) | +| `J` | pointer width unsigned, native endian (`uintptr_t`) | +| | | +| `c` | 8-bit signed (`signed char`) | +| `s` | 16-bit signed, native endian (`int16_t`) | +| `l` | 32-bit signed, native endian (`int32_t`) | +| `q` | 64-bit signed, native endian (`int64_t`) | +| `j` | pointer width signed, native endian (`intptr_t`) | +| | | +| `S_` `S!` | `unsigned short`, native endian | +| `I` `I_` `I!` | `unsigned int`, native endian | +| `L_` `L!` | `unsigned long`, native endian | +| `Q_` `Q!` | `unsigned long long`, native endian; (raises ArgumentError if the platform has no `long long` type) | +| `J!` | `uintptr_t`, native endian (same with `J`) | +| | | +| `s_` `s!` | `signed short`, native endian | +| `i` `i_` `i!` | `signed int`, native endian | +| `l_` `l!` | `signed long`, native endian | +| `q_` `q!` | `signed long long`, native endian; (raises ArgumentError if the platform has no `long long` type) | +| `j!` | `intptr_t`, native endian (same with `j`) | +| | | +| `S>` `s>` `S!>` `s!>` | each the same as the directive without `>`, but big endian; `S>` is the same as `n` | +| `L>` `l>` `L!>` `l!>` | `L>` is the same as `N` | +| `I!>` `i!>` | | +| `Q>` `q>` `Q!>` `q!>` | | +| `J>` `j>` `J!>` `j!>` | | +| | | +| `S<` `s<` `S!<` `s!<` | each the same as the directive without `<`, but little endian; `S<` is the same as `v` | +| `L<` `l<` `L!<` `l!<` | `L<` is the same as `V` | +| `I!<` `i!<` | | +| `Q<` `q<` `Q!<` `q!<` | | +| `J<` `j<` `J!<` `j!<` | | +| | | +| `n` | 16-bit unsigned, network (big-endian) byte order | +| `N` | 32-bit unsigned, network (big-endian) byte order | +| `v` | 16-bit unsigned, VAX (little-endian) byte order | +| `V` | 32-bit unsigned, VAX (little-endian) byte order | +| | | +| `U` | UTF-8 character | +| `w` | BER-compressed integer | +| `R` | LEB128 encoded unsigned integer | +| `r` | LEB128 encoded signed integer | + +### For Floats + +| Directive | Meaning | +|-----------|---------------------------------------------------| +| `D` `d` | double-precision, native format | +| `F` `f` | single-precision, native format | +| `E` | double-precision, little-endian byte order | +| `e` | single-precision, little-endian byte order | +| `G` | double-precision, network (big-endian) byte order | +| `g` | single-precision, network (big-endian) byte order | + +### For Strings + +| Directive | Meaning | +|-----------|------------------------------------------------------------------------------------------------| +| `A` | arbitrary binary string (remove trailing nulls and ASCII spaces) | +| `a` | arbitrary binary string | +| `Z` | null-terminated string | +| `B` | bit string (MSB first) | +| `b` | bit string (LSB first) | +| `H` | hex string (high nibble first) | +| `h` | hex string (low nibble first) | +| `u` | UU-encoded string | +| `M` | quoted-printable, MIME encoding (see RFC2045) | +| `m` | base64 encoded string (RFC 2045) (default) (base64 encoded string (RFC 4648) if followed by 0) | +| `P` | pointer to a structure (fixed-length string) | +| `p` | pointer to a null-terminated string | + +### Additional Directives for Packing + +| Directive | Meaning | +|-----------|----------------------------| +| `@` | moves to absolute position | +| `X` | back up a byte | +| `x` | null byte | + +### Additional Directives for Unpacking + +| Directive | Meaning | +|-----------|-------------------------------------------------| +| `@` | skip to the offset given by the length argument | +| `X` | skip backward one byte | +| `x` | skip forward one byte | +| `^` | return the current offset | + +## Packing and Unpacking + +Certain Ruby core methods deal with packing and unpacking data: + +- Method Array#pack: + Formats each element in array `self` into a binary string; + returns that string. +- Method String#unpack: + Extracts data from string `self`, + forming objects that become the elements of a new array; + returns that array. +- Method String#unpack1: + Does the same, but unpacks and returns only the first extracted object. + +Each of these methods accepts a string `template`, +consisting of zero or more _directive_ characters, +each followed by zero or more _modifier_ characters. + +Examples (directive `'C'` specifies '`unsigned character`'): + +```ruby +[65].pack('C') # => "A" # One element, one directive. +[65, 66].pack('CC') # => "AB" # Two elements, two directives. +[65, 66].pack('C') # => "A" # Extra element is ignored. +[65].pack('') # => "" # No directives. +[65].pack('CC') # Extra directive raises ArgumentError. +``` + +```ruby +'A'.unpack('C') # => [65] # One character, one directive. +'AB'.unpack('CC') # => [65, 66] # Two characters, two directives. +'AB'.unpack('C') # => [65] # Extra character is ignored. +'A'.unpack('CC') # => [65, nil] # Extra directive generates nil. +'AB'.unpack('') # => [] # No directives. +``` + +The string `template` may contain any mixture of valid directives +(directive `'c'` specifies 'signed character'): + +```ruby +[65, -1].pack('cC') # => "A\xFF" +"A\xFF".unpack('cC') # => [65, 255] +``` + +The string `template` may contain whitespace (which is ignored) +and comments, each of which begins with character `'#'` +and continues up to and including the next following newline: + +```ruby +[0,1].pack(" C #foo \n C ") # => "\x00\x01" +"\0\1".unpack(" C #foo \n C ") # => [0, 1] +``` + +Any directive may be followed by either of these modifiers: + +- `'*'` - The directive is to be applied as many times as needed: + + ```ruby + [65, 66].pack('C*') # => "AB" + 'AB'.unpack('C*') # => [65, 66] + ``` + +- \Integer `count` - The directive is to be applied `count` times: + + ```ruby + [65, 66].pack('C2') # => "AB" + [65, 66].pack('C3') # Raises ArgumentError. + 'AB'.unpack('C2') # => [65, 66] + 'AB'.unpack('C3') # => [65, 66, nil] + ``` + + Note: Directives in `%w[A a Z m]` use `count` differently; + see [\String Directives][rdoc-ref:@String+Directives]. + +If elements don't fit the provided directive, only least significant bits are encoded: + +```ruby +[257].pack("C").unpack("C") # => [1] +``` + +## Packing Method + +Method Array#pack accepts optional keyword argument +`buffer` that specifies the target string (instead of a new string): + +```ruby +[65, 66].pack('C*', buffer: 'foo') # => "fooAB" +``` + +The method can accept a block: + +```ruby +# Packed string is passed to the block. +[65, 66].pack('C*') {|s| p s } # => "AB" +``` + +## Unpacking Methods + +Methods String#unpack and String#unpack1 each accept +an optional keyword argument `offset` that specifies an offset +into the string: + +```ruby +'ABC'.unpack('C*', offset: 1) # => [66, 67] +'ABC'.unpack1('C*', offset: 1) # => 66 +``` + +Both methods can accept a block: + +```ruby +# Each unpacked object is passed to the block. +ret = [] +"ABCD".unpack("C*") {|c| ret << c } +ret # => [65, 66, 67, 68] +``` + +```ruby +# The single unpacked object is passed to the block. +'AB'.unpack1('C*') {|ele| p ele } # => 65 +``` + +## \Integer Directives + +Each integer directive specifies the packing or unpacking +for one element in the input or output array. + +### 8-Bit \Integer Directives + +- `'c'` - 8-bit signed integer + (like C `signed char`): + + ```ruby + [0, 1, 255].pack('c*') # => "\x00\x01\xFF" + s = [0, 1, -1].pack('c*') # => "\x00\x01\xFF" + s.unpack('c*') # => [0, 1, -1] + ``` + +- `'C'` - 8-bit unsigned integer + (like C `unsigned char`): + + ```ruby + [0, 1, 255].pack('C*') # => "\x00\x01\xFF" + s = [0, 1, -1].pack('C*') # => "\x00\x01\xFF" + s.unpack('C*') # => [0, 1, 255] + ``` + +### 16-Bit \Integer Directives + +- `'s'` - 16-bit signed integer, native-endian + (like C `int16_t`): + + ```ruby + [513, -514].pack('s*') # => "\x01\x02\xFE\xFD" + s = [513, 65022].pack('s*') # => "\x01\x02\xFE\xFD" + s.unpack('s*') # => [513, -514] + ``` + +- `'S'` - 16-bit unsigned integer, native-endian + (like C `uint16_t`): + + ```ruby + [513, -514].pack('S*') # => "\x01\x02\xFE\xFD" + s = [513, 65022].pack('S*') # => "\x01\x02\xFE\xFD" + s.unpack('S*') # => [513, 65022] + ``` + +- `'n'` - 16-bit network integer, big-endian: + + ```ruby + s = [0, 1, -1, 32767, -32768, 65535].pack('n*') + # => "\x00\x00\x00\x01\xFF\xFF\x7F\xFF\x80\x00\xFF\xFF" + s.unpack('n*') + # => [0, 1, 65535, 32767, 32768, 65535] + ``` + +- `'v'` - 16-bit VAX integer, little-endian: + + ```ruby + s = [0, 1, -1, 32767, -32768, 65535].pack('v*') + # => "\x00\x00\x01\x00\xFF\xFF\xFF\x7F\x00\x80\xFF\xFF" + s.unpack('v*') + # => [0, 1, 65535, 32767, 32768, 65535] + ``` + +### 32-Bit \Integer Directives + +- `'l'` - 32-bit signed integer, native-endian + (like C `int32_t`): + + ```ruby + s = [67305985, -50462977].pack('l*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('l*') + # => [67305985, -50462977] + ``` + +- `'L'` - 32-bit unsigned integer, native-endian + (like C `uint32_t`): + + ```ruby + s = [67305985, 4244504319].pack('L*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('L*') + # => [67305985, 4244504319] + ``` + +- `'N'` - 32-bit network integer, big-endian: + + ```ruby + s = [0,1,-1].pack('N*') + # => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF" + s.unpack('N*') + # => [0, 1, 4294967295] + ``` + +- `'V'` - 32-bit VAX integer, little-endian: + + ```ruby + s = [0,1,-1].pack('V*') + # => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF" + s.unpack('v*') + # => [0, 0, 1, 0, 65535, 65535] + ``` + +### 64-Bit \Integer Directives + +- `'q'` - 64-bit signed integer, native-endian + (like C `int64_t`): + + ```ruby + s = [578437695752307201, -506097522914230529].pack('q*') + # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" + s.unpack('q*') + # => [578437695752307201, -506097522914230529] + ``` + +- `'Q'` - 64-bit unsigned integer, native-endian + (like C `uint64_t`): + + ```ruby + s = [578437695752307201, 17940646550795321087].pack('Q*') + # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" + s.unpack('Q*') + # => [578437695752307201, 17940646550795321087] + ``` + +### Platform-Dependent \Integer Directives + +- `'i'` - Platform-dependent width signed integer, + native-endian (like C `int`): + + ```ruby + s = [67305985, -50462977].pack('i*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('i*') + # => [67305985, -50462977] + ``` + +- `'I'` - Platform-dependent width unsigned integer, + native-endian (like C `unsigned int`): + + ```ruby + s = [67305985, -50462977].pack('I*') + # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" + s.unpack('I*') + # => [67305985, 4244504319] + ``` + +- `'j'` - Pointer-width signed integer, native-endian + (like C `intptr_t`): + + ```ruby + s = [67305985, -50462977].pack('j*') + # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF" + s.unpack('j*') + # => [67305985, -50462977] + ``` + +- `'J'` - Pointer-width unsigned integer, native-endian + (like C `uintptr_t`): + + ```ruby + s = [67305985, 4244504319].pack('J*') + # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00" + s.unpack('J*') + # => [67305985, 4244504319] + ``` + +### Other \Integer Directives + +- `'U'` - UTF-8 character: + + ```ruby + s = [4194304].pack('U*') + # => "\xF8\x90\x80\x80\x80" + s.unpack('U*') + # => [4194304] + ``` + +- `'r'` - Signed LEB128-encoded integer + (see [Signed LEB128](https://en.wikipedia.org/wiki/LEB128#Signed_LEB128)) + + ```ruby + s = [1, 127, -128, 16383, -16384].pack("r*") + # => "\x01\xFF\x00\x80\x7F\xFF\xFF\x00\x80\x80\x7F" + s.unpack('r*') + # => [1, 127, -128, 16383, -16384] + ``` + +- `'R'` - Unsigned LEB128-encoded integer + (see [Unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) + + ```ruby + s = [1, 127, 128, 16383, 16384].pack("R*") + # => "\x01\x7F\x80\x01\xFF\x7F\x80\x80\x01" + s.unpack('R*') + # => [1, 127, 128, 16383, 16384] + ``` + +- `'w'` - BER-encoded integer + (see [BER encoding](https://en.wikipedia.org/wiki/X.690#BER_encoding)): + + ```ruby + s = [1073741823].pack('w*') + # => "\x83\xFF\xFF\xFF\x7F" + s.unpack('w*') + # => [1073741823] + ``` + +### Modifiers for \Integer Directives + +For the following directives, `'!'` or `'_'` modifiers may be +suffixed as underlying platform’s native size. + +- `'i'`, `'I'` - C `int`, always native size. +- `'s'`, `'S'` - C `short`. +- `'l'`, `'L'` - C `long`. +- `'q'`, `'Q'` - C `long long`, if available. +- `'j'`, `'J'` - C `intptr_t`, always native size. + +Native size modifiers are silently ignored for always native size directives. + +The endian modifiers also may be suffixed in the directives above: + +- `'>'` - Big-endian. +- `'<'` - Little-endian. + +## \Float Directives + +Each float directive specifies the packing or unpacking +for one element in the input or output array. + +### Single-Precision \Float Directives + +- `'F'` or `'f'` - Native format: + + ```ruby + s = [3.0].pack('F') # => "\x00\x00@@" + s.unpack('F') # => [3.0] + ``` + +- `'e'` - Little-endian: + + ```ruby + s = [3.0].pack('e') # => "\x00\x00@@" + s.unpack('e') # => [3.0] + ``` + +- `'g'` - Big-endian: + + ```ruby + s = [3.0].pack('g') # => "@@\x00\x00" + s.unpack('g') # => [3.0] + ``` + +### Double-Precision \Float Directives + +- `'D'` or `'d'` - Native format: + + ```ruby + s = [3.0].pack('D') # => "\x00\x00\x00\x00\x00\x00\b@" + s.unpack('D') # => [3.0] + ``` + +- `'E'` - Little-endian: + + ```ruby + s = [3.0].pack('E') # => "\x00\x00\x00\x00\x00\x00\b@" + s.unpack('E') # => [3.0] + ``` + +- `'G'` - Big-endian: + + ```ruby + s = [3.0].pack('G') # => "@\b\x00\x00\x00\x00\x00\x00" + s.unpack('G') # => [3.0] + ``` + +A float directive may be infinity or not-a-number: + +```ruby +inf = 1.0/0.0 # => Infinity +[inf].pack('f') # => "\x00\x00\x80\x7F" +"\x00\x00\x80\x7F".unpack('f') # => [Infinity] + +nan = inf/inf # => NaN +[nan].pack('f') # => "\x00\x00\xC0\x7F" +"\x00\x00\xC0\x7F".unpack('f') # => [NaN] +``` + +## \String Directives + +Each string directive specifies the packing or unpacking +for one byte in the input or output string. + +### Binary \String Directives + +- `'A'` - Arbitrary binary string (space padded; count is width); + `nil` is treated as the empty string: + + ```ruby + ['foo'].pack('A') # => "f" + ['foo'].pack('A*') # => "foo" + ['foo'].pack('A2') # => "fo" + ['foo'].pack('A4') # => "foo " + [nil].pack('A') # => " " + [nil].pack('A*') # => "" + [nil].pack('A2') # => " " + [nil].pack('A4') # => " " + ``` + + ```ruby + "foo\0".unpack('A') # => ["f"] + "foo\0".unpack('A4') # => ["foo"] + "foo\0bar".unpack('A10') # => ["foo\x00bar"] # Reads past "\0". + "foo ".unpack('A') # => ["f"] + "foo ".unpack('A4') # => ["foo"] + "foo".unpack('A4') # => ["foo"] + ``` + + ```ruby + japanese = 'こんにちは' + japanese.size # => 5 + japanese.bytesize # => 15 + [japanese].pack('A') # => "\xE3" + [japanese].pack('A*') # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF" + japanese.unpack('A') # => ["\xE3"] + japanese.unpack('A2') # => ["\xE3\x81"] + japanese.unpack('A4') # => ["\xE3\x81\x93\xE3"] + japanese.unpack('A*') # => ["\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF"] + ``` + +- `'a'` - Arbitrary binary string (null padded; count is width): + + ```ruby + ["foo"].pack('a') # => "f" + ["foo"].pack('a*') # => "foo" + ["foo"].pack('a2') # => "fo" + ["foo\0"].pack('a4') # => "foo\x00" + [nil].pack('a') # => "\x00" + [nil].pack('a*') # => "" + [nil].pack('a2') # => "\x00\x00" + [nil].pack('a4') # => "\x00\x00\x00\x00" + ``` + + ```ruby + "foo\0".unpack('a') # => ["f"] + "foo\0".unpack('a4') # => ["foo\x00"] + "foo ".unpack('a4') # => ["foo "] + "foo".unpack('a4') # => ["foo"] + "foo\0bar".unpack('a4') # => ["foo\x00"] # Reads past "\0". + ``` + +- `'Z'` - Same as `'a'`, + except that null is added or ignored with `'*'`: + + ```ruby + ["foo"].pack('Z*') # => "foo\x00" + [nil].pack('Z*') # => "\x00" + ``` + + ```ruby + "foo\0".unpack('Z*') # => ["foo"] + "foo".unpack('Z*') # => ["foo"] + "foo\0bar".unpack('Z*') # => ["foo"] # Does not read past "\0". + ``` + +### Bit \String Directives + +- `'B'` - Bit string (high byte first): + + ```ruby + ['11111111' + '00000000'].pack('B*') # => "\xFF\x00" + ['10000000' + '01000000'].pack('B*') # => "\x80@" + ``` + + ```ruby + ['1'].pack('B0') # => "" + ['1'].pack('B1') # => "\x80" + ['1'].pack('B2') # => "\x80\x00" + ['1'].pack('B3') # => "\x80\x00" + ['1'].pack('B4') # => "\x80\x00\x00" + ['1'].pack('B5') # => "\x80\x00\x00" + ['1'].pack('B6') # => "\x80\x00\x00\x00" + ``` + + ```ruby + "\xff\x00".unpack("B*") # => ["1111111100000000"] + "\x01\x02".unpack("B*") # => ["0000000100000010"] + ``` + + ```ruby + "".unpack("B0") # => [""] + "\x80".unpack("B1") # => ["1"] + "\x80".unpack("B2") # => ["10"] + "\x80".unpack("B3") # => ["100"] + ``` + +- `'b'` - Bit string (low byte first): + + ```ruby + ['11111111' + '00000000'].pack('b*') # => "\xFF\x00" + ['10000000' + '01000000'].pack('b*') # => "\x01\x02" + ``` + + ```ruby + ['1'].pack('b0') # => "" + ['1'].pack('b1') # => "\x01" + ['1'].pack('b2') # => "\x01\x00" + ['1'].pack('b3') # => "\x01\x00" + ['1'].pack('b4') # => "\x01\x00\x00" + ['1'].pack('b5') # => "\x01\x00\x00" + ['1'].pack('b6') # => "\x01\x00\x00\x00" + ``` + + ```ruby + "\xff\x00".unpack("b*") # => ["1111111100000000"] + "\x01\x02".unpack("b*") # => ["1000000001000000"] + ``` + + ```ruby + "".unpack("b0") # => [""] + "\x01".unpack("b1") # => ["1"] + "\x01".unpack("b2") # => ["10"] + "\x01".unpack("b3") # => ["100"] + ``` + +### Hex \String Directives + +- `'H'` - Hex string (high nibble first): + + ```ruby + ['10ef'].pack('H*') # => "\x10\xEF" + ['10ef'].pack('H0') # => "" + ['10ef'].pack('H3') # => "\x10\xE0" + ['10ef'].pack('H5') # => "\x10\xEF\x00" + ``` + + ```ruby + ['fff'].pack('H3') # => "\xFF\xF0" + ['fff'].pack('H4') # => "\xFF\xF0" + ['fff'].pack('H5') # => "\xFF\xF0\x00" + ['fff'].pack('H6') # => "\xFF\xF0\x00" + ['fff'].pack('H7') # => "\xFF\xF0\x00\x00" + ['fff'].pack('H8') # => "\xFF\xF0\x00\x00" + ``` + + ```ruby + "\x10\xef".unpack('H*') # => ["10ef"] + "\x10\xef".unpack('H0') # => [""] + "\x10\xef".unpack('H1') # => ["1"] + "\x10\xef".unpack('H2') # => ["10"] + "\x10\xef".unpack('H3') # => ["10e"] + "\x10\xef".unpack('H4') # => ["10ef"] + "\x10\xef".unpack('H5') # => ["10ef"] + ``` + +- `'h'` - Hex string (low nibble first): + + ```ruby + ['10ef'].pack('h*') # => "\x01\xFE" + ['10ef'].pack('h0') # => "" + ['10ef'].pack('h3') # => "\x01\x0E" + ['10ef'].pack('h5') # => "\x01\xFE\x00" + ``` + + ```ruby + ['fff'].pack('h3') # => "\xFF\x0F" + ['fff'].pack('h4') # => "\xFF\x0F" + ['fff'].pack('h5') # => "\xFF\x0F\x00" + ['fff'].pack('h6') # => "\xFF\x0F\x00" + ['fff'].pack('h7') # => "\xFF\x0F\x00\x00" + ['fff'].pack('h8') # => "\xFF\x0F\x00\x00" + ``` + + ```ruby + "\x01\xfe".unpack('h*') # => ["10ef"] + "\x01\xfe".unpack('h0') # => [""] + "\x01\xfe".unpack('h1') # => ["1"] + "\x01\xfe".unpack('h2') # => ["10"] + "\x01\xfe".unpack('h3') # => ["10e"] + "\x01\xfe".unpack('h4') # => ["10ef"] + "\x01\xfe".unpack('h5') # => ["10ef"] + ``` + +### Pointer \String Directives + +- `'P'` - Pointer to a structure (fixed-length string): + + ```ruby + s = ['abc'].pack('P') # => "\xE0O\x7F\xE5\xA1\x01\x00\x00" + s.unpack('P*') # => ["abc"] + ".".unpack("P") # => [] + ("\0" * 8).unpack("P") # => [nil] + [nil].pack("P") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + ``` + +- `'p'` - Pointer to a null-terminated string: + + ```ruby + s = ['abc'].pack('p') # => "(\xE4u\xE5\xA1\x01\x00\x00" + s.unpack('p*') # => ["abc"] + ".".unpack("p") # => [] + ("\0" * 8).unpack("p") # => [nil] + [nil].pack("p") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + ``` + +### Other \String Directives + +- `'M'` - Quoted printable, MIME encoding; + text mode, but input must use LF and output LF; + (see [RFC 2045](https://www.ietf.org/rfc/rfc2045.txt)): + + ```ruby + ["a b c\td \ne"].pack('M') # => "a b c\td =\n\ne=\n" + ["\0"].pack('M') # => "=00=\n" + ``` + + ```ruby + ["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n" # => true + ("a"*73+"=\na=\n").unpack('M') == ["a"*74] # => true + (("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023] # => true + ``` + + ```ruby + "a b c\td =\n\ne=\n".unpack('M') # => ["a b c\td \ne"] + "=00=\n".unpack('M') # => ["\x00"] + ``` + + ```ruby + "pre=31=32=33after".unpack('M') # => ["pre123after"] + "pre=\nafter".unpack('M') # => ["preafter"] + "pre=\r\nafter".unpack('M') # => ["preafter"] + "pre=".unpack('M') # => ["pre="] + "pre=\r".unpack('M') # => ["pre=\r"] + "pre=hoge".unpack('M') # => ["pre=hoge"] + "pre==31after".unpack('M') # => ["pre==31after"] + "pre===31after".unpack('M') # => ["pre===31after"] + ``` + +- `'m'` - Base64 encoded string; + count specifies input bytes between each newline, + rounded down to nearest multiple of 3; + if count is zero, no newlines are added; + (see [RFC 4648](https://www.ietf.org/rfc/rfc4648.txt)): + + ```ruby + [""].pack('m') # => "" + ["\0"].pack('m') # => "AA==\n" + ["\0\0"].pack('m') # => "AAA=\n" + ["\0\0\0"].pack('m') # => "AAAA\n" + ["\377"].pack('m') # => "/w==\n" + ["\377\377"].pack('m') # => "//8=\n" + ["\377\377\377"].pack('m') # => "////\n" + ``` + + ```ruby + "".unpack('m') # => [""] + "AA==\n".unpack('m') # => ["\x00"] + "AAA=\n".unpack('m') # => ["\x00\x00"] + "AAAA\n".unpack('m') # => ["\x00\x00\x00"] + "/w==\n".unpack('m') # => ["\xFF"] + "//8=\n".unpack('m') # => ["\xFF\xFF"] + "////\n".unpack('m') # => ["\xFF\xFF\xFF"] + "A\n".unpack('m') # => [""] + "AA\n".unpack('m') # => ["\x00"] + "AA=\n".unpack('m') # => ["\x00"] + "AAA\n".unpack('m') # => ["\x00\x00"] + ``` + + ```ruby + [""].pack('m0') # => "" + ["\0"].pack('m0') # => "AA==" + ["\0\0"].pack('m0') # => "AAA=" + ["\0\0\0"].pack('m0') # => "AAAA" + ["\377"].pack('m0') # => "/w==" + ["\377\377"].pack('m0') # => "//8=" + ["\377\377\377"].pack('m0') # => "////" + ``` + + ```ruby + "".unpack('m0') # => [""] + "AA==".unpack('m0') # => ["\x00"] + "AAA=".unpack('m0') # => ["\x00\x00"] + "AAAA".unpack('m0') # => ["\x00\x00\x00"] + "/w==".unpack('m0') # => ["\xFF"] + "//8=".unpack('m0') # => ["\xFF\xFF"] + "////".unpack('m0') # => ["\xFF\xFF\xFF"] + ``` + +- `'u'` - UU-encoded string: + + ```ruby + [""].pack("u") # => "" + ["a"].pack("u") # => "!80``\n" + ["aaa"].pack("u") # => "#86%A\n" + ``` + + ```ruby + "".unpack("u") # => [""] + "#86)C\n".unpack("u") # => ["abc"] + ``` + +## Offset Directives + +- `'@'` - Begin packing at the given byte offset; + for packing, null fill or shrink if necessary: + + ```ruby + [1, 2].pack("C@0C") # => "\x02" + [1, 2].pack("C@1C") # => "\x01\x02" + [1, 2].pack("C@5C") # => "\x01\x00\x00\x00\x00\x02" + [*1..5].pack("CCCC@2C") # => "\x01\x02\x05" + ``` + + For unpacking, cannot to move to outside the string: + + ```ruby + "\x01\x00\x00\x02".unpack("C@3C") # => [1, 2] + "\x00".unpack("@1C") # => [nil] + "\x00".unpack("@2C") # Raises ArgumentError. + ``` + +- `'X'` - For packing, shrink for the given byte offset: + + ```ruby + [0, 1, 2].pack("CCXC") # => "\x00\x02" + [0, 1, 2].pack("CCX2C") # => "\x02" + ``` + + For unpacking; rewind unpacking position for the given byte offset: + + ```ruby + "\x00\x02".unpack("CCXC") # => [0, 2, 2] + ``` + + Cannot to move to outside the string: + + ```ruby + [0, 1, 2].pack("CCX3C") # Raises ArgumentError. + "\x00\x02".unpack("CX3C") # Raises ArgumentError. + ``` + +- `'x'` - Begin packing at after the given byte offset; + for packing, null fill if necessary: + + ```ruby + [].pack("x0") # => "" + [].pack("x") # => "\x00" + [].pack("x8") # => "\x00\x00\x00\x00\x00\x00\x00\x00" + ``` + + For unpacking, cannot to move to outside the string: + + ```ruby + "\x00\x00\x02".unpack("CxC") # => [0, 2] + "\x00\x00\x02".unpack("x3C") # => [nil] + "\x00\x00\x02".unpack("x4C") # Raises ArgumentError + ``` + +- `'^'` - Only for unpacking; the current position: + + ```ruby + "foo\0\0\0".unpack("Z*^") # => ["foo", 4] + ``` diff --git a/doc/language/ractor.md b/doc/language/ractor.md new file mode 100644 index 0000000000..1592656217 --- /dev/null +++ b/doc/language/ractor.md @@ -0,0 +1,797 @@ +# Ractor - Ruby's Actor-like concurrency abstraction + +Ractors are designed to provide parallel execution of Ruby code without thread-safety concerns. + +## Summary + +### Multiple Ractors in a ruby process + +You can create multiple Ractors which can run ruby code in parallel with each other. + +* `Ractor.new{ expr }` creates a new Ractor and `expr` can run in parallel with other ractors on a multi-core computer. +* Ruby processes start with one ractor (called the *main ractor*). +* If the main ractor terminates, all other ractors receive termination requests, similar to how threads behave. +* Each Ractor contains one or more `Thread`s. + * Threads within the same ractor share a ractor-wide global lock (GVL in MRI terminology), so they can't run in parallel with each other (without releasing the GVL explicitly in C extensions). Threads in different ractors can run in parallel. + * The overhead of creating a ractor is slightly above the overhead of creating a thread. + +### Limited sharing between Ractors + +Ractors don't share all objects, unlike threads which can access any object other than objects stored in another thread's thread-locals. + +* Most objects are *unshareable objects*. Unshareable objects can only be used by the ractor that instantiated them, so you don't need to worry about thread-safety issues resulting from using the object concurrently across ractors. +* Some objects are *shareable objects*. Here is an incomplete list to give you an idea: + * `i = 123`: All `Integer`s are shareable. + * `s = "str".freeze`: Frozen strings are shareable if they have no instance variables that refer to unshareable objects. + * `a = [1, [2], 3].freeze`: `a` is not a shareable object because `a` refers to the unshareable object `[2]` (this Array is not frozen). + * `h = {c: Object}.freeze`: `h` is shareable because `Symbol`s and `Class`es are shareable, and the Hash is frozen. + * Class/Module objects are always shareable, even if they refer to unshareable objects. + * Special shareable objects + * Ractor objects themselves are shareable. + * And more... + +### Communication between Ractors with `Ractor::Port` + +Ractors communicate with each other and synchronize their execution by exchanging messages. The `Ractor::Port` class provides this communication mechanism. + +```ruby +port = Ractor::Port.new + +Ractor.new port do |port| + # Other ractors can send to the port + port << 42 +end + +port.receive # get a message from the port. Only the ractor that created the Port can receive from it. +#=> 42 +``` + +All Ractors have a default port, which `Ractor#send`, `Ractor.receive` (etc) will use. + +### Copy & Move semantics when sending objects + +To send unshareable objects to another ractor, objects are either copied or moved. + +* Copy: deep-copies the object to the other ractor. All unshareable objects will be `Kernel#clone`ed. +* Move: moves membership to another ractor. + * The sending ractor can not access the moved object after it moves. + * There is a guarantee that only one ractor can access an unshareable object at once. + +### Thread-safety + +Ractors help to write thread-safe, concurrent programs. They allow sharing of data only through explicit message passing for +unshareable objects. Shareable objects are guaranteed to work correctly across ractors, even if the ractors are running in parallel. +This guarantee, however, only applies across ractors. You still need to use `Mutex`es and other thread-safety tools within a ractor if +you're using multiple ruby `Thread`s. + + * Most objects are unshareable. You can't create data-races across ractors due to the inability to use these objects across ractors. + * Shareable objects are protected by locks (or otherwise don't need to be) so they can be used by more than one ractor at once. + +## Creation and termination + +### `Ractor.new` + +* `Ractor.new { expr }` creates a Ractor. + +```ruby +# Ractor.new with a block creates a new Ractor +r = Ractor.new do + # This block can run in parallel with other ractors +end + +# You can name a Ractor with a `name:` argument. +r = Ractor.new name: 'my-first-ractor' do +end + +r.name #=> 'my-first-ractor' +``` + +### Block isolation + +The Ractor executes `expr` in the given block. +The given block will be isolated from its outer scope. To prevent sharing objects between ractors, outer variables, `self` and other information is isolated from the block. + +This isolation occurs at Ractor creation time (when `Ractor.new` is called). If the given block is not able to be isolated because of outer variables or `self`, an error will be raised. + +```ruby +begin + a = true + r = Ractor.new do + a #=> Ractor::IsolationError because this block accesses outer variable `a`. + end + r.join # wait for ractor to finish +rescue Ractor::IsolationError +end +``` + +* The `self` of the given block is the `Ractor` object itself. + +```ruby +r = Ractor.new do + p self.class #=> Ractor + self.object_id +end +r.value == self.object_id #=> false +``` + +Arguments passed to `Ractor.new()` become block parameters for the given block. However, Ruby does not pass the objects themselves, but sends them as messages (see below for details). + +```ruby +r = Ractor.new 'ok' do |msg| + msg #=> 'ok' +end +r.value #=> 'ok' +``` + +```ruby +# similar to the last example +r = Ractor.new do + msg = Ractor.receive + msg +end +r.send 'ok' +r.value #=> 'ok' +``` + +### The execution result of the given block + +The return value of the given block becomes an outgoing message (see below for details). + +```ruby +r = Ractor.new do + 'ok' +end +r.value #=> `ok` +``` + +An error in the given block will be propagated to the consumer of the outgoing message. + +```ruby +r = Ractor.new do + raise 'ok' # exception will be transferred to the consumer +end + +begin + r.value +rescue Ractor::RemoteError => e + e.cause.class #=> RuntimeError + e.cause.message #=> 'ok' + e.ractor #=> r +end +``` + +## Communication between Ractors + +Communication between ractors is achieved by sending and receiving messages. There are two ways to communicate: + +* (1) Sending and receiving messages via `Ractor::Port` +* (2) Using shareable container objects. For example, the Ractor::TVar gem ([ko1/ractor-tvar](https://github.com/ko1/ractor-tvar)) + +Users can control program execution timing with (1), but should not control with (2) (only perform critical sections). + +For sending and receiving messages, these are the fundamental APIs: + +* send/receive via `Ractor::Port`. + * `Ractor::Port#send(obj)` (`Ractor::Port#<<(obj)` is an alias) sends a message to the port. Ports are connected to an infinite size incoming queue so sending will never block the caller. + * `Ractor::Port#receive` dequeues a message from its own incoming queue. If the incoming queue is empty, `Ractor::Port#receive` will block the execution of the current Thread until a message is sent. + * `Ractor#send` and `Ractor.receive` use ports (their default port) internally, so are conceptually similar to the above. +* You can close a `Ractor::Port` by `Ractor::Port#close`. A port can only be closed by the ractor that created it. + * If a port is closed, you can't `send` to it. Doing so raises an exception. + * When a ractor is terminated, the ractor's ports are automatically closed. +* You can wait for a ractor's termination and receive its return value with `Ractor#value`. This is similar to `Thread#value`. + +There are 3 ways to send an object as a message: + +1) Send a reference: sending a shareable object sends only a reference to the object (fast). + +2) Copy an object: sending an unshareable object through copying it deeply (can be slow). Note that you can not send an object this way which does not support deep copy. Some `T_DATA` objects (objects whose class is defined in a C extension, such as `StringIO`) are not supported. + +3) Move an object: sending an unshareable object across ractors with a membership change. The sending Ractor can not access the moved object after moving it, otherwise an exception will be raised. Implementation note: `T_DATA` objects are not supported. + +You can choose between "Copy" and "Move" by the `move:` keyword, `Ractor#send(obj, move: true/false)`. The default is `false` ("Copy"). However, if the object is shareable it will automatically use `move`. + +### Wait for multiple Ractors with `Ractor.select` + +You can wait for messages on multiple ports at once. +The return value of `Ractor.select()` is `[port, msg]` where `port` is a ready port and `msg` is the received message. + +To make it convenient, `Ractor.select` can also accept ractors. In this case, it waits for their termination. +The return value of `Ractor.select()` is `[r, msg]` where `r` is a terminated Ractor and `msg` is the value of the ractor's block. + +Wait for a single ractor (same as `Ractor#value`): + +```ruby +r1 = Ractor.new{'r1'} + +r, obj = Ractor.select(r1) +r == r1 and obj == 'r1' #=> true +``` + +Wait for two ractors: + +```ruby +r1 = Ractor.new{'r1'} +r2 = Ractor.new{'r2'} +rs = [r1, r2] +values = [] + +while rs.any? + r, obj = Ractor.select(*rs) + rs.delete(r) + values << obj +end + +values.sort == ['r1', 'r2'] #=> true +``` + +NOTE: Using `Ractor.select()` on a very large number of ractors has the same issue as `select(2)` currently. + +### Closing ports + +* `Ractor::Port#close` closes the port (similar to `Queue#close`). + * `port.send(obj)` will raise an exception when the port is closed. + * When the queue connected to the port is empty and port is closed, `Ractor::Port#receive` raises an exception. If the queue is not empty, it dequeues an object without exceptions. +* When a Ractor terminates, the ports are closed automatically. + +Example (try to get a result from closed ractor): + +```ruby +r = Ractor.new do + 'finish' +end +r.join # success (wait for the termination) +r.value # success (will return 'finish') + +# The ractor's termination value has already been given to another ractor +Ractor.new r do |r| + r.value #=> Ractor::Error +end.join +``` + +Example (try to send to closed port): + +```ruby +r = Ractor.new do +end + +r.join # wait for termination, closes default port + +begin + r.send(1) +rescue Ractor::ClosedError + 'ok' +end +``` + +### Send a message by copying + +`Ractor::Port#send(obj)` copies `obj` deeply if `obj` is an unshareable object. + +```ruby +obj = 'str'.dup +r = Ractor.new obj do |msg| + # return received msg's object_id + msg.object_id +end + +obj.object_id == r.value #=> false +``` + +Some objects do not support copying, and raise an exception. + +```ruby +obj = Thread.new{} +begin + Ractor.new obj do |msg| + msg + end +rescue TypeError => e + e.message #=> #<TypeError: allocator undefined for Thread> +end +``` + +### Send a message by moving + +`Ractor::Port#send(obj, move: true)` moves `obj` to the destination Ractor. +If the source ractor uses the moved object (for example, calls a method like `obj.foo()`), it will raise an error. + +```ruby +r = Ractor.new do + obj = Ractor.receive + obj << ' world' +end + +str = 'hello'.dup +r.send str, move: true +# str is now moved, and accessing str from this ractor is prohibited +modified = r.value #=> 'hello world' + + +begin + # Error because it uses moved str. + str << ' exception' # raise Ractor::MovedError +rescue Ractor::MovedError + modified #=> 'hello world' +end +``` + +Some objects do not support moving, and an exception will be raised. + +```ruby +r = Ractor.new do + Ractor.receive +end + +r.send(Thread.new{}, move: true) #=> allocator undefined for Thread (TypeError) +``` + +Once an object has been moved, the source object's class is changed to `Ractor::MovedObject`. + +### Shareable objects + +The following is an inexhaustive list of shareable objects: + +* `Integer`, `Float`, `Complex`, `Rational` +* `Symbol`, frozen `String` objects that don't refer to unshareables, `true`, `false`, `nil` +* `Regexp` objects, if they have no instance variables or their instance variables refer only to shareables +* `Class` and `Module` objects +* `Ractor` and other special objects which deal with synchronization + +To make objects shareable, `Ractor.make_shareable(obj)` is provided. It tries to make the object shareable by freezing `obj` and recursively traversing its references to freeze them all. This method accepts the `copy:` keyword (default value is false). `Ractor.make_shareable(obj, copy: true)` tries to make a deep copy of `obj` and make the copied object shareable. `Ractor.make_shareable(copy: false)` has no effect on an already shareable object. If the object cannot be made shareable, a `Ractor::Error` exception will be raised. + +## Language changes to limit sharing between Ractors + +To isolate unshareable objects across ractors, we introduced additional language semantics for multi-ractor Ruby programs. + +Note that when not using ractors, these additional semantics are not needed (100% compatible with Ruby 2). + +### Global variables + +Only the main Ractor can access global variables. + +```ruby +$gv = 1 +r = Ractor.new do + $gv +end + +begin + r.join +rescue Ractor::RemoteError => e + e.cause.message #=> 'can not access global variables from non-main Ractors' +end +``` + +Note that some special global variables, such as `$stdin`, `$stdout` and `$stderr` are local to each ractor. See [[Bug #17268]](https://bugs.ruby-lang.org/issues/17268) for more details. + +### Instance variables of shareable objects + +Instance variables of classes/modules can be accessed from non-main ractors only if their values are shareable objects. + +```ruby +class C + @iv = 1 +end + +p Ractor.new do + class C + @iv + end +end.value #=> 1 +``` + +Otherwise, only the main Ractor can access instance variables of shareable objects. + +```ruby +class C + @iv = [] # unshareable object +end + +Ractor.new do + class C + begin + p @iv + rescue Ractor::IsolationError + p $!.message + #=> "can not get unshareable values from instance variables of classes/modules from non-main Ractors" + end + + begin + @iv = 42 + rescue Ractor::IsolationError + p $!.message + #=> "can not set instance variables of classes/modules by non-main Ractors" + end + end +end.join +``` + +```ruby +shared = Ractor.new{} +shared.instance_variable_set(:@iv, 'str') + +r = Ractor.new shared do |shared| + p shared.instance_variable_get(:@iv) +end + +begin + r.join +rescue Ractor::RemoteError => e + e.cause.message #=> can not access instance variables of shareable objects from non-main Ractors (Ractor::IsolationError) +end +``` + +### Class variables + +Only the main Ractor can access class variables. + +```ruby +class C + @@cv = 'str' +end + +r = Ractor.new do + class C + p @@cv + end +end + + +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +### Constants + +Only the main Ractor can read constants which refer to an unshareable object. + +```ruby +class C + CONST = 'str'.dup +end +r = Ractor.new do + C::CONST +end +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +Only the main Ractor can define constants which refer to an unshareable object. + +```ruby +class C +end +r = Ractor.new do + C::CONST = 'str'.dup +end +begin + r.join +rescue => e + e.class #=> Ractor::IsolationError +end +``` + +When creating/updating a library to support ractors, constants should only refer to shareable objects if they are to be used by non-main ractors. + +```ruby +TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} +``` + +In this case, `TABLE` refers to an unshareable Hash object. In order for other ractors to use `TABLE`, we need to make it shareable. We can use `Ractor.make_shareable()` like so: + +```ruby +TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) +``` + +To make it easy, Ruby 3.0 introduced a new `shareable_constant_value` file directive. + +```ruby +# shareable_constant_value: literal + +TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} +#=> Same as: TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) +``` + +The `shareable_constant_value` directive accepts the following modes (descriptions use the example: `CONST = expr`): + +* none: Do nothing. Same as: `CONST = expr` +* literal: + * if `expr` consists of literals, replaced to `CONST = Ractor.make_shareable(expr)`. + * otherwise: replaced to `CONST = expr.tap{|o| raise unless Ractor.shareable?(o)}`. +* experimental_everything: replaced to `CONST = Ractor.make_shareable(expr)`. +* experimental_copy: replaced to `CONST = Ractor.make_shareable(expr, copy: true)`. + +Except for the `none` mode (default), it is guaranteed that these constants refer only to shareable objects. + +See [syntax/comments.rdoc](../syntax/comments.rdoc) for more details. + +### Shareable procs + +Procs and lambdas are unshareable objects, even when they are frozen. To create an unshareable Proc, you must use `Ractor.shareable_proc { expr }`. Much like during Ractor creation, the proc's block is isolated from its outer environment, so it cannot access variables from the outside scope. `self` is also changed within the Proc to be `nil` by default, although a `self:` keyword can be provided if you want to customize the value to a different shareable object. + +```ruby +p = Ractor.shareable_proc { p self } +p.call #=> nil +``` + +```ruby +begin + a = 1 + pr = Ractor.shareable_proc { p a } + pr.call # never gets here +rescue Ractor::IsolationError +end +``` + +In order to dynamically define a method with `Module#define_method` that can be used from different ractors, you must define it with a shareable proc. Alternatively, you can use `Module#class_eval` or `Module#module_eval` with a String. Even though the shareable proc's `self` is initially bound to `nil`, `define_method` will bind `self` to the correct value in the method. + +```ruby +class A + define_method :testing, &Ractor.shareable_proc do + p self + end +end +Ractor.new do + a = A.new + a.testing #=> #<A:0x0000000101acfe10> +end.join +``` + +This isolation must be done to prevent the method from accessing and assigning captured outer variables across ractors. + +### Ractor-local storage + +You can store any object (even unshareables) in ractor-local storage. + +```ruby +r = Ractor.new do + values = [] + Ractor[:threads] = [] + 3.times do |i| + Ractor[:threads] << Thread.new do + values << [Ractor.receive, i+1] # Ractor.receive blocks the current thread in the current ractor until it receives a message + end + end + Ractor[:threads].each(&:join) + values +end + +r << 1 +r << 2 +r << 3 +r.value #=> [[1,1],[2,2],[3,3]] (the order can change with each run) +``` + +## Examples + +### Traditional Ring example in Actor-model + +```ruby +RN = 1_000 +CR = Ractor.current + +r = Ractor.new do + p Ractor.receive + CR << :fin +end + +RN.times{ + r = Ractor.new r do |next_r| + next_r << Ractor.receive + end +} + +p :setup_ok +r << 1 +p Ractor.receive +``` + +### Fork-join + +```ruby +def fib n + if n < 2 + 1 + else + fib(n-2) + fib(n-1) + end +end + +RN = 10 +rs = (1..RN).map do |i| + Ractor.new i do |i| + [i, fib(i)] + end +end + +until rs.empty? + r, v = Ractor.select(*rs) + rs.delete r + p answer: v +end +``` + +### Worker pool + +(1) One ractor has a pool + +```ruby +require 'prime' + +N = 1000 +RN = 10 + +# make RN workers +workers = (1..RN).map do + Ractor.new do |; result_port| + loop do + n, result_port = Ractor.receive + result_port << [n, n.prime?, Ractor.current] + end + end +end + +result_port = Ractor::Port.new +results = [] + +(1..N).each do |i| + if workers.empty? + # receive a result + n, result, w = result_port.receive + results << [n, result] + else + w = workers.pop + end + + # send a task to the idle worker ractor + w << [i, result_port] +end + +# receive a result +while results.size != N + n, result, _w = result_port.receive + results << [n, result] +end + +pp results.sort_by{|n, result| n} +``` + +### Pipeline + +```ruby +# pipeline with send/receive + +r3 = Ractor.new Ractor.current do |cr| + cr.send Ractor.receive + 'r3' +end + +r2 = Ractor.new r3 do |r3| + r3.send Ractor.receive + 'r2' +end + +r1 = Ractor.new r2 do |r2| + r2.send Ractor.receive + 'r1' +end + +r1 << 'r0' +p Ractor.receive #=> "r0r1r2r3" +``` + +### Supervise + +```ruby +# ring example again + +r = Ractor.current +(1..10).map{|i| + r = Ractor.new r, i do |r, i| + r.send Ractor.receive + "r#{i}" + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +``` + +```ruby +# ring example with an error + +r = Ractor.current +rs = (1..10).map{|i| + r = Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +r.send "r0" +p Ractor.select(*rs, Ractor.current) #=> [:receive, "r0r10r9r8r7r6r5r4r3r2r1"] +r.send "e0" +p Ractor.select(*rs, Ractor.current) +#=> +# <Thread:0x000056262de28bd8 run> terminated with exception (report_on_exception is true): +# Traceback (most recent call last): +# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' +# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' +# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception +# Traceback (most recent call last): +# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' +# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' +# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception +# 1: from /home/ko1/src/ruby/trunk/test.rb:21:in `<main>' +# <internal:ractor>:69:in `select': thrown by remote Ractor. (Ractor::RemoteError) +``` + +```ruby +# resend non-error message + +r = Ractor.current +rs = (1..10).map{|i| + r = Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +} + +r.send "r0" +p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" +r.send "r0" +p Ractor.select(*rs, Ractor.current) +[:receive, "r0r10r9r8r7r6r5r4r3r2r1"] +msg = 'e0' +begin + r.send msg + p Ractor.select(*rs, Ractor.current) +rescue Ractor::RemoteError + msg = 'r0' + retry +end + +#=> <internal:ractor>:100:in `send': The incoming-port is already closed (Ractor::ClosedError) +# because r == r[-1] is terminated. +``` + +```ruby +# ring example with supervisor and re-start + +def make_ractor r, i + Ractor.new r, i do |r, i| + loop do + msg = Ractor.receive + raise if /e/ =~ msg + r.send msg + "r#{i}" + end + end +end + +r = Ractor.current +rs = (1..10).map{|i| + r = make_ractor(r, i) +} + +msg = 'e0' # error causing message +begin + r.send msg + p Ractor.select(*rs, Ractor.current) +rescue Ractor::RemoteError + r = rs[-1] = make_ractor(rs[-2], rs.size-1) + msg = 'x0' + retry +end + +#=> [:receive, "x0r9r9r8r7r6r5r4r3r2r1"] +``` diff --git a/doc/regexp/methods.rdoc b/doc/language/regexp/methods.rdoc index 356156ac9a..356156ac9a 100644 --- a/doc/regexp/methods.rdoc +++ b/doc/language/regexp/methods.rdoc diff --git a/doc/regexp/unicode_properties.rdoc b/doc/language/regexp/unicode_properties.rdoc index f1f1f9d6a9..94080f7199 100644 --- a/doc/regexp/unicode_properties.rdoc +++ b/doc/language/regexp/unicode_properties.rdoc @@ -146,6 +146,7 @@ Older versions may not support all of these. - <tt>\p{Bassa_Vah}</tt>, <tt>\p{Bass}</tt> - <tt>\p{Batak}</tt>, <tt>\p{Batk}</tt> - <tt>\p{Bengali}</tt>, <tt>\p{Beng}</tt> +- <tt>\p{Beria_Erfe}</tt>, <tt>\p{Berf}</tt> - <tt>\p{Bhaiksuki}</tt>, <tt>\p{Bhks}</tt> - <tt>\p{Bopomofo}</tt>, <tt>\p{Bopo}</tt> - <tt>\p{Brahmi}</tt>, <tt>\p{Brah}</tt> @@ -270,6 +271,7 @@ Older versions may not support all of these. - <tt>\p{Sharada}</tt>, <tt>\p{Shrd}</tt> - <tt>\p{Shavian}</tt>, <tt>\p{Shaw}</tt> - <tt>\p{Siddham}</tt>, <tt>\p{Sidd}</tt> +- <tt>\p{Sidetic}</tt>, <tt>\p{Sidt}</tt> - <tt>\p{SignWriting}</tt>, <tt>\p{Sgnw}</tt> - <tt>\p{Sinhala}</tt>, <tt>\p{Sinh}</tt> - <tt>\p{Sogdian}</tt>, <tt>\p{Sogd}</tt> @@ -284,6 +286,7 @@ Older versions may not support all of these. - <tt>\p{Tai_Le}</tt>, <tt>\p{Tale}</tt> - <tt>\p{Tai_Tham}</tt>, <tt>\p{Lana}</tt> - <tt>\p{Tai_Viet}</tt>, <tt>\p{Tavt}</tt> +- <tt>\p{Tai_Yo}</tt>, <tt>\p{Tayo}</tt> - <tt>\p{Takri}</tt>, <tt>\p{Takr}</tt> - <tt>\p{Tamil}</tt>, <tt>\p{Taml}</tt> - <tt>\p{Tangsa}</tt>, <tt>\p{Tnsa}</tt> @@ -295,6 +298,7 @@ Older versions may not support all of these. - <tt>\p{Tifinagh}</tt>, <tt>\p{Tfng}</tt> - <tt>\p{Tirhuta}</tt>, <tt>\p{Tirh}</tt> - <tt>\p{Todhri}</tt>, <tt>\p{Todr}</tt> +- <tt>\p{Tolong_Siki}</tt>, <tt>\p{Tols}</tt> - <tt>\p{Toto}</tt> - <tt>\p{Tulu_Tigalari}</tt>, <tt>\p{Tutg}</tt> - <tt>\p{Ugaritic}</tt>, <tt>\p{Ugar}</tt> @@ -336,6 +340,7 @@ Older versions may not support all of these. - <tt>\p{In_Bassa_Vah}</tt> - <tt>\p{In_Batak}</tt> - <tt>\p{In_Bengali}</tt> +- <tt>\p{In_Beria_Erfe}</tt> - <tt>\p{In_Bhaiksuki}</tt> - <tt>\p{In_Block_Elements}</tt> - <tt>\p{In_Bopomofo}</tt> @@ -363,6 +368,7 @@ Older versions may not support all of these. - <tt>\p{In_CJK_Unified_Ideographs_Extension_G}</tt> - <tt>\p{In_CJK_Unified_Ideographs_Extension_H}</tt> - <tt>\p{In_CJK_Unified_Ideographs_Extension_I}</tt> +- <tt>\p{In_CJK_Unified_Ideographs_Extension_J}</tt> - <tt>\p{In_Carian}</tt> - <tt>\p{In_Caucasian_Albanian}</tt> - <tt>\p{In_Chakma}</tt> @@ -516,6 +522,7 @@ Older versions may not support all of these. - <tt>\p{In_Miscellaneous_Mathematical_Symbols_A}</tt> - <tt>\p{In_Miscellaneous_Mathematical_Symbols_B}</tt> - <tt>\p{In_Miscellaneous_Symbols}</tt> +- <tt>\p{In_Miscellaneous_Symbols_Supplement}</tt> - <tt>\p{In_Miscellaneous_Symbols_and_Arrows}</tt> - <tt>\p{In_Miscellaneous_Symbols_and_Pictographs}</tt> - <tt>\p{In_Miscellaneous_Technical}</tt> @@ -575,9 +582,11 @@ Older versions may not support all of these. - <tt>\p{In_Samaritan}</tt> - <tt>\p{In_Saurashtra}</tt> - <tt>\p{In_Sharada}</tt> +- <tt>\p{In_Sharada_Supplement}</tt> - <tt>\p{In_Shavian}</tt> - <tt>\p{In_Shorthand_Format_Controls}</tt> - <tt>\p{In_Siddham}</tt> +- <tt>\p{In_Sidetic}</tt> - <tt>\p{In_Sinhala}</tt> - <tt>\p{In_Sinhala_Archaic_Numbers}</tt> - <tt>\p{In_Small_Form_Variants}</tt> @@ -613,12 +622,14 @@ Older versions may not support all of these. - <tt>\p{In_Tai_Tham}</tt> - <tt>\p{In_Tai_Viet}</tt> - <tt>\p{In_Tai_Xuan_Jing_Symbols}</tt> +- <tt>\p{In_Tai_Yo}</tt> - <tt>\p{In_Takri}</tt> - <tt>\p{In_Tamil}</tt> - <tt>\p{In_Tamil_Supplement}</tt> - <tt>\p{In_Tangsa}</tt> - <tt>\p{In_Tangut}</tt> - <tt>\p{In_Tangut_Components}</tt> +- <tt>\p{In_Tangut_Components_Supplement}</tt> - <tt>\p{In_Tangut_Supplement}</tt> - <tt>\p{In_Telugu}</tt> - <tt>\p{In_Thaana}</tt> @@ -627,6 +638,7 @@ Older versions may not support all of these. - <tt>\p{In_Tifinagh}</tt> - <tt>\p{In_Tirhuta}</tt> - <tt>\p{In_Todhri}</tt> +- <tt>\p{In_Tolong_Siki}</tt> - <tt>\p{In_Toto}</tt> - <tt>\p{In_Transport_and_Map_Symbols}</tt> - <tt>\p{In_Tulu_Tigalari}</tt> @@ -685,6 +697,7 @@ Older versions may not support all of these. - <tt>\p{Age_15_0}</tt> - <tt>\p{Age_15_1}</tt> - <tt>\p{Age_16_0}</tt> +- <tt>\p{Age_17_0}</tt> - <tt>\p{Age_1_1}</tt> - <tt>\p{Age_2_0}</tt> - <tt>\p{Age_2_1}</tt> diff --git a/doc/signals.rdoc b/doc/language/signals.rdoc index 403eb66549..a82dab81c6 100644 --- a/doc/signals.rdoc +++ b/doc/language/signals.rdoc @@ -17,7 +17,7 @@ for its internal data structures, but it does not know when it is safe for data structures in YOUR code. Ruby implements deferred signal handling by registering short C functions with only {async-signal-safe functions}[http://man7.org/linux/man-pages/man7/signal-safety.7.html] as -signal handlers. These short C functions only do enough tell the VM to +signal handlers. These short C functions only do enough to tell the VM to run callbacks registered via Signal.trap later in the main Ruby Thread. == Unsafe methods to call in Signal.trap blocks diff --git a/doc/strftime_formatting.rdoc b/doc/language/strftime_formatting.rdoc index 5c7b33155d..2bfa6b975e 100644 --- a/doc/strftime_formatting.rdoc +++ b/doc/language/strftime_formatting.rdoc @@ -136,7 +136,7 @@ the only required part is the conversion specifier, so we begin with that. t = Time.now # => 2022-06-29 07:10:20.3230914 -0500 t.strftime('%N') # => "323091400" # Default. - Use {width specifiers}[rdoc-ref:strftime_formatting.rdoc@Width+Specifiers] + Use {width specifiers}[rdoc-ref:@Width+Specifiers] to adjust units: t.strftime('%3N') # => "323" # Milliseconds. @@ -522,6 +522,4 @@ An ISO 8601 combined date and time representation may be any ISO 8601 date and any ISO 8601 time, separated by the letter +T+. -For the relevant +strftime+ formats, see -{Dates}[rdoc-ref:strftime_formatting.rdoc@Dates] -and {Times}[rdoc-ref:strftime_formatting.rdoc@Times] above. +For the relevant +strftime+ formats, see {Dates}[rdoc-ref:@Dates] and {Times}[rdoc-ref:@Times] above. diff --git a/doc/maintainers.md b/doc/maintainers.md index 7d217a1665..e87ccaca05 100644 --- a/doc/maintainers.md +++ b/doc/maintainers.md @@ -28,6 +28,10 @@ not have authority to change/add a feature on his/her part. They need consensus on ruby-core/ruby-dev before changing/adding. Some of submaintainers have commit right, others don't. +No maintainer means that there is no specific maintainer for the part now. +The member of ruby core team can fix issues at anytime. But major changes need +consensus on ruby-core/ruby-dev. + ### Language core features including security * Yukihiro Matsumoto ([matz]) @@ -40,25 +44,30 @@ have commit right, others don't. * Yukihiro Matsumoto ([matz]) -## Standard Library Maintainers - -### Libraries +### Standard Library Maintainers #### lib/mkmf.rb -* *unmaintained* +* *No maintainer* + +#### pathname_builtin.rb, lib/pathname.rb + +* Tanaka Akira ([akr]) #### lib/rubygems.rb, lib/rubygems/* -* Eric Hodel ([drbrain]) * Hiroshi SHIBATA ([hsbt]) -* https://github.com/rubygems/rubygems +* https://github.com/ruby/rubygems #### lib/unicode_normalize.rb, lib/unicode_normalize/* * Martin J. Dürst ([duerst]) -### Extensions +### Standard Library(Extensions) Maintainers + +#### set.c + +* Akinori MUSHA ([knu]) #### ext/continuation @@ -78,15 +87,19 @@ have commit right, others don't. #### ext/objspace -* *unmaintained* +* *No maintainer* + +#### ext/pathname + +* Tanaka Akira ([akr]) #### ext/pty -* *unmaintained* +* *No maintainer* #### ext/ripper -* *unmaintained* +* *No maintainer* #### ext/socket @@ -97,29 +110,27 @@ have commit right, others don't. * NAKAMURA Usaku ([unak]) -## Default gems Maintainers - -### Libraries +### Default gems(Libraries) Maintainers #### lib/bundler.rb, lib/bundler/* * Hiroshi SHIBATA ([hsbt]) -* https://github.com/rubygems/rubygems +* https://github.com/ruby/rubygems * https://rubygems.org/gems/bundler #### lib/cgi/escape.rb -* *unmaintained* +* *No maintainer* #### lib/English.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/English * https://rubygems.org/gems/English #### lib/delegate.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/delegate * https://rubygems.org/gems/delegate @@ -150,7 +161,7 @@ have commit right, others don't. #### lib/fileutils.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/fileutils * https://rubygems.org/gems/fileutils @@ -176,6 +187,7 @@ have commit right, others don't. * Nobuyuki Nakada ([nobu]) * https://github.com/ruby/optparse +* https://rubygems.org/gems/optparse #### lib/net/http.rb, lib/net/https.rb @@ -185,13 +197,13 @@ have commit right, others don't. #### lib/net/protocol.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/net-protocol * https://rubygems.org/gems/net-protocol #### lib/open3.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/open3 * https://rubygems.org/gems/open3 @@ -199,6 +211,7 @@ have commit right, others don't. * Tanaka Akira ([akr]) * https://github.com/ruby/open-uri +* https://rubygems.org/gems/open-uri #### lib/pp.rb @@ -217,6 +230,7 @@ have commit right, others don't. * Kevin Newton ([kddnewton]) * Eileen Uchitelle ([eileencodes]) * Aaron Patterson ([tenderlove]) +* Earlopain ([earlopain]) * https://github.com/ruby/prism * https://rubygems.org/gems/prism @@ -246,7 +260,7 @@ have commit right, others don't. #### lib/tempfile.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/tempfile * https://rubygems.org/gems/tempfile @@ -262,24 +276,12 @@ have commit right, others don't. * https://github.com/ruby/timeout * https://rubygems.org/gems/timeout -#### lib/thwait.rb - -* Keiju ISHITSUKA ([keiju]) -* https://github.com/ruby/thwait -* https://rubygems.org/gems/thwait - #### lib/tmpdir.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/tmpdir * https://rubygems.org/gems/tmpdir -#### lib/tsort.rb - -* Tanaka Akira ([akr]) -* https://github.com/ruby/tsort -* https://rubygems.org/gems/tsort - #### lib/un.rb * WATANABE Hirofumi ([eban]) @@ -301,11 +303,11 @@ have commit right, others don't. #### lib/weakref.rb -* *unmaintained* +* *No maintainer* * https://github.com/ruby/weakref * https://rubygems.org/gems/weakref -### Extensions +### Default gems(Extensions) Maintainers #### ext/cgi @@ -313,19 +315,19 @@ have commit right, others don't. #### ext/date -* *unmaintained* +* *No maintainer* * https://github.com/ruby/date * https://rubygems.org/gems/date #### ext/etc -* *unmaintained* +* *No maintainer* * https://github.com/ruby/etc * https://rubygems.org/gems/etc #### ext/fcntl -* *unmaintained* +* *No maintainer* * https://github.com/ruby/fcntl * https://rubygems.org/gems/fcntl @@ -361,12 +363,6 @@ have commit right, others don't. * https://github.com/ruby/openssl * https://rubygems.org/gems/openssl -#### ext/pathname - -* Tanaka Akira ([akr]) -* https://github.com/ruby/pathname -* https://rubygems.org/gems/pathname - #### ext/psych * Aaron Patterson ([tenderlove]) @@ -392,159 +388,248 @@ have commit right, others don't. * https://github.com/ruby/zlib * https://rubygems.org/gems/zlib -## Bundled gems upstream repositories +## Bundled gems upstream repositories and maintainers + +The maintanance policy of bundled gems is different from Module Maintainers above. +Please check the policies for each repository. + +The ruby core team tries to maintain the repositories with no maintainers. +It may needs to make consensus on ruby-core/ruby-dev before making major changes. ### minitest +* Ryan Davis ([zenspider]) * https://github.com/minitest/minitest +* https://rubygems.org/gems/minitest ### power_assert +* Tsujimoto Kenta ([k-tsj]) * https://github.com/ruby/power_assert +* https://rubygems.org/gems/power_assert ### rake +* Hiroshi SHIBATA ([hsbt]) * https://github.com/ruby/rake +* https://rubygems.org/gems/rake ### test-unit +* Kouhei Sutou ([kou]) * https://github.com/test-unit/test-unit +* https://rubygems.org/gems/test-unit ### rexml +* Kouhei Sutou ([kou]) * https://github.com/ruby/rexml +* https://rubygems.org/gems/rexml ### rss +* Kouhei Sutou ([kou]) * https://github.com/ruby/rss - -### net-ftp - -* https://github.com/ruby/net-ftp +* https://rubygems.org/gems/rss ### net-imap +* Nicholas A. Evans ([nevans]) * https://github.com/ruby/net-imap - -### net-pop - -* https://github.com/ruby/net-pop +* https://rubygems.org/gems/net-imap ### net-smtp +* TOMITA Masahiro ([tmtm]) * https://github.com/ruby/net-smtp +* https://rubygems.org/gems/net-smtp ### matrix +* Marc-André Lafortune ([marcandre]) * https://github.com/ruby/matrix +* https://rubygems.org/gems/matrix ### prime * https://github.com/ruby/prime +* https://rubygems.org/gems/prime ### rbs +* Soutaro Matsumoto ([soutaro]) * https://github.com/ruby/rbs +* https://rubygems.org/gems/rbs ### typeprof +* Yusuke Endoh ([mame]) * https://github.com/ruby/typeprof +* https://rubygems.org/gems/typeprof ### debug +* Koichi Sasada ([ko1]) * https://github.com/ruby/debug +* https://rubygems.org/gems/debug ### racc +* Yuichi Kaneko ([yui-knk]) * https://github.com/ruby/racc +* https://rubygems.org/gems/racc #### mutex_m * https://github.com/ruby/mutex_m +* https://rubygems.org/gems/mutex_m #### getoptlong * https://github.com/ruby/getoptlong +* https://rubygems.org/gems/getoptlong #### base64 +* Yusuke Endoh ([mame]) * https://github.com/ruby/base64 +* https://rubygems.org/gems/base64 #### bigdecimal +* Kenta Murata ([mrkn]) * https://github.com/ruby/bigdecimal +* https://rubygems.org/gems/bigdecimal #### observer * https://github.com/ruby/observer +* https://rubygems.org/gems/observer #### abbrev +* Akinori MUSHA ([knu]) * https://github.com/ruby/abbrev +* https://rubygems.org/gems/abbrev #### resolv-replace +* Akira TANAKA ([akr]) * https://github.com/ruby/resolv-replace +* https://rubygems.org/gems/resolv-replace #### rinda +* Masatoshi SEKI ([seki]) * https://github.com/ruby/rinda +* https://rubygems.org/gems/rinda #### drb +* Masatoshi SEKI ([seki]) * https://github.com/ruby/drb +* https://rubygems.org/gems/drb #### nkf +* Naruse Yusuke ([nurse]) * https://github.com/ruby/nkf +* https://rubygems.org/gems/nkf #### syslog +* Akinori Musha ([knu]) * https://github.com/ruby/syslog +* https://rubygems.org/gems/syslog #### csv +* Kouhei Sutou ([kou]) * https://github.com/ruby/csv +* https://rubygems.org/gems/csv #### ostruct +* Marc-André Lafortune ([marcandre]) * https://github.com/ruby/ostruct +* https://rubygems.org/gems/ostruct #### pstore * https://github.com/ruby/pstore +* https://rubygems.org/gems/pstore #### benchmark +* Benoit Daloze ([eregon]) * https://github.com/ruby/benchmark +* https://rubygems.org/gems/benchmark #### logger +* Naotoshi Seo ([sonots]) * https://github.com/ruby/logger +* https://rubygems.org/gems/logger #### rdoc +* Stan Lo ([st0012]) +* Nobuyoshi Nakada ([nobu]) * https://github.com/ruby/rdoc +* https://rubygems.org/gems/rdoc #### win32ole +* Masaki Suketa ([suketa]) * https://github.com/ruby/win32ole +* https://rubygems.org/gems/win32ole #### irb +* Tomoya Ishida ([tompng]) +* Stan Lo ([st0012]) +* Mari Imaizumi ([ima1zumi]) +* HASUMI Hitoshi ([hasumikin]) * https://github.com/ruby/irb +* https://rubygems.org/gems/irb #### reline +* Tomoya Ishida ([tompng]) +* Stan Lo ([st0012]) +* Mari Imaizumi ([ima1zumi]) +* HASUMI Hitoshi ([hasumikin]) * https://github.com/ruby/reline +* https://rubygems.org/gems/reline #### readline * https://github.com/ruby/readline +* https://rubygems.org/gems/readline #### fiddle +* Kouhei Sutou ([kou]) * https://github.com/ruby/fiddle +* https://rubygems.org/gems/fiddle + +#### repl_type_completor + +* Tomoya Ishida ([tompng]) +* https://github.com/ruby/repl_type_completor +* https://rubygems.org/gems/repl_type_completor + +#### tsort + +* Tanaka Akira ([akr]) +* https://github.com/ruby/tsort +* https://rubygems.org/gems/tsort + +#### win32-registry + +* Nakamura Usaku ([unak]) +* https://github.com/ruby/win32-registry +* https://rubygems.org/gems/win32-registry ## Platform Maintainers @@ -582,7 +667,7 @@ have commit right, others don't. ### cygwin, ... -* none. (Maintainer WANTED) +* **No maintainer** ### WebAssembly/WASI @@ -593,8 +678,10 @@ have commit right, others don't. [colby-swandale]: https://github.com/colby-swandale [drbrain]: https://github.com/drbrain [duerst]: https://github.com/duerst +[earlopain]: https://github.com/earlopain [eban]: https://github.com/eban [eileencodes]: https://github.com/eileencodes +[eregon]: https://github.com/eregon [hasumikin]: https://github.com/hasumikin [hsbt]: https://github.com/hsbt [ima1zumi]: https://github.com/ima1zumi @@ -625,3 +712,11 @@ have commit right, others don't. [tompng]: https://github.com/tompng [unak]: https://github.com/unak [yuki24]: https://github.com/yuki24 +[zenspider]: https://github.com/zenspider +[k-tsj]: https://github.com/k-tsj +[nevans]: https://github.com/nevans +[tmtm]: https://github.com/tmtm +[soutaro]: https://github.com/soutaro +[yui-knk]: https://github.com/yui-knk +[hasumikin]: https://github.com/hasumikin +[suketa]: https://github.com/suketa diff --git a/doc/matchdata/begin.rdoc b/doc/matchdata/begin.rdoc index 8046dd9d55..6100617e19 100644 --- a/doc/matchdata/begin.rdoc +++ b/doc/matchdata/begin.rdoc @@ -10,12 +10,12 @@ returns the offset of the beginning of the <tt>n</tt>th match: m[3] # => "113" m.begin(3) # => 3 - m = /(т)(е)(с)/.match('тест') - # => #<MatchData "тес" 1:"т" 2:"е" 3:"с"> - m[0] # => "тес" - m.begin(0) # => 0 - m[3] # => "с" - m.begin(3) # => 2 + m = /(ん)(に)(ち)/.match('こんにちは') + # => #<MatchData "んにち" 1:"ん" 2:"に" 3:"ち"> + m[0] # => "んにち" + m.begin(0) # => 1 + m[3] # => "ち" + m.begin(3) # => 3 When string or symbol argument +name+ is given, returns the offset of the beginning for the named match: diff --git a/doc/matchdata/bytebegin.rdoc b/doc/matchdata/bytebegin.rdoc index 5b40a7ef73..54e417a7fc 100644 --- a/doc/matchdata/bytebegin.rdoc +++ b/doc/matchdata/bytebegin.rdoc @@ -10,12 +10,12 @@ returns the offset of the beginning of the <tt>n</tt>th match: m[3] # => "113" m.bytebegin(3) # => 3 - m = /(т)(е)(с)/.match('тест') - # => #<MatchData "тес" 1:"т" 2:"е" 3:"с"> - m[0] # => "тес" - m.bytebegin(0) # => 0 - m[3] # => "с" - m.bytebegin(3) # => 4 + m = /(ん)(に)(ち)/.match('こんにちは') + # => #<MatchData "んにち" 1:"ん" 2:"に" 3:"ち"> + m[0] # => "んにち" + m.bytebegin(0) # => 3 + m[3] # => "ち" + m.bytebegin(3) # => 9 When string or symbol argument +name+ is given, returns the offset of the beginning for the named match: diff --git a/doc/matchdata/byteend.rdoc b/doc/matchdata/byteend.rdoc index eb57664022..0a03f76208 100644 --- a/doc/matchdata/byteend.rdoc +++ b/doc/matchdata/byteend.rdoc @@ -10,12 +10,12 @@ returns the offset of the end of the <tt>n</tt>th match: m[3] # => "113" m.byteend(3) # => 6 - m = /(т)(е)(с)/.match('тест') - # => #<MatchData "тес" 1:"т" 2:"е" 3:"с"> - m[0] # => "тес" - m.byteend(0) # => 6 - m[3] # => "с" - m.byteend(3) # => 6 + m = /(ん)(に)(ち)/.match('こんにちは') + # => #<MatchData "んにち" 1:"ん" 2:"に" 3:"ち"> + m[0] # => "んにち" + m.byteend(0) # => 12 + m[3] # => "ち" + m.byteend(3) # => 12 When string or symbol argument +name+ is given, returns the offset of the end for the named match: diff --git a/doc/matchdata/end.rdoc b/doc/matchdata/end.rdoc index 0209b2d2fc..c43a5428f3 100644 --- a/doc/matchdata/end.rdoc +++ b/doc/matchdata/end.rdoc @@ -10,12 +10,12 @@ returns the offset of the end of the <tt>n</tt>th match: m[3] # => "113" m.end(3) # => 6 - m = /(т)(е)(с)/.match('тест') - # => #<MatchData "тес" 1:"т" 2:"е" 3:"с"> - m[0] # => "тес" - m.end(0) # => 3 - m[3] # => "с" - m.end(3) # => 3 + m = /(ん)(に)(ち)/.match('こんにちは') + # => #<MatchData "んにち" 1:"ん" 2:"に" 3:"ち"> + m[0] # => "んにち" + m.end(0) # => 4 + m[3] # => "ち" + m.end(3) # => 4 When string or symbol argument +name+ is given, returns the offset of the end for the named match: diff --git a/doc/matchdata/offset.rdoc b/doc/matchdata/offset.rdoc index 0985316d76..4194ef7ef9 100644 --- a/doc/matchdata/offset.rdoc +++ b/doc/matchdata/offset.rdoc @@ -11,12 +11,12 @@ returns the starting and ending offsets of the <tt>n</tt>th match: m[3] # => "113" m.offset(3) # => [3, 6] - m = /(т)(е)(с)/.match('тест') - # => #<MatchData "тес" 1:"т" 2:"е" 3:"с"> - m[0] # => "тес" - m.offset(0) # => [0, 3] - m[3] # => "с" - m.offset(3) # => [2, 3] + m = /(ん)(に)(ち)/.match('こんにちは') + # => #<MatchData "んにち" 1:"ん" 2:"に" 3:"ち"> + m[0] # => "んにち" + m.offset(0) # => [1, 4] + m[3] # => "ち" + m.offset(3) # => [3, 4] When string or symbol argument +name+ is given, returns the starting and ending offsets for the named match: diff --git a/doc/namespace.md b/doc/namespace.md deleted file mode 100644 index eee6b94072..0000000000 --- a/doc/namespace.md +++ /dev/null @@ -1,418 +0,0 @@ -# Namespace - Ruby's in-process separation of Classes and Modules - -Namespace is designed to provide separated spaces in a Ruby process, to isolate applications and libraries. - -## Known issues - -* Experimental warning is shown when ruby starts with `RUBY_NAMESPACE=1` (specify `-W:no-experimental` option to hide it) -* `bundle install` may fail -* `require 'active_support'` may fail -* A wrong current namespace detection happens sometimes in the root namespace - -## TODOs - -* Identify the CI failure cause and restore temporarily skipped tests (mmtk, test/ruby/test_allocation on i686) -* Reconstruct current/loading namespace management based on control frames -* Add the loaded namespace on iseq to check if another namespace tries running the iseq (add a field only when VM_CHECK_MODE?) -* Delete per-namespace extension files (.so) lazily or process exit -* Collect rb_classext_t entries for a namespace when the namespace is collected -* Allocate an rb_namespace_t entry as the root namespace at first, then construct the contents and wrap it as rb_cNamespace instance later (to eliminate root/builtin two namespaces situation) -* Assign its own TOPLEVEL_BINDING in namespaces -* Fix `warn` in namespaces to refer `$VERBOSE` in the namespace -* Make an internal data container `Namespace::Entry` invisible -* More test cases about `$LOAD_PATH` and `$LOADED_FEATURES` -* Return classpath and nesting without the namespace prefix in the namespace itself [#21316](https://bugs.ruby-lang.org/issues/21316), [#21318](https://bugs.ruby-lang.org/issues/21318) - -## How to use - -### Enabling namespace - -First, an environment variable should be set at the ruby process bootup: `RUBY_NAMESPACE=1`. -The only valid value is `1` to enable namespace. Other values (or unset `RUBY_NAMESPACE`) means disabling namespace. And setting the value after Ruby program starts doesn't work. - -### Using namespace - -`Namespace` class is the entrypoint of namespaces. - -```ruby -ns = Namespace.new -ns.require('something') # or require_relative, load -``` - -The required file (either .rb or .so/.dll/.bundle) is loaded in the namespace (`ns` here). The required/loaded files from `something` will be loaded in the namespace recursively. - -```ruby -# something.rb - -X = 1 - -class Something - def self.x = X - def x = ::X -end -``` - -Classes/modules, those methods and constants defined in the namespace can be accessed via `ns` object. - -```ruby -p ns::Something.x # 1 - -X = 2 -p X # 2 -p ::X # 2 -p ns::Something.x # 1 -p ns::X # 1 -``` - -Instance methods defined in the namespace also run with definitions in the namespace. - -```ruby -s = ns::Something.new - -p s.x # 1 -``` - -## Specifications - -### Namespace types - -There are two namespace types: - -* Root namespace -* User namespace - -There is the root namespace, just a single namespace in a Ruby process. Ruby bootstrap runs in the root namespace, and all builtin classes/modules are defined in the root namespace. (See "Builtin classes and modules".) - -User namespaces are to run user-written programs and libraries loaded from user programs. The user's main program (specified by the `ruby` command line argument) is executed in the "main" namespace, which is a user namespace automatically created at the end of Ruby's bootstrap, copied from the root namespace. - -When `Namespace.new` is called, an "optional" namespace (a user, non-main namespace) is created, copied from the root namespace. All user namespaces are flat, copied from the root namespace. - -### Namespace class and instances - -`Namespace` is a top level class, as a subclass of `Module`, and `Namespace` instances are a kind of `Module`. - -### Classes and modules defined in namespace - -The classes and modules, newly defined in a namespace `ns`, are defined under `ns`. For example, if a class `A` is defined in `ns`, it is actually defined as `ns::A`. - -In the namespace `ns`, `ns::A` can be referred to as `A` (and `::A`). From outside of `ns`, it can be referred to as `ns::A`. - -The main namespace is exceptional. Top level classes and modules defined in the main namespace are just top level classes and modules. - -### Classes and modules reopened in namespace - -In namespaces, builtin classes/modules are visible and can be reopened. Those classes/modules can be reopened using `class` or `module` clauses, and class/module definitions can be changed. - -The changed definitions are visible only in the namespace. In other namespaces, builtin classes/modules and those instances work without changed definitions. - -```ruby -# in foo.rb -class String - BLANK_PATTERN = /\A\s*\z/ - def blank? - self =~ BLANK_PATTERN - end -end - -module Foo - def self.foo = "foo" - - def self.foo_is_blank? - foo.blank? - end -end - -Foo.foo.blank? #=> false -"foo".blank? #=> false - -# in main.rb -ns = Namespace.new -ns.require('foo') - -Foo.foo_is_blank? #=> false (#blank? called in ns) - -Foo.foo.blank? # NoMethodError -"foo".blank? # NoMethodError -String::BLANK_PATTERN # NameError -``` - -The main namespace and `ns` are different namespaces, so monkey patches in main are also invisible in `ns`. - -### Builtin classes and modules - -In the namespace context, "builtin" classes and modules are classes and modules: - -* Accessible without any `require` calls in user scripts -* Defined before any user program start running -* Including classes/modules loaded by `prelude.rb` (including RubyGems `Gem`, for example) - -Hereafter, "builtin classes and modules" will be referred to as just "builtin classes". - -### Builtin classes referred via namespace objects - -Builtin classes in a namespace `ns` can be referred from other namespace. For example, `ns::String` is a valid reference, and `String` and `ns::String` are identical (`String == ns::String`, `String.object_id == ns::String.object_id`). - -`ns::String`-like reference returns just a `String` in the current namespace, so its definition is `String` in the namespace, not in `ns`. - -```ruby -# foo.rb -class String - def self.foo = "foo" -end - -# main.rb -ns = Namespace.new -ns.require('foo') - -ns::String.foo # NoMethodError -``` - -### Class instance variables, class variables, constants - -Builtin classes can have different sets of class instance variables, class variables and constants between namespaces. - -```ruby -# foo.rb -class Array - @v = "foo" - @@v = "_foo_" - V = "FOO" -end - -Array.instance_variable_get(:@v) #=> "foo" -Array.class_variable_get(:@@v) #=> "_foo_" -Array.const_get(:V) #=> "FOO" - -# main.rb -ns = Namespace.new -ns.require('foo') - -Array.instance_variable_get(:@v) #=> nil -Array.class_variable_get(:@@v) # NameError -Array.const_get(:V) # NameError -``` - -### Global variables - -In namespaces, changes on global variables are also isolated in the namespace. Changes on global variables in a namespace are visible/applied only in the namespace. - -```ruby -# foo.rb -$foo = "foo" -$VERBOSE = nil - -puts "This appears: '#{$foo}'" - -# main.rb -p $foo #=> nil -p $VERBOSE #=> false - -ns = Namespace.new -ns.require('foo') # "This appears: 'foo'" - -p $foo #=> nil -p $VERBOSE #=> false -``` - -### Top level constants - -Usually, top level constants are defined as constants of `Object`. In namespaces, top level constants are constants of `Object` in the namespace. And the namespace object `ns`'s constants are strictly equal to constants of `Object`. - -```ruby -# foo.rb -FOO = 100 - -FOO #=> 100 -Object::FOO #=> 100 - -# main.rb -ns = Namespace.new -ns.require('foo') - -ns::FOO #=> 100 - -FOO # NameError -Object::FOO # NameError -``` - -### Top level methods - -Top level methods are private instance methods of `Object`, in each namespace. - -```ruby -# foo.rb -def yay = "foo" - -class Foo - def self.say = yay -end - -Foo.say #=> "foo" -yay #=> "foo" - -# main.rb -ns = Namespace.new -ns.require('foo') - -ns.Foo.say #=> "foo" - -yay # NoMethodError -``` - -There is no way to expose top level methods in namespaces to another namespace. -(See "Expose top level methods as a method of the namespace object" in "Discussions" section below) - -### Namespace scopes - -Namespace works in file scope. One `.rb` file runs in a single namespace. - -Once a file is loaded in a namespace `ns`, all methods/procs defined/created in the file run in `ns`. - -## Implementation details - -#### Object Shapes - -Once builtin classes are copied and modified in namespaces, its instance variable management fallbacks from Object Shapes to a traditional iv table (st_table) because RClass stores the shape in its `flags`, not in `rb_classext_t`. - -#### Size of RClass and rb_classext_t - -Namespace requires to move some fields from RClass to `rb_classext_t`, then the size of RClass and `rb_classext_t` is now larger than `4 * RVALUE_SIZE`. It's against the expectation of [Variable Width Allocation](https://rubykaigi.org/2021-takeout/presentations/peterzhu2118.html). - -Now the `STATIC_ASSERT` to check the size is commented-out. (See "Minimize the size of RClass and rb_classext_t" in "Discussion" section below) - -#### ISeq inline method/constant cache - -As described above in "Namespace scopes", an ".rb" file runs in a namespace. So method/constant resolution will be done in a namespace consistently. - -That means ISeq inline caches work well even with namespaces. Otherwise, it's a bug. - -#### Method call global cache (gccct) - -`rb_funcall()` C function refers to the global cc cache table (gccct), and the cache key is calculated with the current namespace. - -So, `rb_funcall()` calls have a performance penalty when namespace is enabled. - -#### Current namespace and loading namespace - -The current namespace is the namespace that the executing code is in. `Namespace.current` returns the current namespace object. - -The loading namespace is an internally managed namespace to determine the namespace to load newly required/loaded files. For example, `ns` is the loading namespace when `ns.require("foo")` is called. - -## Discussions - -#### Namespace#inspect - -Currently, `Namespace#inspect` returns values like `"#<Namespace:0x00000001083a5660>"`. This results in the very redundant and poorly visible classpath outside the namespace. - -```ruby -# foo.rb -class C; end - -# main.rb -ns = Namespace.new -ns.require('foo') - -p ns::C # "#<Namespace:0x00000001083a5660>::C" -``` - -And currently, if a namespace is assigned to a constant `NS1`, the classpath output will be `NS1::C`. But the namespace object can be brought to another namespace and the constant `NS1` in the namespace is something different. So the constant-based classpath for namespace is not safe basically. - -So we should find a better format to show namespaces. Options are: - -* `NS1::C` (only when this namespace is created and assigned to NS1 in the current namespace) -* `#<Namespace:user:1083a5660>::C` (with namespace type and without preceding 0) -* or something else - -#### Namespace#eval - -Testing namespace features needs to create files to be loaded in namespaces. It's not easy nor casual. - -If `Namespace` class has an instance method `#eval` to evaluate code in the namespace, it can be helpful. - -#### More builtin methods written in Ruby - -If namespace is enabled by default, builtin methods can be written in Ruby because it can't be overridden by users' monkey patches. Builtin Ruby methods can be JIT-ed, and it could bring performance reward. - -#### Monkey patching methods called by builtin methods - -Builtin methods sometimes call other builtin methods. For example, `Hash#map` calls `Hash#each` to retrieve entries to be mapped. Without namespace, Ruby users can overwrite `Hash#each` and expect the behavior change of `Hash#map` as a result. - -But with namespaces, `Hash#map` runs in the root namespace. Ruby users can define `Hash#each` only in user namespaces, so users cannot change `Hash#map`'s behavior in this case. To achieve it, users should override both`Hash#map` and `Hash#each` (or only `Hash#map`). - -It is a breaking change. - -It's an option to change the behavior of methods in the root namespace to refer to definitions in user namespaces. But if we do so, that means we can't proceed with "More builtin methods written in Ruby". - -#### Context of \$LOAD\_PATH and \$LOADED\_FEATURES - -Global variables `$LOAD_PATH` and `$LOADED_FEATURES` control `require` method behaviors. So those namespaces are determined by the loading namespace instead of the current namespace. - -This could potentially conflict with the user's expectations. We should find the solution. - -#### Expose top level methods as a method of the namespace object - -Currently, top level methods in namespaces are not accessible from outside of the namespace. But there might be a use case to call other namespace's top level methods. - -#### Split root and builtin namespace - -NOTE: "builtin" namespace is a different one from the "builtin" namespace in the current implementation - -Currently, the single "root" namespace is the source of classext CoW. And also, the "root" namespace can load additional files after starting main script evaluation by calling methods which contain lines like `require "openssl"`. - -That means, user namespaces can have different sets of definitions according to when it is created. - -``` -[root] - | - |----[main] - | - |(require "openssl" called in root) - | - |----[ns1] having OpenSSL - | - |(remove_const called for OpenSSL in root) - | - |----[ns2] without OpenSSL -``` - -This could cause unexpected behavior differences between user namespaces. It should NOT be a problem because user scripts which refer to `OpenSSL` should call `require "openssl"` by themselves. -But in the worst case, a script (without `require "openssl"`) runs well in `ns1`, but doesn't run in `ns2`. This situation looks like a "random failure" to users. - -An option possible to prevent this situation is to have "root" and "builtin" namespaces. - -* root - * The namespace for the Ruby process bootstrap, then the source of CoW - * After starting the main namespace, no code runs in this namespace -* builtin - * The namespace copied from the root namespace at the same time with "main" - * Methods and procs defined in the "root" namespace run in this namespace - * Classes and modules required will be loaded in this namespace - -This design realizes a consistent source of namespace CoW. - -#### Separate cc_tbl and callable_m_tbl, cvc_tbl for less classext CoW - -The fields of `rb_classext_t` contains several cache(-like) data, `cc_tbl`(callcache table), `callable_m_tbl`(table of resolved complemented methods) and `cvc_tbl`(class variable cache table). - -The classext CoW is triggered when the contents of `rb_classext_t` are changed, including `cc_tbl`, `callable_m_tbl`, and `cvc_tbl`. But those three tables are changed by just calling methods or referring class variables. So, currently, classext CoW is triggered much more times than the original expectation. - -If we can move those three tables outside of `rb_classext_t`, the number of copied `rb_classext_t` will be much less than the current implementation. - -#### Object Shapes per namespace - -Now the classext CoW requires RClass and `rb_classext_t` to fallback its instance variable management from Object Shapes to the traditional `st_table`. It may have a performance penalty. - -If we can apply Object Shapes on `rb_classext_t` instead of `RClass`, per-namespace classext can have its own shapes, and it may be able to avoid the performance penalty. - -#### Minimize the size of RClass and rb_classext_t - -As described in "Size of RClass and rb_classext_t" section above, the size of RClass and `rb_classext_t` is currently larger than `4 * RVALUE_SIZE` (`20 * VALUE_SIZE`). Now the size is `23 * VALUE_SIZE + 7 bits`. - -The fields possibly removed from `rb_classext_t` are: - -* `cc_tbl`, `callable_m_tbl`, `cvc_tbl` (See the section "Separate cc_tbl and callable_m_tbl, cvc_tbl for less classext CoW" above) -* `ns_super_subclasses`, `module_super_subclasses` - * `RCLASSEXT_SUBCLASSES(RCLASS_EXT_PRIME(RCLASSEXT_SUPER(klass)))->ns_subclasses` can replace it - * These fields are used only in GC, how's the actual performance benefit? - -If we can move or remove those fields, the size satisfies the assertion (`<= 4 * RVALUE_SIZE`). diff --git a/doc/packed_data.rdoc b/doc/packed_data.rdoc deleted file mode 100644 index b33eed58e7..0000000000 --- a/doc/packed_data.rdoc +++ /dev/null @@ -1,706 +0,0 @@ -= Packed \Data - -== Quick Reference - -These tables summarize the directives for packing and unpacking. - -=== For Integers - - Directive | Meaning - --------------|--------------------------------------------------------------- - C | 8-bit unsigned (unsigned char) - S | 16-bit unsigned, native endian (uint16_t) - L | 32-bit unsigned, native endian (uint32_t) - Q | 64-bit unsigned, native endian (uint64_t) - J | pointer width unsigned, native endian (uintptr_t) - - c | 8-bit signed (signed char) - s | 16-bit signed, native endian (int16_t) - l | 32-bit signed, native endian (int32_t) - q | 64-bit signed, native endian (int64_t) - j | pointer width signed, native endian (intptr_t) - - S_ S! | unsigned short, native endian - I I_ I! | unsigned int, native endian - L_ L! | unsigned long, native endian - Q_ Q! | unsigned long long, native endian - | (raises ArgumentError if the platform has no long long type) - J! | uintptr_t, native endian (same with J) - - s_ s! | signed short, native endian - i i_ i! | signed int, native endian - l_ l! | signed long, native endian - q_ q! | signed long long, native endian - | (raises ArgumentError if the platform has no long long type) - j! | intptr_t, native endian (same with j) - - S> s> S!> s!> | each the same as the directive without >, but big endian - L> l> L!> l!> | S> is the same as n - I!> i!> | L> is the same as N - Q> q> Q!> q!> | - J> j> J!> j!> | - - S< s< S!< s!< | each the same as the directive without <, but little endian - L< l< L!< l!< | S< is the same as v - I!< i!< | L< is the same as V - Q< q< Q!< q!< | - J< j< J!< j!< | - - n | 16-bit unsigned, network (big-endian) byte order - N | 32-bit unsigned, network (big-endian) byte order - v | 16-bit unsigned, VAX (little-endian) byte order - V | 32-bit unsigned, VAX (little-endian) byte order - - U | UTF-8 character - w | BER-compressed integer - -=== For Floats - - Directive | Meaning - ----------|-------------------------------------------------- - D d | double-precision, native format - F f | single-precision, native format - E | double-precision, little-endian byte order - e | single-precision, little-endian byte order - G | double-precision, network (big-endian) byte order - g | single-precision, network (big-endian) byte order - -=== For Strings - - Directive | Meaning - ----------|----------------------------------------------------------------- - A | arbitrary binary string (remove trailing nulls and ASCII spaces) - a | arbitrary binary string - Z | null-terminated string - B | bit string (MSB first) - b | bit string (LSB first) - H | hex string (high nibble first) - h | hex string (low nibble first) - u | UU-encoded string - M | quoted-printable, MIME encoding (see RFC2045) - m | base64 encoded string (RFC 2045) (default) - | (base64 encoded string (RFC 4648) if followed by 0) - P | pointer to a structure (fixed-length string) - p | pointer to a null-terminated string - -=== Additional Directives for Packing - - Directive | Meaning - ----------|---------------------------------------------------------------- - @ | moves to absolute position - X | back up a byte - x | null byte - -=== Additional Directives for Unpacking - - Directive | Meaning - ----------|---------------------------------------------------------------- - @ | skip to the offset given by the length argument - X | skip backward one byte - x | skip forward one byte - -== Packing and Unpacking - -Certain Ruby core methods deal with packing and unpacking data: - -- Method Array#pack: - Formats each element in array +self+ into a binary string; - returns that string. -- Method String#unpack: - Extracts data from string +self+, - forming objects that become the elements of a new array; - returns that array. -- Method String#unpack1: - Does the same, but unpacks and returns only the first extracted object. - -Each of these methods accepts a string +template+, -consisting of zero or more _directive_ characters, -each followed by zero or more _modifier_ characters. - -Examples (directive <tt>'C'</tt> specifies 'unsigned character'): - - [65].pack('C') # => "A" # One element, one directive. - [65, 66].pack('CC') # => "AB" # Two elements, two directives. - [65, 66].pack('C') # => "A" # Extra element is ignored. - [65].pack('') # => "" # No directives. - [65].pack('CC') # Extra directive raises ArgumentError. - - 'A'.unpack('C') # => [65] # One character, one directive. - 'AB'.unpack('CC') # => [65, 66] # Two characters, two directives. - 'AB'.unpack('C') # => [65] # Extra character is ignored. - 'A'.unpack('CC') # => [65, nil] # Extra directive generates nil. - 'AB'.unpack('') # => [] # No directives. - -The string +template+ may contain any mixture of valid directives -(directive <tt>'c'</tt> specifies 'signed character'): - - [65, -1].pack('cC') # => "A\xFF" - "A\xFF".unpack('cC') # => [65, 255] - -The string +template+ may contain whitespace (which is ignored) -and comments, each of which begins with character <tt>'#'</tt> -and continues up to and including the next following newline: - - [0,1].pack(" C #foo \n C ") # => "\x00\x01" - "\0\1".unpack(" C #foo \n C ") # => [0, 1] - -Any directive may be followed by either of these modifiers: - -- <tt>'*'</tt> - The directive is to be applied as many times as needed: - - [65, 66].pack('C*') # => "AB" - 'AB'.unpack('C*') # => [65, 66] - -- Integer +count+ - The directive is to be applied +count+ times: - - [65, 66].pack('C2') # => "AB" - [65, 66].pack('C3') # Raises ArgumentError. - 'AB'.unpack('C2') # => [65, 66] - 'AB'.unpack('C3') # => [65, 66, nil] - - Note: Directives in <tt>%w[A a Z m]</tt> use +count+ differently; - see {String Directives}[rdoc-ref:packed_data.rdoc@String+Directives]. - -If elements don't fit the provided directive, only least significant bits are encoded: - - [257].pack("C").unpack("C") # => [1] - -== Packing Method - -Method Array#pack accepts optional keyword argument -+buffer+ that specifies the target string (instead of a new string): - - [65, 66].pack('C*', buffer: 'foo') # => "fooAB" - -The method can accept a block: - - # Packed string is passed to the block. - [65, 66].pack('C*') {|s| p s } # => "AB" - -== Unpacking Methods - -Methods String#unpack and String#unpack1 each accept -an optional keyword argument +offset+ that specifies an offset -into the string: - - 'ABC'.unpack('C*', offset: 1) # => [66, 67] - 'ABC'.unpack1('C*', offset: 1) # => 66 - -Both methods can accept a block: - - # Each unpacked object is passed to the block. - ret = [] - "ABCD".unpack("C*") {|c| ret << c } - ret # => [65, 66, 67, 68] - - # The single unpacked object is passed to the block. - 'AB'.unpack1('C*') {|ele| p ele } # => 65 - -== \Integer Directives - -Each integer directive specifies the packing or unpacking -for one element in the input or output array. - -=== 8-Bit \Integer Directives - -- <tt>'c'</tt> - 8-bit signed integer - (like C <tt>signed char</tt>): - - [0, 1, 255].pack('c*') # => "\x00\x01\xFF" - s = [0, 1, -1].pack('c*') # => "\x00\x01\xFF" - s.unpack('c*') # => [0, 1, -1] - -- <tt>'C'</tt> - 8-bit unsigned integer - (like C <tt>unsigned char</tt>): - - [0, 1, 255].pack('C*') # => "\x00\x01\xFF" - s = [0, 1, -1].pack('C*') # => "\x00\x01\xFF" - s.unpack('C*') # => [0, 1, 255] - -=== 16-Bit \Integer Directives - -- <tt>'s'</tt> - 16-bit signed integer, native-endian - (like C <tt>int16_t</tt>): - - [513, -514].pack('s*') # => "\x01\x02\xFE\xFD" - s = [513, 65022].pack('s*') # => "\x01\x02\xFE\xFD" - s.unpack('s*') # => [513, -514] - -- <tt>'S'</tt> - 16-bit unsigned integer, native-endian - (like C <tt>uint16_t</tt>): - - [513, -514].pack('S*') # => "\x01\x02\xFE\xFD" - s = [513, 65022].pack('S*') # => "\x01\x02\xFE\xFD" - s.unpack('S*') # => [513, 65022] - -- <tt>'n'</tt> - 16-bit network integer, big-endian: - - s = [0, 1, -1, 32767, -32768, 65535].pack('n*') - # => "\x00\x00\x00\x01\xFF\xFF\x7F\xFF\x80\x00\xFF\xFF" - s.unpack('n*') - # => [0, 1, 65535, 32767, 32768, 65535] - -- <tt>'v'</tt> - 16-bit VAX integer, little-endian: - - s = [0, 1, -1, 32767, -32768, 65535].pack('v*') - # => "\x00\x00\x01\x00\xFF\xFF\xFF\x7F\x00\x80\xFF\xFF" - s.unpack('v*') - # => [0, 1, 65535, 32767, 32768, 65535] - -=== 32-Bit \Integer Directives - -- <tt>'l'</tt> - 32-bit signed integer, native-endian - (like C <tt>int32_t</tt>): - - s = [67305985, -50462977].pack('l*') - # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" - s.unpack('l*') - # => [67305985, -50462977] - -- <tt>'L'</tt> - 32-bit unsigned integer, native-endian - (like C <tt>uint32_t</tt>): - - s = [67305985, 4244504319].pack('L*') - # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" - s.unpack('L*') - # => [67305985, 4244504319] - -- <tt>'N'</tt> - 32-bit network integer, big-endian: - - s = [0,1,-1].pack('N*') - # => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF" - s.unpack('N*') - # => [0, 1, 4294967295] - -- <tt>'V'</tt> - 32-bit VAX integer, little-endian: - - s = [0,1,-1].pack('V*') - # => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF" - s.unpack('v*') - # => [0, 0, 1, 0, 65535, 65535] - -=== 64-Bit \Integer Directives - -- <tt>'q'</tt> - 64-bit signed integer, native-endian - (like C <tt>int64_t</tt>): - - s = [578437695752307201, -506097522914230529].pack('q*') - # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" - s.unpack('q*') - # => [578437695752307201, -506097522914230529] - -- <tt>'Q'</tt> - 64-bit unsigned integer, native-endian - (like C <tt>uint64_t</tt>): - - s = [578437695752307201, 17940646550795321087].pack('Q*') - # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8" - s.unpack('Q*') - # => [578437695752307201, 17940646550795321087] - -=== Platform-Dependent \Integer Directives - -- <tt>'i'</tt> - Platform-dependent width signed integer, - native-endian (like C <tt>int</tt>): - - s = [67305985, -50462977].pack('i*') - # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" - s.unpack('i*') - # => [67305985, -50462977] - -- <tt>'I'</tt> - Platform-dependent width unsigned integer, - native-endian (like C <tt>unsigned int</tt>): - - s = [67305985, -50462977].pack('I*') - # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC" - s.unpack('I*') - # => [67305985, 4244504319] - -- <tt>'j'</tt> - Pointer-width signed integer, native-endian - (like C <tt>intptr_t</tt>): - - s = [67305985, -50462977].pack('j*') - # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF" - s.unpack('j*') - # => [67305985, -50462977] - -- <tt>'J'</tt> - Pointer-width unsigned integer, native-endian - (like C <tt>uintptr_t</tt>): - - s = [67305985, 4244504319].pack('J*') - # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00" - s.unpack('J*') - # => [67305985, 4244504319] - -=== Other \Integer Directives - -- <tt>'U'</tt> - UTF-8 character: - - s = [4194304].pack('U*') - # => "\xF8\x90\x80\x80\x80" - s.unpack('U*') - # => [4194304] - -- <tt>'w'</tt> - BER-encoded integer - (see {BER encoding}[https://en.wikipedia.org/wiki/X.690#BER_encoding]): - - s = [1073741823].pack('w*') - # => "\x83\xFF\xFF\xFF\x7F" - s.unpack('w*') - # => [1073741823] - -=== Modifiers for \Integer Directives - -For the following directives, <tt>'!'</tt> or <tt>'_'</tt> modifiers may be -suffixed as underlying platform’s native size. - -- <tt>'i'</tt>, <tt>'I'</tt> - C <tt>int</tt>, always native size. -- <tt>'s'</tt>, <tt>'S'</tt> - C <tt>short</tt>. -- <tt>'l'</tt>, <tt>'L'</tt> - C <tt>long</tt>. -- <tt>'q'</tt>, <tt>'Q'</tt> - C <tt>long long</tt>, if available. -- <tt>'j'</tt>, <tt>'J'</tt> - C <tt>intptr_t</tt>, always native size. - -Native size modifiers are silently ignored for always native size directives. - -The endian modifiers also may be suffixed in the directives above: - -- <tt>'>'</tt> - Big-endian. -- <tt>'<'</tt> - Little-endian. - -== \Float Directives - -Each float directive specifies the packing or unpacking -for one element in the input or output array. - -=== Single-Precision \Float Directives - -- <tt>'F'</tt> or <tt>'f'</tt> - Native format: - - s = [3.0].pack('F') # => "\x00\x00@@" - s.unpack('F') # => [3.0] - -- <tt>'e'</tt> - Little-endian: - - s = [3.0].pack('e') # => "\x00\x00@@" - s.unpack('e') # => [3.0] - -- <tt>'g'</tt> - Big-endian: - - s = [3.0].pack('g') # => "@@\x00\x00" - s.unpack('g') # => [3.0] - -=== Double-Precision \Float Directives - -- <tt>'D'</tt> or <tt>'d'</tt> - Native format: - - s = [3.0].pack('D') # => "\x00\x00\x00\x00\x00\x00\b@" - s.unpack('D') # => [3.0] - -- <tt>'E'</tt> - Little-endian: - - s = [3.0].pack('E') # => "\x00\x00\x00\x00\x00\x00\b@" - s.unpack('E') # => [3.0] - -- <tt>'G'</tt> - Big-endian: - - s = [3.0].pack('G') # => "@\b\x00\x00\x00\x00\x00\x00" - s.unpack('G') # => [3.0] - -A float directive may be infinity or not-a-number: - - inf = 1.0/0.0 # => Infinity - [inf].pack('f') # => "\x00\x00\x80\x7F" - "\x00\x00\x80\x7F".unpack('f') # => [Infinity] - - nan = inf/inf # => NaN - [nan].pack('f') # => "\x00\x00\xC0\x7F" - "\x00\x00\xC0\x7F".unpack('f') # => [NaN] - -== \String Directives - -Each string directive specifies the packing or unpacking -for one byte in the input or output string. - -=== Binary \String Directives - -- <tt>'A'</tt> - Arbitrary binary string (space padded; count is width); - +nil+ is treated as the empty string: - - ['foo'].pack('A') # => "f" - ['foo'].pack('A*') # => "foo" - ['foo'].pack('A2') # => "fo" - ['foo'].pack('A4') # => "foo " - [nil].pack('A') # => " " - [nil].pack('A*') # => "" - [nil].pack('A2') # => " " - [nil].pack('A4') # => " " - - "foo\0".unpack('A') # => ["f"] - "foo\0".unpack('A4') # => ["foo"] - "foo\0bar".unpack('A10') # => ["foo\x00bar"] # Reads past "\0". - "foo ".unpack('A') # => ["f"] - "foo ".unpack('A4') # => ["foo"] - "foo".unpack('A4') # => ["foo"] - - russian = "\u{442 435 441 442}" # => "тест" - russian.size # => 4 - russian.bytesize # => 8 - [russian].pack('A') # => "\xD1" - [russian].pack('A*') # => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82" - russian.unpack('A') # => ["\xD1"] - russian.unpack('A2') # => ["\xD1\x82"] - russian.unpack('A4') # => ["\xD1\x82\xD0\xB5"] - russian.unpack('A*') # => ["\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"] - -- <tt>'a'</tt> - Arbitrary binary string (null padded; count is width): - - ["foo"].pack('a') # => "f" - ["foo"].pack('a*') # => "foo" - ["foo"].pack('a2') # => "fo" - ["foo\0"].pack('a4') # => "foo\x00" - [nil].pack('a') # => "\x00" - [nil].pack('a*') # => "" - [nil].pack('a2') # => "\x00\x00" - [nil].pack('a4') # => "\x00\x00\x00\x00" - - "foo\0".unpack('a') # => ["f"] - "foo\0".unpack('a4') # => ["foo\x00"] - "foo ".unpack('a4') # => ["foo "] - "foo".unpack('a4') # => ["foo"] - "foo\0bar".unpack('a4') # => ["foo\x00"] # Reads past "\0". - -- <tt>'Z'</tt> - Same as <tt>'a'</tt>, - except that null is added or ignored with <tt>'*'</tt>: - - ["foo"].pack('Z*') # => "foo\x00" - [nil].pack('Z*') # => "\x00" - - "foo\0".unpack('Z*') # => ["foo"] - "foo".unpack('Z*') # => ["foo"] - "foo\0bar".unpack('Z*') # => ["foo"] # Does not read past "\0". - -=== Bit \String Directives - -- <tt>'B'</tt> - Bit string (high byte first): - - ['11111111' + '00000000'].pack('B*') # => "\xFF\x00" - ['10000000' + '01000000'].pack('B*') # => "\x80@" - - ['1'].pack('B0') # => "" - ['1'].pack('B1') # => "\x80" - ['1'].pack('B2') # => "\x80\x00" - ['1'].pack('B3') # => "\x80\x00" - ['1'].pack('B4') # => "\x80\x00\x00" - ['1'].pack('B5') # => "\x80\x00\x00" - ['1'].pack('B6') # => "\x80\x00\x00\x00" - - "\xff\x00".unpack("B*") # => ["1111111100000000"] - "\x01\x02".unpack("B*") # => ["0000000100000010"] - - "".unpack("B0") # => [""] - "\x80".unpack("B1") # => ["1"] - "\x80".unpack("B2") # => ["10"] - "\x80".unpack("B3") # => ["100"] - -- <tt>'b'</tt> - Bit string (low byte first): - - ['11111111' + '00000000'].pack('b*') # => "\xFF\x00" - ['10000000' + '01000000'].pack('b*') # => "\x01\x02" - - ['1'].pack('b0') # => "" - ['1'].pack('b1') # => "\x01" - ['1'].pack('b2') # => "\x01\x00" - ['1'].pack('b3') # => "\x01\x00" - ['1'].pack('b4') # => "\x01\x00\x00" - ['1'].pack('b5') # => "\x01\x00\x00" - ['1'].pack('b6') # => "\x01\x00\x00\x00" - - "\xff\x00".unpack("b*") # => ["1111111100000000"] - "\x01\x02".unpack("b*") # => ["1000000001000000"] - - "".unpack("b0") # => [""] - "\x01".unpack("b1") # => ["1"] - "\x01".unpack("b2") # => ["10"] - "\x01".unpack("b3") # => ["100"] - -=== Hex \String Directives - -- <tt>'H'</tt> - Hex string (high nibble first): - - ['10ef'].pack('H*') # => "\x10\xEF" - ['10ef'].pack('H0') # => "" - ['10ef'].pack('H3') # => "\x10\xE0" - ['10ef'].pack('H5') # => "\x10\xEF\x00" - - ['fff'].pack('H3') # => "\xFF\xF0" - ['fff'].pack('H4') # => "\xFF\xF0" - ['fff'].pack('H5') # => "\xFF\xF0\x00" - ['fff'].pack('H6') # => "\xFF\xF0\x00" - ['fff'].pack('H7') # => "\xFF\xF0\x00\x00" - ['fff'].pack('H8') # => "\xFF\xF0\x00\x00" - - "\x10\xef".unpack('H*') # => ["10ef"] - "\x10\xef".unpack('H0') # => [""] - "\x10\xef".unpack('H1') # => ["1"] - "\x10\xef".unpack('H2') # => ["10"] - "\x10\xef".unpack('H3') # => ["10e"] - "\x10\xef".unpack('H4') # => ["10ef"] - "\x10\xef".unpack('H5') # => ["10ef"] - -- <tt>'h'</tt> - Hex string (low nibble first): - - ['10ef'].pack('h*') # => "\x01\xFE" - ['10ef'].pack('h0') # => "" - ['10ef'].pack('h3') # => "\x01\x0E" - ['10ef'].pack('h5') # => "\x01\xFE\x00" - - ['fff'].pack('h3') # => "\xFF\x0F" - ['fff'].pack('h4') # => "\xFF\x0F" - ['fff'].pack('h5') # => "\xFF\x0F\x00" - ['fff'].pack('h6') # => "\xFF\x0F\x00" - ['fff'].pack('h7') # => "\xFF\x0F\x00\x00" - ['fff'].pack('h8') # => "\xFF\x0F\x00\x00" - - "\x01\xfe".unpack('h*') # => ["10ef"] - "\x01\xfe".unpack('h0') # => [""] - "\x01\xfe".unpack('h1') # => ["1"] - "\x01\xfe".unpack('h2') # => ["10"] - "\x01\xfe".unpack('h3') # => ["10e"] - "\x01\xfe".unpack('h4') # => ["10ef"] - "\x01\xfe".unpack('h5') # => ["10ef"] - -=== Pointer \String Directives - -- <tt>'P'</tt> - Pointer to a structure (fixed-length string): - - s = ['abc'].pack('P') # => "\xE0O\x7F\xE5\xA1\x01\x00\x00" - s.unpack('P*') # => ["abc"] - ".".unpack("P") # => [] - ("\0" * 8).unpack("P") # => [nil] - [nil].pack("P") # => "\x00\x00\x00\x00\x00\x00\x00\x00" - -- <tt>'p'</tt> - Pointer to a null-terminated string: - - s = ['abc'].pack('p') # => "(\xE4u\xE5\xA1\x01\x00\x00" - s.unpack('p*') # => ["abc"] - ".".unpack("p") # => [] - ("\0" * 8).unpack("p") # => [nil] - [nil].pack("p") # => "\x00\x00\x00\x00\x00\x00\x00\x00" - -=== Other \String Directives - -- <tt>'M'</tt> - Quoted printable, MIME encoding; - text mode, but input must use LF and output LF; - (see {RFC 2045}[https://www.ietf.org/rfc/rfc2045.txt]): - - ["a b c\td \ne"].pack('M') # => "a b c\td =\n\ne=\n" - ["\0"].pack('M') # => "=00=\n" - - ["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n" # => true - ("a"*73+"=\na=\n").unpack('M') == ["a"*74] # => true - (("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023] # => true - - "a b c\td =\n\ne=\n".unpack('M') # => ["a b c\td \ne"] - "=00=\n".unpack('M') # => ["\x00"] - - "pre=31=32=33after".unpack('M') # => ["pre123after"] - "pre=\nafter".unpack('M') # => ["preafter"] - "pre=\r\nafter".unpack('M') # => ["preafter"] - "pre=".unpack('M') # => ["pre="] - "pre=\r".unpack('M') # => ["pre=\r"] - "pre=hoge".unpack('M') # => ["pre=hoge"] - "pre==31after".unpack('M') # => ["pre==31after"] - "pre===31after".unpack('M') # => ["pre===31after"] - -- <tt>'m'</tt> - Base64 encoded string; - count specifies input bytes between each newline, - rounded down to nearest multiple of 3; - if count is zero, no newlines are added; - (see {RFC 4648}[https://www.ietf.org/rfc/rfc4648.txt]): - - [""].pack('m') # => "" - ["\0"].pack('m') # => "AA==\n" - ["\0\0"].pack('m') # => "AAA=\n" - ["\0\0\0"].pack('m') # => "AAAA\n" - ["\377"].pack('m') # => "/w==\n" - ["\377\377"].pack('m') # => "//8=\n" - ["\377\377\377"].pack('m') # => "////\n" - - "".unpack('m') # => [""] - "AA==\n".unpack('m') # => ["\x00"] - "AAA=\n".unpack('m') # => ["\x00\x00"] - "AAAA\n".unpack('m') # => ["\x00\x00\x00"] - "/w==\n".unpack('m') # => ["\xFF"] - "//8=\n".unpack('m') # => ["\xFF\xFF"] - "////\n".unpack('m') # => ["\xFF\xFF\xFF"] - "A\n".unpack('m') # => [""] - "AA\n".unpack('m') # => ["\x00"] - "AA=\n".unpack('m') # => ["\x00"] - "AAA\n".unpack('m') # => ["\x00\x00"] - - [""].pack('m0') # => "" - ["\0"].pack('m0') # => "AA==" - ["\0\0"].pack('m0') # => "AAA=" - ["\0\0\0"].pack('m0') # => "AAAA" - ["\377"].pack('m0') # => "/w==" - ["\377\377"].pack('m0') # => "//8=" - ["\377\377\377"].pack('m0') # => "////" - - "".unpack('m0') # => [""] - "AA==".unpack('m0') # => ["\x00"] - "AAA=".unpack('m0') # => ["\x00\x00"] - "AAAA".unpack('m0') # => ["\x00\x00\x00"] - "/w==".unpack('m0') # => ["\xFF"] - "//8=".unpack('m0') # => ["\xFF\xFF"] - "////".unpack('m0') # => ["\xFF\xFF\xFF"] - -- <tt>'u'</tt> - UU-encoded string: - - [""].pack("u") # => "" - ["a"].pack("u") # => "!80``\n" - ["aaa"].pack("u") # => "#86%A\n" - - "".unpack("u") # => [""] - "#86)C\n".unpack("u") # => ["abc"] - -== Offset Directives - -- <tt>'@'</tt> - Begin packing at the given byte offset; - for packing, null fill or shrink if necessary: - - [1, 2].pack("C@0C") # => "\x02" - [1, 2].pack("C@1C") # => "\x01\x02" - [1, 2].pack("C@5C") # => "\x01\x00\x00\x00\x00\x02" - [*1..5].pack("CCCC@2C") # => "\x01\x02\x05" - - For unpacking, cannot to move to outside the string: - - "\x01\x00\x00\x02".unpack("C@3C") # => [1, 2] - "\x00".unpack("@1C") # => [nil] - "\x00".unpack("@2C") # Raises ArgumentError. - -- <tt>'X'</tt> - For packing, shrink for the given byte offset: - - [0, 1, 2].pack("CCXC") # => "\x00\x02" - [0, 1, 2].pack("CCX2C") # => "\x02" - - For unpacking; rewind unpacking position for the given byte offset: - - "\x00\x02".unpack("CCXC") # => [0, 2, 2] - - Cannot to move to outside the string: - - [0, 1, 2].pack("CCX3C") # Raises ArgumentError. - "\x00\x02".unpack("CX3C") # Raises ArgumentError. - -- <tt>'x'</tt> - Begin packing at after the given byte offset; - for packing, null fill if necessary: - - [].pack("x0") # => "" - [].pack("x") # => "\x00" - [].pack("x8") # => "\x00\x00\x00\x00\x00\x00\x00\x00" - - For unpacking, cannot to move to outside the string: - - "\x00\x00\x02".unpack("CxC") # => [0, 2] - "\x00\x00\x02".unpack("x3C") # => [nil] - "\x00\x00\x02".unpack("x4C") # Raises ArgumentError diff --git a/doc/ractor.md b/doc/ractor.md deleted file mode 100644 index 224e36934b..0000000000 --- a/doc/ractor.md +++ /dev/null @@ -1,772 +0,0 @@ -# Ractor - Ruby's Actor-like concurrent abstraction - -Ractor is designed to provide a parallel execution feature of Ruby without thread-safety concerns. - -## Summary - -### Multiple Ractors in an interpreter process - -You can make multiple Ractors and they run in parallel. - -* `Ractor.new{ expr }` creates a new Ractor and `expr` is run in parallel on a parallel computer. -* Interpreter invokes with the first Ractor (called *main Ractor*). -* If the main Ractor terminates, all other Ractors receive termination requests, similar to how threads behave. (if main thread (first invoked Thread), Ruby interpreter sends all running threads to terminate execution). -* Each Ractor contains one or more Threads. - * Threads within the same Ractor share a Ractor-wide global lock like GIL (GVL in MRI terminology), so they can't run in parallel (without releasing GVL explicitly in C-level). Threads in different ractors run in parallel. - * The overhead of creating a Ractor is similar to overhead of one Thread creation. - -### Limited sharing between multiple ractors - -Ractors don't share everything, unlike threads. - -* Most objects are *Unshareable objects*, so you don't need to care about thread-safety problems which are caused by sharing. -* Some objects are *Shareable objects*. - * Immutable objects: frozen objects which don't refer to unshareable-objects. - * `i = 123`: `i` is an immutable object. - * `s = "str".freeze`: `s` is an immutable object. - * `a = [1, [2], 3].freeze`: `a` is not an immutable object because `a` refers unshareable-object `[2]` (which is not frozen). - * `h = {c: Object}.freeze`: `h` is an immutable object because `h` refers Symbol `:c` and shareable `Object` class object which is not frozen. - * Class/Module objects - * Special shareable objects - * Ractor object itself. - * And more... - -### Communication between Ractors with `Ractor::Port` - -Ractors communicate with each other and synchronize the execution by message exchanging between Ractors. `Ractor::Port` is provided for this communication. - -```ruby -port = Ractor::Port.new - -Ractor.new port do |port| - # Other ractors can send to the port - port << 42 -end - -port.receive # get a message to the port. Only the creator Ractor can receive from the port -#=> 42 -``` - -Ractors have its own default port and `Ractor#send`, `Ractor.receive` will use it. - -### Copy & Move semantics to send messages - -To send unshareable objects as messages, objects are copied or moved. - -* Copy: use deep-copy. -* Move: move membership. - * Sender can not access the moved object after moving the object. - * Guarantee that at least only 1 Ractor can access the object. - -### Thread-safety - -Ractor helps to write a thread-safe concurrent program, but we can make thread-unsafe programs with Ractors. - -* GOOD: Sharing limitation - * Most objects are unshareable, so we can't make data-racy and race-conditional programs. - * Shareable objects are protected by an interpreter or locking mechanism. -* BAD: Class/Module can violate this assumption - * To make it compatible with old behavior, classes and modules can introduce data-race and so on. - * Ruby programmers should take care if they modify class/module objects on multi Ractor programs. -* BAD: Ractor can't solve all thread-safety problems - * There are several blocking operations (waiting send) so you can make a program which has dead-lock and live-lock issues. - * Some kind of shareable objects can introduce transactions (STM, for example). However, misusing transactions will generate inconsistent state. - -Without Ractor, we need to trace all state-mutations to debug thread-safety issues. -With Ractor, you can concentrate on suspicious code which are shared with Ractors. - -## Creation and termination - -### `Ractor.new` - -* `Ractor.new{ expr }` generates another Ractor. - -```ruby -# Ractor.new with a block creates new Ractor -r = Ractor.new do - # This block will be run in parallel with other ractors -end - -# You can name a Ractor with `name:` argument. -r = Ractor.new name: 'test-name' do -end - -# and Ractor#name returns its name. -r.name #=> 'test-name' -``` - -### Given block isolation - -The Ractor executes given `expr` in a given block. -Given block will be isolated from outer scope by the `Proc#isolate` method (not exposed yet for Ruby users). To prevent sharing unshareable objects between ractors, block outer-variables, `self` and other information are isolated. - -`Proc#isolate` is called at Ractor creation time (when `Ractor.new` is called). If given Proc object is not able to isolate because of outer variables and so on, an error will be raised. - -```ruby -begin - a = true - r = Ractor.new do - a #=> ArgumentError because this block accesses `a`. - end - r.join # see later -rescue ArgumentError -end -``` - -* The `self` of the given block is the `Ractor` object itself. - -```ruby -r = Ractor.new do - p self.class #=> Ractor - self.object_id -end -r.value == self.object_id #=> false -``` - -Passed arguments to `Ractor.new()` becomes block parameters for the given block. However, an interpreter does not pass the parameter object references, but send them as messages (see below for details). - -```ruby -r = Ractor.new 'ok' do |msg| - msg #=> 'ok' -end -r.value #=> 'ok' -``` - -```ruby -# almost similar to the last example -r = Ractor.new do - msg = Ractor.receive - msg -end -r.send 'ok' -r.value #=> 'ok' -``` - -### An execution result of given block - -Return value of the given block becomes an outgoing message (see below for details). - -```ruby -r = Ractor.new do - 'ok' -end -r.value #=> `ok` -``` - -Error in the given block will be propagated to the receiver of an outgoing message. - -```ruby -r = Ractor.new do - raise 'ok' # exception will be transferred to the receiver -end - -begin - r.value -rescue Ractor::RemoteError => e - e.cause.class #=> RuntimeError - e.cause.message #=> 'ok' - e.ractor #=> r -end -``` - -## Communication between Ractors - -Communication between Ractors is achieved by sending and receiving messages. There are two ways to communicate with each other. - -* (1) Message sending/receiving via `Ractor::Port` -* (2) Using shareable container objects - * Ractor::TVar gem ([ko1/ractor-tvar](https://github.com/ko1/ractor-tvar)) - * more? - -Users can control program execution timing with (1), but should not control with (2) (only manage as critical section). - -For message sending and receiving, there are two types of APIs: push type and pull type. - -* (1) send/receive via `Ractor::Port`. - * `Ractor::Port#send(obj)` (`Ractor::Port#<<(obj)` is an alias) send a message to the port. Ports are connected to the infinite size incoming queue so `Ractor::Port#send` will never block. - * `Ractor::Port#receive` dequeue a message from its own incoming queue. If the incoming queue is empty, `Ractor::Port#receive` calling will block the execution of a thread. -* `Ractor.select()` can wait for the success of `Ractor::Port#receive`. -* You can close `Ractor::Port` by `Ractor::Port#close` only by the creator Ractor of the port. - * If the port is closed, you can't `send` to the port. If `Ractor::Port#receive` is blocked for the closed port, then it will raise an exception. - * When a Ractor is terminated, the Ractor's ports are closed. -* There are 3 ways to send an object as a message - * (1) Send a reference: Sending a shareable object, send only a reference to the object (fast) - * (2) Copy an object: Sending an unshareable object by copying an object deeply (slow). Note that you can not send an object which does not support deep copy. Some `T_DATA` objects (objects whose class is defined in a C extension, such as `StringIO`) are not supported. - * (3) Move an object: Sending an unshareable object reference with a membership. Sender Ractor can not access moved objects anymore (raise an exception) after moving it. Current implementation makes new object as a moved object for receiver Ractor and copies references of sending object to moved object. `T_DATA` objects are not supported. - * You can choose "Copy" and "Move" by the `move:` keyword, `Ractor#send(obj, move: true/false)` and `Ractor.yield(obj, move: true/false)` (default is `false` (COPY)). - -### Wait for multiple Ractors with `Ractor.select` - -You can wait multiple Ractor port's receiving. -The return value of `Ractor.select()` is `[port, msg]` where `port` is a ready port and `msg` is received message. - -To make convenient, `Ractor.select` can also accept Ractors to wait the termination of Ractors. -The return value of `Ractor.select()` is `[r, msg]` where `r` is a terminated Ractor and `msg` is the value of Ractor's block. - -Wait for a single ractor (same as `Ractor#value`): - -```ruby -r1 = Ractor.new{'r1'} - -r, obj = Ractor.select(r1) -r == r1 and obj == 'r1' #=> true -``` - -Waiting for two ractors: - -```ruby -r1 = Ractor.new{'r1'} -r2 = Ractor.new{'r2'} -rs = [r1, r2] -as = [] - -# Wait for r1 or r2's Ractor.yield -r, obj = Ractor.select(*rs) -rs.delete(r) -as << obj - -# Second try (rs only contain not-closed ractors) -r, obj = Ractor.select(*rs) -rs.delete(r) -as << obj -as.sort == ['r1', 'r2'] #=> true -``` - -TODO: Current `Ractor.select()` has the same issue of `select(2)`, so this interface should be refined. - -TODO: `select` syntax of go-language uses round-robin technique to make fair scheduling. Now `Ractor.select()` doesn't use it. - -### Closing Ractor's ports - -* `Ractor::Port#close` close the ports (similar to `Queue#close`). - * `port.send(obj)` where `port` is closed, will raise an exception. - * When the queue connected to the port is empty and port is closed, `Ractor::Port#receive` raises an exception. If the queue is not empty, it dequeues an object without exceptions. -* When a Ractor terminates, the ports are closed automatically. - -Example (try to get a result from closed Ractor): - -```ruby -r = Ractor.new do - 'finish' -end -r.join # success (wait for the termination) -r.value # success (will return 'finish') - -# the first Ractor which success the `Ractor#value` can get the result -Ractor.new r do |r| - r.value #=> Ractor::Error -end -``` - -Example (try to send to closed (terminated) Ractor): - -```ruby -r = Ractor.new do -end - -r.join # wait terminate - -begin - r.send(1) -rescue Ractor::ClosedError - 'ok' -else - 'ng' -end -``` - -### Send a message by copying - -`Ractor::Port#send(obj)` copy `obj` deeply if `obj` is an unshareable object. - -```ruby -obj = 'str'.dup -r = Ractor.new obj do |msg| - # return received msg's object_id - msg.object_id -end - -obj.object_id == r.value #=> false -``` - -Some objects are not supported to copy the value, and raise an exception. - -```ruby -obj = Thread.new{} -begin - Ractor.new obj do |msg| - msg - end -rescue TypeError => e - e.message #=> #<TypeError: allocator undefined for Thread> -else - 'ng' # unreachable here -end -``` - -### Send a message by moving - -`Ractor::Port#send(obj, move: true)` moves `obj` to the destination Ractor. -If the source Ractor touches the moved object (for example, call the method like `obj.foo()`), it will be an error. - -```ruby -# move with Ractor#send -r = Ractor.new do - obj = Ractor.receive - obj << ' world' -end - -str = 'hello' -r.send str, move: true -modified = r.value #=> 'hello world' - -# str is moved, and accessing str from this Ractor is prohibited - -begin - # Error because it touches moved str. - str << ' exception' # raise Ractor::MovedError -rescue Ractor::MovedError - modified #=> 'hello world' -else - raise 'unreachable' -end -``` - -Some objects are not supported to move, and an exception will be raised. - -```ruby -r = Ractor.new do - Ractor.receive -end - -r.send(Thread.new{}, move: true) #=> allocator undefined for Thread (TypeError) -``` - -To achieve the access prohibition for moved objects, _class replacement_ technique is used to implement it. - -### Shareable objects - -The following objects are shareable. - -* Immutable objects - * Small integers, some symbols, `true`, `false`, `nil` (a.k.a. `SPECIAL_CONST_P()` objects in internal) - * Frozen native objects - * Numeric objects: `Float`, `Complex`, `Rational`, big integers (`T_BIGNUM` in internal) - * All Symbols. - * Frozen `String` and `Regexp` objects (their instance variables should refer only shareable objects) -* Class, Module objects (`T_CLASS`, `T_MODULE` and `T_ICLASS` in internal) -* `Ractor` and other special objects which care about synchronization. - -Implementation: Now shareable objects (`RVALUE`) have `FL_SHAREABLE` flag. This flag can be added lazily. - -To make shareable objects, `Ractor.make_shareable(obj)` method is provided. In this case, try to make shareable by freezing `obj` and recursively traversable objects. This method accepts `copy:` keyword (default value is false).`Ractor.make_shareable(obj, copy: true)` tries to make a deep copy of `obj` and make the copied object shareable. - -## Language changes to isolate unshareable objects between Ractors - -To isolate unshareable objects between Ractors, we introduced additional language semantics on multi-Ractor Ruby programs. - -Note that without using Ractors, these additional semantics is not needed (100% compatible with Ruby 2). - -### Global variables - -Only the main Ractor (a Ractor created at starting of interpreter) can access global variables. - -```ruby -$gv = 1 -r = Ractor.new do - $gv -end - -begin - r.join -rescue Ractor::RemoteError => e - e.cause.message #=> 'can not access global variables from non-main Ractors' -end -``` - -Note that some special global variables, such as `$stdin`, `$stdout` and `$stderr` are Ractor-local. See [[Bug #17268]](https://bugs.ruby-lang.org/issues/17268) for more details. - -### Instance variables of shareable objects - -Instance variables of classes/modules can be get from non-main Ractors if the referring values are shareable objects. - -```ruby -class C - @iv = 1 -end - -p Ractor.new do - class C - @iv - end -end.value #=> 1 -``` - -Otherwise, only the main Ractor can access instance variables of shareable objects. - -```ruby -class C - @iv = [] # unshareable object -end - -Ractor.new do - class C - begin - p @iv - rescue Ractor::IsolationError - p $!.message - #=> "can not get unshareable values from instance variables of classes/modules from non-main Ractors" - end - - begin - @iv = 42 - rescue Ractor::IsolationError - p $!.message - #=> "can not set instance variables of classes/modules by non-main Ractors" - end - end -end.join -``` - - - -```ruby -shared = Ractor.new{} -shared.instance_variable_set(:@iv, 'str') - -r = Ractor.new shared do |shared| - p shared.instance_variable_get(:@iv) -end - -begin - r.join -rescue Ractor::RemoteError => e - e.cause.message #=> can not access instance variables of shareable objects from non-main Ractors (Ractor::IsolationError) -end -``` - -Note that instance variables for class/module objects are also prohibited on Ractors. - -### Class variables - -Only the main Ractor can access class variables. - -```ruby -class C - @@cv = 'str' -end - -r = Ractor.new do - class C - p @@cv - end -end - - -begin - r.join -rescue => e - e.class #=> Ractor::IsolationError -end -``` - -### Constants - -Only the main Ractor can read constants which refer to the unshareable object. - -```ruby -class C - CONST = 'str' -end -r = Ractor.new do - C::CONST -end -begin - r.join -rescue => e - e.class #=> Ractor::IsolationError -end -``` - -Only the main Ractor can define constants which refer to the unshareable object. - -```ruby -class C -end -r = Ractor.new do - C::CONST = 'str' -end -begin - r.join -rescue => e - e.class #=> Ractor::IsolationError -end -``` - -To make multi-ractor supported library, the constants should only refer shareable objects. - -```ruby -TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} -``` - -In this case, `TABLE` references an unshareable Hash object. So that other ractors can not refer `TABLE` constant. To make it shareable, we can use `Ractor.make_shareable()` like that. - -```ruby -TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) -``` - -To make it easy, Ruby 3.0 introduced new `shareable_constant_value` Directive. - -```ruby -# shareable_constant_value: literal - -TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'} -#=> Same as: TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} ) -``` - -`shareable_constant_value` directive accepts the following modes (descriptions use the example: `CONST = expr`): - -* none: Do nothing. Same as: `CONST = expr` -* literal: - * if `expr` consists of literals, replaced to `CONST = Ractor.make_shareable(expr)`. - * otherwise: replaced to `CONST = expr.tap{|o| raise unless Ractor.shareable?(o)}`. -* experimental_everything: replaced to `CONST = Ractor.make_shareable(expr)`. -* experimental_copy: replaced to `CONST = Ractor.make_shareable(expr, copy: true)`. - -Except the `none` mode (default), it is guaranteed that the assigned constants refer to only shareable objects. - -See [doc/syntax/comments.rdoc](syntax/comments.rdoc) for more details. - -## Implementation note - -* Each Ractor has its own thread, it means each Ractor has at least 1 native thread. -* Each Ractor has its own ID (`rb_ractor_t::pub::id`). - * On debug mode, all unshareable objects are labeled with current Ractor's id, and it is checked to detect unshareable object leak (access an object from different Ractor) in VM. - -## Examples - -### Traditional Ring example in Actor-model - -```ruby -RN = 1_000 -CR = Ractor.current - -r = Ractor.new do - p Ractor.receive - CR << :fin -end - -RN.times{ - r = Ractor.new r do |next_r| - next_r << Ractor.receive - end -} - -p :setup_ok -r << 1 -p Ractor.receive -``` - -### Fork-join - -```ruby -def fib n - if n < 2 - 1 - else - fib(n-2) + fib(n-1) - end -end - -RN = 10 -rs = (1..RN).map do |i| - Ractor.new i do |i| - [i, fib(i)] - end -end - -until rs.empty? - r, v = Ractor.select(*rs) - rs.delete r - p answer: v -end -``` - -### Worker pool - -(1) One ractor has a pool - -```ruby -require 'prime' - -N = 1000 -RN = 10 - -# make RN workers -workers = (1..RN).map do - Ractor.new do |; result_port| - loop do - n, result_port = Ractor.receive - result_port << [n, n.prime?, Ractor.current] - end - end -end - -result_port = Ractor::Port.new -results = [] - -(1..N).each do |i| - if workers.empty? - # receive a result - n, result, w = result_port.receive - results << [n, result] - else - w = workers.pop - end - - # send a task to the idle worker ractor - w << [i, result_port] -end - -# receive a result -while results.size != N - n, result, _w = result_port.receive - results << [n, result] -end - -pp results.sort_by{|n, result| n} -``` - -### Pipeline - -```ruby -# pipeline with send/receive - -r3 = Ractor.new Ractor.current do |cr| - cr.send Ractor.receive + 'r3' -end - -r2 = Ractor.new r3 do |r3| - r3.send Ractor.receive + 'r2' -end - -r1 = Ractor.new r2 do |r2| - r2.send Ractor.receive + 'r1' -end - -r1 << 'r0' -p Ractor.receive #=> "r0r1r2r3" -``` - -### Supervise - -```ruby -# ring example again - -r = Ractor.current -(1..10).map{|i| - r = Ractor.new r, i do |r, i| - r.send Ractor.receive + "r#{i}" - end -} - -r.send "r0" -p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" -``` - -```ruby -# ring example with an error - -r = Ractor.current -rs = (1..10).map{|i| - r = Ractor.new r, i do |r, i| - loop do - msg = Ractor.receive - raise if /e/ =~ msg - r.send msg + "r#{i}" - end - end -} - -r.send "r0" -p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" -r.send "r0" -p Ractor.select(*rs, Ractor.current) #=> [:receive, "r0r10r9r8r7r6r5r4r3r2r1"] -r.send "e0" -p Ractor.select(*rs, Ractor.current) -#=> -# <Thread:0x000056262de28bd8 run> terminated with exception (report_on_exception is true): -# Traceback (most recent call last): -# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' -# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' -# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception -# Traceback (most recent call last): -# 2: from /home/ko1/src/ruby/trunk/test.rb:7:in `block (2 levels) in <main>' -# 1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop' -# /home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception -# 1: from /home/ko1/src/ruby/trunk/test.rb:21:in `<main>' -# <internal:ractor>:69:in `select': thrown by remote Ractor. (Ractor::RemoteError) -``` - -```ruby -# resend non-error message - -r = Ractor.current -rs = (1..10).map{|i| - r = Ractor.new r, i do |r, i| - loop do - msg = Ractor.receive - raise if /e/ =~ msg - r.send msg + "r#{i}" - end - end -} - -r.send "r0" -p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1" -r.send "r0" -p Ractor.select(*rs, Ractor.current) -[:receive, "r0r10r9r8r7r6r5r4r3r2r1"] -msg = 'e0' -begin - r.send msg - p Ractor.select(*rs, Ractor.current) -rescue Ractor::RemoteError - msg = 'r0' - retry -end - -#=> <internal:ractor>:100:in `send': The incoming-port is already closed (Ractor::ClosedError) -# because r == r[-1] is terminated. -``` - -```ruby -# ring example with supervisor and re-start - -def make_ractor r, i - Ractor.new r, i do |r, i| - loop do - msg = Ractor.receive - raise if /e/ =~ msg - r.send msg + "r#{i}" - end - end -end - -r = Ractor.current -rs = (1..10).map{|i| - r = make_ractor(r, i) -} - -msg = 'e0' # error causing message -begin - r.send msg - p Ractor.select(*rs, Ractor.current) -rescue Ractor::RemoteError - r = rs[-1] = make_ractor(rs[-2], rs.size-1) - msg = 'x0' - retry -end - -#=> [:receive, "x0r9r9r8r7r6r5r4r3r2r1"] -``` diff --git a/doc/reline/face.md b/doc/reline/face.md deleted file mode 100644 index 1fa916123b..0000000000 --- a/doc/reline/face.md +++ /dev/null @@ -1,111 +0,0 @@ -# Face - -With the `Reline::Face` class, you can modify the text color and text decorations in your terminal emulator. -This is primarily used to customize the appearance of the method completion dialog in IRB. - -## Usage - -### ex: Change the background color of the completion dialog cyan to blue - -```ruby -Reline::Face.config(:completion_dialog) do |conf| - conf.define :default, foreground: :white, background: :blue - # ^^^^^ `:cyan` by default - conf.define :enhanced, foreground: :white, background: :magenta - conf.define :scrollbar, foreground: :white, background: :blue -end -``` - -If you provide the above code to an IRB session in some way, you can apply the configuration. -It's generally done by writing it in `.irbrc`. - -Regarding `.irbrc`, please refer to the following link: [https://docs.ruby-lang.org/en/master/IRB.html](https://docs.ruby-lang.org/en/master/IRB.html) - -## Available parameters - -`Reline::Face` internally creates SGR (Select Graphic Rendition) code according to the block parameter of `Reline::Face.config` method. - -| Key | Value | SGR Code (numeric part following "\e[")| -|:------------|:------------------|-----:| -| :foreground | :black | 30 | -| | :red | 31 | -| | :green | 32 | -| | :yellow | 33 | -| | :blue | 34 | -| | :magenta | 35 | -| | :cyan | 36 | -| | :white | 37 | -| | :bright_black | 90 | -| | :gray | 90 | -| | :bright_red | 91 | -| | :bright_green | 92 | -| | :bright_yellow | 93 | -| | :bright_blue | 94 | -| | :bright_magenta | 95 | -| | :bright_cyan | 96 | -| | :bright_white | 97 | -| :background | :black | 40 | -| | :red | 41 | -| | :green | 42 | -| | :yellow | 43 | -| | :blue | 44 | -| | :magenta | 45 | -| | :cyan | 46 | -| | :white | 47 | -| | :bright_black | 100 | -| | :gray | 100 | -| | :bright_red | 101 | -| | :bright_green | 102 | -| | :bright_yellow | 103 | -| | :bright_blue | 104 | -| | :bright_magenta | 105 | -| | :bright_cyan | 106 | -| | :bright_white | 107 | -| :style | :reset | 0 | -| | :bold | 1 | -| | :faint | 2 | -| | :italicized | 3 | -| | :underlined | 4 | -| | :slowly_blinking | 5 | -| | :blinking | 5 | -| | :rapidly_blinking | 6 | -| | :negative | 7 | -| | :concealed | 8 | -| | :crossed_out | 9 | - -- The value for `:style` can be both a Symbol and an Array - ```ruby - # Single symbol - conf.define :default, style: :bold - # Array - conf.define :default, style: [:bold, :negative] - ``` -- The availability of specific SGR codes depends on your terminal emulator -- You can specify a hex color code to `:foreground` and `:background` color like `foreground: "#FF1020"`. Its availability also depends on your terminal emulator - -## Debugging - -You can see the current Face configuration by `Reline::Face.configs` method - -Example: - -```ruby -irb(main):001:0> Reline::Face.configs -=> -{:default=> - {:default=>{:style=>:reset, :escape_sequence=>"\e[0m"}, - :enhanced=>{:style=>:reset, :escape_sequence=>"\e[0m"}, - :scrollbar=>{:style=>:reset, :escape_sequence=>"\e[0m"}}, - :completion_dialog=> - {:default=>{:foreground=>:white, :background=>:cyan, :escape_sequence=>"\e[0m\e[37;46m"}, - :enhanced=>{:foreground=>:white, :background=>:magenta, :escape_sequence=>"\e[0m\e[37;45m"}, - :scrollbar=>{:foreground=>:white, :background=>:cyan, :escape_sequence=>"\e[0m\e[37;46m"}}} -``` - -## 256-Color and TrueColor - -Reline will automatically detect if your terminal emulator supports truecolor with `ENV['COLORTERM] in 'truecolor' | '24bit'`. When this env is not set, Reline will fallback to 256-color. -If your terminal emulator supports truecolor but does not set COLORTERM env, add this line to `.irbrc`. -```ruby -Reline::Face.force_truecolor -``` diff --git a/doc/security/command_injection.rdoc b/doc/security/command_injection.rdoc new file mode 100644 index 0000000000..d46e42f7be --- /dev/null +++ b/doc/security/command_injection.rdoc @@ -0,0 +1,15 @@ += Command Injection + +Some Ruby core methods accept string data +that includes text to be executed as a system command. + +They should not be called with unknown or unsanitized commands. + +These methods include: + +- Kernel.exec +- Kernel.spawn +- Kernel.system +- {\`command` (backtick method)}[rdoc-ref:Kernel#`] + (also called by the expression <tt>%x[command]</tt>). +- IO.popen (when called with other than <tt>"-"</tt>). diff --git a/doc/security.rdoc b/doc/security/security.rdoc index af9970d336..af9970d336 100644 --- a/doc/security.rdoc +++ b/doc/security/security.rdoc diff --git a/doc/standard_library.md b/doc/standard_library.md index 0c48ac0cdd..782db10c37 100644 --- a/doc/standard_library.md +++ b/doc/standard_library.md @@ -11,6 +11,7 @@ of each. - `MakeMakefile`: A module used to generate a Makefile for C extensions - `RbConfig`: Information about your Ruby configuration and build - `Gem`: A package management framework for Ruby +- `Pathname`: Representation of the name of a file or directory on the filesystem. Pathname is a core class, but only methods that depend on other libraries are provided as a library. ## Extensions @@ -58,7 +59,6 @@ of each. - Time ([GitHub][time]): Extends the Time class with methods for parsing and conversion - Timeout ([GitHub][timeout]): Auto-terminate potentially long-running operations in Ruby - TmpDir ([GitHub][tmpdir]): Extends the Dir class to manage the OS temporary file path -- TSort ([GitHub][tsort]): Topological sorting using Tarjan's algorithm - UN ([GitHub][un]): Utilities to replace common UNIX commands - URI ([GitHub][uri]): A Ruby module providing support for Uniform Resource Identifiers - YAML ([GitHub][yaml]): The Ruby client library for the Psych YAML implementation @@ -75,7 +75,6 @@ of each. - IO#wait ([GitHub][io-wait]): Provides the feature for waiting until IO is readable or writable without blocking. - JSON ([GitHub][json]): Implements JavaScript Object Notation for Ruby - OpenSSL ([GitHub][openssl]): Provides SSL, TLS, and general-purpose cryptography for Ruby -- Pathname ([GitHub][pathname]): Representation of the name of a file or directory on the filesystem - Psych ([GitHub][psych]): A YAML parser and emitter for Ruby - StringIO ([GitHub][stringio]): Pseudo-I/O on String objects - StringScanner ([GitHub][strscan]): Provides lexical scanning operations on a String @@ -96,9 +95,7 @@ of each. - [test-unit]: A compatibility layer for MiniTest - [rexml][rexml-doc] ([GitHub][rexml]): An XML toolkit for Ruby - [rss]: A family of libraries supporting various XML-based "feeds" -- [net-ftp]: Support for the File Transfer Protocol - [net-imap]: Ruby client API for the Internet Message Access Protocol -- [net-pop]: Ruby client library for POP3 - [net-smtp]: Simple Mail Transfer Protocol client library for Ruby - [matrix]: Represents a mathematical matrix - [prime]: Prime numbers and factorization library @@ -126,6 +123,8 @@ of each. - [reline][reline-doc] ([GitHub][reline]): GNU Readline and Editline in a pure Ruby implementation - [readline]: Wrapper for the Readline extension and Reline - [fiddle]: A libffi wrapper for Ruby +- [tsort]: Topological sorting using Tarjan's algorithm +- [win32-registry]: Registry accessor library for the Windows platform. ## Tools @@ -164,10 +163,8 @@ of each. [matrix]: https://github.com/ruby/matrix [minitest]: https://github.com/seattlerb/minitest [mutex_m]: https://github.com/ruby/mutex_m -[net-ftp]: https://github.com/ruby/net-ftp [net-http]: https://github.com/ruby/net-http [net-imap]: https://github.com/ruby/net-imap -[net-pop]: https://github.com/ruby/net-pop [net-smtp]: https://github.com/ruby/net-smtp [nkf]: https://github.com/ruby/nkf [observer]: https://github.com/ruby/observer @@ -212,6 +209,7 @@ of each. [uri]: https://github.com/ruby/uri [weakref]: https://github.com/ruby/weakref [win32ole]: https://github.com/ruby/win32ole +[win32-registry]: https://github.com/ruby/win32-registry [yaml]: https://github.com/ruby/yaml [zlib]: https://github.com/ruby/zlib diff --git a/doc/string.rb b/doc/string.rb index 9ed97d49f6..4dac94e93a 100644 --- a/doc/string.rb +++ b/doc/string.rb @@ -159,156 +159,17 @@ # - #rstrip, #rstrip!: Strip trailing whitespace. # - #strip, #strip!: Strip leading and trailing whitespace. # -# == +String+ Slices -# -# A _slice_ of a string is a substring selected by certain criteria. -# -# These instance methods utilize slicing: -# -# - String#[] (aliased as String#slice): Returns a slice copied from +self+. -# - String#[]=: Mutates +self+ with the slice replaced. -# - String#slice!: Mutates +self+ with the slice removed and returns the removed slice. -# -# Each of the above methods takes arguments that determine the slice -# to be copied or replaced. -# -# The arguments have several forms. -# For a string +string+, the forms are: -# -# - <tt>string[index]</tt> -# - <tt>string[start, length]</tt> -# - <tt>string[range]</tt> -# - <tt>string[regexp, capture = 0]</tt> -# - <tt>string[substring]</tt> -# -# <b><tt>string[index]</tt></b> -# -# When a non-negative integer argument +index+ is given, -# the slice is the 1-character substring found in +self+ at character offset +index+: -# -# 'bar'[0] # => "b" -# 'bar'[2] # => "r" -# 'bar'[20] # => nil -# 'тест'[2] # => "с" -# 'こんにちは'[4] # => "は" -# -# When a negative integer +index+ is given, -# the slice begins at the offset given by counting backward from the end of +self+: -# -# 'bar'[-3] # => "b" -# 'bar'[-1] # => "r" -# 'bar'[-20] # => nil -# -# <b><tt>string[start, length]</tt></b> -# -# When non-negative integer arguments +start+ and +length+ are given, -# the slice begins at character offset +start+, if it exists, -# and continues for +length+ characters, if available: -# -# 'foo'[0, 2] # => "fo" -# 'тест'[1, 2] # => "ес" -# 'こんにちは'[2, 2] # => "にち" -# # Zero length. -# 'foo'[2, 0] # => "" -# # Length not entirely available. -# 'foo'[1, 200] # => "oo" -# # Start out of range. -# 'foo'[4, 2] # => nil -# -# Special case: if +start+ equals the length of +self+, -# the slice is a new empty string: -# -# 'foo'[3, 2] # => "" -# 'foo'[3, 200] # => "" -# -# When a negative +start+ and non-negative +length+ are given, -# the slice begins by counting backward from the end of +self+, -# and continues for +length+ characters, if available: -# -# 'foo'[-2, 2] # => "oo" -# 'foo'[-2, 200] # => "oo" -# # Start out of range. -# 'foo'[-4, 2] # => nil -# -# When a negative +length+ is given, there is no slice: -# -# 'foo'[1, -1] # => nil -# 'foo'[-2, -1] # => nil -# -# <b><tt>string[range]</tt></b> -# -# When a Range argument +range+ is given, -# it creates a substring of +string+ using the indices in +range+. -# The slice is then determined as above: -# -# 'foo'[0..1] # => "fo" -# 'foo'[0, 2] # => "fo" -# -# 'foo'[2...2] # => "" -# 'foo'[2, 0] # => "" -# -# 'foo'[1..200] # => "oo" -# 'foo'[1, 200] # => "oo" -# -# 'foo'[4..5] # => nil -# 'foo'[4, 2] # => nil -# -# 'foo'[-4..-3] # => nil -# 'foo'[-4, 2] # => nil -# -# 'foo'[3..4] # => "" -# 'foo'[3, 2] # => "" -# -# 'foo'[-2..-1] # => "oo" -# 'foo'[-2, 2] # => "oo" -# -# 'foo'[-2..197] # => "oo" -# 'foo'[-2, 200] # => "oo" -# -# <b><tt>string[regexp, capture = 0]</tt></b> -# -# When the Regexp argument +regexp+ is given, -# and the +capture+ argument is <tt>0</tt>, -# the slice is the first matching substring found in +self+: -# -# 'foo'[/o/] # => "o" -# 'foo'[/x/] # => nil -# s = 'hello there' -# s[/[aeiou](.)\1/] # => "ell" -# s[/[aeiou](.)\1/, 0] # => "ell" -# -# If the argument +capture+ is provided and not <tt>0</tt>, -# it should be either a capture group index (integer) -# or a capture group name (String or Symbol); -# the slice is the specified capture (see Regexp@Groups and Captures): -# -# s = 'hello there' -# s[/[aeiou](.)\1/, 1] # => "l" -# s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] # => "l" -# s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, :vowel] # => "e" -# -# If an invalid capture group index is given, there is no slice. -# If an invalid capture group name is given, +IndexError+ is raised. -# -# <b><tt>string[substring]</tt></b> -# -# When the single +String+ argument +substring+ is given, -# it returns the substring from +self+ if found, otherwise +nil+: -# -# 'foo'['oo'] # => "oo" -# 'foo'['xx'] # => nil -# # == What's Here # # First, what's elsewhere. Class +String+: # -# - Inherits from the {Object class}[rdoc-ref:Object@What-27s+Here]. -# - Includes the {Comparable module}[rdoc-ref:Comparable@What-27s+Here]. +# - Inherits from the {Object class}[rdoc-ref:Object@Whats+Here]. +# - Includes the {Comparable module}[rdoc-ref:Comparable@Whats+Here]. # # Here, class +String+ provides methods that are useful for: # # - {Creating a \String}[rdoc-ref:String@Creating+a+String]. -# - {Freezing/Unfreezing a \String}[rdoc-ref:String@Freezing-2FUnfreezing]. +# - {Freezing/Unfreezing a \String}[rdoc-ref:String@FreezingUnfreezing]. # - {Querying a \String}[rdoc-ref:String@Querying]. # - {Comparing Strings}[rdoc-ref:String@Comparing]. # - {Modifying a \String}[rdoc-ref:String@Modifying]. @@ -333,10 +194,10 @@ # # _Counts_ # -# - #length (aliased as #size): Returns the count of characters (not bytes). -# - #empty?: Returns whether the length of +self+ is zero. # - #bytesize: Returns the count of bytes. # - #count: Returns the count of substrings matching given strings. +# - #empty?: Returns whether the length of +self+ is zero. +# - #length (aliased as #size): Returns the count of characters (not bytes). # # _Substrings_ # @@ -387,6 +248,7 @@ # - #<<: Returns +self+ concatenated with a given string or integer. # - #append_as_bytes: Returns +self+ concatenated with strings without performing any # encoding validation or conversion. +# - #prepend: Prefixes to +self+ the concatenation of given other strings. # # _Substitution_ # @@ -396,7 +258,7 @@ # - #gsub!: Replaces each substring that matches a given pattern with a given replacement string; # returns +self+ if any changes, +nil+ otherwise. # - #succ! (aliased as #next!): Returns +self+ modified to become its own successor. -# - #initialize_copy (aliased as #replace): Returns +self+ with its entire content replaced by a given string. +# - #replace: Returns +self+ with its entire content replaced by a given string. # - #reverse!: Returns +self+ with its characters in reverse order. # - #setbyte: Sets the byte at a given integer offset to a given value; returns the argument. # - #tr!: Replaces specified characters in +self+ with specified replacement characters; @@ -447,7 +309,6 @@ # - #+: Returns the concatenation of +self+ and a given other string. # - #center: Returns a copy of +self+, centered by specified padding. # - #concat: Returns the concatenation of +self+ with given other strings. -# - #prepend: Returns the concatenation of a given other string with +self+. # - #ljust: Returns a copy of +self+ of a given length, right-padded with a given other string. # - #rjust: Returns a copy of +self+ of a given length, left-padded with a given other string. # @@ -461,8 +322,7 @@ # _Substitution_ # # - #dump: Returns a printable version of +self+, enclosed in double-quotes. -# - #undump: Returns a copy of +self+ with all <tt>\xNN</tt> notations replaced by <tt>\uNNNN</tt> notations -# and all escaped characters unescaped. +# - #undump: Inverse of #dump; returns a copy of +self+ with changes of the kinds made by #dump "undone." # - #sub: Returns a copy of +self+ with the first substring matching a given pattern # replaced with a given replacement string. # - #gsub: Returns a copy of +self+ with each substring that matches a given pattern @@ -538,8 +398,10 @@ # - #hex: Returns the integer value of the leading characters, interpreted as hexadecimal digits. # - #oct: Returns the integer value of the leading characters, interpreted as octal digits. # - #ord: Returns the integer ordinal of the first character in +self+. +# - #to_c: Returns the complex value of leading characters, interpreted as a complex number. # - #to_i: Returns the integer value of leading characters, interpreted as an integer. # - #to_f: Returns the floating-point value of leading characters, interpreted as a floating-point number. +# - #to_r: Returns the rational value of leading characters, interpreted as a rational. # # <em>Strings and Symbols</em> # diff --git a/doc/string/aref.rdoc b/doc/string/aref.rdoc new file mode 100644 index 0000000000..a9ab8857bc --- /dev/null +++ b/doc/string/aref.rdoc @@ -0,0 +1,96 @@ +Returns the substring of +self+ specified by the arguments. + +<b>Form <tt>self[offset]</tt></b> + +With non-negative integer argument +offset+ given, +returns the 1-character substring found in self at character offset +offset+: + + 'hello'[0] # => "h" + 'hello'[4] # => "o" + 'hello'[5] # => nil + 'こんにちは'[4] # => "は" + +With negative integer argument +offset+ given, +counts backward from the end of +self+: + + 'hello'[-1] # => "o" + 'hello'[-5] # => "h" + 'hello'[-6] # => nil + +<b>Form <tt>self[offset, size]</tt></b> + +With integer arguments +offset+ and +size+ given, +returns a substring of size +size+ characters (as available) +beginning at character offset specified by +offset+. + +If argument +offset+ is non-negative, +the offset is +offset+: + + 'hello'[0, 1] # => "h" + 'hello'[0, 5] # => "hello" + 'hello'[0, 6] # => "hello" + 'hello'[2, 3] # => "llo" + 'hello'[2, 0] # => "" + 'hello'[2, -1] # => nil + +If argument +offset+ is negative, +counts backward from the end of +self+: + + 'hello'[-1, 1] # => "o" + 'hello'[-5, 5] # => "hello" + 'hello'[-1, 0] # => "" + 'hello'[-6, 5] # => nil + +Special case: if +offset+ equals the size of +self+, +returns a new empty string: + + 'hello'[5, 3] # => "" + +<b>Form <tt>self[range]</tt></b> + +With Range argument +range+ given, +forms substring <tt>self[range.start, range.size]</tt>: + + 'hello'[0..2] # => "hel" + 'hello'[0, 3] # => "hel" + + 'hello'[0...2] # => "he" + 'hello'[0, 2] # => "he" + + 'hello'[0, 0] # => "" + 'hello'[0...0] # => "" + +<b>Form <tt>self[regexp, capture = 0]</tt></b> + +With Regexp argument +regexp+ given and +capture+ as zero, +searches for a matching substring in +self+; +updates {Regexp-related global variables}[rdoc-ref:Regexp@Global+Variables]: + + 'hello'[/ell/] # => "ell" + 'hello'[/l+/] # => "ll" + 'hello'[//] # => "" + 'hello'[/nosuch/] # => nil + +With +capture+ as a positive integer +n+, +returns the +n+th matched group: + + 'hello'[/(h)(e)(l+)(o)/] # => "hello" + 'hello'[/(h)(e)(l+)(o)/, 1] # => "h" + $1 # => "h" + 'hello'[/(h)(e)(l+)(o)/, 2] # => "e" + $2 # => "e" + 'hello'[/(h)(e)(l+)(o)/, 3] # => "ll" + 'hello'[/(h)(e)(l+)(o)/, 4] # => "o" + 'hello'[/(h)(e)(l+)(o)/, 5] # => nil + +<b>Form <tt>self[substring]</tt></b> + +With string argument +substring+ given, +returns the matching substring of +self+, if found: + + 'hello'['ell'] # => "ell" + 'hello'[''] # => "" + 'hello'['nosuch'] # => nil + 'こんにちは'['んにち'] # => "んにち" + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/aset.rdoc b/doc/string/aset.rdoc new file mode 100644 index 0000000000..98c58b59cc --- /dev/null +++ b/doc/string/aset.rdoc @@ -0,0 +1,179 @@ +Returns +self+ with all, a substring, or none of its contents replaced; +returns the argument +other_string+. + +<b>Form <tt>self[index] = other_string</tt></b> + +With non-negative integer argument +index+ given, +searches for the 1-character substring found in self at character offset index: + + s = 'hello' + s[0] = 'foo' # => "foo" + s # => "fooello" + + s = 'hello' + s[4] = 'foo' # => "foo" + s # => "hellfoo" + + s = 'hello' + s[5] = 'foo' # => "foo" + s # => "hellofoo" + + s = 'hello' + s[6] = 'foo' # Raises IndexError: index 6 out of string. + +With negative integer argument +index+ given, +counts backward from the end of +self+: + + s = 'hello' + s[-1] = 'foo' # => "foo" + s # => "hellfoo" + + s = 'hello' + s[-5] = 'foo' # => "foo" + s # => "fooello" + + s = 'hello' + s[-6] = 'foo' # Raises IndexError: index -6 out of string. + +<b>Form <tt>self[start, length] = other_string</tt></b> + +With integer arguments +start+ and +length+ given, +searches for a substring of size +length+ characters (as available) +beginning at character offset specified by +start+. + +If argument +start+ is non-negative, +the offset is +start+: + + s = 'hello' + s[0, 1] = 'foo' # => "foo" + s # => "fooello" + + s = 'hello' + s[0, 5] = 'foo' # => "foo" + s # => "foo" + + s = 'hello' + s[0, 9] = 'foo' # => "foo" + s # => "foo" + + s = 'hello' + s[2, 0] = 'foo' # => "foo" + s # => "hefoollo" + + s = 'hello' + s[2, -1] = 'foo' # Raises IndexError: negative length -1. + +If argument +start+ is negative, +counts backward from the end of +self+: + + s = 'hello' + s[-1, 1] = 'foo' # => "foo" + s # => "hellfoo" + + s = 'hello' + s[-1, 9] = 'foo' # => "foo" + s # => "hellfoo" + + s = 'hello' + s[-5, 2] = 'foo' # => "foo" + s # => "foollo" + + s = 'hello' + s[-3, 0] = 'foo' # => "foo" + s # => "hefoollo" + + s = 'hello' + s[-6, 2] = 'foo' # Raises IndexError: index -6 out of string. + +Special case: if +start+ equals the length of +self+, +the argument is appended to +self+: + + s = 'hello' + s[5, 3] = 'foo' # => "foo" + s # => "hellofoo" + +<b>Form <tt>self[range] = other_string</tt></b> + +With Range argument +range+ given, +equivalent to <tt>self[range.start, range.size] = other_string</tt>: + + s0 = 'hello' + s1 = 'hello' + s0[0..2] = 'foo' # => "foo" + s1[0, 3] = 'foo' # => "foo" + s0 # => "foolo" + s1 # => "foolo" + + s = 'hello' + s[0...2] = 'foo' # => "foo" + s # => "foollo" + + s = 'hello' + s[0...0] = 'foo' # => "foo" + s # => "foohello" + + s = 'hello' + s[9..10] = 'foo' # Raises RangeError: 9..10 out of range + +<b>Form <tt>self[regexp, capture = 0] = other_string</tt></b> + +With Regexp argument +regexp+ given and +capture+ as zero, +searches for a matching substring in +self+; +updates {Regexp-related global variables}[rdoc-ref:Regexp@Global+Variables]: + + s = 'hello' + s[/l/] = 'L' # => "L" + [$`, $&, $'] # => ["he", "l", "lo"] + s[/eLlo/] = 'owdy' # => "owdy" + [$`, $&, $'] # => ["h", "eLlo", ""] + s[/eLlo/] = 'owdy' # Raises IndexError: regexp not matched. + [$`, $&, $'] # => [nil, nil, nil] + +With +capture+ as a positive integer +n+, +searches for the +n+th matched group: + + s = 'hello' + s[/(h)(e)(l+)(o)/] = 'foo' # => "foo" + [$`, $&, $'] # => ["", "hello", ""] + + s = 'hello' + s[/(h)(e)(l+)(o)/, 1] = 'foo' # => "foo" + s # => "fooello" + [$`, $&, $'] # => ["", "hello", ""] + + s = 'hello' + s[/(h)(e)(l+)(o)/, 2] = 'foo' # => "foo" + s # => "hfoollo" + [$`, $&, $'] # => ["", "hello", ""] + + s = 'hello' + s[/(h)(e)(l+)(o)/, 4] = 'foo' # => "foo" + s # => "hellfoo" + [$`, $&, $'] # => ["", "hello", ""] + + s = 'hello' + # => "hello" + s[/(h)(e)(l+)(o)/, 5] = 'foo # Raises IndexError: index 5 out of regexp. + + s = 'hello' + s[/nosuch/] = 'foo' # Raises IndexError: regexp not matched. + +<b>Form <tt>self[substring] = other_string</tt></b> + +With string argument +substring+ given: + + s = 'hello' + s['l'] = 'foo' # => "foo" + s # => "hefoolo" + + s = 'hello' + s['ll'] = 'foo' # => "foo" + s # => "hefooo" + + s = 'こんにちは' + s['んにち'] = 'foo' # => "foo" + s # => "こfooは" + + s['nosuch'] = 'foo' # Raises IndexError: string not matched. + +Related: see {Modifying}[rdoc-ref:String@Modifying]. diff --git a/doc/string/bytes.rdoc b/doc/string/bytes.rdoc index f4b071f630..16fa8e0bb0 100644 --- a/doc/string/bytes.rdoc +++ b/doc/string/bytes.rdoc @@ -1,8 +1,7 @@ Returns an array of the bytes in +self+: - 'hello'.bytes # => [104, 101, 108, 108, 111] - 'тест'.bytes # => [209, 130, 208, 181, 209, 129, 209, 130] + 'hello'.bytes # => [104, 101, 108, 108, 111] 'こんにちは'.bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] -Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString]. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/bytesize.rdoc b/doc/string/bytesize.rdoc index 5166dd7dc6..8d12a0d454 100644 --- a/doc/string/bytesize.rdoc +++ b/doc/string/bytesize.rdoc @@ -5,9 +5,6 @@ Note that the byte count may be different from the character count (returned by s = 'foo' s.bytesize # => 3 s.size # => 3 - s = 'тест' - s.bytesize # => 8 - s.size # => 4 s = 'こんにちは' s.bytesize # => 15 s.size # => 5 diff --git a/doc/string/bytesplice.rdoc b/doc/string/bytesplice.rdoc index 5689ef4a2b..790f9eb9a0 100644 --- a/doc/string/bytesplice.rdoc +++ b/doc/string/bytesplice.rdoc @@ -20,7 +20,7 @@ And either count may be zero (i.e., specifying an empty string): '0123456789'.bytesplice(0, 0, 'abc') # => "abc0123456789" # Empty target. In the second form, just as in the first, -arugments +offset+ and +length+ determine the target bytes; +arguments +offset+ and +length+ determine the target bytes; argument +str+ _contains_ the source bytes, and the additional arguments +str_offset+ and +str_length+ determine the actual source bytes: @@ -42,7 +42,7 @@ and the source bytes are all of the given +str+: '0123456789'.bytesplice(0...0, 'abc') # => "abc0123456789" # Empty target. In the fourth form, just as in the third, -arugment +range+ determines the target bytes; +argument +range+ determines the target bytes; argument +str+ _contains_ the source bytes, and the additional argument +str_range+ determines the actual source bytes: @@ -63,4 +63,3 @@ and so has character boundaries at offsets 0, 3, 6, 9, 12, and 15. 'こんにちは'.bytesplice(0, 3, 'abc') # => "abcんにちは" 'こんにちは'.bytesplice(1, 3, 'abc') # Raises IndexError. 'こんにちは'.bytesplice(0, 2, 'abc') # Raises IndexError. - diff --git a/doc/string/capitalize.rdoc b/doc/string/capitalize.rdoc new file mode 100644 index 0000000000..3a1a2dcb8b --- /dev/null +++ b/doc/string/capitalize.rdoc @@ -0,0 +1,26 @@ +Returns a string containing the characters in +self+, +each with possibly changed case: + +- The first character made uppercase. +- All other characters are made lowercase. + +Examples: + + 'hello'.capitalize # => "Hello" + 'HELLO'.capitalize # => "Hello" + 'straße'.capitalize # => "Straße" # Lowercase 'ß' not changed. + 'STRAẞE'.capitalize # => "Straße" # Uppercase 'ẞ' downcased to 'ß'. + +Some characters (and some character sets) do not have upcase and downcase versions; +see {Case Mapping}[rdoc-ref:case_mapping.rdoc]: + + s = '1, 2, 3, ...' + s.capitalize == s # => true + s = 'こんにちは' + s.capitalize == s # => true + +The casing is affected by the given +mapping+, +which may be +:ascii+, +:fold+, or +:turkic+; +see {Case Mappings}[rdoc-ref:case_mapping.rdoc@Case+Mappings]. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/center.rdoc b/doc/string/center.rdoc index 343f6ba263..b86c8b5916 100644 --- a/doc/string/center.rdoc +++ b/doc/string/center.rdoc @@ -9,7 +9,6 @@ centered and padded on one or both ends with +pad_string+: 'hello'.center(20, '-|') # => "-|-|-|-hello-|-|-|-|" # Some padding repeated. 'hello'.center(10, 'abcdefg') # => "abhelloabc" # Some padding not used. ' hello '.center(13) # => " hello " - 'тест'.center(10) # => " тест " 'こんにちは'.center(10) # => " こんにちは " # Multi-byte characters. If +size+ is less than or equal to the size of +self+, returns an unpadded copy of +self+: diff --git a/doc/string/chars.rdoc b/doc/string/chars.rdoc index 094384271b..47fb01b43a 100644 --- a/doc/string/chars.rdoc +++ b/doc/string/chars.rdoc @@ -1,8 +1,7 @@ Returns an array of the characters in +self+: 'hello'.chars # => ["h", "e", "l", "l", "o"] - 'тест'.chars # => ["т", "е", "с", "т"] 'こんにちは'.chars # => ["こ", "ん", "に", "ち", "は"] ''.chars # => [] -Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString]. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/chomp.rdoc b/doc/string/chomp.rdoc index 6ec7664f6b..4efff5c291 100644 --- a/doc/string/chomp.rdoc +++ b/doc/string/chomp.rdoc @@ -9,7 +9,6 @@ if they are <tt>"\r"</tt>, <tt>"\n"</tt>, or <tt>"\r\n"</tt> "abc\n".chomp # => "abc" "abc\r\n".chomp # => "abc" "abc\n\r".chomp # => "abc\n" - "тест\r\n".chomp # => "тест" "こんにちは\r\n".chomp # => "こんにちは" When +line_sep+ is <tt>''</tt> (an empty string), diff --git a/doc/string/chop.rdoc b/doc/string/chop.rdoc index 2c48e91129..d818ba467a 100644 --- a/doc/string/chop.rdoc +++ b/doc/string/chop.rdoc @@ -3,13 +3,11 @@ Returns a new string copied from +self+, with trailing characters possibly remov Removes <tt>"\r\n"</tt> if those are the last two characters. "abc\r\n".chop # => "abc" - "тест\r\n".chop # => "тест" "こんにちは\r\n".chop # => "こんにちは" Otherwise removes the last character if it exists. 'abcd'.chop # => "abc" - 'тест'.chop # => "тес" 'こんにちは'.chop # => "こんにち" ''.chop # => "" diff --git a/doc/string/chr.rdoc b/doc/string/chr.rdoc index 1ada3854cb..153d5d71c3 100644 --- a/doc/string/chr.rdoc +++ b/doc/string/chr.rdoc @@ -1,7 +1,6 @@ Returns a string containing the first character of +self+: 'hello'.chr # => "h" - 'тест'.chr # => "т" 'こんにちは'.chr # => "こ" ''.chr # => "" diff --git a/doc/string/codepoints.rdoc b/doc/string/codepoints.rdoc index d9586d2e0b..0ad866389e 100644 --- a/doc/string/codepoints.rdoc +++ b/doc/string/codepoints.rdoc @@ -2,8 +2,7 @@ Returns an array of the codepoints in +self+; each codepoint is the integer value for a character: 'hello'.codepoints # => [104, 101, 108, 108, 111] - 'тест'.codepoints # => [1090, 1077, 1089, 1090] 'こんにちは'.codepoints # => [12371, 12435, 12395, 12385, 12399] ''.codepoints # => [] -Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString]. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/concat.rdoc b/doc/string/concat.rdoc index 2ba0c714af..92ba664b8c 100644 --- a/doc/string/concat.rdoc +++ b/doc/string/concat.rdoc @@ -6,7 +6,6 @@ For each given object +object+ that is an integer, the value is considered a codepoint and converted to a character before concatenation: 'foo'.concat(32, 'bar', 32, 'baz') # => "foo bar baz" # Embeds spaces. - 'те'.concat(1089, 1090) # => "тест" 'こん'.concat(12395, 12385, 12399) # => "こんにちは" Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/count.rdoc b/doc/string/count.rdoc index 092c672d7d..7a3b9f1e21 100644 --- a/doc/string/count.rdoc +++ b/doc/string/count.rdoc @@ -9,10 +9,6 @@ returns the count of instances of that character: s.count('x') # => 0 s.count('') # => 0 - s = 'тест' - s.count('т') # => 2 - s.count('е') # => 1 - s = 'よろしくお願いします' s.count('よ') # => 1 s.count('し') # => 2 diff --git a/doc/string/delete.rdoc b/doc/string/delete.rdoc index e8ff4c0ae4..1827f177e6 100644 --- a/doc/string/delete.rdoc +++ b/doc/string/delete.rdoc @@ -10,10 +10,6 @@ removes all instances of that character: s.delete('x') # => "abracadabra" s.delete('') # => "abracadabra" - s = 'тест' - s.delete('т') # => "ес" - s.delete('е') # => "тст" - s = 'よろしくお願いします' s.delete('よ') # => "ろしくお願いします" s.delete('し') # => "よろくお願います" diff --git a/doc/string/delete_prefix.rdoc b/doc/string/delete_prefix.rdoc index 1135f3d19d..6255e300e3 100644 --- a/doc/string/delete_prefix.rdoc +++ b/doc/string/delete_prefix.rdoc @@ -4,7 +4,6 @@ Returns a copy of +self+ with leading substring +prefix+ removed: 'oof'.delete_prefix('oo') # => "f" 'oof'.delete_prefix('oof') # => "" 'oof'.delete_prefix('x') # => "oof" - 'тест'.delete_prefix('те') # => "ст" 'こんにちは'.delete_prefix('こん') # => "にちは" Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/delete_suffix.rdoc b/doc/string/delete_suffix.rdoc index 2fb70ce012..a4d9a80f85 100644 --- a/doc/string/delete_suffix.rdoc +++ b/doc/string/delete_suffix.rdoc @@ -5,7 +5,6 @@ Returns a copy of +self+ with trailing substring <tt>suffix</tt> removed: 'foo'.delete_suffix('foo') # => "" 'foo'.delete_suffix('f') # => "foo" 'foo'.delete_suffix('x') # => "foo" - 'тест'.delete_suffix('ст') # => "те" 'こんにちは'.delete_suffix('ちは') # => "こんに" Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/downcase.rdoc b/doc/string/downcase.rdoc index 0fb67daaeb..d5fffa037b 100644 --- a/doc/string/downcase.rdoc +++ b/doc/string/downcase.rdoc @@ -1,12 +1,20 @@ Returns a new string containing the downcased characters in +self+: - 'Hello, World!'.downcase # => "hello, world!" - 'ТЕСТ'.downcase # => "тест" - 'よろしくお願いします'.downcase # => "よろしくお願いします" + 'HELLO'.downcase # => "hello" + 'STRAẞE'.downcase # => "straße" + 'ПРИВЕТ'.downcase # => "привет" + 'RubyGems.org'.downcase # => "rubygems.org" -Some characters do not have upcased and downcased versions. +Some characters (and some character sets) do not have upcase and downcase versions; +see {Case Mapping}[rdoc-ref:case_mapping.rdoc]: -The casing may be affected by the given +mapping+; -see {Case Mapping}[rdoc-ref:case_mapping.rdoc]. + s = '1, 2, 3, ...' + s.downcase == s # => true + s = 'こんにちは' + s.downcase == s # => true + +The casing is affected by the given +mapping+, +which may be +:ascii+, +:fold+, or +:turkic+; +see {Case Mappings}[rdoc-ref:case_mapping.rdoc@Case+Mappings]. Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/dump.rdoc b/doc/string/dump.rdoc index a5ab0bb42f..7b688c28a6 100644 --- a/doc/string/dump.rdoc +++ b/doc/string/dump.rdoc @@ -1,52 +1,89 @@ -Returns a printable version of +self+, enclosed in double-quotes: +For an ordinary string, this method, +String#dump+, +returns a printable ASCII-only version of +self+, enclosed in double-quotes. - 'hello'.dump # => "\"hello\"" +For a dumped string, method String#undump is the inverse of +String#dump+; +it returns a "restored" version of +self+, +where all the dumping changes have been undone. -Certain special characters are rendered with escapes: +In the simplest case, the dumped string contains the original string, +enclosed in double-quotes; +this example is done in +irb+ (interactive Ruby), which uses method `inspect` to render the results: - '"'.dump # => "\"\\\"\"" - '\\'.dump # => "\"\\\\\"" + s = 'hello' # => "hello" + s.dump # => "\"hello\"" + s.dump.undump # => "hello" -Non-printing characters are rendered with escapes: +Keep in mind that in the second line above: + +- The outer double-quotes are put on by +inspect+, + and _are_ _not_ part of the output of #dump. +- The inner double-quotes _are_ part of the output of +dump+, + and are escaped by +inspect+ because they are within the outer double-quotes. + +To avoid confusion, we'll use this helper method to omit the outer double-quotes: + + def dump(s) + print "String: ", s, "\n" + print "Dumped: ", s.dump, "\n" + print "Undumped: ", s.dump.undump, "\n" + end + +So that for string <tt>'hello'</tt>, we'll see: + + String: hello + Dumped: "hello" + Undumped: hello + +In a dump, certain special characters are escaped: + + String: " + Dumped: "\"" + Undumped: " + + String: \ + Dumped: "\\" + Undumped: \ + +In a dump, unprintable characters are replaced by printable ones; +the unprintable characters are the whitespace characters (other than space itself); +here we see the ordinals for those characters, together with explanatory text: + + h = { + 7 => 'Alert (BEL)', + 8 => 'Backspace (BS)', + 9 => 'Horizontal tab (HT)', + 10 => 'Linefeed (LF)', + 11 => 'Vertical tab (VT)', + 12 => 'Formfeed (FF)', + 13 => 'Carriage return (CR)' + } + +In this example, the dumped output is printed by method #inspect, +and so contains both outer double-quotes and escaped inner double-quotes: s = '' - s << 7 # Alarm (bell). - s << 8 # Back space. - s << 9 # Horizontal tab. - s << 10 # Line feed. - s << 11 # Vertical tab. - s << 12 # Form feed. - s << 13 # Carriage return. - s # => "\a\b\t\n\v\f\r" - s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\"" - -If +self+ is encoded in UTF-8 and contains Unicode characters, renders Unicode -characters in Unicode escape sequence: - - 'тест'.dump # => "\"\\u0442\\u0435\\u0441\\u0442\"" - 'こんにちは'.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\"" - -If the encoding of +self+ is not ASCII-compatible (i.e., +self.encoding.ascii_compatible?+ -returns +false+), renders all ASCII-compatible bytes as ASCII characters and all -other bytes as hexadecimal. Appends <tt>.dup.force_encoding(\"encoding\")</tt>, where -<tt><encoding></tt> is +self.encoding.name+: - - s = 'hello' - s.encoding # => #<Encoding:UTF-8> - s.dump # => "\"hello\"" - s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x00h\\x00e\\x00l\\x00l\\x00o\".dup.force_encoding(\"UTF-16\")" - s.encode('utf-16le').dump # => "\"h\\x00e\\x00l\\x00l\\x00o\\x00\".dup.force_encoding(\"UTF-16LE\")" - - s = 'тест' - s.encoding # => #<Encoding:UTF-8> - s.dump # => "\"\\u0442\\u0435\\u0441\\u0442\"" - s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x04B\\x045\\x04A\\x04B\".dup.force_encoding(\"UTF-16\")" - s.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")" - - s = 'こんにちは' - s.encoding # => #<Encoding:UTF-8> - s.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\"" - s.encode('utf-16').dump # => "\"\\xFE\\xFF0S0\\x930k0a0o\".dup.force_encoding(\"UTF-16\")" - s.encode('utf-16le').dump # => "\"S0\\x930k0a0o0\".dup.force_encoding(\"UTF-16LE\")" - -Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. + h.keys.each {|i| s << i } # => [7, 8, 9, 10, 11, 12, 13] + s # => "\a\b\t\n\v\f\r" + s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\"" + +If +self+ is encoded in UTF-8 and contains Unicode characters, +each Unicode character is dumped as a Unicode escape sequence: + + String: こんにちは + Dumped: "\u3053\u3093\u306B\u3061\u306F" + Undumped: こんにちは + +If the encoding of +self+ is not ASCII-compatible +(i.e., if <tt>self.encoding.ascii_compatible?</tt> returns +false+), +each ASCII-compatible byte is dumped as an ASCII character, +and all other bytes are dumped as hexadecimal; +also appends <tt>.dup.force_encoding(\"encoding\")</tt>, +where <tt><encoding></tt> is <tt>self.encoding.name</tt>: + + String: hello + Dumped: "\xFE\xFF\x00h\x00e\x00l\x00l\x00o".dup.force_encoding("UTF-16") + Undumped: hello + + String: こんにちは + Dumped: "\xFE\xFF0S0\x930k0a0o".dup.force_encoding("UTF-16") + Undumped: こんにちは diff --git a/doc/string/each_byte.rdoc b/doc/string/each_byte.rdoc index 1f1069863b..642d71e84b 100644 --- a/doc/string/each_byte.rdoc +++ b/doc/string/each_byte.rdoc @@ -5,9 +5,6 @@ returns +self+: 'hello'.each_byte {|byte| a.push(byte) } # Five 1-byte characters. a # => [104, 101, 108, 108, 111] a = [] - 'тест'.each_byte {|byte| a.push(byte) } # Four 2-byte characters. - a # => [209, 130, 208, 181, 209, 129, 209, 130] - a = [] 'こんにちは'.each_byte {|byte| a.push(byte) } # Five 3-byte characters. a # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] diff --git a/doc/string/each_char.rdoc b/doc/string/each_char.rdoc index 5aa85b28ad..2dd56711d3 100644 --- a/doc/string/each_char.rdoc +++ b/doc/string/each_char.rdoc @@ -7,11 +7,6 @@ returns +self+: end a # => ["h", "e", "l", "l", "o"] a = [] - 'тест'.each_char do |char| - a.push(char) - end - a # => ["т", "е", "с", "т"] - a = [] 'こんにちは'.each_char do |char| a.push(char) end diff --git a/doc/string/each_codepoint.rdoc b/doc/string/each_codepoint.rdoc index 0e687082d3..8e4e7545e6 100644 --- a/doc/string/each_codepoint.rdoc +++ b/doc/string/each_codepoint.rdoc @@ -8,11 +8,6 @@ returns +self+: end a # => [104, 101, 108, 108, 111] a = [] - 'тест'.each_codepoint do |codepoint| - a.push(codepoint) - end - a # => [1090, 1077, 1089, 1090] - a = [] 'こんにちは'.each_codepoint do |codepoint| a.push(codepoint) end diff --git a/doc/string/each_grapheme_cluster.rdoc b/doc/string/each_grapheme_cluster.rdoc index 8bc6f78aaa..384cd6967d 100644 --- a/doc/string/each_grapheme_cluster.rdoc +++ b/doc/string/each_grapheme_cluster.rdoc @@ -9,12 +9,6 @@ returns +self+: a # => ["h", "e", "l", "l", "o"] a = [] - 'тест'.each_grapheme_cluster do |grapheme_cluster| - a.push(grapheme_cluster) - end - a # => ["т", "е", "с", "т"] - - a = [] 'こんにちは'.each_grapheme_cluster do |grapheme_cluster| a.push(grapheme_cluster) end diff --git a/doc/string/end_with_p.rdoc b/doc/string/end_with_p.rdoc index fcd9242122..9a95d74fde 100644 --- a/doc/string/end_with_p.rdoc +++ b/doc/string/end_with_p.rdoc @@ -4,7 +4,6 @@ Returns whether +self+ ends with any of the given +strings+: 'foo'.end_with?('bar', 'oo') # => true 'foo'.end_with?('bar', 'baz') # => false 'foo'.end_with?('') # => true - 'тест'.end_with?('т') # => true 'こんにちは'.end_with?('は') # => true Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/string/getbyte.rdoc b/doc/string/getbyte.rdoc index ba1c06fd27..974e21c473 100644 --- a/doc/string/getbyte.rdoc +++ b/doc/string/getbyte.rdoc @@ -16,11 +16,8 @@ Returns +nil+ if +index+ is out of range: More examples: - s = 'тест' - s.bytes # => [209, 130, 208, 181, 209, 129, 209, 130] - s.getbyte(2) # => 208 s = 'こんにちは' s.bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] s.getbyte(2) # => 147 -Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString]. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/grapheme_clusters.rdoc b/doc/string/grapheme_clusters.rdoc index 07ea1e318b..ee8b45700e 100644 --- a/doc/string/grapheme_clusters.rdoc +++ b/doc/string/grapheme_clusters.rdoc @@ -16,4 +16,4 @@ Details: s.chars # => ["a", "̈"] # Two characters. s.chars.map {|char| char.ord } # => [97, 776] # Their values. -Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non--5CString]. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/index.rdoc b/doc/string/index.rdoc index cc34bc68e6..c3cff24dac 100644 --- a/doc/string/index.rdoc +++ b/doc/string/index.rdoc @@ -8,10 +8,9 @@ returns the index of the first matching substring in +self+: 'foo'.index('o') # => 1 'foo'.index('oo') # => 1 'foo'.index('ooo') # => nil - 'тест'.index('с') # => 2 # Characters, not bytes. 'こんにちは'.index('ち') # => 3 -When +pattern is a Regexp, returns the index of the first match in +self+: +When +pattern+ is a Regexp, returns the index of the first match in +self+: 'foo'.index(/o./) # => 1 'foo'.index(/.o/) # => 0 @@ -24,9 +23,6 @@ the returned index is relative to the beginning of +self+: 'bar'.index('r', 2) # => 2 'bar'.index('r', 3) # => nil 'bar'.index(/[r-z]/, 0) # => 2 - 'тест'.index('с', 1) # => 2 - 'тест'.index('с', 2) # => 2 - 'тест'.index('с', 3) # => nil # Offset in characters, not bytes. 'こんにちは'.index('ち', 2) # => 3 With negative integer argument +offset+, selects the search position by counting backward diff --git a/doc/string/insert.rdoc b/doc/string/insert.rdoc index d8252d5ec5..73205f2069 100644 --- a/doc/string/insert.rdoc +++ b/doc/string/insert.rdoc @@ -5,7 +5,6 @@ If the given +index+ is non-negative, inserts +other_string+ at offset +index+: 'foo'.insert(0, 'bar') # => "barfoo" 'foo'.insert(1, 'bar') # => "fbaroo" 'foo'.insert(3, 'bar') # => "foobar" - 'тест'.insert(2, 'bar') # => "теbarст" # Characters, not bytes. 'こんにちは'.insert(2, 'bar') # => "こんbarにちは" If the +index+ is negative, counts backward from the end of +self+ diff --git a/doc/string/inspect.rdoc b/doc/string/inspect.rdoc new file mode 100644 index 0000000000..398a5a74c5 --- /dev/null +++ b/doc/string/inspect.rdoc @@ -0,0 +1,38 @@ +Returns a printable version of +self+, enclosed in double-quotes. + +Most printable characters are rendered simply as themselves: + + 'abc'.inspect # => "\"abc\"" + '012'.inspect # => "\"012\"" + ''.inspect # => "\"\"" + "\u000012".inspect # => "\"\\u000012\"" + 'こんにちは'.inspect # => "\"こんにちは\"" + +But printable characters double-quote (<tt>'"'</tt>) and backslash and (<tt>'\\'</tt>) are escaped: + + '"'.inspect # => "\"\\\"\"" + '\\'.inspect # => "\"\\\\\"" + +Unprintable characters are the {ASCII characters}[https://en.wikipedia.org/wiki/ASCII] +whose values are in range <tt>0..31</tt>, +along with the character whose value is +127+. + +Most of these characters are rendered thus: + + 0.chr.inspect # => "\"\\x00\"" + 1.chr.inspect # => "\"\\x01\"" + 2.chr.inspect # => "\"\\x02\"" + # ... + +A few, however, have special renderings: + + 7.chr.inspect # => "\"\\a\"" # BEL + 8.chr.inspect # => "\"\\b\"" # BS + 9.chr.inspect # => "\"\\t\"" # TAB + 10.chr.inspect # => "\"\\n\"" # LF + 11.chr.inspect # => "\"\\v\"" # VT + 12.chr.inspect # => "\"\\f\"" # FF + 13.chr.inspect # => "\"\\r\"" # CR + 27.chr.inspect # => "\"\\e\"" # ESC + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/intern.rdoc b/doc/string/intern.rdoc new file mode 100644 index 0000000000..c82302b906 --- /dev/null +++ b/doc/string/intern.rdoc @@ -0,0 +1,8 @@ +Returns the Symbol object derived from +self+, +creating it if it did not already exist: + + 'foo'.intern # => :foo + 'こんにちは'.intern # => :こんにちは + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. + diff --git a/doc/string/length.rdoc b/doc/string/length.rdoc index 544bca269f..eb68edb10c 100644 --- a/doc/string/length.rdoc +++ b/doc/string/length.rdoc @@ -1,12 +1,11 @@ Returns the count of characters (not bytes) in +self+: 'foo'.length # => 3 - 'тест'.length # => 4 - 'こんにちは'.length # => 5 + 'こんにちは'.length # => 5 Contrast with String#bytesize: 'foo'.bytesize # => 3 - 'тест'.bytesize # => 8 - 'こんにちは'.bytesize # => 15 + 'こんにちは'.bytesize # => 15 +Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/string/ljust.rdoc b/doc/string/ljust.rdoc index 8e23c1fc8f..a8ca62ee76 100644 --- a/doc/string/ljust.rdoc +++ b/doc/string/ljust.rdoc @@ -1,16 +1,13 @@ -Returns a left-justified copy of +self+. - -If integer argument +size+ is greater than the size (in characters) of +self+, -returns a new string of length +size+ that is a copy of +self+, -left justified and padded on the right with +pad_string+: +Returns a copy of +self+, left-justified and, if necessary, right-padded with the +pad_string+: 'hello'.ljust(10) # => "hello " ' hello'.ljust(10) # => " hello " 'hello'.ljust(10, 'ab') # => "helloababa" - 'тест'.ljust(10) # => "тест " - 'こんにちは'.ljust(10) # => "こんにちは " + 'こんにちは'.ljust(10) # => "こんにちは " -If +size+ is not greater than the size of +self+, returns a copy of +self+: +If <tt>width <= self.length</tt>, returns a copy of +self+: 'hello'.ljust(5) # => "hello" - 'hello'.ljust(1) # => "hello" + 'hello'.ljust(1) # => "hello" # Does not truncate to width. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/ord.rdoc b/doc/string/ord.rdoc index d586363d44..8c460d3ba4 100644 --- a/doc/string/ord.rdoc +++ b/doc/string/ord.rdoc @@ -2,5 +2,6 @@ Returns the integer ordinal of the first character of +self+: 'h'.ord # => 104 'hello'.ord # => 104 - 'тест'.ord # => 1090 'こんにちは'.ord # => 12371 + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/partition.rdoc b/doc/string/partition.rdoc index ebe575e8eb..b2e620a9fc 100644 --- a/doc/string/partition.rdoc +++ b/doc/string/partition.rdoc @@ -1,24 +1,43 @@ Returns a 3-element array of substrings of +self+. -Matches a pattern against +self+, scanning from the beginning. -The pattern is: +If +pattern+ is matched, returns the array: -- +string_or_regexp+ itself, if it is a Regexp. -- <tt>Regexp.quote(string_or_regexp)</tt>, if +string_or_regexp+ is a string. + [pre_match, first_match, post_match] -If the pattern is matched, returns pre-match, first-match, post-match: +where: - 'hello'.partition('l') # => ["he", "l", "lo"] - 'hello'.partition('ll') # => ["he", "ll", "o"] - 'hello'.partition('h') # => ["", "h", "ello"] - 'hello'.partition('o') # => ["hell", "o", ""] - 'hello'.partition(/l+/) #=> ["he", "ll", "o"] - 'hello'.partition('') # => ["", "", "hello"] - 'тест'.partition('т') # => ["", "т", "ест"] - 'こんにちは'.partition('に') # => ["こん", "に", "ちは"] +- +first_match+ is the first-found matching substring. +- +pre_match+ and +post_match+ are the preceding and following substrings. -If the pattern is not matched, returns a copy of +self+ and two empty strings: +If +pattern+ is not matched, returns the array: - 'hello'.partition('x') # => ["hello", "", ""] + [self.dup, "", ""] -Related: String#rpartition, String#split. +Note that in the examples below, a returned string <tt>'hello'</tt> +is a copy of +self+, not +self+. + +If +pattern+ is a Regexp, performs the equivalent of <tt>self.match(pattern)</tt> +(also setting {matched-data variables}[rdoc-ref:language/globals.md@Matched+Data]): + + 'hello'.partition(/h/) # => ["", "h", "ello"] + 'hello'.partition(/l/) # => ["he", "l", "lo"] + 'hello'.partition(/l+/) # => ["he", "ll", "o"] + 'hello'.partition(/o/) # => ["hell", "o", ""] + 'hello'.partition(/^/) # => ["", "", "hello"] + 'hello'.partition(//) # => ["", "", "hello"] + 'hello'.partition(/$/) # => ["hello", "", ""] + 'hello'.partition(/x/) # => ["hello", "", ""] + +If +pattern+ is not a Regexp, converts it to a string (if it is not already one), +then performs the equivalent of <tt>self.index(pattern)</tt> +(and does _not_ set {matched-data global variables}[rdoc-ref:language/globals.md@Matched+Data]): + + 'hello'.partition('h') # => ["", "h", "ello"] + 'hello'.partition('l') # => ["he", "l", "lo"] + 'hello'.partition('ll') # => ["he", "ll", "o"] + 'hello'.partition('o') # => ["hell", "o", ""] + 'hello'.partition('') # => ["", "", "hello"] + 'hello'.partition('x') # => ["hello", "", ""] + 'こんにちは'.partition('に') # => ["こん", "に", "ちは"] + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/rindex.rdoc b/doc/string/rindex.rdoc new file mode 100644 index 0000000000..2b81c3716d --- /dev/null +++ b/doc/string/rindex.rdoc @@ -0,0 +1,51 @@ +Returns the integer position of the _last_ substring that matches the given argument +pattern+, +or +nil+ if none found. + +When +pattern+ is a string, returns the index of the last matching substring in self: + + 'foo'.rindex('f') # => 0 + 'foo'.rindex('o') # => 2 + 'foo'.rindex('oo' # => 1 + 'foo'.rindex('ooo') # => nil + 'こんにちは'.rindex('ち') # => 3 + +When +pattern+ is a Regexp, returns the index of the last match in self: + + 'foo'.rindex(/f/) # => 0 + 'foo'.rindex(/o/) # => 2 + 'foo'.rindex(/oo/) # => 1 + 'foo'.rindex(/ooo/) # => nil + +When +offset+ is non-negative, it specifies the maximum starting position in the +string to end the search: + + 'foo'.rindex('o', 0) # => nil + 'foo'.rindex('o', 1) # => 1 + 'foo'.rindex('o', 2) # => 2 + 'foo'.rindex('o', 3) # => 2 + +With negative integer argument +offset+, +selects the search position by counting backward from the end of +self+: + + 'foo'.rindex('o', -1) # => 2 + 'foo'.rindex('o', -2) # => 1 + 'foo'.rindex('o', -3) # => nil + 'foo'.rindex('o', -4) # => nil + +The last match means starting at the possible last position, not +the last of longest matches: + + 'foo'.rindex(/o+/) # => 2 + $~ # => #<MatchData "o"> + +To get the last longest match, combine with negative lookbehind: + + 'foo'.rindex(/(?<!o)o+/) # => 1 + $~ # => #<MatchData "oo"> + +Or String#index with negative lookforward. + + 'foo'.index(/o+(?!.*o)/) # => 1 + $~ # => #<MatchData "oo"> + +Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/string/rjust.rdoc b/doc/string/rjust.rdoc index 24e7bf3159..acd3f198d4 100644 --- a/doc/string/rjust.rdoc +++ b/doc/string/rjust.rdoc @@ -1,16 +1,17 @@ Returns a right-justified copy of +self+. -If integer argument +size+ is greater than the size (in characters) of +self+, -returns a new string of length +size+ that is a copy of +self+, +If integer argument +width+ is greater than the size (in characters) of +self+, +returns a new string of length +width+ that is a copy of +self+, right justified and padded on the left with +pad_string+: 'hello'.rjust(10) # => " hello" 'hello '.rjust(10) # => " hello " 'hello'.rjust(10, 'ab') # => "ababahello" - 'тест'.rjust(10) # => " тест" 'こんにちは'.rjust(10) # => " こんにちは" -If +size+ is not greater than the size of +self+, returns a copy of +self+: +If <tt>width <= self.size</tt>, returns a copy of +self+: 'hello'.rjust(5, 'ab') # => "hello" 'hello'.rjust(1, 'ab') # => "hello" + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/rpartition.rdoc b/doc/string/rpartition.rdoc index d24106fb9f..add95b1f40 100644 --- a/doc/string/rpartition.rdoc +++ b/doc/string/rpartition.rdoc @@ -1,24 +1,47 @@ Returns a 3-element array of substrings of +self+. -Matches a pattern against +self+, scanning backwards from the end. -The pattern is: +Searches +self+ for a match of +pattern+, seeking the _last_ match. -- +string_or_regexp+ itself, if it is a Regexp. -- <tt>Regexp.quote(string_or_regexp)</tt>, if +string_or_regexp+ is a string. +If +pattern+ is not matched, returns the array: -If the pattern is matched, returns pre-match, last-match, post-match: + ["", "", self.dup] - 'hello'.rpartition('l') # => ["hel", "l", "o"] - 'hello'.rpartition('ll') # => ["he", "ll", "o"] - 'hello'.rpartition('h') # => ["", "h", "ello"] - 'hello'.rpartition('o') # => ["hell", "o", ""] - 'hello'.rpartition(/l+/) # => ["hel", "l", "o"] - 'hello'.rpartition('') # => ["hello", "", ""] - 'тест'.rpartition('т') # => ["тес", "т", ""] - 'こんにちは'.rpartition('に') # => ["こん", "に", "ちは"] +If +pattern+ is matched, returns the array: -If the pattern is not matched, returns two empty strings and a copy of +self+: + [pre_match, last_match, post_match] - 'hello'.rpartition('x') # => ["", "", "hello"] +where: -Related: String#partition, String#split. +- +last_match+ is the last-found matching substring. +- +pre_match+ and +post_match+ are the preceding and following substrings. + +The pattern used is: + +- +pattern+ itself, if it is a Regexp. +- <tt>Regexp.quote(pattern)</tt>, if +pattern+ is a string. + +Note that in the examples below, a returned string <tt>'hello'</tt> is a copy of +self+, not +self+. + +If +pattern+ is a Regexp, searches for the last matching substring +(also setting {matched-data global variables}[rdoc-ref:language/globals.md@Matched+Data]): + + 'hello'.rpartition(/l/) # => ["hel", "l", "o"] + 'hello'.rpartition(/ll/) # => ["he", "ll", "o"] + 'hello'.rpartition(/h/) # => ["", "h", "ello"] + 'hello'.rpartition(/o/) # => ["hell", "o", ""] + 'hello'.rpartition(//) # => ["hello", "", ""] + 'hello'.rpartition(/x/) # => ["", "", "hello"] + 'こんにちは'.rpartition(/に/) # => ["こん", "に", "ちは"] + +If +pattern+ is not a Regexp, converts it to a string (if it is not already one), +then searches for the last matching substring +(and does _not_ set {matched-data global variables}[rdoc-ref:language/globals.md@Matched+Data]): + + 'hello'.rpartition('l') # => ["hel", "l", "o"] + 'hello'.rpartition('ll') # => ["he", "ll", "o"] + 'hello'.rpartition('h') # => ["", "h", "ello"] + 'hello'.rpartition('o') # => ["hell", "o", ""] + 'hello'.rpartition('') # => ["hello", "", ""] + 'こんにちは'.rpartition('に') # => ["こん", "に", "ちは"] + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/scan.rdoc b/doc/string/scan.rdoc new file mode 100644 index 0000000000..d39b5b6dfa --- /dev/null +++ b/doc/string/scan.rdoc @@ -0,0 +1,35 @@ +Matches a pattern against +self+: + +- If +pattern+ is a Regexp, the pattern used is +pattern+ itself. +- If +pattern+ is a string, the pattern used is <tt>Regexp.quote(pattern)</tt>. + +Generates a collection of matching results +and updates {regexp-related global variables}[rdoc-ref:Regexp@Global+Variables]: + +- If the pattern contains no groups, each result is a matched substring. +- If the pattern contains groups, each result is an array + containing a matched substring for each group. + +With no block given, returns an array of the results: + + 'cruel world'.scan(/\w+/) # => ["cruel", "world"] + 'cruel world'.scan(/.../) # => ["cru", "el ", "wor"] + 'cruel world'.scan(/(...)/) # => [["cru"], ["el "], ["wor"]] + 'cruel world'.scan(/(..)(..)/) # => [["cr", "ue"], ["l ", "wo"]] + 'こんにちは'.scan(/../) # => ["こん", "にち"] + 'abracadabra'.scan('ab') # => ["ab", "ab"] + 'abracadabra'.scan('nosuch') # => [] + +With a block given, calls the block with each result; returns +self+: + + 'cruel world'.scan(/\w+/) {|w| p w } + # => "cruel" + # => "world" + 'cruel world'.scan(/(.)(.)/) {|x, y| p [x, y] } + # => ["c", "r"] + # => ["u", "e"] + # => ["l", " "] + # => ["w", "o"] + # => ["r", "l"] + +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/scrub.rdoc b/doc/string/scrub.rdoc index 1a5b1c79d0..314b28c465 100644 --- a/doc/string/scrub.rdoc +++ b/doc/string/scrub.rdoc @@ -1,25 +1,22 @@ Returns a copy of +self+ with each invalid byte sequence replaced by the given +replacement_string+. -With no block given and no argument, replaces each invalid sequence -with the default replacement string -(<tt>"�"</tt> for a Unicode encoding, <tt>'?'</tt> otherwise): +With no block given, replaces each invalid sequence +with the given +default_replacement_string+ +(by default, <tt>"�"</tt> for a Unicode encoding, <tt>'?'</tt> otherwise): - s = "foo\x81\x81bar" - s.scrub # => "foo��bar" + "foo\x81\x81bar".scrub # => "foo��bar" + "foo\x81\x81bar".force_encoding('US-ASCII').scrub # => "foo??bar" + "foo\x81\x81bar".scrub('xyzzy') # => "fooxyzzyxyzzybar" -With no block given and argument +replacement_string+ given, -replaces each invalid sequence with that string: +With a block given, calls the block with each invalid sequence, +and replaces that sequence with the return value of the block: - "foo\x81\x81bar".scrub('xyzzy') # => "fooxyzzyxyzzybar" + "foo\x81\x81bar".scrub {|sequence| p sequence; 'XYZZY' } # => "fooXYZZYXYZZYbar" -With a block given, replaces each invalid sequence with the value -of the block: - - "foo\x81\x81bar".scrub {|bytes| p bytes; 'XYZZY' } - # => "fooXYZZYXYZZYbar" - -Output: +Output : "\x81" "\x81" + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/split.rdoc b/doc/string/split.rdoc index 131c14b83f..8679149003 100644 --- a/doc/string/split.rdoc +++ b/doc/string/split.rdoc @@ -1,99 +1,101 @@ -Returns an array of substrings of +self+ -that are the result of splitting +self+ +Creates an array of substrings by splitting +self+ at each occurrence of the given field separator +field_sep+. -When +field_sep+ is <tt>$;</tt>: +With no arguments given, +splits using the field separator <tt>$;</tt>, +whose default value is +nil+. -- If <tt>$;</tt> is +nil+ (its default value), - the split occurs just as if +field_sep+ were given as a space character - (see below). +With no block given, returns the array of substrings: -- If <tt>$;</tt> is a string, - the split occurs just as if +field_sep+ were given as that string - (see below). + 'abracadabra'.split('a') # => ["", "br", "c", "d", "br"] -When +field_sep+ is <tt>' '</tt> and +limit+ is +0+ (its default value), -the split occurs at each sequence of whitespace: +When +field_sep+ is +nil+ or <tt>' '</tt> (a single space), +splits at each sequence of whitespace: - 'abc def ghi'.split(' ') # => ["abc", "def", "ghi"] - "abc \n\tdef\t\n ghi".split(' ') # => ["abc", "def", "ghi"] - 'abc def ghi'.split(' ') # => ["abc", "def", "ghi"] + 'foo bar baz'.split(nil) # => ["foo", "bar", "baz"] + 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"] + "foo \n\tbar\t\n baz".split(' ') # => ["foo", "bar", "baz"] + 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"] ''.split(' ') # => [] -When +field_sep+ is a string different from <tt>' '</tt> -and +limit+ is +0+, -the split occurs at each occurrence of +field_sep+; -trailing empty substrings are not returned: +When +field_sep+ is an empty string, +splits at every character: - 'abracadabra'.split('ab') # => ["", "racad", "ra"] - 'aaabcdaaa'.split('a') # => ["", "", "", "bcd"] - ''.split('a') # => [] - '3.14159'.split('1') # => ["3.", "4", "59"] - '!@#$%^$&*($)_+'.split('$') # => ["!@#", "%^", "&*(", ")_+"] - 'тест'.split('т') # => ["", "ес"] - 'こんにちは'.split('に') # => ["こん", "ちは"] + 'abracadabra'.split('') # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"] + ''.split('') # => [] + 'こんにちは'.split('') # => ["こ", "ん", "に", "ち", "は"] -When +field_sep+ is a Regexp and +limit+ is +0+, -the split occurs at each occurrence of a match; -trailing empty substrings are not returned: +When +field_sep+ is a non-empty string and different from <tt>' '</tt> (a single space), +uses that string as the separator: + + 'abracadabra'.split('a') # => ["", "br", "c", "d", "br"] + 'abracadabra'.split('ab') # => ["", "racad", "ra"] + ''.split('a') # => [] + 'こんにちは'.split('に') # => ["こん", "ちは"] + +When +field_sep+ is a Regexp, +splits at each occurrence of a matching substring: 'abracadabra'.split(/ab/) # => ["", "racad", "ra"] - 'aaabcdaaa'.split(/a/) # => ["", "", "", "bcd"] - 'aaabcdaaa'.split(//) # => ["a", "a", "a", "b", "c", "d", "a", "a", "a"] '1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"] + 'abracadabra'.split(//) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"] -If the \Regexp contains groups, their matches are also included +If the \Regexp contains groups, their matches are included in the returned array: '1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"] -As seen above, if +limit+ is +0+, -trailing empty substrings are not returned: +Argument +limit+ sets a limit on the size of the returned array; +it also determines whether trailing empty strings are included in the returned array. - 'aaabcdaaa'.split('a') # => ["", "", "", "bcd"] +When +limit+ is zero, +there is no limit on the size of the array, +but trailing empty strings are omitted: -If +limit+ is positive integer +n+, no more than <tt>n - 1-</tt> -splits occur, so that at most +n+ substrings are returned, -and trailing empty substrings are included: + 'abracadabra'.split('', 0) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"] + 'abracadabra'.split('a', 0) # => ["", "br", "c", "d", "br"] # Empty string after last 'a' omitted. - 'aaabcdaaa'.split('a', 1) # => ["aaabcdaaa"] - 'aaabcdaaa'.split('a', 2) # => ["", "aabcdaaa"] - 'aaabcdaaa'.split('a', 5) # => ["", "", "", "bcd", "aa"] - 'aaabcdaaa'.split('a', 7) # => ["", "", "", "bcd", "", "", ""] - 'aaabcdaaa'.split('a', 8) # => ["", "", "", "bcd", "", "", ""] +When +limit+ is a positive integer, +there is a limit on the size of the array (no more than <tt>n - 1</tt> splits occur), +and trailing empty strings are included: -Note that if +field_sep+ is a \Regexp containing groups, -their matches are in the returned array, but do not count toward the limit. + 'abracadabra'.split('', 3) # => ["a", "b", "racadabra"] + 'abracadabra'.split('a', 3) # => ["", "br", "cadabra"] + 'abracadabra'.split('', 30) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""] + 'abracadabra'.split('a', 30) # => ["", "br", "c", "d", "br", ""] + 'abracadabra'.split('', 1) # => ["abracadabra"] + 'abracadabra'.split('a', 1) # => ["abracadabra"] -If +limit+ is negative, it behaves the same as if +limit+ was zero, -meaning that there is no limit, -and trailing empty substrings are included: +When +limit+ is negative, +there is no limit on the size of the array, +and trailing empty strings are omitted: - 'aaabcdaaa'.split('a', -1) # => ["", "", "", "bcd", "", "", ""] + 'abracadabra'.split('', -1) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""] + 'abracadabra'.split('a', -1) # => ["", "br", "c", "d", "br", ""] If a block is given, it is called with each substring and returns +self+: - 'abc def ghi'.split(' ') {|substring| p substring } + 'foo bar baz'.split(' ') {|substring| p substring } + +Output : + + "foo" + "bar" + "baz" -Output: +Note that the above example is functionally equivalent to: - "abc" - "def" - "ghi" - => "abc def ghi" + 'foo bar baz'.split(' ').each {|substring| p substring } -Note that the above example is functionally the same as calling +#each+ after -+#split+ and giving the same block. However, the above example has better -performance because it avoids the creation of an intermediate array. Also, -note the different return values. +Output : - 'abc def ghi'.split(' ').each {|substring| p substring } + "foo" + "bar" + "baz" -Output: +But the latter: - "abc" - "def" - "ghi" - => ["abc", "def", "ghi"] +- Has poorer performance because it creates an intermediate array. +- Returns an array (instead of +self+). -Related: String#partition, String#rpartition. +Related: see {Converting to Non-String}[rdoc-ref:String@Converting+to+Non-String]. diff --git a/doc/string/squeeze.rdoc b/doc/string/squeeze.rdoc new file mode 100644 index 0000000000..1a38c08b32 --- /dev/null +++ b/doc/string/squeeze.rdoc @@ -0,0 +1,33 @@ +Returns a copy of +self+ with each tuple (doubling, tripling, etc.) of specified characters +"squeezed" down to a single character. + +The tuples to be squeezed are specified by arguments +selectors+, +each of which is a string; +see {Character Selectors}[rdoc-ref:character_selectors.rdoc@Character+Selectors]. + +A single argument may be a single character: + + 'Noooooo!'.squeeze('o') # => "No!" + 'foo bar baz'.squeeze(' ') # => "foo bar baz" + 'Mississippi'.squeeze('s') # => "Misisippi" + 'Mississippi'.squeeze('p') # => "Mississipi" + 'Mississippi'.squeeze('x') # => "Mississippi" # Unused selector character is ignored. + 'бессонница'.squeeze('с') # => "бесонница" + 'бессонница'.squeeze('н') # => "бессоница" + +A single argument may be a string of characters: + + 'Mississippi'.squeeze('sp') # => "Misisipi" + 'Mississippi'.squeeze('ps') # => "Misisipi" # Order doesn't matter. + 'Mississippi'.squeeze('nonsense') # => "Misisippi" # Unused selector characters are ignored. + +A single argument may be a range of characters: + + 'Mississippi'.squeeze('a-p') # => "Mississipi" + 'Mississippi'.squeeze('q-z') # => "Misisippi" + 'Mississippi'.squeeze('a-z') # => "Misisipi" + +Multiple arguments are allowed; +see {Multiple Character Selectors}[rdoc-ref:character_selectors.rdoc@Multiple+Character+Selectors]. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/start_with_p.rdoc b/doc/string/start_with_p.rdoc index 5d1f9f9543..f78edc7fa3 100644 --- a/doc/string/start_with_p.rdoc +++ b/doc/string/start_with_p.rdoc @@ -1,10 +1,9 @@ -Returns whether +self+ starts with any of the given +string_or_regexp+. +Returns whether +self+ starts with any of the given +patterns+. -Matches patterns against the beginning of +self+. -For each given +string_or_regexp+, the pattern is: +For each argument, the pattern used is: -- +string_or_regexp+ itself, if it is a Regexp. -- <tt>Regexp.quote(string_or_regexp)</tt>, if +string_or_regexp+ is a string. +- The pattern itself, if it is a Regexp. +- <tt>Regexp.quote(pattern)</tt>, if it is a string. Returns +true+ if any pattern matches the beginning, +false+ otherwise: @@ -12,7 +11,6 @@ Returns +true+ if any pattern matches the beginning, +false+ otherwise: 'hello'.start_with?(/H/i) # => true 'hello'.start_with?('heaven', 'hell') # => true 'hello'.start_with?('heaven', 'paradise') # => false - 'тест'.start_with?('т') # => true 'こんにちは'.start_with?('こ') # => true -Related: String#end_with?. +Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/string/sub.rdoc b/doc/string/sub.rdoc new file mode 100644 index 0000000000..ff051ea177 --- /dev/null +++ b/doc/string/sub.rdoc @@ -0,0 +1,33 @@ +Returns a copy of self, possibly with a substring replaced. + +Argument +pattern+ may be a string or a Regexp; +argument +replacement+ may be a string or a Hash. + +Varying types for the argument values makes this method very versatile. + +Below are some simple examples; for many more examples, +see {Substitution Methods}[rdoc-ref:String@Substitution+Methods]. + +With arguments +pattern+ and string +replacement+ given, +replaces the first matching substring with the given replacement string: + + s = 'abracadabra' # => "abracadabra" + s.sub('bra', 'xyzzy') # => "axyzzycadabra" + s.sub(/bra/, 'xyzzy') # => "axyzzycadabra" + s.sub('nope', 'xyzzy') # => "abracadabra" + +With arguments +pattern+ and hash +replacement+ given, +replaces the first matching substring with a value from the given replacement hash, or removes it: + + h = {'a' => 'A', 'b' => 'B', 'c' => 'C'} + s.sub('b', h) # => "aBracadabra" + s.sub(/b/, h) # => "aBracadabra" + s.sub(/d/, h) # => "abracaabra" # 'd' removed. + +With argument +pattern+ and a block given, +calls the block with each matching substring; +replaces that substring with the block’s return value: + + s.sub('b') {|match| match.upcase } # => "aBracadabra" + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/succ.rdoc b/doc/string/succ.rdoc new file mode 100644 index 0000000000..1b4b936a8e --- /dev/null +++ b/doc/string/succ.rdoc @@ -0,0 +1,52 @@ +Returns the successor to +self+. The successor is calculated by +incrementing characters. + +The first character to be incremented is the rightmost alphanumeric: +or, if no alphanumerics, the rightmost character: + + 'THX1138'.succ # => "THX1139" + '<<koala>>'.succ # => "<<koalb>>" + '***'.succ # => '**+' + 'こんにちは'.succ # => "こんにちば" + +The successor to a digit is another digit, "carrying" to the next-left +character for a "rollover" from 9 to 0, and prepending another digit +if necessary: + + '00'.succ # => "01" + '09'.succ # => "10" + '99'.succ # => "100" + +The successor to a letter is another letter of the same case, +carrying to the next-left character for a rollover, +and prepending another same-case letter if necessary: + + 'aa'.succ # => "ab" + 'az'.succ # => "ba" + 'zz'.succ # => "aaa" + 'AA'.succ # => "AB" + 'AZ'.succ # => "BA" + 'ZZ'.succ # => "AAA" + +The successor to a non-alphanumeric character is the next character +in the underlying character set's collating sequence, +carrying to the next-left character for a rollover, +and prepending another character if necessary: + + s = 0.chr * 3 # => "\x00\x00\x00" + s.succ # => "\x00\x00\x01" + s = 255.chr * 3 # => "\xFF\xFF\xFF" + s.succ # => "\x01\x00\x00\x00" + +Carrying can occur between and among mixtures of alphanumeric characters: + + s = 'zz99zz99' # => "zz99zz99" + s.succ # => "aaa00aa00" + s = '99zz99zz' # => "99zz99zz" + s.succ # => "100aa00aa" + +The successor to an empty +String+ is a new empty +String+: + + ''.succ # => "" + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/sum.rdoc b/doc/string/sum.rdoc index 5de24e6402..22045e5f4d 100644 --- a/doc/string/sum.rdoc +++ b/doc/string/sum.rdoc @@ -1,11 +1,12 @@ -Returns a basic +n+-bit checksum of the characters in +self+; +Returns a basic +n+-bit {checksum}[https://en.wikipedia.org/wiki/Checksum] of the characters in +self+; the checksum is the sum of the binary value of each byte in +self+, modulo <tt>2**n - 1</tt>: 'hello'.sum # => 532 'hello'.sum(4) # => 4 'hello'.sum(64) # => 532 - 'тест'.sum # => 1405 'こんにちは'.sum # => 2582 This is not a particularly strong checksum. + +Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/string/swapcase.rdoc b/doc/string/swapcase.rdoc new file mode 100644 index 0000000000..4353c8528a --- /dev/null +++ b/doc/string/swapcase.rdoc @@ -0,0 +1,31 @@ +Returns a string containing the characters in +self+, with cases reversed: + +- Each uppercase character is downcased. +- Each lowercase character is upcased. + +Examples: + + 'Hello'.swapcase # => "hELLO" + 'Straße'.swapcase # => "sTRASSE" + 'RubyGems.org'.swapcase # => "rUBYgEMS.ORG" + +The sizes of +self+ and the upcased result may differ: + + s = 'Straße' + s.size # => 6 + s.swapcase # => "sTRASSE" + s.swapcase.size # => 7 + +Some characters (and some character sets) do not have upcase and downcase versions; +see {Case Mapping}[rdoc-ref:case_mapping.rdoc]: + + s = '1, 2, 3, ...' + s.swapcase == s # => true + s = 'こんにちは' + s.swapcase == s # => true + +The casing is affected by the given +mapping+, +which may be +:ascii+, +:fold+, or +:turkic+; +see {Case Mappings}[rdoc-ref:case_mapping.rdoc@Case+Mappings]. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/unicode_normalize.rdoc b/doc/string/unicode_normalize.rdoc new file mode 100644 index 0000000000..5f733c0fb8 --- /dev/null +++ b/doc/string/unicode_normalize.rdoc @@ -0,0 +1,28 @@ +Returns a copy of +self+ with +{Unicode normalization}[https://unicode.org/reports/tr15] applied. + +Argument +form+ must be one of the following symbols +(see {Unicode normalization forms}[https://unicode.org/reports/tr15/#Norm_Forms]): + +- +:nfc+: Canonical decomposition, followed by canonical composition. +- +:nfd+: Canonical decomposition. +- +:nfkc+: Compatibility decomposition, followed by canonical composition. +- +:nfkd+: Compatibility decomposition. + +The encoding of +self+ must be one of: + +- <tt>Encoding::UTF_8</tt>. +- <tt>Encoding::UTF_16BE</tt>. +- <tt>Encoding::UTF_16LE</tt>. +- <tt>Encoding::UTF_32BE</tt>. +- <tt>Encoding::UTF_32LE</tt>. +- <tt>Encoding::GB18030</tt>. +- <tt>Encoding::UCS_2BE</tt>. +- <tt>Encoding::UCS_4BE</tt>. + +Examples: + + "a\u0300".unicode_normalize # => "à" # Lowercase 'a' with grave accens. + "a\u0300".unicode_normalize(:nfd) # => "à" # Same. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/upcase.rdoc b/doc/string/upcase.rdoc new file mode 100644 index 0000000000..ad859e8973 --- /dev/null +++ b/doc/string/upcase.rdoc @@ -0,0 +1,27 @@ +Returns a new string containing the upcased characters in +self+: + + 'hello'.upcase # => "HELLO" + 'straße'.upcase # => "STRASSE" + 'привет'.upcase # => "ПРИВЕТ" + 'RubyGems.org'.upcase # => "RUBYGEMS.ORG" + +The sizes of +self+ and the upcased result may differ: + + s = 'Straße' + s.size # => 6 + s.upcase # => "STRASSE" + s.upcase.size # => 7 + +Some characters (and some character sets) do not have upcase and downcase versions; +see {Case Mapping}[rdoc-ref:case_mapping.rdoc]: + + s = '1, 2, 3, ...' + s.upcase == s # => true + s = 'こんにちは' + s.upcase == s # => true + +The casing is affected by the given +mapping+, +which may be +:ascii+, +:fold+, or +:turkic+; +see {Case Mappings}[rdoc-ref:case_mapping.rdoc@Case+Mappings]. + +Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String]. diff --git a/doc/string/upto.rdoc b/doc/string/upto.rdoc new file mode 100644 index 0000000000..f860fe84fe --- /dev/null +++ b/doc/string/upto.rdoc @@ -0,0 +1,38 @@ +With a block given, calls the block with each +String+ value +returned by successive calls to String#succ; +the first value is +self+, the next is <tt>self.succ</tt>, and so on; +the sequence terminates when value +other_string+ is reached; +returns +self+: + + a = [] + 'a'.upto('f') {|c| a.push(c) } + a # => ["a", "b", "c", "d", "e", "f"] + + a = [] + 'Ж'.upto('П') {|c| a.push(c) } + a # => ["Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П"] + + a = [] + 'よ'.upto('ろ') {|c| a.push(c) } + a # => ["よ", "ら", "り", "る", "れ", "ろ"] + + a = [] + 'a8'.upto('b6') {|c| a.push(c) } + a # => ["a8", "a9", "b0", "b1", "b2", "b3", "b4", "b5", "b6"] + +If argument +exclusive+ is given as a truthy object, the last value is omitted: + + a = [] + 'a'.upto('f', true) {|c| a.push(c) } + a # => ["a", "b", "c", "d", "e"] + +If +other_string+ would not be reached, does not call the block: + + '25'.upto('5') {|s| fail s } + 'aa'.upto('a') {|s| fail s } + +With no block given, returns a new Enumerator: + + 'a8'.upto('b6') # => #<Enumerator: "a8":upto("b6")> + +Related: see {Iterating}[rdoc-ref:String@Iterating]. diff --git a/doc/string/valid_encoding_p.rdoc b/doc/string/valid_encoding_p.rdoc new file mode 100644 index 0000000000..e1db55174a --- /dev/null +++ b/doc/string/valid_encoding_p.rdoc @@ -0,0 +1,8 @@ +Returns whether +self+ is encoded correctly: + + s = 'Straße' + s.valid_encoding? # => true + s.encoding # => #<Encoding:UTF-8> + s.force_encoding(Encoding::ASCII).valid_encoding? # => false + +Related: see {Querying}[rdoc-ref:String@Querying]. diff --git a/doc/stringio/each_byte.rdoc b/doc/stringio/each_byte.rdoc new file mode 100644 index 0000000000..708432b69e --- /dev/null +++ b/doc/stringio/each_byte.rdoc @@ -0,0 +1,31 @@ +With a block given, calls the block with each remaining byte in the stream; +positions the stream at end-of-file; +returns +self+: + + bytes = [] + strio = StringIO.new('hello') # Five 1-byte characters. + strio.each_byte {|byte| bytes.push(byte) } + strio.eof? # => true + bytes # => [104, 101, 108, 108, 111] + + bytes = [] + strio = StringIO.new('こんにちは') # Five 3-byte characters. + strio.each_byte {|byte| bytes.push(byte) } + bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] + +The position in the stream matters: + + bytes = [] + strio = StringIO.new('こんにちは') + strio.getc # => "こ" + strio.pos # => 3 # 3-byte character was read. + strio.each_byte {|byte| bytes.push(byte) } + bytes # => [227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] + +If at end-of-file, does not call the block: + + strio.eof? # => true + strio.each_byte {|byte| fail 'Boo!' } + strio.eof? # => true + +With no block given, returns a new {Enumerator}[rdoc-ref:Enumerator]. diff --git a/doc/stringio/each_char.rdoc b/doc/stringio/each_char.rdoc new file mode 100644 index 0000000000..bec5ecac3f --- /dev/null +++ b/doc/stringio/each_char.rdoc @@ -0,0 +1,31 @@ +With a block given, calls the block with each remaining character in the stream; +positions the stream at end-of-file; +returns +self+: + + chars = [] + strio = StringIO.new('hello') + strio.each_char {|char| chars.push(char) } + strio.eof? # => true + chars # => ["h", "e", "l", "l", "o"] + + chars = [] + strio = StringIO.new('こんにちは') + strio.each_char {|char| chars.push(char) } + chars # => ["こ", "ん", "に", "ち", "は"] + +Stream position matters: + + chars = [] + strio = StringIO.new('こんにちは') + strio.getc # => "こ" + strio.pos # => 3 # 3-byte character was read. + strio.each_char {|char| chars.push(char) } + chars # => ["ん", "に", "ち", "は"] + +When at end-of-stream does not call the block: + + strio.eof? # => true + strio.each_char {|char| fail 'Boo!' } + strio.eof? # => true + +With no block given, returns a new {Enumerator}[rdoc-ref:Enumerator]. diff --git a/doc/stringio/each_codepoint.rdoc b/doc/stringio/each_codepoint.rdoc new file mode 100644 index 0000000000..0d10831142 --- /dev/null +++ b/doc/stringio/each_codepoint.rdoc @@ -0,0 +1,33 @@ +With a block given, calls the block with each successive codepoint from self; +sets the position to end-of-stream; +returns +self+. + +Each codepoint is the integer value for a character; returns self: + + codepoints = [] + strio = StringIO.new('hello') + strio.each_codepoint {|codepoint| codepoints.push(codepoint) } + strio.eof? # => true + codepoints # => [104, 101, 108, 108, 111] + + codepoints = [] + strio = StringIO.new('こんにちは') + strio.each_codepoint {|codepoint| codepoints.push(codepoint) } + codepoints # => [12371, 12435, 12395, 12385, 12399] + +Position in the stream matters: + + codepoints = [] + strio = StringIO.new('こんにちは') + strio.getc # => "こ" + strio.pos # => 3 + strio.each_codepoint {|codepoint| codepoints.push(codepoint) } + codepoints # => [12435, 12395, 12385, 12399] + +When at end-of-stream, the block is not called: + + strio.eof? # => true + strio.each_codepoint {|codepoint| fail 'Boo!' } + strio.eof? # => true + +With no block given, returns a new {Enumerator}[rdoc-ref:Enumerator]. diff --git a/doc/stringio/each_line.md b/doc/stringio/each_line.md new file mode 100644 index 0000000000..e29640a12a --- /dev/null +++ b/doc/stringio/each_line.md @@ -0,0 +1,189 @@ +With a block given calls the block with each remaining line (see "Position" below) in the stream; +returns `self`. + +Leaves stream position at end-of-stream. + +**No Arguments** + +With no arguments given, +reads lines using the default record separator +(global variable `$/`, whose initial value is `"\n"`). + +```ruby +strio = StringIO.new(TEXT) +strio.each_line {|line| p line } +strio.eof? # => true +``` + +Output: + +``` +"First line\n" +"Second line\n" +"\n" +"Fourth line\n" +"Fifth line\n" +``` + +**Argument `sep`** + +With only string argument `sep` given, +reads lines using that string as the record separator: + +```ruby +strio = StringIO.new(TEXT) +strio.each_line(' ') {|line| p line } +``` + +Output: + +``` +"First " +"line\nSecond " +"line\n\nFourth " +"line\nFifth " +"line\n" +``` + +**Argument `limit`** + +With only integer argument `limit` given, +reads lines using the default record separator; +also limits the size (in characters) of each line to the given limit: + +```ruby +strio = StringIO.new(TEXT) +strio.each_line(10) {|line| p line } +``` + +Output: + +``` +"First line" +"\n" +"Second lin" +"e\n" +"\n" +"Fourth lin" +"e\n" +"Fifth line" +"\n" +``` + +**Arguments `sep` and `limit`** + +With arguments `sep` and `limit` both given, +honors both: + +```ruby +strio = StringIO.new(TEXT) +strio.each_line(' ', 10) {|line| p line } +``` + +Output: + +``` +"First " +"line\nSecon" +"d " +"line\n\nFour" +"th " +"line\nFifth" +" " +"line\n" +``` + +**Position** + +As stated above, method `each` _remaining_ line in the stream. + +In the examples above each `strio` object starts with its position at beginning-of-stream; +but in other cases the position may be anywhere (see StringIO#pos): + +```ruby +strio = StringIO.new(TEXT) +strio.pos = 30 # Set stream position to character 30. +strio.each_line {|line| p line } +``` + +Output: + +``` +" line\n" +"Fifth line\n" +``` + +In all the examples above, the stream position is at the beginning of a character; +in other cases, that need not be so: + +```ruby +s = 'こんにちは' # Five 3-byte characters. +strio = StringIO.new(s) +strio.pos = 3 # At beginning of second character. +strio.each_line {|line| p line } +strio.pos = 4 # At second byte of second character. +strio.each_line {|line| p line } +strio.pos = 5 # At third byte of second character. +strio.each_line {|line| p line } +``` + +Output: + +``` +"んにちは" +"\x82\x93にちは" +"\x93にちは" +``` + +**Special Record Separators** + +Like some methods in class `IO`, StringIO.each honors two special record separators; +see {Special Line Separators}[https://docs.ruby-lang.org/en/master/IO.html#class-IO-label-Special+Line+Separator+Values]. + +```ruby +strio = StringIO.new(TEXT) +strio.each_line('') {|line| p line } # Read as paragraphs (separated by blank lines). +``` + +Output: + +``` +"First line\nSecond line\n\n" +"Fourth line\nFifth line\n" +``` + +```ruby +strio = StringIO.new(TEXT) +strio.each_line(nil) {|line| p line } # "Slurp"; read it all. +``` + +Output: + +``` +"First line\nSecond line\n\nFourth line\nFifth line\n" +``` + +**Keyword Argument `chomp`** + +With keyword argument `chomp` given as `true` (the default is `false`), +removes trailing newline (if any) from each line: + +```ruby +strio = StringIO.new(TEXT) +strio.each_line(chomp: true) {|line| p line } +``` + +Output: + +``` +"First line" +"Second line" +"" +"Fourth line" +"Fifth line" +``` + +With no block given, returns a new {Enumerator}[https://docs.ruby-lang.org/en/master/Enumerator.html]. + + +Related: StringIO.each_byte, StringIO.each_char, StringIO.each_codepoint. diff --git a/doc/stringio/getbyte.rdoc b/doc/stringio/getbyte.rdoc new file mode 100644 index 0000000000..148455abf4 --- /dev/null +++ b/doc/stringio/getbyte.rdoc @@ -0,0 +1,24 @@ +Reads and returns the next integer byte (not character) from the stream: + + s = 'foo' + s.bytes # => [102, 111, 111] + strio = StringIO.new(s) + strio.getbyte # => 102 + strio.getbyte # => 111 + strio.getbyte # => 111 + +Returns +nil+ if at end-of-stream: + + strio.eof? # => true + strio.getbyte # => nil + +Returns a byte, not a character: + + s = 'こんにちは' + s.bytes + # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] + strio = StringIO.new(s) + strio.getbyte # => 227 + strio.getbyte # => 129 + +Related: #each_byte, #ungetbyte, #getc. diff --git a/doc/stringio/getc.rdoc b/doc/stringio/getc.rdoc new file mode 100644 index 0000000000..58ce47c337 --- /dev/null +++ b/doc/stringio/getc.rdoc @@ -0,0 +1,30 @@ +Reads and returns the next character (or byte; see below) from the stream: + + strio = StringIO.new('foo') + strio.getc # => "f" + strio.getc # => "o" + strio.getc # => "o" + +Returns +nil+ if at end-of-stream: + + strio.eof? # => true + strio.getc # => nil + +Returns characters, not bytes: + + strio = StringIO.new('こんにちは') + strio.getc # => "こ" + strio.getc # => "ん" + +In each of the examples above, the stream is positioned at the beginning of a character; +in other cases that need not be true: + + strio = StringIO.new('こんにちは') # Five 3-byte characters. + strio.pos = 3 # => 3 # At beginning of second character; returns character. + strio.getc # => "ん" + strio.pos = 4 # => 4 # At second byte of second character; returns byte. + strio.getc # => "\x82" + strio.pos = 5 # => 5 # At third byte of second character; returns byte. + strio.getc # => "\x93" + +Related: #getbyte, #putc, #ungetc. diff --git a/doc/stringio/gets.rdoc b/doc/stringio/gets.rdoc new file mode 100644 index 0000000000..4152152a25 --- /dev/null +++ b/doc/stringio/gets.rdoc @@ -0,0 +1,99 @@ +Reads and returns a line from the stream; +returns +nil+ if at end-of-stream. + +Side effects: + +- Increments stream position by the number of bytes read. +- Assigns the return value to global variable <tt>$_</tt>. + +With no arguments given, reads a line using the default record separator +(global variable <tt>$/</tt>,* whose initial value is <tt>"\n"</tt>): + + strio = StringIO.new(TEXT) + strio.pos # => 0 + strio.gets # => "First line\n" + strio.pos # => 11 + $_ # => "First line\n" + strio.gets # => "Second line\n" + strio.read # => "\nFourth line\nFifth line\n" + strio.eof? # => true + strio.gets # => nil + + strio = StringIO.new('こんにちは') # Five 3-byte characters. + strio.pos # => 0 + strio.gets # => "こんにちは" + strio.pos # => 15 + +<b>Argument +sep+</b> + +With only string argument +sep+ given, reads a line using that string as the record separator: + + strio = StringIO.new(TEXT) + strio.gets(' ') # => "First " + strio.gets(' ') # => "line\nSecond " + strio.gets(' ') # => "line\n\nFourth " + +<b>Argument +limit+</b> + +With only integer argument +limit+ given, +reads a line using the default record separator; +limits the size (in characters) of each line to the given limit: + + strio = StringIO.new(TEXT) + strio.gets(10) # => "First line" + strio.gets(10) # => "\n" + strio.gets(10) # => "Second lin" + strio.gets(10) # => "e\n" + +<b>Arguments +sep+ and +limit+</b> + +With arguments +sep+ and +limit+ both given, honors both: + + strio = StringIO.new(TEXT) + strio.gets(' ', 10) # => "First " + strio.gets(' ', 10) # => "line\nSecon" + strio.gets(' ', 10) # => "d " + +<b>Position</b> + +As stated above, method +gets+ reads and returns the next line in the stream. + +In the examples above each +strio+ object starts with its position at beginning-of-stream; +but in other cases the position may be anywhere: + + strio = StringIO.new(TEXT) + strio.pos = 12 + strio.gets # => "econd line\n" + +The position need not be at a character boundary: + + strio = StringIO.new('こんにちは') # Five 3-byte characters. + strio.pos = 3 # At beginning of second character. + strio.gets # => "んにちは" + strio.pos = 4 # Within second character. + strio.gets # => "\x82\x93にちは" + +<b>Special Record Separators</b> + +Like some methods in class IO, method +gets+ honors two special record separators; +see {Special Line Separators}[https://docs.ruby-lang.org/en/master/IO.html#class-IO-label-Special+Line+Separator+Values]: + + strio = StringIO.new(TEXT) + strio.gets('') # Read "paragraph" (up to empty line). + # => "First line\nSecond line\n\n" + + strio = StringIO.new(TEXT) + strio.gets(nil) # "Slurp": read all. + # => "First line\nSecond line\n\nFourth line\nFifth line\n" + +<b>Keyword Argument +chomp+</b> + +With keyword argument +chomp+ given as +true+ (the default is +false+), +removes the trailing newline (if any) from the returned line: + + strio = StringIO.new(TEXT) + strio.gets # => "First line\n" + strio.gets(chomp: true) # => "Second line" + +Related: #each_line, #readlines, +{Kernel#puts}[rdoc-ref:Kernel#puts]. diff --git a/doc/stringio/pread.rdoc b/doc/stringio/pread.rdoc new file mode 100644 index 0000000000..2dcbc18ad8 --- /dev/null +++ b/doc/stringio/pread.rdoc @@ -0,0 +1,65 @@ +**Note**: \Method +pread+ is different from other reading methods +in that it does not modify +self+ in any way; +thus, multiple threads may read safely from the same stream. + +Reads up to +maxlen+ bytes from the stream, +beginning at 0-based byte offset +offset+; +returns a string containing the read bytes. + +The returned string: + +- Contains +maxlen+ bytes from the stream, if available; + otherwise contains all available bytes. +- Has encoding +Encoding::ASCII_8BIT+. + +With only arguments +maxlen+ and +offset+ given, +returns a new string: + + english = 'Hello' # Five 1-byte characters. + strio = StringIO.new(english) + strio.pread(3, 0) # => "Hel" + strio.pread(3, 2) # => "llo" + strio.pread(0, 0) # => "" + strio.pread(50, 0) # => "Hello" + strio.pread(50, 2) # => "llo" + strio.pread(50, 4) # => "o" + strio.pread(0, 0).encoding + # => #<Encoding:BINARY (ASCII-8BIT)> + + russian = 'Привет' # Six 2-byte characters. + strio = StringIO.new(russian) + strio.pread(50, 0) # All 12 bytes. + # => "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82" + strio.pread(3, 0) # => "\xD0\x9F\xD1" + strio.pread(3, 3) # => "\x80\xD0\xB8" + strio.pread(0, 0).encoding + # => #<Encoding:BINARY (ASCII-8BIT)> + + japanese = 'こんにちは' # Five 3-byte characters. + strio = StringIO.new(japanese) + strio.pread(50, 0) # All 15 bytes. + # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF" + strio.pread(6, 0) # => "\xE3\x81\x93\xE3\x82\x93" + strio.pread(1, 2) # => "\x93" + strio.pread(0, 0).encoding + # => #<Encoding:BINARY (ASCII-8BIT)> + +Raises an exception if +offset+ is out-of-range: + + strio = StringIO.new(english) + strio.pread(5, 50) # Raises EOFError: end of file reached + +With string argument +out_string+ given: + +- Reads as above. +- Overwrites the content of +out_string+ with the read bytes. + +Examples: + + out_string = 'Will be overwritten' + out_string.encoding # => #<Encoding:UTF-8> + result = StringIO.new(english).pread(50, 0, out_string) + result.__id__ == out_string.__id__ # => true + out_string # => "Hello" + out_string.encoding # => #<Encoding:BINARY (ASCII-8BIT)> + diff --git a/doc/stringio/putc.rdoc b/doc/stringio/putc.rdoc new file mode 100644 index 0000000000..4636ffa0db --- /dev/null +++ b/doc/stringio/putc.rdoc @@ -0,0 +1,82 @@ +Replaces one or more bytes at position +pos+ +with bytes of the given argument; +advances the position by the count of bytes written; +returns the argument. + +\StringIO object for 1-byte characters. + + strio = StringIO.new('foo') + strio.pos # => 0 + +With 1-byte argument, replaces one byte: + + strio.putc('b') + strio.string # => "boo" + strio.pos # => 1 + strio.putc('a') # => "a" + strio.string # => "bao" + strio.pos # => 2 + strio.putc('r') # => "r" + strio.string # => "bar" + strio.pos # => 3 + strio.putc('n') # => "n" + strio.string # => "barn" + strio.pos # => 4 + +Fills with null characters if necessary: + + strio.pos = 6 + strio.putc('x') # => "x" + strio.string # => "barn\u0000\u0000x" + strio.pos # => 7 + +With integer argument, replaces one byte with the low-order byte of the integer: + + strio = StringIO.new('foo') + strio.putc(70) + strio.string # => "Foo" + strio.putc(79) + strio.string # => "FOo" + strio.putc(79 + 1024) + strio.string # => "FOO" + +\StringIO object for Multi-byte characters: + + greek = 'αβγδε' # Five 2-byte characters. + strio = StringIO.new(greek) + strio.string# => "αβγδε" + strio.string.b # => "\xCE\xB1\xCE\xB2\xCE\xB3\xCE\xB4\xCE\xB5" + strio.string.bytesize # => 10 + strio.string.chars # => ["α", "β", "γ", "δ", "ε"] + strio.string.size # => 5 + +With 1-byte argument, replaces one byte of the string: + + strio.putc(' ') # 1-byte ascii space. + strio.pos # => 1 + strio.string # => " \xB1βγδε" + strio.string.b # => " \xB1\xCE\xB2\xCE\xB3\xCE\xB4\xCE\xB5" + strio.string.bytesize # => 10 + strio.string.chars # => [" ", "\xB1", "β", "γ", "δ", "ε"] + strio.string.size # => 6 + + strio.putc(' ') + strio.pos # => 2 + strio.string # => " βγδε" + strio.string.b # => " \xCE\xB2\xCE\xB3\xCE\xB4\xCE\xB5" + strio.string.bytesize # => 10 + strio.string.chars # => [" ", " ", "β", "γ", "δ", "ε"] + strio.string.size # => 6 + +With 2-byte argument, replaces two bytes of the string: + + strio.rewind + strio.putc('α') + strio.pos # => 2 + strio.string # => "αβγδε" + strio.string.b # => "\xCE\xB1\xCE\xB2\xCE\xB3\xCE\xB4\xCE\xB5" + strio.string.bytesize # => 10 + strio.string.chars # => ["α", "β", "γ", "δ", "ε"] + strio.string.size # => 5 + +Related: #getc, #ungetc. diff --git a/doc/stringio/read.rdoc b/doc/stringio/read.rdoc new file mode 100644 index 0000000000..46b9fa349f --- /dev/null +++ b/doc/stringio/read.rdoc @@ -0,0 +1,83 @@ +Reads and returns a string containing bytes read from the stream, +beginning at the current position; +advances the position by the count of bytes read. + +With no arguments given, +reads all remaining bytes in the stream; +returns a new string containing bytes read: + + strio = StringIO.new('Hello') # Five 1-byte characters. + strio.read # => "Hello" + strio.pos # => 5 + strio.read # => "" + StringIO.new('').read # => "" + +With non-negative argument +maxlen+ given, +reads +maxlen+ bytes as available; +returns a new string containing the bytes read, or +nil+ if none: + + strio.rewind + strio.read(3) # => "Hel" + strio.read(3) # => "lo" + strio.read(3) # => nil + + russian = 'Привет' # Six 2-byte characters. + russian.b + # => "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82" + strio = StringIO.new(russian) + strio.read(6) # => "\xD0\x9F\xD1\x80\xD0\xB8" + strio.read(6) # => "\xD0\xB2\xD0\xB5\xD1\x82" + strio.read(6) # => nil + + japanese = 'こんにちは' + japanese.b + # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF" + strio = StringIO.new(japanese) + strio.read(9) # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB" + strio.read(9) # => "\xE3\x81\xA1\xE3\x81\xAF" + strio.read(9) # => nil + +With argument +max_len+ as +nil+ and string argument +out_string+ given, +reads the remaining bytes in the stream; +clears +out_string+ and writes the bytes into it; +returns +out_string+: + + out_string = 'Will be overwritten' + strio = StringIO.new('Hello') + strio.read(nil, out_string) # => "Hello" + strio.read(nil, out_string) # => "" + +With non-negative argument +maxlen+ and string argument +out_string+ given, +reads the +maxlen bytes from the stream, as availble; +clears +out_string+ and writes the bytes into it; +returns +out_string+ if any bytes were read, or +nil+ if none: + + out_string = 'Will be overwritten' + strio = StringIO.new('Hello') + strio.read(3, out_string) # => "Hel" + strio.read(3, out_string) # => "lo" + strio.read(3, out_string) # => nil + + out_string = 'Will be overwritten' + strio = StringIO.new(russian) + strio.read(6, out_string) # => "При" + strio.read(6, out_string) # => "вет" + strio.read(6, out_string) # => nil + strio.rewind + russian.b + # => "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82" + strio.read(3) # => "\xD0\x9F\xD1" + strio.read(3) # => "\x80\xD0\xB8" + + out_string = 'Will be overwritten' + strio = StringIO.new(japanese) + strio.read(9, out_string) # => "こんに" + strio.read(9, out_string) # => "ちは" + strio.read(9, out_string) # => nil + strio.rewind + japanese.b + # => "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF" + strio.read(4) # => "\xE3\x81\x93\xE3" + strio.read(4) # => "\x82\x93\xE3\x81" + +Related: #gets, #readlines. diff --git a/doc/stringio/size.rdoc b/doc/stringio/size.rdoc new file mode 100644 index 0000000000..253c612c43 --- /dev/null +++ b/doc/stringio/size.rdoc @@ -0,0 +1,4 @@ +Returns the number of bytes in the string in +self+: + + StringIO.new('hello').size # => 5 # Five 1-byte characters. + StringIO.new('こんにちは').size # => 15 # Five 3-byte characters. diff --git a/doc/stringio/stringio.md b/doc/stringio/stringio.md new file mode 100644 index 0000000000..f81f79cfea --- /dev/null +++ b/doc/stringio/stringio.md @@ -0,0 +1,702 @@ +\Class \StringIO supports accessing a string as a stream, +similar in some ways to [class IO][io class]. + +You can create a \StringIO instance using: + +- StringIO.new: returns a new \StringIO object containing the given string. +- StringIO.open: passes a new \StringIO object to the given block. + +Like an \IO stream, a \StringIO stream has certain properties: + +- **Read/write mode**: whether the stream may be read, written, appended to, etc.; + see [Read/Write Mode][read/write mode]. +- **Data mode**: text-only or binary; + see [Data Mode][data mode]. +- **Encodings**: internal and external encodings; + see [Encodings][encodings]. +- **Position**: where in the stream the next read or write is to occur; + see [Position][position]. +- **Line number**: a special, line-oriented, "position" (different from the position mentioned above); + see [Line Number][line number]. +- **Open/closed**: whether the stream is open or closed, for reading or writing. + see [Open/Closed Streams][open/closed streams]. +- **BOM**: byte mark order; + see [Byte Order Mark][bom (byte order mark)]. + +## About the Examples + +Examples on this page assume that \StringIO has been required: + +```ruby +require 'stringio' +``` + +And that this constant has been defined: + +```ruby +TEXT = <<EOT +First line +Second line + +Fourth line +Fifth line +EOT +``` + +## Stream Properties + +### Read/Write Mode + +#### Summary + +| Mode | Initial Clear? | Read | Write | +|:--------------------------:|:--------------:|:--------:|:--------:| +| <tt>'r'</tt>: read-only | No | Anywhere | Error | +| <tt>'w'</tt>: write-only | Yes | Error | Anywhere | +| <tt>'a'</tt>: append-only | No | Error | End only | +| <tt>'r+'</tt>: read/write | No | Anywhere | Anywhere | +| <tt>'w+'</tt>: read-write | Yes | Anywhere | Anywhere | +| <tt>'a+'</tt>: read/append | No | Anywhere | End only | + +Each section below describes a read/write mode. + +Any of the modes may be given as a string or as file constants; +example: + +```ruby +strio = StringIO.new('foo', 'a') +strio = StringIO.new('foo', File::WRONLY | File::APPEND) +``` + +#### `'r'`: Read-Only + +Mode specified as one of: + +- String: `'r'`. +- Constant: `File::RDONLY`. + +Initial state: + +```ruby +strio = StringIO.new('foobarbaz', 'r') +strio.pos # => 0 # Beginning-of-stream. +strio.string # => "foobarbaz" # Not cleared. +``` + +May be read anywhere: + +```ruby +strio.gets(3) # => "foo" +strio.gets(3) # => "bar" +strio.pos = 9 +strio.gets(3) # => nil +``` + +May not be written: + +```ruby +strio.write('foo') # Raises IOError: not opened for writing +``` + +#### `'w'`: Write-Only + +Mode specified as one of: + +- String: `'w'`. +- Constant: `File::WRONLY`. + +Initial state: + +```ruby +strio = StringIO.new('foo', 'w') +strio.pos # => 0 # Beginning of stream. +strio.string # => "" # Initially cleared. +``` + +May be written anywhere (even past end-of-stream): + +```ruby +strio.write('foobar') +strio.string # => "foobar" +strio.rewind +strio.write('FOO') +strio.string # => "FOObar" +strio.pos = 3 +strio.write('BAR') +strio.string # => "FOOBAR" +strio.pos = 9 +strio.write('baz') +strio.string # => "FOOBAR\u0000\u0000\u0000baz" # Null-padded. +``` + +May not be read: + +```ruby +strio.read # Raises IOError: not opened for reading +``` + +#### `'a'`: Append-Only + +Mode specified as one of: + +- String: `'a'`. +- Constant: `File::WRONLY | File::APPEND`. + +Initial state: + +```ruby +strio = StringIO.new('foo', 'a') +strio.pos # => 0 # Beginning-of-stream. +strio.string # => "foo" # Not cleared. +``` + +May be written only at the end; position does not affect writing: + +```ruby +strio.write('bar') +strio.string # => "foobar" +strio.write('baz') +strio.string # => "foobarbaz" +strio.pos = 400 +strio.write('bat') +strio.string # => "foobarbazbat" +``` + +May not be read: + +```ruby +strio.gets # Raises IOError: not opened for reading +``` + +#### `'r+'`: Read/Write + +Mode specified as one of: + +- String: `'r+'`. +- Constant: `File::RDRW`. + +Initial state: + +```ruby +strio = StringIO.new('foobar', 'r+') +strio.pos # => 0 # Beginning-of-stream. +strio.string # => "foobar" # Not cleared. +``` + +May be written anywhere (even past end-of-stream): + +```ruby +strio.write('FOO') +strio.string # => "FOObar" +strio.write('BAR') +strio.string # => "FOOBAR" +strio.write('BAZ') +strio.string # => "FOOBARBAZ" +strio.pos = 12 +strio.write('BAT') +strio.string # => "FOOBARBAZ\u0000\u0000\u0000BAT" # Null padded. +``` + +May be read anywhere: + +```ruby +strio.pos = 0 +strio.gets(3) # => "FOO" +strio.pos = 6 +strio.gets(3) # => "BAZ" +strio.pos = 400 +strio.gets(3) # => nil +``` + +#### `'w+'`: Read/Write (Initially Clear) + +Mode specified as one of: + +- String: `'w+'`. +- Constant: `File::RDWR | File::TRUNC`. + +Initial state: + +```ruby +strio = StringIO.new('foo', 'w+') +strio.pos # => 0 # Beginning-of-stream. +strio.string # => "" # Truncated. +``` + +May be written anywhere (even past end-of-stream): + +```ruby +strio.write('foobar') +strio.string # => "foobar" +strio.rewind +strio.write('FOO') +strio.string # => "FOObar" +strio.write('BAR') +strio.string # => "FOOBAR" +strio.write('BAZ') +strio.string # => "FOOBARBAZ" +strio.pos = 12 +strio.write('BAT') +strio.string # => "FOOBARBAZ\u0000\u0000\u0000BAT" # Null-padded. +``` + +May be read anywhere: + +```ruby +strio.rewind +strio.gets(3) # => "FOO" +strio.gets(3) # => "BAR" +strio.pos = 12 +strio.gets(3) # => "BAT" +strio.pos = 400 +strio.gets(3) # => nil +``` + +#### `'a+'`: Read/Append + +Mode specified as one of: + +- String: `'a+'`. +- Constant: `File::RDWR | File::APPEND`. + +Initial state: + +```ruby +strio = StringIO.new('foo', 'a+') +strio.pos # => 0 # Beginning-of-stream. +strio.string # => "foo" # Not cleared. +``` + +May be written only at the end; #rewind; position does not affect writing: + +```ruby +strio.write('bar') +strio.string # => "foobar" +strio.write('baz') +strio.string # => "foobarbaz" +strio.pos = 400 +strio.write('bat') +strio.string # => "foobarbazbat" +``` + +May be read anywhere: + +```ruby +strio.rewind +strio.gets(3) # => "foo" +strio.gets(3) # => "bar" +strio.pos = 9 +strio.gets(3) # => "bat" +strio.pos = 400 +strio.gets(3) # => nil +``` + +### Data Mode + +To specify whether the stream is to be treated as text or as binary data, +either of the following may be suffixed to any of the string read/write modes above: + +- `'t'`: Text; + initializes the encoding as Encoding::UTF_8. +- `'b'`: Binary; + initializes the encoding as Encoding::ASCII_8BIT. + +If neither is given, the stream defaults to text data. + +Examples: + +```ruby +strio = StringIO.new('foo', 'rt') +strio.external_encoding # => #<Encoding:UTF-8> +data = "\u9990\u9991\u9992\u9993\u9994" +strio = StringIO.new(data, 'rb') +strio.external_encoding # => #<Encoding:BINARY (ASCII-8BIT)> +``` + +When the data mode is specified, the read/write mode may not be omitted: + +```ruby +StringIO.new(data, 'b') # Raises ArgumentError: invalid access mode b +``` + +A text stream may be changed to binary by calling instance method #binmode; +a binary stream may not be changed to text. + +### Encodings + +A stream has an encoding; see [Encodings][encodings document]. + +The initial encoding for a new or re-opened stream depends on its [data mode][data mode]: + +- Text: `Encoding::UTF_8`. +- Binary: `Encoding::ASCII_8BIT`. + +These instance methods are relevant: + +- #external_encoding: returns the current encoding of the stream as an `Encoding` object. +- #internal_encoding: returns +nil+; a stream does not have an internal encoding. +- #set_encoding: sets the encoding for the stream. +- #set_encoding_by_bom: sets the encoding for the stream to the stream's BOM (byte order mark). + +Examples: + +```ruby +strio = StringIO.new('foo', 'rt') # Text mode. +strio.external_encoding # => #<Encoding:UTF-8> +data = "\u9990\u9991\u9992\u9993\u9994" +strio = StringIO.new(data, 'rb') # Binary mode. +strio.external_encoding # => #<Encoding:BINARY (ASCII-8BIT)> +strio = StringIO.new('foo') +strio.external_encoding # => #<Encoding:UTF-8> +strio.set_encoding('US-ASCII') +strio.external_encoding # => #<Encoding:US-ASCII> +``` + +### Position + +A stream has a _position_, and integer offset (in bytes) into the stream. +The initial position of a stream is zero. + +#### Getting and Setting the Position + +Each of these methods initializes (to zero) the position of a new or re-opened stream: + +- ::new: returns a new stream. +- ::open: passes a new stream to the block. +- #reopen: re-initializes the stream. + +Each of these methods queries, gets, or sets the position, without otherwise changing the stream: + +- #eof?: returns whether the position is at end-of-stream. +- #pos: returns the position. +- #pos=: sets the position. +- #rewind: sets the position to zero. +- #seek: sets the position. + +Examples: + +```ruby +strio = StringIO.new('foobar') +strio.pos # => 0 +strio.pos = 3 +strio.pos # => 3 +strio.eof? # => false +strio.rewind +strio.pos # => 0 +strio.seek(0, IO::SEEK_END) +strio.pos # => 6 +strio.eof? # => true +``` + +#### Position Before and After Reading + +Except for #pread, a stream reading method (see [Basic Reading][basic reading]) +begins reading at the current position. + +Except for #pread, a read method advances the position past the read substring. + +Examples: + +```ruby +strio = StringIO.new(TEXT) +strio.string # => "First line\nSecond line\n\nFourth line\nFifth line\n" +strio.pos # => 0 +strio.getc # => "F" +strio.pos # => 1 +strio.gets # => "irst line\n" +strio.pos # => 11 +strio.pos = 24 +strio.gets # => "Fourth line\n" +strio.pos # => 36 + +strio = StringIO.new('こんにちは') # Five 3-byte characters. +strio.pos = 0 # At first byte of first character. +strio.read # => "こんにちは" +strio.pos = 1 # At second byte of first character. +strio.read # => "\x81\x93んにちは" +strio.pos = 2 # At third byte of first character. +strio.read # => "\x93んにちは" +strio.pos = 3 # At first byte of second character. +strio.read # => "んにちは" + +strio = StringIO.new(TEXT) +strio.pos = 15 +a = [] +strio.each_line {|line| a.push(line) } +a # => ["nd line\n", "\n", "Fourth line\n", "Fifth line\n"] +strio.pos # => 47 ## End-of-stream. +``` + +#### Position Before and After Writing + +Each of these methods begins writing at the current position, +and advances the position to the end of the written substring: + +- #putc: writes the given character. +- #write: writes the given objects as strings. +- [Kernel#puts][kernel#puts]: writes given objects as strings, each followed by newline. + +Examples: + +```ruby +strio = StringIO.new('foo') +strio.pos # => 0 +strio.putc('b') +strio.string # => "boo" +strio.pos # => 1 +strio.write('r') +strio.string # => "bro" +strio.pos # => 2 +strio.puts('ew') +strio.string # => "brew\n" +strio.pos # => 5 +strio.pos = 8 +strio.write('foo') +strio.string # => "brew\n\u0000\u0000\u0000foo" +strio.pos # => 11 +``` + +Each of these methods writes _before_ the current position, and decrements the position +so that the written data is next to be read: + +- #ungetbyte: unshifts the given byte. +- #ungetc: unshifts the given character. + +Examples: + +```ruby +strio = StringIO.new('foo') +strio.pos = 2 +strio.ungetc('x') +strio.pos # => 1 +strio.string # => "fxo" +strio.ungetc('x') +strio.pos # => 0 +strio.string # => "xxo" +``` + +This method does not affect the position: + +- #truncate: truncates the stream's string to the given size. + +Examples: + +```ruby +strio = StringIO.new('foobar') +strio.pos # => 0 +strio.truncate(3) +strio.string # => "foo" +strio.pos # => 0 +strio.pos = 500 +strio.truncate(0) +strio.string # => "" +strio.pos # => 500 +``` + +### Line Number + +A stream has a line number, which initially is zero: + +- Method #lineno returns the line number. +- Method #lineno= sets the line number. + +The line number can be affected by reading (but never by writing); +in general, the line number is incremented each time the record separator (default: `"\n"`) is read. + +Examples: + +```ruby +strio = StringIO.new(TEXT) +strio.string # => "First line\nSecond line\n\nFourth line\nFifth line\n" +strio.lineno # => 0 +strio.gets # => "First line\n" +strio.lineno # => 1 +strio.getc # => "S" +strio.lineno # => 1 +strio.gets # => "econd line\n" +strio.lineno # => 2 +strio.gets # => "\n" +strio.lineno # => 3 +strio.gets # => "Fourth line\n" +strio.lineno # => 4 +``` + +Setting the position does not affect the line number: + +```ruby +strio.pos = 0 +strio.lineno # => 4 +strio.gets # => "First line\n" +strio.pos # => 11 +strio.lineno # => 5 +``` + +And setting the line number does not affect the position: + +```ruby +strio.lineno = 10 +strio.pos # => 11 +strio.gets # => "Second line\n" +strio.lineno # => 11 +strio.pos # => 23 +``` + +### Open/Closed Streams + +A new stream is open for either reading or writing, and may be open for both; +see [Read/Write Mode][read/write mode]. + +Each of these methods initializes the read/write mode for a new or re-opened stream: + +- ::new: returns a new stream. +- ::open: passes a new stream to the block. +- #reopen: re-initializes the stream. + +Other relevant methods: + +- #close: closes the stream for both reading and writing. +- #close_read: closes the stream for reading. +- #close_write: closes the stream for writing. +- #closed?: returns whether the stream is closed for both reading and writing. +- #closed_read?: returns whether the stream is closed for reading. +- #closed_write?: returns whether the stream is closed for writing. + +### BOM (Byte Order Mark) + +The string provided for ::new, ::open, or #reopen +may contain an optional [BOM][bom] (byte order mark) at the beginning of the string; +the BOM can affect the stream's encoding. + +The BOM (if provided): + +- Is stored as part of the stream's string. +- Does _not_ immediately affect the encoding. +- Is _initially_ considered part of the stream. + +```ruby +utf8_bom = "\xEF\xBB\xBF" +string = utf8_bom + 'foo' +string.bytes # => [239, 187, 191, 102, 111, 111] +strio.string.bytes.take(3) # => [239, 187, 191] # The BOM. +strio = StringIO.new(string, 'rb') +strio.string.bytes # => [239, 187, 191, 102, 111, 111] # BOM is part of the stored string. +strio.external_encoding # => #<Encoding:BINARY (ASCII-8BIT)> # Default for a binary stream. +strio.gets # => "\xEF\xBB\xBFfoo" # BOM is part of the stream. +``` + +You can call instance method #set_encoding_by_bom to "activate" the stored BOM; +after doing so the BOM: + +- Is _still_ stored as part of the stream's string. +- _Determines_ (and may have changed) the stream's encoding. +- Is _no longer_ considered part of the stream. + +```ruby +strio.set_encoding_by_bom +strio.string.bytes # => [239, 187, 191, 102, 111, 111] # BOM is still part of the stored string. +strio.external_encoding # => #<Encoding:UTF-8> # The new encoding. +strio.rewind # => 0 +strio.gets # => "foo" # BOM is not part of the stream. +``` + +## Basic Stream \IO + +### Basic Reading + +You can read from the stream using these instance methods: + +- #getbyte: reads and returns the next byte. +- #getc: reads and returns the next character. +- #gets: reads and returns all or part of the next line. +- #read: reads and returns all or part of the remaining data in the stream. +- #readlines: reads the remaining data the stream and returns an array of its lines. +- [Kernel#readline][kernel#readline]: like #gets, but raises an exception if at end-of-stream. + +You can iterate over the stream using these instance methods: + +- #each_byte: reads each remaining byte, passing it to the block. +- #each_char: reads each remaining character, passing it to the block. +- #each_codepoint: reads each remaining codepoint, passing it to the block. +- #each_line: reads all or part of each remaining line, passing the read string to the block + +This instance method is useful in a multi-threaded application: + +- #pread: reads and returns all or part of the stream. + +### Basic Writing + +You can write to the stream, advancing the position, using these instance methods: + +- #putc: writes a given character. +- #write: writes the given objects as strings. +- [Kernel#puts][kernel#puts] writes given objects as strings, each followed by newline. + +You can "unshift" to the stream using these instance methods; +each writes _before_ the current position, and decrements the position +so that the written data is next to be read. + +- #ungetbyte: unshifts the given byte. +- #ungetc: unshifts the given character. + +One more writing method: + +- #truncate: truncates the stream's string to the given size. + +## Line \IO + +Reading: + +- #gets: reads and returns the next line. +- [Kernel#readline][kernel#readline]: like #gets, but raises an exception if at end-of-stream. +- #readlines: reads the remaining data the stream and returns an array of its lines. +- #each_line: reads each remaining line, passing it to the block + +Writing: + +- [Kernel#puts][kernel#puts]: writes given objects, each followed by newline. + +## Character \IO + +Reading: + +- #each_char: reads each remaining character, passing it to the block. +- #getc: reads and returns the next character. + +Writing: + +- #putc: writes the given character. +- #ungetc.: unshifts the given character. + +## Byte \IO + +Reading: + +- #each_byte: reads each remaining byte, passing it to the block. +- #getbyte: reads and returns the next byte. + +Writing: + +- #ungetbyte: unshifts the given byte. + +## Codepoint \IO + +Reading: + +- #each_codepoint: reads each remaining codepoint, passing it to the block. + +[bom]: https://en.wikipedia.org/wiki/Byte_order_mark +[encodings document]: https://docs.ruby-lang.org/en/master/language/encodings_rdoc.html +[io class]: https://docs.ruby-lang.org/en/master/IO.html +[kernel#puts]: https://docs.ruby-lang.org/en/master/Kernel.html#method-i-puts +[kernel#readline]: https://docs.ruby-lang.org/en/master/Kernel.html#method-i-readline + +[basic reading]: rdoc-ref:StringIO@Basic+Reading +[basic writing]: rdoc-ref:StringIO@Basic+Writing +[bom (byte order mark)]: rdoc-ref:StringIO@BOM+Byte+Order+Mark +[data mode]: rdoc-ref:StringIO@Data+Mode +[encodings]: rdoc-ref:StringIO@Encodings +[end-of-stream]: rdoc-ref:StringIO@End-of-Stream +[line number]: rdoc-ref:StringIO@Line+Number +[open/closed streams]: rdoc-ref:StringIO@OpenClosed+Streams +[position]: rdoc-ref:StringIO@Position +[read/write mode]: rdoc-ref:StringIO@ReadWrite+Mode diff --git a/doc/strscan/.document b/doc/strscan/.document new file mode 100644 index 0000000000..b8085a8474 --- /dev/null +++ b/doc/strscan/.document @@ -0,0 +1 @@ +helper_methods.md diff --git a/doc/strscan/link_refs.txt b/doc/strscan/link_refs.txt index 19f6f7ce5c..04c4419982 100644 --- a/doc/strscan/link_refs.txt +++ b/doc/strscan/link_refs.txt @@ -1,5 +1,5 @@ [1]: rdoc-ref:StringScanner@Stored+String -[2]: rdoc-ref:StringScanner@Byte+Position+-28Position-29 +[2]: rdoc-ref:StringScanner@Byte+Position+Position [3]: rdoc-ref:StringScanner@Target+Substring [4]: rdoc-ref:StringScanner@Setting+the+Target+Substring [5]: rdoc-ref:StringScanner@Traversing+the+Target+Substring diff --git a/doc/strscan/methods/get_byte.md b/doc/strscan/methods/get_byte.md index 3208d77158..775226638e 100644 --- a/doc/strscan/methods/get_byte.md +++ b/doc/strscan/methods/get_byte.md @@ -1,6 +1,3 @@ -call-seq: - get_byte -> byte_as_character or nil - Returns the next byte, if available: - If the [position][2] diff --git a/doc/strscan/methods/get_charpos.md b/doc/strscan/methods/get_charpos.md index 954fcf5b44..4de07897dc 100644 --- a/doc/strscan/methods/get_charpos.md +++ b/doc/strscan/methods/get_charpos.md @@ -1,6 +1,3 @@ -call-seq: - charpos -> character_position - Returns the [character position][7] (initially zero), which may be different from the [byte position][2] given by method #pos: diff --git a/doc/strscan/methods/get_pos.md b/doc/strscan/methods/get_pos.md index 81bbb2345e..56b1636812 100644 --- a/doc/strscan/methods/get_pos.md +++ b/doc/strscan/methods/get_pos.md @@ -1,6 +1,3 @@ -call-seq: - pos -> byte_position - Returns the integer [byte position][2], which may be different from the [character position][7]: diff --git a/doc/strscan/methods/getch.md b/doc/strscan/methods/getch.md index 3dd70e4c5b..ede1d2b071 100644 --- a/doc/strscan/methods/getch.md +++ b/doc/strscan/methods/getch.md @@ -1,6 +1,3 @@ -call-seq: - getch -> character or nil - Returns the next (possibly multibyte) character, if available: diff --git a/doc/strscan/methods/scan.md b/doc/strscan/methods/scan.md index 22ddd368b6..805c797913 100644 --- a/doc/strscan/methods/scan.md +++ b/doc/strscan/methods/scan.md @@ -1,6 +1,3 @@ -call-seq: - scan(pattern) -> substring or nil - Attempts to [match][17] the given `pattern` at the beginning of the [target substring][3]. diff --git a/doc/strscan/methods/scan_until.md b/doc/strscan/methods/scan_until.md index 9a8c7c02f6..5fb2912a1b 100644 --- a/doc/strscan/methods/scan_until.md +++ b/doc/strscan/methods/scan_until.md @@ -1,6 +1,3 @@ -call-seq: - scan_until(pattern) -> substring or nil - Attempts to [match][17] the given `pattern` anywhere (at any [position][2]) in the [target substring][3]. diff --git a/doc/strscan/methods/set_pos.md b/doc/strscan/methods/set_pos.md index 3b7abe65e3..6a43edeb41 100644 --- a/doc/strscan/methods/set_pos.md +++ b/doc/strscan/methods/set_pos.md @@ -1,7 +1,3 @@ -call-seq: - pos = n -> n - pointer = n -> n - Sets the [byte position][2] and the [character position][11]; returns `n`. diff --git a/doc/strscan/methods/skip.md b/doc/strscan/methods/skip.md index 10a329e0e4..7e924b624b 100644 --- a/doc/strscan/methods/skip.md +++ b/doc/strscan/methods/skip.md @@ -1,6 +1,3 @@ -call-seq: - skip(pattern) match_size or nil - Attempts to [match][17] the given `pattern` at the beginning of the [target substring][3]; diff --git a/doc/strscan/methods/skip_until.md b/doc/strscan/methods/skip_until.md index b7dacf6da1..a0ffab0b84 100644 --- a/doc/strscan/methods/skip_until.md +++ b/doc/strscan/methods/skip_until.md @@ -1,13 +1,11 @@ -call-seq: - skip_until(pattern) -> matched_substring_size or nil - Attempts to [match][17] the given `pattern` -anywhere (at any [position][2]) in the [target substring][3]; -does not modify the positions. +anywhere (at any [position][2]) in the [target substring][3]. If the match attempt succeeds: - Sets [match values][9]. +- Sets the [byte position][2] to the end of the matched substring; + may adjust the [character position][7]. - Returns the size of the matched substring. ```rb @@ -42,6 +40,7 @@ If the match attempt fails: - Clears match values. - Returns `nil`. +- Does not update positions. ```rb scanner.skip_until(/nope/) # => nil diff --git a/doc/strscan/methods/terminate.md b/doc/strscan/methods/terminate.md index b03b37d2a2..27f7d41cb1 100644 --- a/doc/strscan/methods/terminate.md +++ b/doc/strscan/methods/terminate.md @@ -1,6 +1,3 @@ -call-seq: - terminate -> self - Sets the scanner to end-of-string; returns +self+: diff --git a/doc/strscan/strscan.md b/doc/strscan/strscan.md index 1211a687c2..bbdeccd75e 100644 --- a/doc/strscan/strscan.md +++ b/doc/strscan/strscan.md @@ -37,7 +37,7 @@ Some examples here assume that certain helper methods are defined: - `match_values_cleared?(scanner)`: Returns whether the scanner's [match values][9] are cleared. -See examples [here][ext/strscan/helper_methods_md.html]. +See examples at [helper methods](doc/strscan/helper_methods.md). ## The `StringScanner` \Object @@ -204,7 +204,7 @@ put_situation(scanner) ## Target Substring -The target substring is the the part of the [stored string][1] +The target substring is the part of the [stored string][1] that extends from the current [byte position][2] to the end of the stored string; it is always either: @@ -417,7 +417,7 @@ Each of these methods returns a captured match value: | Method | Return After Match | Return After No Match | |-----------------|-----------------------------------------|-----------------------| | #size | Count of captured substrings. | +nil+. | -| #[](n) | <tt>n</tt>th captured substring. | +nil+. | +| #\[\](n) | <tt>n</tt>th captured substring. | +nil+. | | #captures | Array of all captured substrings. | +nil+. | | #values_at(*n) | Array of specified captured substrings. | +nil+. | | #named_captures | Hash of named captures. | <tt>{}</tt>. | diff --git a/doc/syntax.rdoc b/doc/syntax.rdoc index cb427b6f0f..a48c83ff15 100644 --- a/doc/syntax.rdoc +++ b/doc/syntax.rdoc @@ -2,6 +2,9 @@ The Ruby syntax is large and is split up into the following sections: +{Code Layout}[rdoc-ref:syntax/layout.rdoc] :: + Breaking code in lines + Literals[rdoc-ref:syntax/literals.rdoc] :: Numbers, Strings, Arrays, Hashes, etc. diff --git a/doc/syntax/assignment.rdoc b/doc/syntax/assignment.rdoc index 68d4ae97be..3988f82e5f 100644 --- a/doc/syntax/assignment.rdoc +++ b/doc/syntax/assignment.rdoc @@ -9,7 +9,7 @@ Assignment creates a local variable if the variable was not previously referenced. An assignment expression result is always the assigned value, including -{assignment methods}[rdoc-ref:syntax/assignment.rdoc@Assignment+Methods]. +{assignment methods}[rdoc-ref:@Assignment+Methods]. == Local Variable Names @@ -279,7 +279,7 @@ An uninitialized global variable has a value of +nil+. Ruby has some special globals that behave differently depending on context such as the regular expression match variables or that have a side-effect when -assigned to. See the {global variables documentation}[rdoc-ref:globals.md] +assigned to. See the {global variables documentation}[rdoc-ref:language/globals.md] for details. == Assignment Methods diff --git a/doc/syntax/calling_methods.rdoc b/doc/syntax/calling_methods.rdoc index bf5916e99a..a24c5fbf1f 100644 --- a/doc/syntax/calling_methods.rdoc +++ b/doc/syntax/calling_methods.rdoc @@ -355,9 +355,8 @@ as one argument: # Prints the object itself: # #<Name:0x00007f9d07bca650 @name="Jane Doe"> -This allows to handle one or many arguments polymorphically. Note also that +nil+ -has NilClass#to_a defined to return an empty array, so conditional unpacking is -possible: +This allows to handle one or many arguments polymorphically. Note also that <tt>*nil</tt> +is unpacked to an empty list of arguments, so conditional unpacking is possible: my_method(*(some_arguments if some_condition?)) @@ -426,7 +425,7 @@ as keyword arguments: name = Name.new('Jane Doe') p(**name) - # Prints: {name: "Jane", last: "Doe"} + # Prints: {first: "Jane", last: "Doe"} Unlike <code>*</code> operator, <code>**</code> raises an error when used on an object that doesn't respond to <code>#to_hash</code>. The one exception is diff --git a/doc/syntax/comments.rdoc b/doc/syntax/comments.rdoc index 00d19d588a..cb6829a984 100644 --- a/doc/syntax/comments.rdoc +++ b/doc/syntax/comments.rdoc @@ -170,7 +170,7 @@ In this mode, all values assigned to constants are made shareable. # shareable_constant_value: experimental_everything FOO = Set[1, 2, {foo: []}] - # same as FOO = Ractor.make_sharable(...) + # same as FOO = Ractor.make_shareable(...) # OR same as `FOO = Set[1, 2, {foo: [].freeze}.freeze].freeze` var = [{foo: []}] diff --git a/doc/syntax/layout.rdoc b/doc/syntax/layout.rdoc new file mode 100644 index 0000000000..31e51d9ff1 --- /dev/null +++ b/doc/syntax/layout.rdoc @@ -0,0 +1,118 @@ += Code Layout + +Expressions in Ruby are separated by line breaks: + + x = 1 + y = 2 + z = x + y + +Line breaks are also used as logical separators of the headers of some control structures from their bodies: + + if z > 3 # line break ends the condition and starts the body + puts "more" + end + + while x < 3 # line break ends the condition and starts the body + x += 1 + end + +<tt>;</tt> can be used as an expressions separator instead of a line break: + + x = 1; y = 2; z = x + y + if z > 3; puts "more"; end + +Traditionally, expressions separated by <tt>;</tt> are used only in short scripts and experiments. + +In some control structures, there is an optional keyword that can be used instead of a line break to separate their elements: + + # if, elsif, until and case ... when: 'then' is an optional separator: + + if z > 3 then puts "more" end + + case x + when Numeric then "number" + when String then "string" + else "object" + end + + # while and until: 'do' is an optional separator + while x < 3 do x +=1 end + +Also, line breaks can be skipped in some places where it doesn't create any ambiguity. Note in the example above: no line break needed before +end+, just as no line break needed after +else+. + +== Breaking expressions in lines + +One expression might be split into several lines when each line can be unambiguously identified as "incomplete" without the next one. + +These works: + + x = # incomplete without something after = + 1 + # incomplete without something after + + 2 + + File.read "test.txt", # incomplete without something after , + enconding: "utf-8" + +These would not: + + # unintended interpretation: + x = 1 # already complete expression + + 2 # interpreted as a separate +2 + + # syntax error: + File.read "test.txt" # already complete expression + , encoding: "utf-8" # attempt to parse as a new expression, SyntaxError + +The exceptions to the rule are lines starting with <tt>.</tt> ("leading dot" style of method calls) or logical operators <tt>&&</tt>/<tt>||</tt> and <tt>and</tt>/<tt>or</tt>: + + # OK, interpreted as a chain of calls + File.read('test.txt') + .strip("\n") + .split("\t") + .sort + + # OK, interpreted as a chain of logical operators: + File.empty?('test.txt') + || File.size('test.txt') < 10 + || File.read('test.txt').strip.empty? + +If the expressions is broken into multiple lines in any of the ways described above, comments between separate lines are allowed: + + sum = base_salary + + # see "yearly bonuses section" + yearly_bonus(year) + + # per-employee coefficient is described + # in another module + personal_coeff(employee) + + # We want to short-circuit on empty files + File.empty?('test.txt') + # Or almost empty ones + || File.size('test.txt') < 10 + # Otherwise we check if it is full of spaces + || File.read('test.txt').strip.empty? + +Finally, the code can explicitly tell Ruby that the expression is continued on the next line with <tt>\\</tt>: + + # Unusual, but works + File.read "test.txt" \ + , encoding: "utf-8" + + # More regular usage (joins the strings on parsing instead + # of concatenating them in runtime, as + would do): + TEXT = "One pretty long line" \ + "one more long line" \ + "one other line of the text" + +The <tt>\\</tt> works as a parse time line break escape, so with it, comments can not be inserted between the lines: + + TEXT = "line 1" \ + # here would be line 2: + "line 2" + + # This is interpreted as if there was no line break where \ is, + # i.e. the same as + TEXT = "line 1" # here would be line 2: + "line 2" + + puts TEXT #=> "line 1" diff --git a/doc/syntax/literals.rdoc b/doc/syntax/literals.rdoc index 46bb7673f3..c876558d4e 100644 --- a/doc/syntax/literals.rdoc +++ b/doc/syntax/literals.rdoc @@ -3,7 +3,7 @@ Literals create objects you can use in your program. Literals include: * {Boolean and Nil Literals}[#label-Boolean+and+Nil+Literals] -* {Number Literals}[#label-Number+Literals] +* {Numeric Literals}[#label-Numeric+Literals] * {Integer Literals}[#label-Integer+Literals] * {Float Literals}[#label-Float+Literals] @@ -36,7 +36,7 @@ Literals create objects you can use in your program. Literals include: +true+ is a true value. All objects except +nil+ and +false+ evaluate to a true value in conditional expressions. -== Number Literals +== \Numeric Literals === \Integer Literals @@ -547,6 +547,13 @@ with <tt>%w</tt> (non-interpolable) or <tt>%W</tt> (interpolable): # (not nested array). %w[foo[bar baz]qux] # => ["foo[bar", "baz]qux"] +The interpolated string is treated as a single word even if it contains +whitespace. + + s = "bar baz" + %W[foo #{s} zot] #=> ["foo", "bar baz", "zot"] + %W[foo #{"bar baz zot"} qux] # => ["foo", "bar baz zot", "qux"] + The following characters are considered as white spaces to separate words: * space, ASCII 20h (SPC) diff --git a/doc/syntax/methods.rdoc b/doc/syntax/methods.rdoc index 8dafa6bb0c..14810a188f 100644 --- a/doc/syntax/methods.rdoc +++ b/doc/syntax/methods.rdoc @@ -100,6 +100,7 @@ operators. <code>/</code> :: divide <code>%</code> :: modulus division, String#% <code>&</code> :: AND +<code>|</code> :: OR <code>^</code> :: XOR (exclusive OR) <code>>></code> :: right-shift <code><<</code> :: left-shift, append diff --git a/doc/syntax/pattern_matching.rdoc b/doc/syntax/pattern_matching.rdoc index c43919ba14..06aae26d49 100644 --- a/doc/syntax/pattern_matching.rdoc +++ b/doc/syntax/pattern_matching.rdoc @@ -253,11 +253,11 @@ The "rest" part of a pattern also can be bound to a variable: case {a: 1, b: 2} in {a: } | Array + # ^ SyntaxError (variable capture in alternative pattern) "matched: #{a}" else "not matched" end - # SyntaxError (illegal variable in alternative pattern (a)) Variables that start with <code>_</code> are the only exclusions from this rule: diff --git a/doc/syntax/refinements.rdoc b/doc/syntax/refinements.rdoc index 17d5e67c21..80595eb445 100644 --- a/doc/syntax/refinements.rdoc +++ b/doc/syntax/refinements.rdoc @@ -210,43 +210,58 @@ all refinements from the same module are active when a refined method == Method Lookup -When looking up a method for an instance of class +C+ Ruby checks: +Method lookup in Ruby is based on the ancestor chain. You can see the +ancestor chain for any object in Ruby by doing: -* If refinements are active for +C+, in the reverse order they were activated: - * The prepended modules from the refinement for +C+ - * The refinement for +C+ - * The included modules from the refinement for +C+ -* The prepended modules of +C+ -* +C+ -* The included modules of +C+ + object.singleton_class.ancestors + # or, if the object does not support a singleton class: + object.class.ancestors -If no method was found at any point this repeats with the superclass of +C+. +The ancestor chain is constructed as follows: -Note that methods in a subclass have priority over refinements in a -superclass. For example, if the method <code>/</code> is defined in a -refinement for Numeric <code>1 / 2</code> invokes the original Integer#/ -because Integer is a subclass of Numeric and is searched before the refinements -for the superclass Numeric. Since the method <code>/</code> is also present -in child +Integer+, the method lookup does not move up to the superclass. +* Subclasses are before superclasses in the ancestor chain +* Prepended modules are before the class they prepend in the ancestor + chain, in reverse order in which they were prepended. +* Included modules are after the class they are included in in the + ancestor chain, in reverse order in which they were included. + +When looking up a method for an object, Ruby goes through each ancestor: + +* If the class/module has been refined, Ruby will consider the refinements + activated at the point the method was called, in reverse order of + activation. +* Otherwise, Ruby will check the methods of the class/module itself. + +If no method was found at either point this repeats with the next +ancestor. -However, if a method +foo+ is defined on Numeric in a refinement, <code>1.foo</code> +Note that methods in a earlier ancestor have priority over refinements in a +later ancestor. For example, if the method <code>/</code> is defined in a +refinement for Numeric <code>1 / 2</code> invokes the original Integer#/ +because Integer is a comes before Numeric in the ancestor chain. However, +if a method +foo+ is defined on Numeric in a refinement, <code>1.foo</code> invokes that method since +foo+ does not exist on Integer. == +super+ -When +super+ is invoked method lookup checks: +When +super+ is invoked, method lookup starts: + +* If the method is in a refinement, at the refined class or module +* Otherwise, at the next ancestor + +Method lookup then proceeds as described in the Method Lookup section +above. -* The included modules of the current class. Note that the current class may - be a refinement. -* If the current class is a refinement, the method lookup proceeds as in the - Method Lookup section above. -* If the current class has a direct superclass, the method proceeds as in the - Method Lookup section above using the superclass. +Refinements activated at the call site of a refinement method do not +affect +super+ inside that method. Only refinements activated at the +point +super+ was called affect method lookup for that +super+ call. +You cannot use refinements to insert into the middle of a method +lookup chain, only to insert at the start of a method lookup chain, +unless you control the +super+ call sites. -Note that +super+ in a method of a refinement invokes the method in the -refined class even if there is another refinement which has been activated in -the same context. This is only true for +super+ in a method of a refinement, it -does not apply to +super+ in a method in a module that is included in a refinement. +Note that if you refine a module, the refinement method can call +super+ +to call the method in the module, but the method in the module cannot +call +super+ to continue the method lookup process to further ancestors. == Methods Introspection diff --git a/doc/yarv_frame_layout.md b/doc/yarv_frame_layout.md deleted file mode 100644 index ea8ad013cf..0000000000 --- a/doc/yarv_frame_layout.md +++ /dev/null @@ -1,77 +0,0 @@ -# YARV Frame Layout - -This document is an introduction to what happens on the VM stack as the VM -services calls. The code holds the ultimate truth for this subject, so beware -that this document can become stale. - -We'll walk through the following program, with explanation at selected points -in execution and abridged disassembly listings: - -```ruby -def foo(x, y) - z = x.casecmp(y) -end - -foo(:one, :two) -``` - -First, after arguments are evaluated and right before the `send` to `foo`: - -``` - ┌────────────┐ - putself │ :two │ - putobject :one 0x2 ├────────────┤ - putobject :two │ :one │ -► send <:foo, argc:2> 0x1 ├────────────┤ - leave │ self │ - 0x0 └────────────┘ -``` - -The `put*` instructions have pushed 3 items onto the stack. It's now time to -add a new control frame for `foo`. The following is the shape of the stack -after one instruction in `foo`: - -``` - cfp->sp=0x8 at this point. - 0x8 ┌────────────┐◄──Stack space for temporaries - │ :one │ live above the environment. - 0x7 ├────────────┤ - getlocal x@0 │ < flags > │ foo's rb_control_frame_t -► getlocal y@1 0x6 ├────────────┤◄──has cfp->ep=0x6 - send <:casecmp, argc:1> │ <no block> │ - dup 0x5 ├────────────┤ The flags, block, and CME triple - setlocal z@2 │ <CME: foo> │ (VM_ENV_DATA_SIZE) form an - leave 0x4 ├────────────┤ environment. They can be used to - │ z (nil) │ figure out what local variables - 0x3 ├────────────┤ are below them. - │ :two │ - 0x2 ├────────────┤ Notice how the arguments, now - │ :one │ locals, never moved. This layout - 0x1 ├────────────┤ allows for argument transfer - │ self │ without copying. - 0x0 └────────────┘ -``` - -Given that locals have lower address than `cfp->ep`, it makes sense then that -`getlocal` in `insns.def` has `val = *(vm_get_ep(GET_EP(), level) - idx);`. -When accessing variables in the immediate scope, where `level=0`, it's -essentially `val = cfp->ep[-idx];`. - -Note that this EP-relative index has a different basis the index that comes -after "@" in disassembly listings. The "@" index is relative to the 0th local -(`x` in this case). - -## Q&A - -Q: It seems that the receiver is always at an offset relative to EP, - like locals. Couldn't we use EP to access it instead of using `cfp->self`? - -A: Not all calls put the `self` in the callee on the stack. Two - examples are `Proc#call`, where the receiver is the Proc object, but `self` - inside the callee is `Proc#receiver`, and `yield`, where the receiver isn't - pushed onto the stack before the arguments. - -Q: Why have `cfp->ep` when it seems that everything is below `cfp->sp`? - -A: In the example, `cfp->ep` points to the stack, but it can also point to the - GC heap. Blocks can capture and evacuate their environment to the heap. diff --git a/doc/yarvarch.en b/doc/yarvarch.en deleted file mode 100644 index 7a76e25b7e..0000000000 --- a/doc/yarvarch.en +++ /dev/null @@ -1,7 +0,0 @@ -#title YARV: Yet another RubyVM - Software Architecture - -maybe writing. - -* YARV instruction set - -<%= d %> diff --git a/doc/yarvarch.ja b/doc/yarvarch.ja deleted file mode 100644 index 2739ec6b14..0000000000 --- a/doc/yarvarch.ja +++ /dev/null @@ -1,454 +0,0 @@ -#title YARVアーキテクチャ -#set author 日本 Ruby の会 ささだこういち - - -- 2005-03-03(Thu) 00:31:12 +0900 いろいろと書き直し - ----- - -* これは? - -[[YARV: Yet Another RubyVM|http://www.atdot.net/yarv]] の 設計メモです。 - - -YARV は、Ruby プログラムのための次の機能を提供します。 - -- Compiler -- VM Generator -- VM (Virtual Machine) -- Assembler -- Dis-Assembler -- (experimental) JIT Compiler -- (experimental) AOT Compiler - - -現在の YARV は Ruby インタプリタの拡張ライブラリとして実装しています。こ -れにより、Ruby インタプリタの必要な機能(パーサ、オブジェクト管理、既存 -の拡張ライブラリ)などがほぼそのまま利用できます。 - -ただし、いくつかのパッチを Ruby インタプリタに当てなければなりません。 - -今後は、Ruby 本体のインタプリタ部分(eval.c)を置き換えることを目指して -開発を継続する予定です。 - - -* Compiler (compile.h, compile.c) - -コンパイラは、Ruby インタプリタのパーサによって生成された構文木(RNode -データによる木)を YARV 命令列に変換します。YARV 命令については後述しま -す。 - -とくに難しいことはしていませんが、スコープなどの開始時にローカル変数の初 -期化などを行い、あとは構文木を辿り変換していきます。 - -変換中は Ruby の Array オブジェクトに YARV 命令オブジェクト、およびオペ -ランドを格納していき、最後に実行できる形に変換します。コンパイラでは、コ -ンパイル中に生成するメモリ領域の管理が問題になることがありますが、YARV -の場合、Ruby インタプリタがすべて面倒をみてくれるのでこの部分は非常に楽 -に作ることができました(ガーベージコレクタによって自動的にメモリ管理をし -てくれるため)。 - -YARV 命令は、命令を示す識別子、オペランドなど、すべて 1 word (マシンで -表現できる自然な値。C 言語ではポインタのサイズ。Ruby インタプリタ用語で -は VALUE のサイズ)で表現されます。そのため、YARV 命令はいわゆる「バイト -コード」ではありません。そのため、YARV の説明などでは「命令列」という用 -語を使っています。 - -1 word であるため、メモリの利用効率は多少悪くなりますが、アクセス速度な -どを考慮すると、本方式が一番いいと考えております。たとえばオペランドをコ -ンスタントプールに格納し、インデックスのみをオペランドで示すことも可能で -すが、間接アクセスになってしまうので性能に影響が出るため、却下しました。 - - -* VM Generator (rb/insns2vm.rb, insns.def) - -rb/insns2vm.rb というスクリプトは、insns.def というファイルを読み込み、 -VM のために必要なファイルを生成します。具体的には、命令を実行する部分を -生成しますが、ほかにもコンパイルに必要な情報、最適化に必要な情報、やアセ -ンブラ、逆アセンブラに必要な情報を示すファイルも生成します。 - - -** 命令記述 - -insns.def には、各命令がどのような命令であるかを記述します。具体的には次 -の情報を記述します。 - -- 命令の名前 -- その命令のカテゴリ、コメント(英語、日本語) -- オペランドの名前 -- その命令実行前にスタックからポップする値 -- その命令実行後にスタックにプッシュする値 -- その命令のロジック(C 言語で記述) - -たとえば、スタックに self をおく putself という命令は次のように記述しま -す。 - -#code -/** - @c put - @e put self. - @j self を置く。 - */ -DEFINE_INSN -putself -() -() -(VALUE val) -{ - val = GET_SELF(); -} -#end - -この場合、オペランドと、スタックからポップする値は無いことになります。命 -令終了後、self をスタックトップに置きたいわけですが、それは val という、 -スタックにプッシュする値として宣言しておいた変数に代入しておくことで、こ -れを変換するとスタックトップに置く C プログラムが生成されます。 - -細かいフォーマットは insns.def の冒頭を参照してください。そんなに難しく -ないと思います。 - -insnhelper.h というファイルに、命令ロジックを記述するために必要なマクロ -が定義されています。また、VM の内部構造に関する定義は vm.h というファイ -ルにあります。 - - -* VM (Virtual Machine, vm.h, vm.c) - -VM は、実際にコンパイルした結果生成される YARV 命令列を実行します。まさ -に、この部分が YARV のキモになり、将来的には eval.c をこの VM で置き換え -たいと考えています。 - -現在の Ruby インタプリタで実行できるすべてのことが、この VM で実現できる -ように作っています(現段階ではまだ完全ではありませんが、そうなるべきです)。 - -VM は、単純なスタックマシンとして実装しています。スレッドひとつにスタッ -クひとつを保持します。スタックの領域はヒープから取得するので、柔軟な領域 -設定が可能です。 - - -** レジスタ - -VM は 5 つの仮想的なレジスタによって制御されます。 - -- PC (Program Counter) -- SP (Stack Pointer) -- CFP (Control Frame Pointer) -- LFP (Local Frame Pointer) -- DFP (Dynamic Frame Pointer) - -PC は現在実行中の命令列の位置を示します。SP はスタックトップの位置を示し -ます。CFP、LFP、DFP はそれぞれフレームの情報を示します。詳細は後述します。 - - -** スタックフレーム - -obsolete (update soon) - - -** フレームデザインについての補足 - -Lisp の処理系などをかんがえると、わざわざブロックローカルフレームとメソ -ッドローカルフレームのようなものを用意するのは奇異に見えるかもしれません。 -あるフレームを、入れ子構造にして、ローカル変数のアクセスはその入れ子を外 -側に辿れば必ずたどり着くことができるからです(つまり、lfp は必要ない)。 - -しかし、Ruby ではいくつか状況が違います。まず、メソッドローカルな情報が -あること、具体的にはブロックとself(callee からみると receiver)です。こ -の情報をそれぞれのフレームにもたせるのは無駄です。 - -また、Ruby2.0 からはブロックローカル変数はなくなります(ブロックローカル -引数は残るので、構造自体はあまり変わりません)。そのため、メソッドローカ -ル変数へのアクセスが頻発することが予想されます。 - -このとき、メソッドローカル変数へのアクセスのたびにフレーム(スコープ)の -リストをたどるのは無駄であると判断し、明示的にメソッドローカルスコープと -ブロックフレームを分離し、ブロックフレームからはメソッドローカルフレーム -が lfpレジスタによって容易にアクセスできるようにしました。 - - -** メソッド呼び出しについて - -メソッド呼び出しは、YARV 命令列で記述されたメソッドか、C で記述されたメ -ソッドかによってディスパッチ手法が変わります。 - -YARV 命令列であった場合、上述したスタックフレームを作成して命令を継続し -ます。とくに VM の関数を再帰呼び出すすることは行ないません。 - -C で記述されたメソッドだった場合、単純にその関数を呼び出します(ただし、 -バックトレースを正しく生成するためにメソッド呼び出しの情報を付加してから -行ないます)。 - -このため、VM 用スタックを別途用意したものの、プログラムによってはマシン -スタックを使い切ってしまう可能性があります(C -> Ruby -> C -> ... という -呼び出しが続いた場合)。これは、現在では避けられない仕様となっています。 - - -** 例外 - -例外は、Java の JVM と同様に例外テーブルを用意することで実現します。例外 -が発生したら、当該フレームを、例外テーブルを検査します。そこで、例外が発 -生したときの PC の値に合致するエントリがあった場合、そのエントリに従って -動作します。もしエントリが見つからなかった場合、スタックを撒き戻してまた -同様にそのスコープの例外テーブルを検査します。 - -また、break、return(ブロック中)、retry なども同様の仕組みで実現します。 - -*** 例外テーブル - -例外テーブルエントリは具体的には次の情報が格納されています。 - -- 対象とする PC の範囲 -- 対象とする例外の種類 -- もし対象となったときにジャンプする先(種類による) -- もし対象となったときに起動するブロックの iseq - - -*** rescue - -rescue 節はブロックとして実現しています。$! の値を唯一の引数として持ちま -す。 - -#code -begin -rescue A -rescue B -rescue C -end -#end - -は、次のような Ruby スクリプトに変換されます。 - -#code -{|err| - case err - when A === err - when B === err - when C === err - else - raise # yarv の命令では throw - end -} -#end - - -*** ensure - -正常系(例外が発生しなかった場合)と異常系(例外が発生したときなど)の2 -種類の命令列が生成されます。正常系では、ただの連続したコード領域としてコ -ンパイルされます。また、異常系ではブロックとして実装します。最後は必ず -throw 命令で締めることになります。 - - -*** break, return(ブロック中)、retry - -break 文、ブロック中の return 文、retry 文は throw 命令としてコンパイル -されます。どこまで戻るかは、break をフックする例外テーブルのエントリが判 -断します。 - - -** 定数の検索 - -定数という名前なのに、Ruby ではコンパイル時に決定しません。というか、い -つまでも再定義可能になっています。 - -定数アクセスのためのRuby記述は次のようになります。 - -#code -Ruby表現: -expr::ID::...::ID -#end - -これは、yarv命令セットでは次のようになります。 - -#code -(expr) -getconstant ID -... -getconstant ID -#end - - -*** 定数検索パス - -もし expr が nil だった場合、定数検索パスに従って定数を検索します。この -挙動は今後 Ruby 2.0 に向けて変更される場合があります。 - -+ クラス、モジュールの動的ネスト関係(プログラムの字面上)をルートまで辿る -+ 継承関係をルート(Object)まで辿る - -このため、クラス、モジュールの動的ネスト関係を保存しなければなりません。 -このために、thread_object には klass_nest_stack というものを用意しました。 -これは、現在のネストの情報を保存します。 - -メソッド定義時、その現在のネスト情報をメソッド定義時に(dupして)加える -ことで、そのメソッドの実行時、そのネスト情報を参照することが可能になりま -す。 - -トップレベルでは、その情報はないことになります。 - -クラス/モジュール定義文実行時は、現在の情報そのものを参照することになり -ます。これは、クラススコープ突入時、その情報をクラス定義文にコピーします -(すでにコピーされていれば、これを行いません)。 - -これにより、動的なネスト情報を統一的に扱うことができます。 - - -** 最適化手法 - -YARV では高速化を目的としているので、さまざまな最適化手法を利用していま -す。詳細は割愛しますが、以下に述べる最適化などを行なっております。 - - -*** threaded code - -GCC の C 言語拡張である値としてのラベルを利用して direct threaded code -を実現しています。 - - -*** Peephole optimization - -いくつかの簡単な最適化をしています。 - - -*** inline method cache - -命令列の中にメソッド検索結果を埋め込みます。 - - -*** inline constant cache - -命令列の中に定数検索結果を埋め込みます。 - - -*** ブロックと Proc オブジェクトの分離 - -ブロック付きメソッド呼び出しが行なわれたときにはすぐにはブロックを Proc -オブジェクトとして生成しません。これにより、必要ない Proc オブジェクトの -生成を抑えています。 - -Proc メソッドは、実際に必要になった時点で作られ、そのときに環境(スコー -プ上に確保された変数など)をヒープに保存します。 - - -*** 特化命令 - -Fixnum 同士の加算などを正直に関数呼び出しによって行なうと、コストがかか -るので、これらのプリミティブな操作を行なうためのメソッド呼び出しは専用命 -令を用意しました。 - - -*** 命令融合 - -複数の命令を 1 命令に変換します。融合命令は opt_insn_unif.def の記述によ -り自動的に生成されます。 - - -*** オペランド融合 - -複数のオペランドを含めた命令を生成します。融合命令は opt_operand.def の -記述によって自動的に生成されます。 - - -*** stack caching - -スタックトップを仮想レジスタに保持するようにします。現在は 2 個の仮想レ -ジスタを想定し、5状態のスタックキャッシングを行ないます。スタックキャッ -シングする命令は自動的に生成されます。 - - -*** JIT Compile - -機械語を切り貼りします。非常に実験的なコードものしか作っておりません。ほ -とんどのプログラムは動きません。 - - -*** AOT Compile - -YARV 命令列を C 言語に変換します。まだ十分な最適化を行なえておりませんが、 -それなりに動きます。rb/aotc.rb がコンパイラです。 - - -* Assembler (rb/yasm.rb) - -YARV 命令列のアセンブラを用意しました。使い方は rb/yasm.rb を参照してく -ださい(まだ、例示してある生成手法のすべてをサポートしているわけではあり -ません)。 - - -* Dis-Assembler (disasm.c) - -YARV 命令列を示すオブジェクト YARVCore::InstructionSequence には disasm -メソッドがあります。これは、命令列を逆アセンブルした文字列を返します。 - - -* YARV 命令セット - -<%= d %> - -* その他 - -** テスト - -test/test_* がテストケースです。一応、ミスなく動くはずです。逆にいうと、 -このテストに記述されている例ではきちんと動作するということです。 - - -** ベンチマーク - -benchmark/bm_* にベンチマークプログラムがおいてあります。 - - -** 今後の予定 - -まだまだやらなければいけないこと、未実装部分がたくさんありますんでやって -いかなければなりません。一番大きな目標は eval.c を置き換えることでしょう -か。 - - -*** Verifier - -YARV 命令列は、ミスがあっても動かしてしまうため危険である可能性がありま -す。そのため、スタックの利用状態をきちんと事前に検証するようなベリファイ -アを用意しなければならないと考えています。 - - -*** Compiled File の構想 - -Ruby プログラムをこの命令セットにシリアライズしたデータ構造をファイルに -出力できるようにしたいと考えています。これを利用して一度コンパイルした命 -令列をファイルに保存しておけば、次回ロード時にはコンパイルの手間、コスト -を省くことができます。 - - -**** 全体構成 - -次のようなファイル構成を考えていますが、まだ未定です。 - -#code -u4 : 4 byte unsigned storage -u2 : 2 byte unsigned storage -u1 : 1 byte unsigned storage - -every storages are little endian :-) - -CompiledFile{ - u4 magic; - - u2 major; - u2 minor; - - u4 character_code; - - u4 constants_pool_count; - ConstantEntry constants_pool[constants_pool_count]; - - u4 block_count; - blockEntry blocks[block_count]; - - u4 method_count; - MethodEntry methods[method_count]; -} -#end - -Java classfile のパクリ。 - diff --git a/doc/zjit.md b/doc/zjit.md deleted file mode 100644 index 90f890bfa0..0000000000 --- a/doc/zjit.md +++ /dev/null @@ -1,133 +0,0 @@ -# ZJIT: ADVANCED RUBY JIT PROTOTYPE - -## Build Instructions - -To build ZJIT on macOS: -``` -./autogen.sh -./configure --enable-zjit=dev --prefix=$HOME/.rubies/ruby-zjit --disable-install-doc --with-opt-dir="$(brew --prefix openssl):$(brew --prefix readline):$(brew --prefix libyaml)" -make -j miniruby -``` - -## Useful dev commands - -To view YARV output for code snippets: -``` -./miniruby --dump=insns -e0 -``` - -To run code snippets with ZJIT: -``` -./miniruby --zjit -e0 -``` - -You can also try https://www.rubyexplorer.xyz/ to view Ruby YARV disasm output with syntax highlighting -in a way that can be easily shared with other team members. - -## Testing - -Make sure you have a `--enable-zjit=dev` build, and run `brew install cargo-nextest` first. - -### make zjit-check - -This command runs all ZJIT tests: `make zjit-test` and `test/ruby/test_zjit.rb`. - -``` -make zjit-check -``` - -### make zjit-test - -This command runs Rust unit tests. - -``` -make zjit-test -``` - -You can also run a single test case by specifying the function name: - -``` -make zjit-test ZJIT_TESTS=test_putobject -``` - -If you expect that your changes cause tests to fail and they do, you can have -`expect-test` fix the expected value for you by putting `UPDATE_EXPECT=1` -before your test command, like so: - -``` -UPDATE_EXPECT=1 make zjit-test ZJIT_TESTS=test_putobject -``` - -Test changes will be reviewed alongside code changes. - -<details> - -<summary>Setting up zjit-test</summary> - -ZJIT uses `cargo-nextest` for Rust unit tests instead of `cargo test`. -`cargo-nextest` runs each test in its own process, which is valuable since -CRuby only supports booting once per process, and most APIs are not thread -safe. Use `brew install cargo-nextest` to install it on macOS, otherwise, refer -to <https://nexte.st/docs/installation/pre-built-binaries/> for installation -instructions. - -Since it uses Cargo, you'll also need a `configure --enable-zjit=dev ...` build -for `make zjit-test`. Since the tests need to link against CRuby, directly -calling `cargo test`, or `cargo nextest` likely won't build. Make sure to -use `make`. - -</details> - -### make zjit-test-all - -``` -make zjit-test-all -``` - -This command runs all Ruby tests under `/test/ruby/` with ZJIT enabled. - -Certain tests are excluded under `/test/.excludes-zjit`. - -### test/ruby/test\_zjit.rb - -This command runs Ruby execution tests. - -``` -make test-all TESTS="test/ruby/test_zjit.rb" -``` - -You can also run a single test case by matching the method name: - -``` -make test-all TESTS="test/ruby/test_zjit.rb -n TestZJIT#test_putobject" -``` - -## ZJIT Glossary - -This glossary contains terms that are helpful for understanding ZJIT. - -Please note that some terms may appear in CRuby internals too but with different meanings. - -| Term | Definition | -| --- | -----------| -| HIR | High-level Intermediate Representation. High-level (Ruby semantics) graph representation in static single-assignment (SSA) form | -| LIR | Low-level Intermediate Representation. Low-level IR used in the backend for assembly generation | -| SSA | Static Single Assignment. A form where each variable is assigned exactly once | -| `opnd` | Operand. An operand to an IR instruction (can be register, memory, immediate, etc.) | -| `dst` | Destination. The output operand of an instruction where the result is stored | -| VReg | Virtual Register. A virtual register that gets lowered to physical register or memory | -| `insn_id` | Instruction ID. An index of an instruction in a function | -| `block_id` | The index of a basic block, which effectively acts like a pointer | -| `branch` | Control flow edge between basic blocks in the compiled code | -| `cb` | Code Block. Memory region for generated machine code | -| `entry` | The starting address of compiled code for an ISEQ | -| Patch Point | Location in generated code that can be modified later in case assumptions get invalidated | -| Frame State | Captured state of the Ruby stack frame at a specific point for deoptimization | -| Guard | A run-time check that ensures assumptions are still valid | -| `invariant` | An assumption that JIT code relies on, requiring invalidation if broken | -| Deopt | Deoptimization. Process of falling back from JIT code to interpreter | -| Side Exit | Exit from JIT code back to interpreter | -| Type Lattice | Hierarchy of types used for type inference and optimization | -| Constant Folding | Optimization that evaluates constant expressions at compile time | -| RSP | x86-64 stack pointer register used for native stack operations | -| Register Spilling | Process of moving register values to memory when running out of physical registers | |
