summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/contributing/building_ruby.md145
-rw-r--r--doc/encodings.rdoc2
-rw-r--r--doc/exceptions.md362
-rw-r--r--doc/format_specifications.rdoc2
-rw-r--r--doc/strscan/helper_methods.md128
-rw-r--r--doc/strscan/link_refs.txt17
-rw-r--r--doc/strscan/methods/get_byte.md30
-rw-r--r--doc/strscan/methods/get_charpos.md19
-rw-r--r--doc/strscan/methods/get_pos.md14
-rw-r--r--doc/strscan/methods/getch.md43
-rw-r--r--doc/strscan/methods/scan.md51
-rw-r--r--doc/strscan/methods/scan_until.md52
-rw-r--r--doc/strscan/methods/set_pos.md27
-rw-r--r--doc/strscan/methods/skip.md43
-rw-r--r--doc/strscan/methods/skip_until.md49
-rw-r--r--doc/strscan/methods/terminate.md30
-rw-r--r--doc/strscan/strscan.md543
-rw-r--r--doc/syntax/literals.rdoc171
-rw-r--r--doc/syntax/pattern_matching.rdoc8
19 files changed, 1651 insertions, 85 deletions
diff --git a/doc/contributing/building_ruby.md b/doc/contributing/building_ruby.md
index 96cee40cb4..ce844b5026 100644
--- a/doc/contributing/building_ruby.md
+++ b/doc/contributing/building_ruby.md
@@ -8,28 +8,30 @@
For RubyGems, you will also need:
- * OpenSSL 1.1.x or 3.0.x / LibreSSL
- * libyaml 0.1.7 or later
- * zlib
+ * [OpenSSL] 1.1.x or 3.0.x / [LibreSSL]
+ * [libyaml] 0.1.7 or later
+ * [zlib]
If you want to build from the git repository, you will also need:
- * autoconf - 2.67 or later
- * gperf - 3.1 or later
+ * [autoconf] - 2.67 or later
+ * [gperf] - 3.1 or later
* Usually unneeded; only if you edit some source files using gperf
* ruby - 3.0 or later
- * We can upgrade this version to system ruby version of the latest Ubuntu LTS.
+ * We can upgrade this version to system ruby version of the latest
+ Ubuntu LTS.
2. Install optional, recommended dependencies:
- * libffi (to build fiddle)
- * gmp (if you with to accelerate Bignum operations)
- * libexecinfo (FreeBSD)
- * rustc - 1.58.0 or later, if you wish to build
- [YJIT](https://docs.ruby-lang.org/en/master/RubyVM/YJIT.html).
+ * [libffi] (to build fiddle)
+ * [gmp] (if you with to accelerate Bignum operations)
+ * [rustc] - 1.58.0 or later, if you wish to build
+ [YJIT](rdoc-ref:RubyVM::YJIT).
- If you installed the libraries needed for extensions (openssl, readline, libyaml, zlib) into other than the OS default place,
- typically using Homebrew on macOS, add `--with-EXTLIB-dir` options to `CONFIGURE_ARGS` environment variable.
+ If you installed the libraries needed for extensions (openssl, readline,
+ libyaml, zlib) into other than the OS default place, typically using
+ Homebrew on macOS, add `--with-EXTLIB-dir` options to `CONFIGURE_ARGS`
+ environment variable.
``` shell
export CONFIGURE_ARGS=""
@@ -38,6 +40,16 @@
done
```
+[OpenSSL]: https://www.openssl.org
+[LibreSSL]: https://www.libressl.org
+[libyaml]: https://github.com/yaml/libyaml/
+[zlib]: https://www.zlib.net
+[autoconf]: https://www.gnu.org/software/autoconf/
+[gperf]: https://www.gnu.org/software/gperf/
+[libffi]: https://sourceware.org/libffi/
+[gmp]: https://gmplib.org
+[rustc]: https://www.rust-lang.org
+
## Quick start guide
1. Download ruby source code:
@@ -46,8 +58,8 @@
1. Build from the tarball:
- Download the latest tarball from [ruby-lang.org](https://www.ruby-lang.org/en/downloads/) and
- extract it. Example for Ruby 3.0.2:
+ Download the latest tarball from [Download Ruby] page and extract
+ it. Example for Ruby 3.0.2:
``` shell
tar -xzf ruby-3.0.2.tar.gz
@@ -75,7 +87,8 @@
mkdir build && cd build
```
- While it's not necessary to build in a separate directory, it's good practice to do so.
+ While it's not necessary to build in a separate directory, it's good
+ practice to do so.
3. We'll install Ruby in `~/.rubies/ruby-master`, so create the directory:
@@ -89,7 +102,8 @@
../configure --prefix="${HOME}/.rubies/ruby-master"
```
- - Also `-C` (or `--config-cache`) would reduce time to configure from the next time.
+ - Also `-C` (or `--config-cache`) would reduce time to configure from the
+ next time.
5. Build Ruby:
@@ -105,16 +119,24 @@
make install
```
- - If you need to run `make install` with `sudo` and want to avoid document generation with different permissions, you can use
- `make SUDO=sudo install`.
+ - If you need to run `make install` with `sudo` and want to avoid document
+ generation with different permissions, you can use `make SUDO=sudo
+ install`.
+
+[Download Ruby]: https://www.ruby-lang.org/en/downloads/
### Unexplainable Build Errors
-If you are having unexplainable build errors, after saving all your work, try running `git clean -xfd` in the source root to remove all git ignored local files. If you are working from a source directory that's been updated several times, you may have temporary build artifacts from previous releases which can cause build failures.
+If you are having unexplainable build errors, after saving all your work, try
+running `git clean -xfd` in the source root to remove all git ignored local
+files. If you are working from a source directory that's been updated several
+times, you may have temporary build artifacts from previous releases which can
+cause build failures.
## Building on Windows
-The documentation for building on Windows can be found [here](../windows.md).
+The documentation for building on Windows can be found in [the separated
+file](../windows.md).
## More details
@@ -123,8 +145,9 @@ about Ruby's build to help out.
### Running make scripts in parallel
-In GNU make and BSD make implementations, to run a specific make script in parallel, pass the flag `-j<number of processes>`. For instance,
-to run tests on 8 processes, use:
+In GNU make[^caution-gmake-3] and BSD make implementations, to run a specific make script in
+parallel, pass the flag `-j<number of processes>`. For instance, to run tests
+on 8 processes, use:
``` shell
make test-all -j8
@@ -132,7 +155,9 @@ make test-all -j8
We can also set `MAKEFLAGS` to run _all_ `make` commands in parallel.
-Having the right `--jobs` flag will ensure all processors are utilized when building software projects. To do this effectively, you can set `MAKEFLAGS` in your shell configuration/profile:
+Having the right `--jobs` flag will ensure all processors are utilized when
+building software projects. To do this effectively, you can set `MAKEFLAGS` in
+your shell configuration/profile:
``` shell
# On macOS with Fish shell:
@@ -148,11 +173,15 @@ export MAKEFLAGS="--jobs "(nproc)
export MAKEFLAGS="--jobs $(nproc)"
```
+[^caution-gmake-3]: **CAUTION**: GNU make 3 is missing some features for parallel execution, we
+recommend to upgrade to GNU make 4 or later.
+
### Miniruby vs Ruby
-Miniruby is a version of Ruby which has no external dependencies and lacks certain features.
-It can be useful in Ruby development because it allows for faster build times. Miniruby is
-built before Ruby. A functional Miniruby is required to build Ruby. To build Miniruby:
+Miniruby is a version of Ruby which has no external dependencies and lacks
+certain features. It can be useful in Ruby development because it allows for
+faster build times. Miniruby is built before Ruby. A functional Miniruby is
+required to build Ruby. To build Miniruby:
``` shell
make miniruby
@@ -160,8 +189,9 @@ make miniruby
## Debugging
-You can use either lldb or gdb for debugging. Before debugging, you need to create a `test.rb`
-with the Ruby script you'd like to run. You can use the following make targets:
+You can use either lldb or gdb for debugging. Before debugging, you need to
+create a `test.rb` with the Ruby script you'd like to run. You can use the
+following make targets:
* `make run`: Runs `test.rb` using Miniruby
* `make lldb`: Runs `test.rb` using Miniruby in lldb
@@ -172,7 +202,8 @@ with the Ruby script you'd like to run. You can use the following make targets:
### Compiling for Debugging
-You should configure Ruby without optimization and other flags that may interfere with debugging:
+You should configure Ruby without optimization and other flags that may
+interfere with debugging:
``` shell
./configure --enable-debug-env optflags="-O0 -fno-omit-frame-pointer"
@@ -180,15 +211,23 @@ You should configure Ruby without optimization and other flags that may interfer
### Building with Address Sanitizer
-Using the address sanitizer (ASAN) is a great way to detect memory issues. It can detect memory safety issues in Ruby itself, and also in any C extensions compiled with and loaded into a Ruby compiled with ASAN.
+Using the address sanitizer (ASAN) is a great way to detect memory issues. It
+can detect memory safety issues in Ruby itself, and also in any C extensions
+compiled with and loaded into a Ruby compiled with ASAN.
``` shell
./autogen.sh
mkdir build && cd build
-../configure CC=clang cflags="-fsanitize=address -fno-omit-frame-pointer -DUSE_MN_THREADS=0" # and any other options you might like
+../configure CC=clang-18 cflags="-fsanitize=address -fno-omit-frame-pointer -DUSE_MN_THREADS=0" # and any other options you might like
make
```
-The compiled Ruby will now automatically crash with a report and a backtrace if ASAN detects a memory safety issue. To run Ruby's test suite under ASAN, issue the following command. Note that this will take quite a long time (over two hours on my laptop); the `RUBY_TEST_TIMEOUT_SCALE` and `SYNTAX_SUGEST_TIMEOUT` variables are required to make sure tests don't spuriously fail with timeouts when in fact they're just slow.
+
+The compiled Ruby will now automatically crash with a report and a backtrace
+if ASAN detects a memory safety issue. To run Ruby's test suite under ASAN,
+issue the following command. Note that this will take quite a long time (over
+two hours on my laptop); the `RUBY_TEST_TIMEOUT_SCALE` and
+`SYNTAX_SUGEST_TIMEOUT` variables are required to make sure tests don't
+spuriously fail with timeouts when in fact they're just slow.
``` shell
RUBY_TEST_TIMEOUT_SCALE=5 SYNTAX_SUGGEST_TIMEOUT=600 make check
@@ -196,11 +235,30 @@ RUBY_TEST_TIMEOUT_SCALE=5 SYNTAX_SUGGEST_TIMEOUT=600 make check
Please note, however, the following caveats!
-* ASAN will not work properly on any currently released version of Ruby; the necessary support is currently only present on Ruby's master branch (and the whole test suite passes only as of commit [9d0a5148ae062a0481a4a18fbeb9cfd01dc10428](https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/9d0a5148ae062a0481a4a18fbeb9cfd01dc10428))
-* Due to [this bug](https://bugs.ruby-lang.org/issues/20243), Clang generates code for threadlocal variables which doesn't work with M:N threading. Thus, it's necessary to disable M:N threading support at build time for now (with the `-DUSE_MN_THREADS=0` configure argument).
-* Currently, ASAN will only work correctly when using a recent head build of LLVM/Clang - it requires [this bugfix](https://github.com/llvm/llvm-project/pull/75290) related to multithreaded `fork`, which is not yet in any released version. See [here](https://llvm.org/docs/CMake.html) for instructions on how to build LLVM/Clang from source (note you will need at least the `clang` and `compiler-rt` projects enabled). Then, you will need to replace `CC=clang` in the instructions with an explicit path to your built Clang binary.
-* ASAN has only been tested so far with Clang on Linux. It may or may not work with other compilers or on other platforms - please file an issue on [https://bugs.ruby-lang.org](https://bugs.ruby-lang.org) if you run into problems with such configurations (or, to report that they actually work properly!)
-* In particular, although I have not yet tried it, I have reason to believe ASAN will _not_ work properly on macOS yet - the fix for the multithreaded fork issue was actually reverted for macOS (see [here](https://github.com/llvm/llvm-project/commit/2a03854e4ce9bb1bcd79a211063bc63c4657f92c)). Please open an issue on [https://bugs.ruby-lang.org](https://bugs.ruby-lang.org) if this is a problem for you.
+* ASAN will not work properly on any currently released version of Ruby; the
+ necessary support is currently only present on Ruby's master branch (and the
+ whole test suite passes only as of commit [Revision 9d0a5148]).
+* Due to [Bug #20243], Clang generates code for threadlocal variables which
+ doesn't work with M:N threading. Thus, it's necessary to disable M:N
+ threading support at build time for now (with the `-DUSE_MN_THREADS=0`
+ configure argument).
+* ASAN will only work when using Clang version 18 or later - it requires
+ [llvm/llvm-project#75290] related to multithreaded `fork`.
+* ASAN has only been tested so far with Clang on Linux. It may or may not work
+ with other compilers or on other platforms - please file an issue on
+ [Ruby Issue Tracking System] if you run into problems with such configurations
+ (or, to report that they actually work properly!)
+* In particular, although I have not yet tried it, I have reason to believe
+ ASAN will _not_ work properly on macOS yet - the fix for the multithreaded
+ fork issue was actually reverted for macOS (see [llvm/llvm-project#75659]).
+ Please open an issue on [Ruby Issue Tracking System] if this is a problem for
+ you.
+
+[Revision 9d0a5148]: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/9d0a5148ae062a0481a4a18fbeb9cfd01dc10428
+[Bug #20243]: https://bugs.ruby-lang.org/issues/20243
+[llvm/llvm-project#75290]: https://github.com/llvm/llvm-project/pull/75290
+[llvm/llvm-project#75659]: https://github.com/llvm/llvm-project/pull/75659#issuecomment-1861584777
+[Ruby Issue Tracking System]: https://bugs.ruby-lang.org
## How to measure coverage of C and Ruby code
@@ -217,11 +275,12 @@ make lcov
open lcov-out/index.html
```
-If you need only C code coverage, you can remove `COVERAGE=true` from the above process.
-You can also use `gcov` command directly to get per-file coverage.
+If you need only C code coverage, you can remove `COVERAGE=true` from the
+above process. You can also use `gcov` command directly to get per-file
+coverage.
-If you need only Ruby code coverage, you can remove `--enable-gcov`.
-Note that `test-coverage.dat` accumulates all runs of `make test-all`.
-Make sure that you remove the file if you want to measure one test run.
+If you need only Ruby code coverage, you can remove `--enable-gcov`. Note
+that `test-coverage.dat` accumulates all runs of `make test-all`. Make sure
+that you remove the file if you want to measure one test run.
You can see the coverage result of CI: https://rubyci.org/coverage
diff --git a/doc/encodings.rdoc b/doc/encodings.rdoc
index 97c0d22616..d85099cdbc 100644
--- a/doc/encodings.rdoc
+++ b/doc/encodings.rdoc
@@ -419,7 +419,7 @@ These keyword-value pairs specify encoding options:
hash = {"\u3042" => 'xyzzy'}
hash.default = 'XYZZY'
- s.encode('ASCII', fallback: h) # => "xyzzyfooXYZZY"
+ s.encode('ASCII', fallback: hash) # => "xyzzyfooXYZZY"
def (fallback = "U+%.4X").escape(x)
self % x.unpack("U")
diff --git a/doc/exceptions.md b/doc/exceptions.md
new file mode 100644
index 0000000000..4db2f26c18
--- /dev/null
+++ b/doc/exceptions.md
@@ -0,0 +1,362 @@
+# Exceptions
+
+Ruby code can raise exceptions.
+
+Most often, a raised exception is meant to alert the running program
+that an unusual (i.e., _exceptional_) situation has arisen,
+and may need to be handled.
+
+Code throughout the Ruby core, Ruby standard library, and Ruby gems generates exceptions
+in certain circumstances:
+
+```
+File.open('nope.txt') # Raises Errno::ENOENT: "No such file or directory"
+```
+
+## Raised Exceptions
+
+A raised exception transfers program execution, one way or another.
+
+### Unrescued Exceptions
+
+If an exception not _rescued_
+(see [Rescued Exceptions](#label-Rescued+Exceptions) below),
+execution transfers to code in the Ruby interpreter
+that prints a message and exits the program (or thread):
+
+```
+$ ruby -e "raise"
+-e:1:in `<main>': unhandled exception
+```
+
+### Rescued Exceptions
+
+An <i>exception handler</i> may determine what is to happen
+when an exception is raised;
+the handler may _rescue_ an exception,
+and may prevent the program from exiting.
+
+A simple example:
+
+```
+begin
+ raise 'Boom!' # Raises an exception, transfers control.
+ puts 'Will not get here.'
+rescue
+ puts 'Rescued an exception.' # Control transferred to here; program does not exit.
+end
+puts 'Got here.'
+```
+
+Output:
+
+```
+Rescued an exception.
+Got here.
+```
+
+An exception handler has several elements:
+
+| Element | Use |
+|-----------------------------|------------------------------------------------------------------------------------------|
+| Begin clause. | Begins the handler and contains the code whose raised exception, if any, may be rescued. |
+| One or more rescue clauses. | Each contains "rescuing" code, which is to be executed for certain exceptions. |
+| Else clause (optional). | Contains code to be executed if no exception is raised. |
+| Ensure clause (optional). | Contains code to be executed whether or not an exception is raised, or is rescued. |
+| <tt>end</tt> statement. | Ends the handler. ` |
+
+#### Begin Clause
+
+The begin clause begins the exception handler:
+
+- May start with a `begin` statement;
+ see also [Begin-Less Exception Handlers](#label-Begin-Less+Exception+Handlers).
+- Contains code whose raised exception (if any) is covered
+ by the handler.
+- Ends with the first following `rescue` statement.
+
+#### Rescue Clauses
+
+A rescue clause:
+
+- Starts with a `rescue` statement.
+- Contains code that is to be executed for certain raised exceptions.
+- Ends with the first following `rescue`,
+ `else`, `ensure`, or `end` statement.
+
+A `rescue` statement may include one or more classes
+that are to be rescued;
+if none is given, StandardError is assumed.
+
+The rescue clause rescues both the specified class
+(or StandardError if none given) or any of its subclasses;
+(see [Built-In Exception Classes](rdoc-ref:Exception@Built-In+Exception+Classes)
+for the hierarchy of Ruby built-in exception classes):
+
+
+```
+begin
+ 1 / 0 # Raises ZeroDivisionError, a subclass of StandardError.
+rescue
+ puts "Rescued #{$!.class}"
+end
+```
+
+Output:
+
+```
+Rescued ZeroDivisionError
+```
+
+If the `rescue` statement specifies an exception class,
+only that class (or one of its subclasses) is rescued;
+this example exits with a ZeroDivisionError,
+which was not rescued because it is not ArgumentError or one of its subclasses:
+
+```
+begin
+ 1 / 0
+rescue ArgumentError
+ puts "Rescued #{$!.class}"
+end
+```
+
+A `rescue` statement may specify multiple classes,
+which means that its code rescues an exception
+of any of the given classes (or their subclasses):
+
+```
+begin
+ 1 / 0
+rescue FloatDomainError, ZeroDivisionError
+ puts "Rescued #{$!.class}"
+end
+```
+
+An exception handler may contain multiple rescue clauses;
+in that case, the first clause that rescues the exception does so,
+and those before and after are ignored:
+
+```
+begin
+ Dir.open('nosuch')
+rescue Errno::ENOTDIR
+ puts "Rescued #{$!.class}"
+rescue Errno::ENOENT
+ puts "Rescued #{$!.class}"
+end
+```
+
+Output:
+
+```
+Rescued Errno::ENOENT
+```
+
+A `rescue` statement may specify a variable
+whose value becomes the rescued exception
+(an instance of Exception or one of its subclasses:
+
+```
+begin
+ 1 / 0
+rescue => x
+ puts x.class
+ puts x.message
+end
+```
+
+Output:
+
+```
+ZeroDivisionError
+divided by 0
+```
+
+In the rescue clause, these global variables are defined:
+
+- `$!`": the current exception instance.
+- `$@`: its backtrace.
+
+#### Else Clause
+
+The `else` clause:
+
+- Starts with an `else` statement.
+- Contains code that is to be executed if no exception is raised in the begin clause.
+- Ends with the first following `ensure` or `end` statement.
+
+```
+begin
+ puts 'Begin.'
+rescue
+ puts 'Rescued an exception!'
+else
+ puts 'No exception raised.'
+end
+```
+
+Output:
+
+```
+Begin.
+No exception raised.
+```
+
+#### Ensure Clause
+
+The ensure clause:
+
+- Starts with an `ensure` statement.
+- Contains code that is to be executed
+ regardless of whether an exception is raised,
+ and regardless of whether a raised exception is handled.
+- Ends with the first following `end` statement.
+
+```
+def foo(boom: false)
+ puts 'Begin.'
+ raise 'Boom!' if boom
+rescue
+ puts 'Rescued an exception!'
+else
+ puts 'No exception raised.'
+ensure
+ puts 'Always do this.'
+end
+
+foo(boom: true)
+foo(boom: false)
+```
+
+Output:
+
+```
+Begin.
+Rescued an exception!
+Always do this.
+Begin.
+No exception raised.
+Always do this.
+```
+
+#### End Statement
+
+The `end` statement ends the handler.
+
+Code following it is reached only if any raised exception is rescued.
+
+#### Begin-Less \Exception Handlers
+
+As seen above, an exception handler may be implemented with `begin` and `end`.
+
+An exception handler may also be implemented as:
+
+- A method body:
+
+ ```
+ def foo(boom: false) # Serves as beginning of exception handler.
+ puts 'Begin.'
+ raise 'Boom!' if boom
+ rescue
+ puts 'Rescued an exception!'
+ else
+ puts 'No exception raised.'
+ end # Serves as end of exception handler.
+ ```
+
+- A block:
+
+ ```
+ Dir.chdir('.') do |dir| # Serves as beginning of exception handler.
+ raise 'Boom!'
+ rescue
+ puts 'Rescued an exception!'
+ end # Serves as end of exception handler.
+ ```
+
+#### Re-Raising an \Exception
+
+It can be useful to rescue an exception, but allow its eventual effect;
+for example, a program can rescue an exception, log data about it,
+and then "reinstate" the exception.
+
+This may be done via the `raise` method, but in a special way;
+a rescuing clause:
+
+ - Captures an exception.
+ - Does whatever is needed concerning the exception (such as logging it).
+ - Calls method `raise` with no argument,
+ which raises the rescued exception:
+
+```
+begin
+ 1 / 0
+rescue ZeroDivisionError
+ # Do needful things (like logging).
+ raise # Raised exception will be ZeroDivisionError, not RuntimeError.
+end
+```
+
+Output:
+
+```
+ruby t.rb
+t.rb:2:in `/': divided by 0 (ZeroDivisionError)
+ from t.rb:2:in `<main>'
+```
+
+#### Retrying
+
+It can be useful to retry a begin clause;
+for example, if it must access a possibly-volatile resource
+(such as a web page),
+it can be useful to try the access more than once
+(in the hope that it may become available):
+
+```
+retries = 0
+begin
+ puts "Try ##{retries}."
+ raise 'Boom'
+rescue
+ puts "Rescued retry ##{retries}."
+ if (retries += 1) < 3
+ puts 'Retrying'
+ retry
+ else
+ puts 'Giving up.'
+ raise
+ end
+end
+```
+
+```
+Try #0.
+Rescued retry #0.
+Retrying
+Try #1.
+Rescued retry #1.
+Retrying
+Try #2.
+Rescued retry #2.
+Giving up.
+# RuntimeError ('Boom') raised.
+```
+
+Note that the retry re-executes the entire begin clause,
+not just the part after the point of failure.
+
+## Raising an \Exception
+
+Raise an exception with method Kernel#raise.
+
+## Custom Exceptions
+
+To provide additional or alternate information,
+you may create custom exception classes;
+each should be a subclass of one of the built-in exception classes:
+
+```
+class MyException < StandardError; end
+```
diff --git a/doc/format_specifications.rdoc b/doc/format_specifications.rdoc
index 1111575e74..bdfdc24953 100644
--- a/doc/format_specifications.rdoc
+++ b/doc/format_specifications.rdoc
@@ -233,6 +233,8 @@ Format +argument+ as a single character:
sprintf('%c', 'A') # => "A"
sprintf('%c', 65) # => "A"
+This behaves like String#<<, except for raising ArgumentError instead of RangeError.
+
=== Specifier +d+
Format +argument+ as a decimal integer:
diff --git a/doc/strscan/helper_methods.md b/doc/strscan/helper_methods.md
new file mode 100644
index 0000000000..6555a2ce66
--- /dev/null
+++ b/doc/strscan/helper_methods.md
@@ -0,0 +1,128 @@
+## Helper Methods
+
+These helper methods display values returned by scanner's methods.
+
+### `put_situation(scanner)`
+
+Display scanner's situation:
+
+- Byte position (`#pos`).
+- Character position (`#charpos`)
+- Target string (`#rest`) and size (`#rest_size`).
+
+```
+scanner = StringScanner.new('foobarbaz')
+scanner.scan(/foo/)
+put_situation(scanner)
+# Situation:
+# pos: 3
+# charpos: 3
+# rest: "barbaz"
+# rest_size: 6
+```
+
+### `put_match_values(scanner)`
+
+Display the scanner's match values:
+
+```
+scanner = StringScanner.new('Fri Dec 12 1975 14:39')
+pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
+scanner.match?(pattern)
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 11
+# pre_match: ""
+# matched : "Fri Dec 12 "
+# post_match: "1975 14:39"
+# Captured match values:
+# size: 4
+# captures: ["Fri", "Dec", "12"]
+# named_captures: {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"}
+# values_at: ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
+# []:
+# [0]: "Fri Dec 12 "
+# [1]: "Fri"
+# [2]: "Dec"
+# [3]: "12"
+# [4]: nil
+```
+
+### `match_values_cleared?(scanner)`
+
+Returns whether the scanner's match values are all properly cleared:
+
+```
+scanner = StringScanner.new('foobarbaz')
+match_values_cleared?(scanner) # => true
+put_match_values(scanner)
+# Basic match values:
+# matched?: false
+# matched_size: nil
+# pre_match: nil
+# matched : nil
+# post_match: nil
+# Captured match values:
+# size: nil
+# captures: nil
+# named_captures: {}
+# values_at: nil
+# [0]: nil
+scanner.scan(/foo/)
+match_values_cleared?(scanner) # => false
+```
+
+## The Code
+
+```
+def put_situation(scanner)
+ puts '# Situation:'
+ puts "# pos: #{scanner.pos}"
+ puts "# charpos: #{scanner.charpos}"
+ puts "# rest: #{scanner.rest.inspect}"
+ puts "# rest_size: #{scanner.rest_size}"
+end
+```
+
+```
+def put_match_values(scanner)
+ puts '# Basic match values:'
+ puts "# matched?: #{scanner.matched?}"
+ value = scanner.matched_size || 'nil'
+ puts "# matched_size: #{value}"
+ puts "# pre_match: #{scanner.pre_match.inspect}"
+ puts "# matched : #{scanner.matched.inspect}"
+ puts "# post_match: #{scanner.post_match.inspect}"
+ puts '# Captured match values:'
+ puts "# size: #{scanner.size}"
+ puts "# captures: #{scanner.captures}"
+ puts "# named_captures: #{scanner.named_captures}"
+ if scanner.size.nil?
+ puts "# values_at: #{scanner.values_at(0)}"
+ puts "# [0]: #{scanner[0]}"
+ else
+ puts "# values_at: #{scanner.values_at(*(0..scanner.size))}"
+ puts "# []:"
+ scanner.size.times do |i|
+ puts "# [#{i}]: #{scanner[i].inspect}"
+ end
+ end
+end
+```
+
+```
+def match_values_cleared?(scanner)
+ scanner.matched? == false &&
+ scanner.matched_size.nil? &&
+ scanner.matched.nil? &&
+ scanner.pre_match.nil? &&
+ scanner.post_match.nil? &&
+ scanner.size.nil? &&
+ scanner[0].nil? &&
+ scanner.captures.nil? &&
+ scanner.values_at(0..1).nil? &&
+ scanner.named_captures == {}
+end
+```
+
diff --git a/doc/strscan/link_refs.txt b/doc/strscan/link_refs.txt
new file mode 100644
index 0000000000..19f6f7ce5c
--- /dev/null
+++ b/doc/strscan/link_refs.txt
@@ -0,0 +1,17 @@
+[1]: rdoc-ref:StringScanner@Stored+String
+[2]: rdoc-ref:StringScanner@Byte+Position+-28Position-29
+[3]: rdoc-ref:StringScanner@Target+Substring
+[4]: rdoc-ref:StringScanner@Setting+the+Target+Substring
+[5]: rdoc-ref:StringScanner@Traversing+the+Target+Substring
+[6]: https://docs.ruby-lang.org/en/master/Regexp.html
+[7]: rdoc-ref:StringScanner@Character+Position
+[8]: https://docs.ruby-lang.org/en/master/String.html#method-i-5B-5D
+[9]: rdoc-ref:StringScanner@Match+Values
+[10]: rdoc-ref:StringScanner@Fixed-Anchor+Property
+[11]: rdoc-ref:StringScanner@Positions
+[13]: rdoc-ref:StringScanner@Captured+Match+Values
+[14]: rdoc-ref:StringScanner@Querying+the+Target+Substring
+[15]: rdoc-ref:StringScanner@Searching+the+Target+Substring
+[16]: https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-Groups+and+Captures
+[17]: rdoc-ref:StringScanner@Matching
+[18]: rdoc-ref:StringScanner@Basic+Match+Values
diff --git a/doc/strscan/methods/get_byte.md b/doc/strscan/methods/get_byte.md
new file mode 100644
index 0000000000..2f23be1899
--- /dev/null
+++ b/doc/strscan/methods/get_byte.md
@@ -0,0 +1,30 @@
+call-seq:
+ get_byte -> byte_as_character or nil
+
+Returns the next byte, if available:
+
+- If the [position][2]
+ is not at the end of the [stored string][1]:
+
+ - Returns the next byte.
+ - Increments the [byte position][2].
+ - Adjusts the [character position][7].
+
+ ```
+ scanner = StringScanner.new(HIRAGANA_TEXT)
+ # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82...">
+ scanner.string # => "こんにちは"
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1]
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2]
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1]
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2]
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3]
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2]
+ ```
+
+- Otherwise, returns `nil`, and does not change the positions.
+
+ ```
+ scanner.terminate
+ [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5]
+ ```
diff --git a/doc/strscan/methods/get_charpos.md b/doc/strscan/methods/get_charpos.md
new file mode 100644
index 0000000000..f77563c860
--- /dev/null
+++ b/doc/strscan/methods/get_charpos.md
@@ -0,0 +1,19 @@
+call-seq:
+ charpos -> character_position
+
+Returns the [character position][7] (initially zero),
+which may be different from the [byte position][2]
+given by method #pos:
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.getch # => "こ" # 3-byte character.
+scanner.getch # => "ん" # 3-byte character.
+put_situation(scanner)
+# Situation:
+# pos: 6
+# charpos: 2
+# rest: "にちは"
+# rest_size: 9
+```
diff --git a/doc/strscan/methods/get_pos.md b/doc/strscan/methods/get_pos.md
new file mode 100644
index 0000000000..56bcef3274
--- /dev/null
+++ b/doc/strscan/methods/get_pos.md
@@ -0,0 +1,14 @@
+call-seq:
+ pos -> byte_position
+
+Returns the integer [byte position][2],
+which may be different from the [character position][7]:
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos # => 0
+scanner.getch # => "こ" # 3-byte character.
+scanner.charpos # => 1
+scanner.pos # => 3
+```
diff --git a/doc/strscan/methods/getch.md b/doc/strscan/methods/getch.md
new file mode 100644
index 0000000000..b57732ad7c
--- /dev/null
+++ b/doc/strscan/methods/getch.md
@@ -0,0 +1,43 @@
+call-seq:
+ getch -> character or nil
+
+Returns the next (possibly multibyte) character,
+if available:
+
+- If the [position][2]
+ is at the beginning of a character:
+
+ - Returns the character.
+ - Increments the [character position][7] by 1.
+ - Increments the [byte position][2]
+ by the size (in bytes) of the character.
+
+ ```
+ scanner = StringScanner.new(HIRAGANA_TEXT)
+ scanner.string # => "こんにちは"
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5]
+ [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
+ ```
+
+- If the [position][2] is within a multi-byte character
+ (that is, not at its beginning),
+ behaves like #get_byte (returns a 1-byte character):
+
+ ```
+ scanner.pos = 1
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1]
+ [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
+ ```
+
+- If the [position][2] is at the end of the [stored string][1],
+ returns `nil` and does not modify the positions:
+
+ ```
+ scanner.terminate
+ [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
+ ```
diff --git a/doc/strscan/methods/scan.md b/doc/strscan/methods/scan.md
new file mode 100644
index 0000000000..714fa9910a
--- /dev/null
+++ b/doc/strscan/methods/scan.md
@@ -0,0 +1,51 @@
+call-seq:
+ scan(pattern) -> substring or nil
+
+Attempts to [match][17] the given `pattern`
+at the beginning of the [target substring][3].
+
+If the match succeeds:
+
+- Returns the matched substring.
+- Increments the [byte position][2] by <tt>substring.bytesize</tt>,
+ and may increment the [character position][7].
+- Sets [match values][9].
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos = 6
+scanner.scan(/に/) # => "に"
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 3
+# pre_match: "こん"
+# matched : "に"
+# post_match: "ちは"
+# Captured match values:
+# size: 1
+# captures: []
+# named_captures: {}
+# values_at: ["に", nil]
+# []:
+# [0]: "に"
+# [1]: nil
+put_situation(scanner)
+# Situation:
+# pos: 9
+# charpos: 3
+# rest: "ちは"
+# rest_size: 6
+```
+
+If the match fails:
+
+- Returns `nil`.
+- Does not increment byte and character positions.
+- Clears match values.
+
+```
+scanner.scan(/nope/) # => nil
+match_values_cleared?(scanner) # => true
+```
diff --git a/doc/strscan/methods/scan_until.md b/doc/strscan/methods/scan_until.md
new file mode 100644
index 0000000000..3b7ff2c3a9
--- /dev/null
+++ b/doc/strscan/methods/scan_until.md
@@ -0,0 +1,52 @@
+call-seq:
+ scan_until(pattern) -> substring or nil
+
+Attempts to [match][17] the given `pattern`
+anywhere (at any [position][2]) in the [target substring][3].
+
+If the match attempt succeeds:
+
+- Sets [match values][9].
+- Sets the [byte position][2] to the end of the matched substring;
+ may adjust the [character position][7].
+- Returns the matched substring.
+
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos = 6
+scanner.scan_until(/ち/) # => "にち"
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 3
+# pre_match: "こんに"
+# matched : "ち"
+# post_match: "は"
+# Captured match values:
+# size: 1
+# captures: []
+# named_captures: {}
+# values_at: ["ち", nil]
+# []:
+# [0]: "ち"
+# [1]: nil
+put_situation(scanner)
+# Situation:
+# pos: 12
+# charpos: 4
+# rest: "は"
+# rest_size: 3
+```
+
+If the match attempt fails:
+
+- Clears match data.
+- Returns `nil`.
+- Does not update positions.
+
+```
+scanner.scan_until(/nope/) # => nil
+match_values_cleared?(scanner) # => true
+```
diff --git a/doc/strscan/methods/set_pos.md b/doc/strscan/methods/set_pos.md
new file mode 100644
index 0000000000..230177109c
--- /dev/null
+++ b/doc/strscan/methods/set_pos.md
@@ -0,0 +1,27 @@
+call-seq:
+ pos = n -> n
+ pointer = n -> n
+
+Sets the [byte position][2] and the [character position][11];
+returns `n`.
+
+Does not affect [match values][9].
+
+For non-negative `n`, sets the position to `n`:
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos = 3 # => 3
+scanner.rest # => "んにちは"
+scanner.charpos # => 1
+```
+
+For negative `n`, counts from the end of the [stored string][1]:
+
+```
+scanner.pos = -9 # => -9
+scanner.pos # => 6
+scanner.rest # => "にちは"
+scanner.charpos # => 2
+```
diff --git a/doc/strscan/methods/skip.md b/doc/strscan/methods/skip.md
new file mode 100644
index 0000000000..656f134c5a
--- /dev/null
+++ b/doc/strscan/methods/skip.md
@@ -0,0 +1,43 @@
+call-seq:
+ skip(pattern) match_size or nil
+
+Attempts to [match][17] the given `pattern`
+at the beginning of the [target substring][3];
+
+If the match succeeds:
+
+- Increments the [byte position][2] by substring.bytesize,
+ and may increment the [character position][7].
+- Sets [match values][9].
+- Returns the size (bytes) of the matched substring.
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos = 6
+scanner.skip(/に/) # => 3
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 3
+# pre_match: "こん"
+# matched : "に"
+# post_match: "ちは"
+# Captured match values:
+# size: 1
+# captures: []
+# named_captures: {}
+# values_at: ["に", nil]
+# []:
+# [0]: "に"
+# [1]: nil
+put_situation(scanner)
+# Situation:
+# pos: 9
+# charpos: 3
+# rest: "ちは"
+# rest_size: 6
+
+scanner.skip(/nope/) # => nil
+match_values_cleared?(scanner) # => true
+```
diff --git a/doc/strscan/methods/skip_until.md b/doc/strscan/methods/skip_until.md
new file mode 100644
index 0000000000..5187a4826f
--- /dev/null
+++ b/doc/strscan/methods/skip_until.md
@@ -0,0 +1,49 @@
+call-seq:
+ skip_until(pattern) -> matched_substring_size or nil
+
+Attempts to [match][17] the given `pattern`
+anywhere (at any [position][2]) in the [target substring][3];
+does not modify the positions.
+
+If the match attempt succeeds:
+
+- Sets [match values][9].
+- Returns the size of the matched substring.
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.pos = 6
+scanner.skip_until(/ち/) # => 6
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 3
+# pre_match: "こんに"
+# matched : "ち"
+# post_match: "は"
+# Captured match values:
+# size: 1
+# captures: []
+# named_captures: {}
+# values_at: ["ち", nil]
+# []:
+# [0]: "ち"
+# [1]: nil
+put_situation(scanner)
+# Situation:
+# pos: 12
+# charpos: 4
+# rest: "は"
+# rest_size: 3
+```
+
+If the match attempt fails:
+
+- Clears match values.
+- Returns `nil`.
+
+```
+scanner.skip_until(/nope/) # => nil
+match_values_cleared?(scanner) # => true
+```
diff --git a/doc/strscan/methods/terminate.md b/doc/strscan/methods/terminate.md
new file mode 100644
index 0000000000..fd55727099
--- /dev/null
+++ b/doc/strscan/methods/terminate.md
@@ -0,0 +1,30 @@
+call-seq:
+ terminate -> self
+
+Sets the scanner to end-of-string;
+returns +self+:
+
+- Sets both [positions][11] to end-of-stream.
+- Clears [match values][9].
+
+```
+scanner = StringScanner.new(HIRAGANA_TEXT)
+scanner.string # => "こんにちは"
+scanner.scan_until(/に/)
+put_situation(scanner)
+# Situation:
+# pos: 9
+# charpos: 3
+# rest: "ちは"
+# rest_size: 6
+match_values_cleared?(scanner) # => false
+
+scanner.terminate # => #<StringScanner fin>
+put_situation(scanner)
+# Situation:
+# pos: 15
+# charpos: 5
+# rest: ""
+# rest_size: 0
+match_values_cleared?(scanner) # => true
+```
diff --git a/doc/strscan/strscan.md b/doc/strscan/strscan.md
new file mode 100644
index 0000000000..465cebd4cb
--- /dev/null
+++ b/doc/strscan/strscan.md
@@ -0,0 +1,543 @@
+\Class `StringScanner` supports processing a stored string as a stream;
+this code creates a new `StringScanner` object with string `'foobarbaz'`:
+
+```
+require 'strscan'
+scanner = StringScanner.new('foobarbaz')
+```
+
+## About the Examples
+
+All examples here assume that `StringScanner` has been required:
+
+```
+require 'strscan'
+```
+
+Some examples here assume that these constants are defined:
+
+```
+MULTILINE_TEXT = <<~EOT
+Go placidly amid the noise and haste,
+and remember what peace there may be in silence.
+EOT
+
+HIRAGANA_TEXT = 'こんにちは'
+
+ENGLISH_TEXT = 'Hello'
+```
+
+Some examples here assume that certain helper methods are defined:
+
+- `put_situation(scanner)`:
+ Displays the values of the scanner's
+ methods #pos, #charpos, #rest, and #rest_size.
+- `put_match_values(scanner)`:
+ Displays the scanner's [match values][9].
+- `match_values_cleared?(scanner)`:
+ Returns whether the scanner's [match values][9] are cleared.
+
+See examples [here][ext/strscan/helper_methods_md.html].
+
+## The `StringScanner` \Object
+
+This code creates a `StringScanner` object
+(we'll call it simply a _scanner_),
+and shows some of its basic properties:
+
+```
+scanner = StringScanner.new('foobarbaz')
+scanner.string # => "foobarbaz"
+put_situation(scanner)
+# Situation:
+# pos: 0
+# charpos: 0
+# rest: "foobarbaz"
+# rest_size: 9
+```
+
+The scanner has:
+
+* A <i>stored string</i>, which is:
+
+ * Initially set by StringScanner.new(string) to the given `string`
+ (`'foobarbaz'` in the example above).
+ * Modifiable by methods #string=(new_string) and #concat(more_string).
+ * Returned by method #string.
+
+ More at [Stored String][1] below.
+
+* A _position_;
+ a zero-based index into the bytes of the stored string (_not_ into its characters):
+
+ * Initially set by StringScanner.new to `0`.
+ * Returned by method #pos.
+ * Modifiable explicitly by methods #reset, #terminate, and #pos=(new_pos).
+ * Modifiable implicitly (various traversing methods, among others).
+
+ More at [Byte Position][2] below.
+
+* A <i>target substring</i>,
+ which is a trailing substring of the stored string;
+ it extends from the current position to the end of the stored string:
+
+ * Initially set by StringScanner.new(string) to the given `string`
+ (`'foobarbaz'` in the example above).
+ * Returned by method #rest.
+ * Modified by any modification to either the stored string or the position.
+
+ <b>Most importantly</b>:
+ the searching and traversing methods operate on the target substring,
+ which may be (and often is) less than the entire stored string.
+
+ More at [Target Substring][3] below.
+
+## Stored \String
+
+The <i>stored string</i> is the string stored in the `StringScanner` object.
+
+Each of these methods sets, modifies, or returns the stored string:
+
+| Method | Effect |
+|----------------------|-------------------------------------------------|
+| ::new(string) | Creates a new scanner for the given string. |
+| #string=(new_string) | Replaces the existing stored string. |
+| #concat(more_string) | Appends a string to the existing stored string. |
+| #string | Returns the stored string. |
+
+## Positions
+
+A `StringScanner` object maintains a zero-based <i>byte position</i>
+and a zero-based <i>character position</i>.
+
+Each of these methods explicitly sets positions:
+
+| Method | Effect |
+|--------------------------|----------------------------------------------------------|
+| #reset | Sets both positions to zero (begining of stored string). |
+| #terminate | Sets both positions to the end of the stored string. |
+| #pos=(new_byte_position) | Sets byte position; adjusts character position. |
+
+### Byte Position (Position)
+
+The byte position (or simply _position_)
+is a zero-based index into the bytes in the scanner's stored string;
+for a new `StringScanner` object, the byte position is zero.
+
+When the byte position is:
+
+* Zero (at the beginning), the target substring is the entire stored string.
+* Equal to the size of the stored string (at the end),
+ the target substring is the empty string `''`.
+
+To get or set the byte position:
+
+* \#pos: returns the byte position.
+* \#pos=(new_pos): sets the byte position.
+
+Many methods use the byte position as the basis for finding matches;
+many others set, increment, or decrement the byte position:
+
+```
+scanner = StringScanner.new('foobar')
+scanner.pos # => 0
+scanner.scan(/foo/) # => "foo" # Match found.
+scanner.pos # => 3 # Byte position incremented.
+scanner.scan(/foo/) # => nil # Match not found.
+scanner.pos # => 3 # Byte position not changed.
+```
+
+Some methods implicitly modify the byte position;
+see:
+
+* [Setting the Target Substring][4].
+* [Traversing the Target Substring][5].
+
+The values of these methods are derived directly from the values of #pos and #string:
+
+- \#charpos: the [character position][7].
+- \#rest: the [target substring][3].
+- \#rest_size: `rest.size`.
+
+### Character Position
+
+The character position is a zero-based index into the _characters_
+in the stored string;
+for a new `StringScanner` object, the character position is zero.
+
+\Method #charpos returns the character position;
+its value may not be reset explicitly.
+
+Some methods change (increment or reset) the character position;
+see:
+
+* [Setting the Target Substring][4].
+* [Traversing the Target Substring][5].
+
+Example (string includes multi-byte characters):
+
+```
+scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters.
+scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters
+scanner.string # => "Helloこんにちは" # Twenty bytes in all.
+put_situation(scanner)
+# Situation:
+# pos: 0
+# charpos: 0
+# rest: "Helloこんにちは"
+# rest_size: 20
+scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters.
+put_situation(scanner)
+# Situation:
+# pos: 5
+# charpos: 5
+# rest: "こんにちは"
+# rest_size: 15
+scanner.getch # => "こ" # One 3-byte character.
+put_situation(scanner)
+# Situation:
+# pos: 8
+# charpos: 6
+# rest: "んにちは"
+# rest_size: 12```
+
+## Target Substring
+
+The target substring is the the part of the [stored string][1]
+that extends from the current [byte position][2] to the end of the stored string;
+it is always either:
+
+- The entire stored string (byte position is zero).
+- A trailing substring of the stored string (byte position positive).
+
+The target substring is returned by method #rest,
+and its size is returned by method #rest_size.
+
+Examples:
+
+```
+scanner = StringScanner.new('foobarbaz')
+put_situation(scanner)
+# Situation:
+# pos: 0
+# charpos: 0
+# rest: "foobarbaz"
+# rest_size: 9
+scanner.pos = 3
+put_situation(scanner)
+# Situation:
+# pos: 3
+# charpos: 3
+# rest: "barbaz"
+# rest_size: 6
+scanner.pos = 9
+put_situation(scanner)
+# Situation:
+# pos: 9
+# charpos: 9
+# rest: ""
+# rest_size: 0
+```
+
+### Setting the Target Substring
+
+The target substring is set whenever:
+
+* The [stored string][1] is set (position reset to zero; target substring set to stored string).
+* The [byte position][2] is set (target substring adjusted accordingly).
+
+### Querying the Target Substring
+
+This table summarizes (details and examples at the links):
+
+| Method | Returns |
+|------------|-----------------------------------|
+| #rest | Target substring. |
+| #rest_size | Size (bytes) of target substring. |
+
+### Searching the Target Substring
+
+A _search_ method examines the target substring,
+but does not advance the [positions][11]
+or (by implication) shorten the target substring.
+
+This table summarizes (details and examples at the links):
+
+| Method | Returns | Sets Match Values? |
+|-----------------------|-----------------------------------------------|--------------------|
+| #check(pattern) | Matched leading substring or +nil+. | Yes. |
+| #check_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. |
+| #exist?(pattern) | Matched substring (anywhere) end index. | Yes. |
+| #match?(pattern) | Size of matched leading substring or +nil+. | Yes. |
+| #peek(size) | Leading substring of given length (bytes). | No. |
+| #peek_byte | Integer leading byte or +nil+. | No. |
+| #rest | Target substring (from byte position to end). | No. |
+
+### Traversing the Target Substring
+
+A _traversal_ method examines the target substring,
+and, if successful:
+
+- Advances the [positions][11].
+- Shortens the target substring.
+
+
+This table summarizes (details and examples at links):
+
+| Method | Returns | Sets Match Values? |
+|----------------------|------------------------------------------------------|--------------------|
+| #get_byte | Leading byte or +nil+. | No. |
+| #getch | Leading character or +nil+. | No. |
+| #scan(pattern) | Matched leading substring or +nil+. | Yes. |
+| #scan_byte | Integer leading byte or +nil+. | No. |
+| #scan_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. |
+| #skip(pattern) | Matched leading substring size or +nil+. | Yes. |
+| #skip_until(pattern) | Position delta to end-of-matched-substring or +nil+. | Yes. |
+| #unscan | +self+. | No. |
+
+## Querying the Scanner
+
+Each of these methods queries the scanner object
+without modifying it (details and examples at links)
+
+| Method | Returns |
+|---------------------|----------------------------------|
+| #beginning_of_line? | +true+ or +false+. |
+| #charpos | Character position. |
+| #eos? | +true+ or +false+. |
+| #fixed_anchor? | +true+ or +false+. |
+| #inspect | String representation of +self+. |
+| #pos | Byte position. |
+| #rest | Target substring. |
+| #rest_size | Size of target substring. |
+| #string | Stored string. |
+
+## Matching
+
+`StringScanner` implements pattern matching via Ruby class [Regexp][6],
+and its matching behaviors are the same as Ruby's
+except for the [fixed-anchor property][10].
+
+### Matcher Methods
+
+Each <i>matcher method</i> takes a single argument `pattern`,
+and attempts to find a matching substring in the [target substring][3].
+
+| Method | Pattern Type | Matches Target Substring | Success Return | May Update Positions? |
+|--------------|-------------------|--------------------------|--------------------|-----------------------|
+| #check | Regexp or String. | At beginning. | Matched substring. | No. |
+| #check_until | Regexp. | Anywhere. | Substring. | No. |
+| #match? | Regexp or String. | At beginning. | Updated position. | No. |
+| #exist? | Regexp. | Anywhere. | Updated position. | No. |
+| #scan | Regexp or String. | At beginning. | Matched substring. | Yes. |
+| #scan_until | Regexp. | Anywhere. | Substring. | Yes. |
+| #skip | Regexp or String. | At beginning. | Match size. | Yes. |
+| #skip_until | Regexp. | Anywhere. | Position delta. | Yes. |
+
+<br>
+
+Which matcher you choose will depend on:
+
+- Where you want to find a match:
+
+ - Only at the beginning of the target substring:
+ #check, #match?, #scan, #skip.
+ - Anywhere in the target substring:
+ #check_until, #exist?, #scan_until, #skip_until.
+
+- Whether you want to:
+
+ - Traverse, by advancing the positions:
+ #scan, #scan_until, #skip, #skip_until.
+ - Keep the positions unchanged:
+ #check, #check_until, #exist?, #match?.
+
+- What you want for the return value:
+
+ - The matched substring: #check, #check_until, #scan, #scan_until.
+ - The updated position: #exist?, #match?.
+ - The position delta: #skip_until.
+ - The match size: #skip.
+
+### Match Values
+
+The <i>match values</i> in a `StringScanner` object
+generally contain the results of the most recent attempted match.
+
+Each match value may be thought of as:
+
+* _Clear_: Initially, or after an unsuccessful match attempt:
+ usually, `false`, `nil`, or `{}`.
+* _Set_: After a successful match attempt:
+ `true`, string, array, or hash.
+
+Each of these methods clears match values:
+
+- ::new(string).
+- \#reset.
+- \#terminate.
+
+Each of these methods attempts a match based on a pattern,
+and either sets match values (if successful) or clears them (if not);
+
+- \#check(pattern)
+- \#check_until(pattern)
+- \#exist?(pattern)
+- \#match?(pattern)
+- \#scan(pattern)
+- \#scan_until(pattern)
+- \#skip(pattern)
+- \#skip_until(pattern)
+
+#### Basic Match Values
+
+Basic match values are those not related to captures.
+
+Each of these methods returns a basic match value:
+
+| Method | Return After Match | Return After No Match |
+|-----------------|----------------------------------------|-----------------------|
+| #matched? | +true+. | +false+. |
+| #matched_size | Size of matched substring. | +nil+. |
+| #matched | Matched substring. | +nil+. |
+| #pre_match | Substring preceding matched substring. | +nil+. |
+| #post_match | Substring following matched substring. | +nil+. |
+
+<br>
+
+See examples below.
+
+#### Captured Match Values
+
+Captured match values are those related to [captures][16].
+
+Each of these methods returns a captured match value:
+
+| Method | Return After Match | Return After No Match |
+|-----------------|-----------------------------------------|-----------------------|
+| #size | Count of captured substrings. | +nil+. |
+| #[](n) | <tt>n</tt>th captured substring. | +nil+. |
+| #captures | Array of all captured substrings. | +nil+. |
+| #values_at(*n) | Array of specified captured substrings. | +nil+. |
+| #named_captures | Hash of named captures. | <tt>{}</tt>. |
+
+<br>
+
+See examples below.
+
+#### Match Values Examples
+
+Successful basic match attempt (no captures):
+
+```
+scanner = StringScanner.new('foobarbaz')
+scanner.exist?(/bar/)
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 3
+# pre_match: "foo"
+# matched : "bar"
+# post_match: "baz"
+# Captured match values:
+# size: 1
+# captures: []
+# named_captures: {}
+# values_at: ["bar", nil]
+# []:
+# [0]: "bar"
+# [1]: nil
+```
+
+Failed basic match attempt (no captures);
+
+```
+scanner = StringScanner.new('foobarbaz')
+scanner.exist?(/nope/)
+match_values_cleared?(scanner) # => true
+```
+
+Successful unnamed capture match attempt:
+
+```
+scanner = StringScanner.new('foobarbazbatbam')
+scanner.exist?(/(foo)bar(baz)bat(bam)/)
+put_match_values(scanner)
+# Basic match values:
+# matched?: true
+# matched_size: 15
+# pre_match: ""
+# matched : "foobarbazbatbam"
+# post_match: ""
+# Captured match values:
+# size: 4
+# captures: ["foo", "baz", "bam"]
+# named_captures: {}
+# values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil]
+# []:
+# [0]: "foobarbazbatbam"
+# [1]: "foo"
+# [2]: "baz"
+# [3]: "bam"
+# [4]: nil
+```
+
+Successful named capture match attempt;
+same as unnamed above, except for #named_captures:
+
+```
+scanner = StringScanner.new('foobarbazbatbam')
+scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
+scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}
+```
+
+Failed unnamed capture match attempt:
+
+```
+scanner = StringScanner.new('somestring')
+scanner.exist?(/(foo)bar(baz)bat(bam)/)
+match_values_cleared?(scanner) # => true
+```
+
+Failed named capture match attempt;
+same as unnamed above, except for #named_captures:
+
+```
+scanner = StringScanner.new('somestring')
+scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
+match_values_cleared?(scanner) # => false
+scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}
+```
+
+## Fixed-Anchor Property
+
+Pattern matching in `StringScanner` is the same as in Ruby's,
+except for its fixed-anchor property,
+which determines the meaning of `'\A'`:
+
+* `false` (the default): matches the current byte position.
+
+ ```
+ scanner = StringScanner.new('foobar')
+ scanner.scan(/\A./) # => "f"
+ scanner.scan(/\A./) # => "o"
+ scanner.scan(/\A./) # => "o"
+ scanner.scan(/\A./) # => "b"
+ ```
+
+* `true`: matches the beginning of the target substring;
+ never matches unless the byte position is zero:
+
+ ```
+ scanner = StringScanner.new('foobar', fixed_anchor: true)
+ scanner.scan(/\A./) # => "f"
+ scanner.scan(/\A./) # => nil
+ scanner.reset
+ scanner.scan(/\A./) # => "f"
+ ```
+
+The fixed-anchor property is set when the `StringScanner` object is created,
+and may not be modified
+(see StringScanner.new);
+method #fixed_anchor? returns the setting.
+
diff --git a/doc/syntax/literals.rdoc b/doc/syntax/literals.rdoc
index 0c1e4a434b..6d681419a2 100644
--- a/doc/syntax/literals.rdoc
+++ b/doc/syntax/literals.rdoc
@@ -138,19 +138,18 @@ Also \Rational numbers may be imaginary numbers.
== Strings
-=== \String Literals
-
-The most common way of writing strings is using <tt>"</tt>:
-
- "This is a string."
-
-The string may be many lines long.
-
-Any internal <tt>"</tt> must be escaped:
-
- "This string has a quote: \". As you can see, it is escaped"
-
-Double-quote strings allow escaped characters such as <tt>\n</tt> for
+=== Escape Sequences
+
+Some characters can be represented as escape sequences in
+double-quoted strings,
+character literals,
+here document literals (non-quoted, double-quoted, and with backticks),
+double-quoted symbols,
+double-quoted symbol keys in Hash literals,
+Regexp literals, and
+several percent literals (<tt>%</tt>, <tt>%Q,</tt> <tt>%W</tt>, <tt>%I</tt>, <tt>%r</tt>, <tt>%x</tt>).
+
+They allow escape sequences such as <tt>\n</tt> for
newline, <tt>\t</tt> for tab, etc. The full list of supported escape
sequences are as follows:
@@ -174,11 +173,31 @@ sequences are as follows:
\M-\cx same as above
\c\M-x same as above
\c? or \C-? delete, ASCII 7Fh (DEL)
+ \<newline> continuation line (empty string)
+
+The last one, <tt>\<newline></tt>, represents an empty string instead of a character.
+It is used to fold a line in a string.
+
+=== Double-quoted \String Literals
-Any other character following a backslash is interpreted as the
+The most common way of writing strings is using <tt>"</tt>:
+
+ "This is a string."
+
+The string may be many lines long.
+
+Any internal <tt>"</tt> must be escaped:
+
+ "This string has a quote: \". As you can see, it is escaped"
+
+Double-quoted strings allow escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences].
+
+In a double-quoted string,
+any other character following a backslash is interpreted as the
character itself.
-Double-quote strings allow interpolation of other values using
+Double-quoted strings allow interpolation of other values using
<tt>#{...}</tt>:
"One plus one is two: #{1 + 1}"
@@ -190,8 +209,14 @@ You can also use <tt>#@foo</tt>, <tt>#@@foo</tt> and <tt>#$foo</tt> as a
shorthand for, respectively, <tt>#{ @foo }</tt>, <tt>#{ @@foo }</tt> and
<tt>#{ $foo }</tt>.
+See also:
+
+* {% and %Q: Interpolable String Literals}[#label-25+and+-25Q-3A+Interpolable+String+Literals]
+
+=== Single-quoted \String Literals
+
Interpolation may be disabled by escaping the "#" character or using
-single-quote strings:
+single-quoted strings:
'#{1 + 1}' #=> "\#{1 + 1}"
@@ -199,6 +224,16 @@ In addition to disabling interpolation, single-quoted strings also disable all
escape sequences except for the single-quote (<tt>\'</tt>) and backslash
(<tt>\\\\</tt>).
+In a single-quoted string,
+any other character following a backslash is interpreted as is:
+a backslash and the character itself.
+
+See also:
+
+* {%q: Non-Interpolable String Literals}[#label-25q-3A+Non-Interpolable+String+Literals]
+
+=== Literal String Concatenation
+
Adjacent string literals are automatically concatenated by the interpreter:
"con" "cat" "en" "at" "ion" #=> "concatenation"
@@ -211,10 +246,12 @@ be concatenated as long as a percent-string is not last.
%q{a} 'b' "c" #=> "abc"
"a" 'b' %q{c} #=> NameError: uninitialized constant q
+=== Character Literal
+
There is also a character literal notation to represent single
character strings, which syntax is a question mark (<tt>?</tt>)
-followed by a single character or escape sequence that corresponds to
-a single codepoint in the script encoding:
+followed by a single character or escape sequence (except continuation line)
+that corresponds to a single codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
@@ -228,11 +265,6 @@ a single codepoint in the script encoding:
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
-See also:
-
-* {%q: Non-Interpolable String Literals}[#label-25q-3A+Non-Interpolable+String+Literals]
-* {% and %Q: Interpolable String Literals}[#label-25+and+-25Q-3A+Interpolable+String+Literals]
-
=== Here Document Literals
If you are writing a large block of text you may use a "here document" or
@@ -283,9 +315,10 @@ its end is a multiple of eight. The amount to be removed is counted in terms
of the number of spaces. If the boundary appears in the middle of a tab, that
tab is not removed.
-A heredoc allows interpolation and escaped characters. You may disable
-interpolation and escaping by surrounding the opening identifier with single
-quotes:
+A heredoc allows interpolation and the escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences].
+You may disable interpolation and the escaping by surrounding the opening
+identifier with single quotes:
expected_result = <<-'EXPECTED'
One plus one is #{1 + 1}
@@ -326,12 +359,15 @@ details on what symbols are and when ruby creates them internally.
You may reference a symbol using a colon: <tt>:my_symbol</tt>.
-You may also create symbols by interpolation:
+You may also create symbols by interpolation and escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences] with double-quotes:
:"my_symbol1"
:"my_symbol#{1 + 1}"
+ :"foo\sbar"
-Like strings, a single-quote may be used to disable interpolation:
+Like strings, a single-quote may be used to disable interpolation and
+escape sequences:
:'my_symbol#{1 + 1}' #=> :"my_symbol\#{1 + 1}"
@@ -451,7 +487,12 @@ may use these paired delimiters:
* <tt>(</tt> and <tt>)</tt>.
* <tt>{</tt> and <tt>}</tt>.
* <tt><</tt> and <tt>></tt>.
-* Any other character, as both beginning and ending delimiters.
+* Non-alphanumeric ASCII character except above, as both beginning and ending delimiters.
+
+The delimiters can be escaped with a backslash.
+However, the first four pairs (brackets, parenthesis, braces, and
+angle brackets) are allowed without backslash as far as they are correctly
+paired.
These are demonstrated in the next section.
@@ -460,13 +501,20 @@ These are demonstrated in the next section.
You can write a non-interpolable string with <tt>%q</tt>.
The created string is the same as if you created it with single quotes:
- %[foo bar baz] # => "foo bar baz" # Using [].
- %(foo bar baz) # => "foo bar baz" # Using ().
- %{foo bar baz} # => "foo bar baz" # Using {}.
- %<foo bar baz> # => "foo bar baz" # Using <>.
- %|foo bar baz| # => "foo bar baz" # Using two |.
- %:foo bar baz: # => "foo bar baz" # Using two :.
+ %q[foo bar baz] # => "foo bar baz" # Using [].
+ %q(foo bar baz) # => "foo bar baz" # Using ().
+ %q{foo bar baz} # => "foo bar baz" # Using {}.
+ %q<foo bar baz> # => "foo bar baz" # Using <>.
+ %q|foo bar baz| # => "foo bar baz" # Using two |.
+ %q:foo bar baz: # => "foo bar baz" # Using two :.
%q(1 + 1 is #{1 + 1}) # => "1 + 1 is \#{1 + 1}" # No interpolation.
+ %q[foo[bar]baz] # => "foo[bar]baz" # brackets can be nested.
+ %q(foo(bar)baz) # => "foo(bar)baz" # parenthesis can be nested.
+ %q{foo{bar}baz} # => "foo{bar}baz" # braces can be nested.
+ %q<foo<bar>baz> # => "foo<bar>baz" # angle brackets can be nested.
+
+This is similar to single-quoted string but only backslashs and
+the specified delimiters can be escaped with a backslash.
=== <tt>% and %Q</tt>: Interpolable String Literals
@@ -476,30 +524,63 @@ or with its alias <tt>%</tt>:
%[foo bar baz] # => "foo bar baz"
%(1 + 1 is #{1 + 1}) # => "1 + 1 is 2" # Interpolation.
+This is similar to double-quoted string.
+It allow escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences].
+Other escaped characters (a backslash followed by a character) are
+interpreted as the character.
+
=== <tt>%w and %W</tt>: String-Array Literals
-You can write an array of strings with <tt>%w</tt> (non-interpolable)
-or <tt>%W</tt> (interpolable):
+You can write an array of strings as whitespace-separated words
+with <tt>%w</tt> (non-interpolable) or <tt>%W</tt> (interpolable):
%w[foo bar baz] # => ["foo", "bar", "baz"]
%w[1 % *] # => ["1", "%", "*"]
# Use backslash to embed spaces in the strings.
%w[foo\ bar baz\ bat] # => ["foo bar", "baz bat"]
+ %W[foo\ bar baz\ bat] # => ["foo bar", "baz bat"]
%w(#{1 + 1}) # => ["\#{1", "+", "1}"]
%W(#{1 + 1}) # => ["2"]
+ # The nested delimiters evaluated to a flat array of strings
+ # (not nested array).
+ %w[foo[bar baz]qux] # => ["foo[bar", "baz]qux"]
+
+The following characters are considered as white spaces to separate words:
+
+* space, ASCII 20h (SPC)
+* form feed, ASCII 0Ch (FF)
+* newline (line feed), ASCII 0Ah (LF)
+* carriage return, ASCII 0Dh (CR)
+* horizontal tab, ASCII 09h (TAB)
+* vertical tab, ASCII 0Bh (VT)
+
+The white space characters can be escaped with a backslash to make them
+part of a word.
+
+<tt>%W</tt> allow escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences].
+However the continuation line <tt>\<newline></tt> is not usable because
+it is interpreted as the escaped newline described above.
+
=== <tt>%i and %I</tt>: Symbol-Array Literals
-You can write an array of symbols with <tt>%i</tt> (non-interpolable)
-or <tt>%I</tt> (interpolable):
+You can write an array of symbols as whitespace-separated words
+with <tt>%i</tt> (non-interpolable) or <tt>%I</tt> (interpolable):
%i[foo bar baz] # => [:foo, :bar, :baz]
%i[1 % *] # => [:"1", :%, :*]
# Use backslash to embed spaces in the symbols.
%i[foo\ bar baz\ bat] # => [:"foo bar", :"baz bat"]
+ %I[foo\ bar baz\ bat] # => [:"foo bar", :"baz bat"]
%i(#{1 + 1}) # => [:"\#{1", :+, :"1}"]
%I(#{1 + 1}) # => [:"2"]
+The white space characters and its escapes are interpreted as the same as
+string-array literals described in
+{%w and %W: String-Array Literals}[#label-25w+and+-25W-3A+String-Array+Literals].
+
=== <tt>%s</tt>: Symbol Literals
You can write a symbol with <tt>%s</tt>:
@@ -507,6 +588,10 @@ You can write a symbol with <tt>%s</tt>:
%s[foo] # => :foo
%s[foo bar] # => :"foo bar"
+This is non-interpolable.
+No interpolation allowed.
+Only backslashs and the specified delimiters can be escaped with a backslash.
+
=== <tt>%r</tt>: Regexp Literals
You can write a regular expression with <tt>%r</tt>;
@@ -531,4 +616,10 @@ See {Regexp modes}[rdoc-ref:Regexp@Modes] for details.
You can write and execute a shell command with <tt>%x</tt>:
- %x(echo 1) # => "1\n"
+ %x(echo 1) # => "1\n"
+ %x[echo #{1 + 2}] # => "3\n"
+ %x[echo \u0030] # => "0\n"
+
+This is interpolable.
+<tt>%x</tt> allow escape sequences described in
+{Escape Sequences}[#label-Escape+Sequences].
diff --git a/doc/syntax/pattern_matching.rdoc b/doc/syntax/pattern_matching.rdoc
index e49c09a1f8..6a30380f46 100644
--- a/doc/syntax/pattern_matching.rdoc
+++ b/doc/syntax/pattern_matching.rdoc
@@ -422,7 +422,8 @@ These core and library classes implement deconstruction:
== Guard clauses
-+if+ can be used to attach an additional condition (guard clause) when the pattern matches. This condition may use bound variables:
++if+ can be used to attach an additional condition (guard clause) when the pattern matches in +case+/+in+ expressions.
+This condition may use bound variables:
case [1, 2]
in a, b if b == a*2
@@ -450,6 +451,11 @@ These core and library classes implement deconstruction:
end
#=> "matched"
+Note that <code>=></code> and +in+ operator can not have a guard clause.
+The following examples is parsed as a standalone expression with modifier +if+.
+
+ [1, 2] in a, b if b == a*2
+
== Appendix A. Pattern syntax
Approximate syntax is: