diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/contributing/building_ruby.md | 145 | ||||
-rw-r--r-- | doc/encodings.rdoc | 2 | ||||
-rw-r--r-- | doc/exceptions.md | 362 | ||||
-rw-r--r-- | doc/format_specifications.rdoc | 2 | ||||
-rw-r--r-- | doc/strscan/helper_methods.md | 128 | ||||
-rw-r--r-- | doc/strscan/link_refs.txt | 17 | ||||
-rw-r--r-- | doc/strscan/methods/get_byte.md | 30 | ||||
-rw-r--r-- | doc/strscan/methods/get_charpos.md | 19 | ||||
-rw-r--r-- | doc/strscan/methods/get_pos.md | 14 | ||||
-rw-r--r-- | doc/strscan/methods/getch.md | 43 | ||||
-rw-r--r-- | doc/strscan/methods/scan.md | 51 | ||||
-rw-r--r-- | doc/strscan/methods/scan_until.md | 52 | ||||
-rw-r--r-- | doc/strscan/methods/set_pos.md | 27 | ||||
-rw-r--r-- | doc/strscan/methods/skip.md | 43 | ||||
-rw-r--r-- | doc/strscan/methods/skip_until.md | 49 | ||||
-rw-r--r-- | doc/strscan/methods/terminate.md | 30 | ||||
-rw-r--r-- | doc/strscan/strscan.md | 543 | ||||
-rw-r--r-- | doc/syntax/literals.rdoc | 171 | ||||
-rw-r--r-- | doc/syntax/pattern_matching.rdoc | 8 |
19 files changed, 1651 insertions, 85 deletions
diff --git a/doc/contributing/building_ruby.md b/doc/contributing/building_ruby.md index 96cee40cb4..ce844b5026 100644 --- a/doc/contributing/building_ruby.md +++ b/doc/contributing/building_ruby.md @@ -8,28 +8,30 @@ For RubyGems, you will also need: - * OpenSSL 1.1.x or 3.0.x / LibreSSL - * libyaml 0.1.7 or later - * zlib + * [OpenSSL] 1.1.x or 3.0.x / [LibreSSL] + * [libyaml] 0.1.7 or later + * [zlib] If you want to build from the git repository, you will also need: - * autoconf - 2.67 or later - * gperf - 3.1 or later + * [autoconf] - 2.67 or later + * [gperf] - 3.1 or later * Usually unneeded; only if you edit some source files using gperf * ruby - 3.0 or later - * We can upgrade this version to system ruby version of the latest Ubuntu LTS. + * We can upgrade this version to system ruby version of the latest + Ubuntu LTS. 2. Install optional, recommended dependencies: - * libffi (to build fiddle) - * gmp (if you with to accelerate Bignum operations) - * libexecinfo (FreeBSD) - * rustc - 1.58.0 or later, if you wish to build - [YJIT](https://docs.ruby-lang.org/en/master/RubyVM/YJIT.html). + * [libffi] (to build fiddle) + * [gmp] (if you with to accelerate Bignum operations) + * [rustc] - 1.58.0 or later, if you wish to build + [YJIT](rdoc-ref:RubyVM::YJIT). - If you installed the libraries needed for extensions (openssl, readline, libyaml, zlib) into other than the OS default place, - typically using Homebrew on macOS, add `--with-EXTLIB-dir` options to `CONFIGURE_ARGS` environment variable. + If you installed the libraries needed for extensions (openssl, readline, + libyaml, zlib) into other than the OS default place, typically using + Homebrew on macOS, add `--with-EXTLIB-dir` options to `CONFIGURE_ARGS` + environment variable. ``` shell export CONFIGURE_ARGS="" @@ -38,6 +40,16 @@ done ``` +[OpenSSL]: https://www.openssl.org +[LibreSSL]: https://www.libressl.org +[libyaml]: https://github.com/yaml/libyaml/ +[zlib]: https://www.zlib.net +[autoconf]: https://www.gnu.org/software/autoconf/ +[gperf]: https://www.gnu.org/software/gperf/ +[libffi]: https://sourceware.org/libffi/ +[gmp]: https://gmplib.org +[rustc]: https://www.rust-lang.org + ## Quick start guide 1. Download ruby source code: @@ -46,8 +58,8 @@ 1. Build from the tarball: - Download the latest tarball from [ruby-lang.org](https://www.ruby-lang.org/en/downloads/) and - extract it. Example for Ruby 3.0.2: + Download the latest tarball from [Download Ruby] page and extract + it. Example for Ruby 3.0.2: ``` shell tar -xzf ruby-3.0.2.tar.gz @@ -75,7 +87,8 @@ mkdir build && cd build ``` - While it's not necessary to build in a separate directory, it's good practice to do so. + While it's not necessary to build in a separate directory, it's good + practice to do so. 3. We'll install Ruby in `~/.rubies/ruby-master`, so create the directory: @@ -89,7 +102,8 @@ ../configure --prefix="${HOME}/.rubies/ruby-master" ``` - - Also `-C` (or `--config-cache`) would reduce time to configure from the next time. + - Also `-C` (or `--config-cache`) would reduce time to configure from the + next time. 5. Build Ruby: @@ -105,16 +119,24 @@ make install ``` - - If you need to run `make install` with `sudo` and want to avoid document generation with different permissions, you can use - `make SUDO=sudo install`. + - If you need to run `make install` with `sudo` and want to avoid document + generation with different permissions, you can use `make SUDO=sudo + install`. + +[Download Ruby]: https://www.ruby-lang.org/en/downloads/ ### Unexplainable Build Errors -If you are having unexplainable build errors, after saving all your work, try running `git clean -xfd` in the source root to remove all git ignored local files. If you are working from a source directory that's been updated several times, you may have temporary build artifacts from previous releases which can cause build failures. +If you are having unexplainable build errors, after saving all your work, try +running `git clean -xfd` in the source root to remove all git ignored local +files. If you are working from a source directory that's been updated several +times, you may have temporary build artifacts from previous releases which can +cause build failures. ## Building on Windows -The documentation for building on Windows can be found [here](../windows.md). +The documentation for building on Windows can be found in [the separated +file](../windows.md). ## More details @@ -123,8 +145,9 @@ about Ruby's build to help out. ### Running make scripts in parallel -In GNU make and BSD make implementations, to run a specific make script in parallel, pass the flag `-j<number of processes>`. For instance, -to run tests on 8 processes, use: +In GNU make[^caution-gmake-3] and BSD make implementations, to run a specific make script in +parallel, pass the flag `-j<number of processes>`. For instance, to run tests +on 8 processes, use: ``` shell make test-all -j8 @@ -132,7 +155,9 @@ make test-all -j8 We can also set `MAKEFLAGS` to run _all_ `make` commands in parallel. -Having the right `--jobs` flag will ensure all processors are utilized when building software projects. To do this effectively, you can set `MAKEFLAGS` in your shell configuration/profile: +Having the right `--jobs` flag will ensure all processors are utilized when +building software projects. To do this effectively, you can set `MAKEFLAGS` in +your shell configuration/profile: ``` shell # On macOS with Fish shell: @@ -148,11 +173,15 @@ export MAKEFLAGS="--jobs "(nproc) export MAKEFLAGS="--jobs $(nproc)" ``` +[^caution-gmake-3]: **CAUTION**: GNU make 3 is missing some features for parallel execution, we +recommend to upgrade to GNU make 4 or later. + ### Miniruby vs Ruby -Miniruby is a version of Ruby which has no external dependencies and lacks certain features. -It can be useful in Ruby development because it allows for faster build times. Miniruby is -built before Ruby. A functional Miniruby is required to build Ruby. To build Miniruby: +Miniruby is a version of Ruby which has no external dependencies and lacks +certain features. It can be useful in Ruby development because it allows for +faster build times. Miniruby is built before Ruby. A functional Miniruby is +required to build Ruby. To build Miniruby: ``` shell make miniruby @@ -160,8 +189,9 @@ make miniruby ## Debugging -You can use either lldb or gdb for debugging. Before debugging, you need to create a `test.rb` -with the Ruby script you'd like to run. You can use the following make targets: +You can use either lldb or gdb for debugging. Before debugging, you need to +create a `test.rb` with the Ruby script you'd like to run. You can use the +following make targets: * `make run`: Runs `test.rb` using Miniruby * `make lldb`: Runs `test.rb` using Miniruby in lldb @@ -172,7 +202,8 @@ with the Ruby script you'd like to run. You can use the following make targets: ### Compiling for Debugging -You should configure Ruby without optimization and other flags that may interfere with debugging: +You should configure Ruby without optimization and other flags that may +interfere with debugging: ``` shell ./configure --enable-debug-env optflags="-O0 -fno-omit-frame-pointer" @@ -180,15 +211,23 @@ You should configure Ruby without optimization and other flags that may interfer ### Building with Address Sanitizer -Using the address sanitizer (ASAN) is a great way to detect memory issues. It can detect memory safety issues in Ruby itself, and also in any C extensions compiled with and loaded into a Ruby compiled with ASAN. +Using the address sanitizer (ASAN) is a great way to detect memory issues. It +can detect memory safety issues in Ruby itself, and also in any C extensions +compiled with and loaded into a Ruby compiled with ASAN. ``` shell ./autogen.sh mkdir build && cd build -../configure CC=clang cflags="-fsanitize=address -fno-omit-frame-pointer -DUSE_MN_THREADS=0" # and any other options you might like +../configure CC=clang-18 cflags="-fsanitize=address -fno-omit-frame-pointer -DUSE_MN_THREADS=0" # and any other options you might like make ``` -The compiled Ruby will now automatically crash with a report and a backtrace if ASAN detects a memory safety issue. To run Ruby's test suite under ASAN, issue the following command. Note that this will take quite a long time (over two hours on my laptop); the `RUBY_TEST_TIMEOUT_SCALE` and `SYNTAX_SUGEST_TIMEOUT` variables are required to make sure tests don't spuriously fail with timeouts when in fact they're just slow. + +The compiled Ruby will now automatically crash with a report and a backtrace +if ASAN detects a memory safety issue. To run Ruby's test suite under ASAN, +issue the following command. Note that this will take quite a long time (over +two hours on my laptop); the `RUBY_TEST_TIMEOUT_SCALE` and +`SYNTAX_SUGEST_TIMEOUT` variables are required to make sure tests don't +spuriously fail with timeouts when in fact they're just slow. ``` shell RUBY_TEST_TIMEOUT_SCALE=5 SYNTAX_SUGGEST_TIMEOUT=600 make check @@ -196,11 +235,30 @@ RUBY_TEST_TIMEOUT_SCALE=5 SYNTAX_SUGGEST_TIMEOUT=600 make check Please note, however, the following caveats! -* ASAN will not work properly on any currently released version of Ruby; the necessary support is currently only present on Ruby's master branch (and the whole test suite passes only as of commit [9d0a5148ae062a0481a4a18fbeb9cfd01dc10428](https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/9d0a5148ae062a0481a4a18fbeb9cfd01dc10428)) -* Due to [this bug](https://bugs.ruby-lang.org/issues/20243), Clang generates code for threadlocal variables which doesn't work with M:N threading. Thus, it's necessary to disable M:N threading support at build time for now (with the `-DUSE_MN_THREADS=0` configure argument). -* Currently, ASAN will only work correctly when using a recent head build of LLVM/Clang - it requires [this bugfix](https://github.com/llvm/llvm-project/pull/75290) related to multithreaded `fork`, which is not yet in any released version. See [here](https://llvm.org/docs/CMake.html) for instructions on how to build LLVM/Clang from source (note you will need at least the `clang` and `compiler-rt` projects enabled). Then, you will need to replace `CC=clang` in the instructions with an explicit path to your built Clang binary. -* ASAN has only been tested so far with Clang on Linux. It may or may not work with other compilers or on other platforms - please file an issue on [https://bugs.ruby-lang.org](https://bugs.ruby-lang.org) if you run into problems with such configurations (or, to report that they actually work properly!) -* In particular, although I have not yet tried it, I have reason to believe ASAN will _not_ work properly on macOS yet - the fix for the multithreaded fork issue was actually reverted for macOS (see [here](https://github.com/llvm/llvm-project/commit/2a03854e4ce9bb1bcd79a211063bc63c4657f92c)). Please open an issue on [https://bugs.ruby-lang.org](https://bugs.ruby-lang.org) if this is a problem for you. +* ASAN will not work properly on any currently released version of Ruby; the + necessary support is currently only present on Ruby's master branch (and the + whole test suite passes only as of commit [Revision 9d0a5148]). +* Due to [Bug #20243], Clang generates code for threadlocal variables which + doesn't work with M:N threading. Thus, it's necessary to disable M:N + threading support at build time for now (with the `-DUSE_MN_THREADS=0` + configure argument). +* ASAN will only work when using Clang version 18 or later - it requires + [llvm/llvm-project#75290] related to multithreaded `fork`. +* ASAN has only been tested so far with Clang on Linux. It may or may not work + with other compilers or on other platforms - please file an issue on + [Ruby Issue Tracking System] if you run into problems with such configurations + (or, to report that they actually work properly!) +* In particular, although I have not yet tried it, I have reason to believe + ASAN will _not_ work properly on macOS yet - the fix for the multithreaded + fork issue was actually reverted for macOS (see [llvm/llvm-project#75659]). + Please open an issue on [Ruby Issue Tracking System] if this is a problem for + you. + +[Revision 9d0a5148]: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/9d0a5148ae062a0481a4a18fbeb9cfd01dc10428 +[Bug #20243]: https://bugs.ruby-lang.org/issues/20243 +[llvm/llvm-project#75290]: https://github.com/llvm/llvm-project/pull/75290 +[llvm/llvm-project#75659]: https://github.com/llvm/llvm-project/pull/75659#issuecomment-1861584777 +[Ruby Issue Tracking System]: https://bugs.ruby-lang.org ## How to measure coverage of C and Ruby code @@ -217,11 +275,12 @@ make lcov open lcov-out/index.html ``` -If you need only C code coverage, you can remove `COVERAGE=true` from the above process. -You can also use `gcov` command directly to get per-file coverage. +If you need only C code coverage, you can remove `COVERAGE=true` from the +above process. You can also use `gcov` command directly to get per-file +coverage. -If you need only Ruby code coverage, you can remove `--enable-gcov`. -Note that `test-coverage.dat` accumulates all runs of `make test-all`. -Make sure that you remove the file if you want to measure one test run. +If you need only Ruby code coverage, you can remove `--enable-gcov`. Note +that `test-coverage.dat` accumulates all runs of `make test-all`. Make sure +that you remove the file if you want to measure one test run. You can see the coverage result of CI: https://rubyci.org/coverage diff --git a/doc/encodings.rdoc b/doc/encodings.rdoc index 97c0d22616..d85099cdbc 100644 --- a/doc/encodings.rdoc +++ b/doc/encodings.rdoc @@ -419,7 +419,7 @@ These keyword-value pairs specify encoding options: hash = {"\u3042" => 'xyzzy'} hash.default = 'XYZZY' - s.encode('ASCII', fallback: h) # => "xyzzyfooXYZZY" + s.encode('ASCII', fallback: hash) # => "xyzzyfooXYZZY" def (fallback = "U+%.4X").escape(x) self % x.unpack("U") diff --git a/doc/exceptions.md b/doc/exceptions.md new file mode 100644 index 0000000000..4db2f26c18 --- /dev/null +++ b/doc/exceptions.md @@ -0,0 +1,362 @@ +# Exceptions + +Ruby code can raise exceptions. + +Most often, a raised exception is meant to alert the running program +that an unusual (i.e., _exceptional_) situation has arisen, +and may need to be handled. + +Code throughout the Ruby core, Ruby standard library, and Ruby gems generates exceptions +in certain circumstances: + +``` +File.open('nope.txt') # Raises Errno::ENOENT: "No such file or directory" +``` + +## Raised Exceptions + +A raised exception transfers program execution, one way or another. + +### Unrescued Exceptions + +If an exception not _rescued_ +(see [Rescued Exceptions](#label-Rescued+Exceptions) below), +execution transfers to code in the Ruby interpreter +that prints a message and exits the program (or thread): + +``` +$ ruby -e "raise" +-e:1:in `<main>': unhandled exception +``` + +### Rescued Exceptions + +An <i>exception handler</i> may determine what is to happen +when an exception is raised; +the handler may _rescue_ an exception, +and may prevent the program from exiting. + +A simple example: + +``` +begin + raise 'Boom!' # Raises an exception, transfers control. + puts 'Will not get here.' +rescue + puts 'Rescued an exception.' # Control transferred to here; program does not exit. +end +puts 'Got here.' +``` + +Output: + +``` +Rescued an exception. +Got here. +``` + +An exception handler has several elements: + +| Element | Use | +|-----------------------------|------------------------------------------------------------------------------------------| +| Begin clause. | Begins the handler and contains the code whose raised exception, if any, may be rescued. | +| One or more rescue clauses. | Each contains "rescuing" code, which is to be executed for certain exceptions. | +| Else clause (optional). | Contains code to be executed if no exception is raised. | +| Ensure clause (optional). | Contains code to be executed whether or not an exception is raised, or is rescued. | +| <tt>end</tt> statement. | Ends the handler. ` | + +#### Begin Clause + +The begin clause begins the exception handler: + +- May start with a `begin` statement; + see also [Begin-Less Exception Handlers](#label-Begin-Less+Exception+Handlers). +- Contains code whose raised exception (if any) is covered + by the handler. +- Ends with the first following `rescue` statement. + +#### Rescue Clauses + +A rescue clause: + +- Starts with a `rescue` statement. +- Contains code that is to be executed for certain raised exceptions. +- Ends with the first following `rescue`, + `else`, `ensure`, or `end` statement. + +A `rescue` statement may include one or more classes +that are to be rescued; +if none is given, StandardError is assumed. + +The rescue clause rescues both the specified class +(or StandardError if none given) or any of its subclasses; +(see [Built-In Exception Classes](rdoc-ref:Exception@Built-In+Exception+Classes) +for the hierarchy of Ruby built-in exception classes): + + +``` +begin + 1 / 0 # Raises ZeroDivisionError, a subclass of StandardError. +rescue + puts "Rescued #{$!.class}" +end +``` + +Output: + +``` +Rescued ZeroDivisionError +``` + +If the `rescue` statement specifies an exception class, +only that class (or one of its subclasses) is rescued; +this example exits with a ZeroDivisionError, +which was not rescued because it is not ArgumentError or one of its subclasses: + +``` +begin + 1 / 0 +rescue ArgumentError + puts "Rescued #{$!.class}" +end +``` + +A `rescue` statement may specify multiple classes, +which means that its code rescues an exception +of any of the given classes (or their subclasses): + +``` +begin + 1 / 0 +rescue FloatDomainError, ZeroDivisionError + puts "Rescued #{$!.class}" +end +``` + +An exception handler may contain multiple rescue clauses; +in that case, the first clause that rescues the exception does so, +and those before and after are ignored: + +``` +begin + Dir.open('nosuch') +rescue Errno::ENOTDIR + puts "Rescued #{$!.class}" +rescue Errno::ENOENT + puts "Rescued #{$!.class}" +end +``` + +Output: + +``` +Rescued Errno::ENOENT +``` + +A `rescue` statement may specify a variable +whose value becomes the rescued exception +(an instance of Exception or one of its subclasses: + +``` +begin + 1 / 0 +rescue => x + puts x.class + puts x.message +end +``` + +Output: + +``` +ZeroDivisionError +divided by 0 +``` + +In the rescue clause, these global variables are defined: + +- `$!`": the current exception instance. +- `$@`: its backtrace. + +#### Else Clause + +The `else` clause: + +- Starts with an `else` statement. +- Contains code that is to be executed if no exception is raised in the begin clause. +- Ends with the first following `ensure` or `end` statement. + +``` +begin + puts 'Begin.' +rescue + puts 'Rescued an exception!' +else + puts 'No exception raised.' +end +``` + +Output: + +``` +Begin. +No exception raised. +``` + +#### Ensure Clause + +The ensure clause: + +- Starts with an `ensure` statement. +- Contains code that is to be executed + regardless of whether an exception is raised, + and regardless of whether a raised exception is handled. +- Ends with the first following `end` statement. + +``` +def foo(boom: false) + puts 'Begin.' + raise 'Boom!' if boom +rescue + puts 'Rescued an exception!' +else + puts 'No exception raised.' +ensure + puts 'Always do this.' +end + +foo(boom: true) +foo(boom: false) +``` + +Output: + +``` +Begin. +Rescued an exception! +Always do this. +Begin. +No exception raised. +Always do this. +``` + +#### End Statement + +The `end` statement ends the handler. + +Code following it is reached only if any raised exception is rescued. + +#### Begin-Less \Exception Handlers + +As seen above, an exception handler may be implemented with `begin` and `end`. + +An exception handler may also be implemented as: + +- A method body: + + ``` + def foo(boom: false) # Serves as beginning of exception handler. + puts 'Begin.' + raise 'Boom!' if boom + rescue + puts 'Rescued an exception!' + else + puts 'No exception raised.' + end # Serves as end of exception handler. + ``` + +- A block: + + ``` + Dir.chdir('.') do |dir| # Serves as beginning of exception handler. + raise 'Boom!' + rescue + puts 'Rescued an exception!' + end # Serves as end of exception handler. + ``` + +#### Re-Raising an \Exception + +It can be useful to rescue an exception, but allow its eventual effect; +for example, a program can rescue an exception, log data about it, +and then "reinstate" the exception. + +This may be done via the `raise` method, but in a special way; +a rescuing clause: + + - Captures an exception. + - Does whatever is needed concerning the exception (such as logging it). + - Calls method `raise` with no argument, + which raises the rescued exception: + +``` +begin + 1 / 0 +rescue ZeroDivisionError + # Do needful things (like logging). + raise # Raised exception will be ZeroDivisionError, not RuntimeError. +end +``` + +Output: + +``` +ruby t.rb +t.rb:2:in `/': divided by 0 (ZeroDivisionError) + from t.rb:2:in `<main>' +``` + +#### Retrying + +It can be useful to retry a begin clause; +for example, if it must access a possibly-volatile resource +(such as a web page), +it can be useful to try the access more than once +(in the hope that it may become available): + +``` +retries = 0 +begin + puts "Try ##{retries}." + raise 'Boom' +rescue + puts "Rescued retry ##{retries}." + if (retries += 1) < 3 + puts 'Retrying' + retry + else + puts 'Giving up.' + raise + end +end +``` + +``` +Try #0. +Rescued retry #0. +Retrying +Try #1. +Rescued retry #1. +Retrying +Try #2. +Rescued retry #2. +Giving up. +# RuntimeError ('Boom') raised. +``` + +Note that the retry re-executes the entire begin clause, +not just the part after the point of failure. + +## Raising an \Exception + +Raise an exception with method Kernel#raise. + +## Custom Exceptions + +To provide additional or alternate information, +you may create custom exception classes; +each should be a subclass of one of the built-in exception classes: + +``` +class MyException < StandardError; end +``` diff --git a/doc/format_specifications.rdoc b/doc/format_specifications.rdoc index 1111575e74..bdfdc24953 100644 --- a/doc/format_specifications.rdoc +++ b/doc/format_specifications.rdoc @@ -233,6 +233,8 @@ Format +argument+ as a single character: sprintf('%c', 'A') # => "A" sprintf('%c', 65) # => "A" +This behaves like String#<<, except for raising ArgumentError instead of RangeError. + === Specifier +d+ Format +argument+ as a decimal integer: diff --git a/doc/strscan/helper_methods.md b/doc/strscan/helper_methods.md new file mode 100644 index 0000000000..6555a2ce66 --- /dev/null +++ b/doc/strscan/helper_methods.md @@ -0,0 +1,128 @@ +## Helper Methods + +These helper methods display values returned by scanner's methods. + +### `put_situation(scanner)` + +Display scanner's situation: + +- Byte position (`#pos`). +- Character position (`#charpos`) +- Target string (`#rest`) and size (`#rest_size`). + +``` +scanner = StringScanner.new('foobarbaz') +scanner.scan(/foo/) +put_situation(scanner) +# Situation: +# pos: 3 +# charpos: 3 +# rest: "barbaz" +# rest_size: 6 +``` + +### `put_match_values(scanner)` + +Display the scanner's match values: + +``` +scanner = StringScanner.new('Fri Dec 12 1975 14:39') +pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) / +scanner.match?(pattern) +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 11 +# pre_match: "" +# matched : "Fri Dec 12 " +# post_match: "1975 14:39" +# Captured match values: +# size: 4 +# captures: ["Fri", "Dec", "12"] +# named_captures: {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"} +# values_at: ["Fri Dec 12 ", "Fri", "Dec", "12", nil] +# []: +# [0]: "Fri Dec 12 " +# [1]: "Fri" +# [2]: "Dec" +# [3]: "12" +# [4]: nil +``` + +### `match_values_cleared?(scanner)` + +Returns whether the scanner's match values are all properly cleared: + +``` +scanner = StringScanner.new('foobarbaz') +match_values_cleared?(scanner) # => true +put_match_values(scanner) +# Basic match values: +# matched?: false +# matched_size: nil +# pre_match: nil +# matched : nil +# post_match: nil +# Captured match values: +# size: nil +# captures: nil +# named_captures: {} +# values_at: nil +# [0]: nil +scanner.scan(/foo/) +match_values_cleared?(scanner) # => false +``` + +## The Code + +``` +def put_situation(scanner) + puts '# Situation:' + puts "# pos: #{scanner.pos}" + puts "# charpos: #{scanner.charpos}" + puts "# rest: #{scanner.rest.inspect}" + puts "# rest_size: #{scanner.rest_size}" +end +``` + +``` +def put_match_values(scanner) + puts '# Basic match values:' + puts "# matched?: #{scanner.matched?}" + value = scanner.matched_size || 'nil' + puts "# matched_size: #{value}" + puts "# pre_match: #{scanner.pre_match.inspect}" + puts "# matched : #{scanner.matched.inspect}" + puts "# post_match: #{scanner.post_match.inspect}" + puts '# Captured match values:' + puts "# size: #{scanner.size}" + puts "# captures: #{scanner.captures}" + puts "# named_captures: #{scanner.named_captures}" + if scanner.size.nil? + puts "# values_at: #{scanner.values_at(0)}" + puts "# [0]: #{scanner[0]}" + else + puts "# values_at: #{scanner.values_at(*(0..scanner.size))}" + puts "# []:" + scanner.size.times do |i| + puts "# [#{i}]: #{scanner[i].inspect}" + end + end +end +``` + +``` +def match_values_cleared?(scanner) + scanner.matched? == false && + scanner.matched_size.nil? && + scanner.matched.nil? && + scanner.pre_match.nil? && + scanner.post_match.nil? && + scanner.size.nil? && + scanner[0].nil? && + scanner.captures.nil? && + scanner.values_at(0..1).nil? && + scanner.named_captures == {} +end +``` + diff --git a/doc/strscan/link_refs.txt b/doc/strscan/link_refs.txt new file mode 100644 index 0000000000..19f6f7ce5c --- /dev/null +++ b/doc/strscan/link_refs.txt @@ -0,0 +1,17 @@ +[1]: rdoc-ref:StringScanner@Stored+String +[2]: rdoc-ref:StringScanner@Byte+Position+-28Position-29 +[3]: rdoc-ref:StringScanner@Target+Substring +[4]: rdoc-ref:StringScanner@Setting+the+Target+Substring +[5]: rdoc-ref:StringScanner@Traversing+the+Target+Substring +[6]: https://docs.ruby-lang.org/en/master/Regexp.html +[7]: rdoc-ref:StringScanner@Character+Position +[8]: https://docs.ruby-lang.org/en/master/String.html#method-i-5B-5D +[9]: rdoc-ref:StringScanner@Match+Values +[10]: rdoc-ref:StringScanner@Fixed-Anchor+Property +[11]: rdoc-ref:StringScanner@Positions +[13]: rdoc-ref:StringScanner@Captured+Match+Values +[14]: rdoc-ref:StringScanner@Querying+the+Target+Substring +[15]: rdoc-ref:StringScanner@Searching+the+Target+Substring +[16]: https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-Groups+and+Captures +[17]: rdoc-ref:StringScanner@Matching +[18]: rdoc-ref:StringScanner@Basic+Match+Values diff --git a/doc/strscan/methods/get_byte.md b/doc/strscan/methods/get_byte.md new file mode 100644 index 0000000000..2f23be1899 --- /dev/null +++ b/doc/strscan/methods/get_byte.md @@ -0,0 +1,30 @@ +call-seq: + get_byte -> byte_as_character or nil + +Returns the next byte, if available: + +- If the [position][2] + is not at the end of the [stored string][1]: + + - Returns the next byte. + - Increments the [byte position][2]. + - Adjusts the [character position][7]. + + ``` + scanner = StringScanner.new(HIRAGANA_TEXT) + # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82..."> + scanner.string # => "こんにちは" + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1] + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2] + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3] + [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2] + ``` + +- Otherwise, returns `nil`, and does not change the positions. + + ``` + scanner.terminate + [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5] + ``` diff --git a/doc/strscan/methods/get_charpos.md b/doc/strscan/methods/get_charpos.md new file mode 100644 index 0000000000..f77563c860 --- /dev/null +++ b/doc/strscan/methods/get_charpos.md @@ -0,0 +1,19 @@ +call-seq: + charpos -> character_position + +Returns the [character position][7] (initially zero), +which may be different from the [byte position][2] +given by method #pos: + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.getch # => "こ" # 3-byte character. +scanner.getch # => "ん" # 3-byte character. +put_situation(scanner) +# Situation: +# pos: 6 +# charpos: 2 +# rest: "にちは" +# rest_size: 9 +``` diff --git a/doc/strscan/methods/get_pos.md b/doc/strscan/methods/get_pos.md new file mode 100644 index 0000000000..56bcef3274 --- /dev/null +++ b/doc/strscan/methods/get_pos.md @@ -0,0 +1,14 @@ +call-seq: + pos -> byte_position + +Returns the integer [byte position][2], +which may be different from the [character position][7]: + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos # => 0 +scanner.getch # => "こ" # 3-byte character. +scanner.charpos # => 1 +scanner.pos # => 3 +``` diff --git a/doc/strscan/methods/getch.md b/doc/strscan/methods/getch.md new file mode 100644 index 0000000000..b57732ad7c --- /dev/null +++ b/doc/strscan/methods/getch.md @@ -0,0 +1,43 @@ +call-seq: + getch -> character or nil + +Returns the next (possibly multibyte) character, +if available: + +- If the [position][2] + is at the beginning of a character: + + - Returns the character. + - Increments the [character position][7] by 1. + - Increments the [byte position][2] + by the size (in bytes) of the character. + + ``` + scanner = StringScanner.new(HIRAGANA_TEXT) + scanner.string # => "こんにちは" + [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1] + [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2] + [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3] + [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4] + [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5] + [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5] + ``` + +- If the [position][2] is within a multi-byte character + (that is, not at its beginning), + behaves like #get_byte (returns a 1-byte character): + + ``` + scanner.pos = 1 + [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] + [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] + [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2] + ``` + +- If the [position][2] is at the end of the [stored string][1], + returns `nil` and does not modify the positions: + + ``` + scanner.terminate + [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5] + ``` diff --git a/doc/strscan/methods/scan.md b/doc/strscan/methods/scan.md new file mode 100644 index 0000000000..714fa9910a --- /dev/null +++ b/doc/strscan/methods/scan.md @@ -0,0 +1,51 @@ +call-seq: + scan(pattern) -> substring or nil + +Attempts to [match][17] the given `pattern` +at the beginning of the [target substring][3]. + +If the match succeeds: + +- Returns the matched substring. +- Increments the [byte position][2] by <tt>substring.bytesize</tt>, + and may increment the [character position][7]. +- Sets [match values][9]. + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos = 6 +scanner.scan(/に/) # => "に" +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 3 +# pre_match: "こん" +# matched : "に" +# post_match: "ちは" +# Captured match values: +# size: 1 +# captures: [] +# named_captures: {} +# values_at: ["に", nil] +# []: +# [0]: "に" +# [1]: nil +put_situation(scanner) +# Situation: +# pos: 9 +# charpos: 3 +# rest: "ちは" +# rest_size: 6 +``` + +If the match fails: + +- Returns `nil`. +- Does not increment byte and character positions. +- Clears match values. + +``` +scanner.scan(/nope/) # => nil +match_values_cleared?(scanner) # => true +``` diff --git a/doc/strscan/methods/scan_until.md b/doc/strscan/methods/scan_until.md new file mode 100644 index 0000000000..3b7ff2c3a9 --- /dev/null +++ b/doc/strscan/methods/scan_until.md @@ -0,0 +1,52 @@ +call-seq: + scan_until(pattern) -> substring or nil + +Attempts to [match][17] the given `pattern` +anywhere (at any [position][2]) in the [target substring][3]. + +If the match attempt succeeds: + +- Sets [match values][9]. +- Sets the [byte position][2] to the end of the matched substring; + may adjust the [character position][7]. +- Returns the matched substring. + + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos = 6 +scanner.scan_until(/ち/) # => "にち" +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 3 +# pre_match: "こんに" +# matched : "ち" +# post_match: "は" +# Captured match values: +# size: 1 +# captures: [] +# named_captures: {} +# values_at: ["ち", nil] +# []: +# [0]: "ち" +# [1]: nil +put_situation(scanner) +# Situation: +# pos: 12 +# charpos: 4 +# rest: "は" +# rest_size: 3 +``` + +If the match attempt fails: + +- Clears match data. +- Returns `nil`. +- Does not update positions. + +``` +scanner.scan_until(/nope/) # => nil +match_values_cleared?(scanner) # => true +``` diff --git a/doc/strscan/methods/set_pos.md b/doc/strscan/methods/set_pos.md new file mode 100644 index 0000000000..230177109c --- /dev/null +++ b/doc/strscan/methods/set_pos.md @@ -0,0 +1,27 @@ +call-seq: + pos = n -> n + pointer = n -> n + +Sets the [byte position][2] and the [character position][11]; +returns `n`. + +Does not affect [match values][9]. + +For non-negative `n`, sets the position to `n`: + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos = 3 # => 3 +scanner.rest # => "んにちは" +scanner.charpos # => 1 +``` + +For negative `n`, counts from the end of the [stored string][1]: + +``` +scanner.pos = -9 # => -9 +scanner.pos # => 6 +scanner.rest # => "にちは" +scanner.charpos # => 2 +``` diff --git a/doc/strscan/methods/skip.md b/doc/strscan/methods/skip.md new file mode 100644 index 0000000000..656f134c5a --- /dev/null +++ b/doc/strscan/methods/skip.md @@ -0,0 +1,43 @@ +call-seq: + skip(pattern) match_size or nil + +Attempts to [match][17] the given `pattern` +at the beginning of the [target substring][3]; + +If the match succeeds: + +- Increments the [byte position][2] by substring.bytesize, + and may increment the [character position][7]. +- Sets [match values][9]. +- Returns the size (bytes) of the matched substring. + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos = 6 +scanner.skip(/に/) # => 3 +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 3 +# pre_match: "こん" +# matched : "に" +# post_match: "ちは" +# Captured match values: +# size: 1 +# captures: [] +# named_captures: {} +# values_at: ["に", nil] +# []: +# [0]: "に" +# [1]: nil +put_situation(scanner) +# Situation: +# pos: 9 +# charpos: 3 +# rest: "ちは" +# rest_size: 6 + +scanner.skip(/nope/) # => nil +match_values_cleared?(scanner) # => true +``` diff --git a/doc/strscan/methods/skip_until.md b/doc/strscan/methods/skip_until.md new file mode 100644 index 0000000000..5187a4826f --- /dev/null +++ b/doc/strscan/methods/skip_until.md @@ -0,0 +1,49 @@ +call-seq: + skip_until(pattern) -> matched_substring_size or nil + +Attempts to [match][17] the given `pattern` +anywhere (at any [position][2]) in the [target substring][3]; +does not modify the positions. + +If the match attempt succeeds: + +- Sets [match values][9]. +- Returns the size of the matched substring. + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.pos = 6 +scanner.skip_until(/ち/) # => 6 +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 3 +# pre_match: "こんに" +# matched : "ち" +# post_match: "は" +# Captured match values: +# size: 1 +# captures: [] +# named_captures: {} +# values_at: ["ち", nil] +# []: +# [0]: "ち" +# [1]: nil +put_situation(scanner) +# Situation: +# pos: 12 +# charpos: 4 +# rest: "は" +# rest_size: 3 +``` + +If the match attempt fails: + +- Clears match values. +- Returns `nil`. + +``` +scanner.skip_until(/nope/) # => nil +match_values_cleared?(scanner) # => true +``` diff --git a/doc/strscan/methods/terminate.md b/doc/strscan/methods/terminate.md new file mode 100644 index 0000000000..fd55727099 --- /dev/null +++ b/doc/strscan/methods/terminate.md @@ -0,0 +1,30 @@ +call-seq: + terminate -> self + +Sets the scanner to end-of-string; +returns +self+: + +- Sets both [positions][11] to end-of-stream. +- Clears [match values][9]. + +``` +scanner = StringScanner.new(HIRAGANA_TEXT) +scanner.string # => "こんにちは" +scanner.scan_until(/に/) +put_situation(scanner) +# Situation: +# pos: 9 +# charpos: 3 +# rest: "ちは" +# rest_size: 6 +match_values_cleared?(scanner) # => false + +scanner.terminate # => #<StringScanner fin> +put_situation(scanner) +# Situation: +# pos: 15 +# charpos: 5 +# rest: "" +# rest_size: 0 +match_values_cleared?(scanner) # => true +``` diff --git a/doc/strscan/strscan.md b/doc/strscan/strscan.md new file mode 100644 index 0000000000..465cebd4cb --- /dev/null +++ b/doc/strscan/strscan.md @@ -0,0 +1,543 @@ +\Class `StringScanner` supports processing a stored string as a stream; +this code creates a new `StringScanner` object with string `'foobarbaz'`: + +``` +require 'strscan' +scanner = StringScanner.new('foobarbaz') +``` + +## About the Examples + +All examples here assume that `StringScanner` has been required: + +``` +require 'strscan' +``` + +Some examples here assume that these constants are defined: + +``` +MULTILINE_TEXT = <<~EOT +Go placidly amid the noise and haste, +and remember what peace there may be in silence. +EOT + +HIRAGANA_TEXT = 'こんにちは' + +ENGLISH_TEXT = 'Hello' +``` + +Some examples here assume that certain helper methods are defined: + +- `put_situation(scanner)`: + Displays the values of the scanner's + methods #pos, #charpos, #rest, and #rest_size. +- `put_match_values(scanner)`: + Displays the scanner's [match values][9]. +- `match_values_cleared?(scanner)`: + Returns whether the scanner's [match values][9] are cleared. + +See examples [here][ext/strscan/helper_methods_md.html]. + +## The `StringScanner` \Object + +This code creates a `StringScanner` object +(we'll call it simply a _scanner_), +and shows some of its basic properties: + +``` +scanner = StringScanner.new('foobarbaz') +scanner.string # => "foobarbaz" +put_situation(scanner) +# Situation: +# pos: 0 +# charpos: 0 +# rest: "foobarbaz" +# rest_size: 9 +``` + +The scanner has: + +* A <i>stored string</i>, which is: + + * Initially set by StringScanner.new(string) to the given `string` + (`'foobarbaz'` in the example above). + * Modifiable by methods #string=(new_string) and #concat(more_string). + * Returned by method #string. + + More at [Stored String][1] below. + +* A _position_; + a zero-based index into the bytes of the stored string (_not_ into its characters): + + * Initially set by StringScanner.new to `0`. + * Returned by method #pos. + * Modifiable explicitly by methods #reset, #terminate, and #pos=(new_pos). + * Modifiable implicitly (various traversing methods, among others). + + More at [Byte Position][2] below. + +* A <i>target substring</i>, + which is a trailing substring of the stored string; + it extends from the current position to the end of the stored string: + + * Initially set by StringScanner.new(string) to the given `string` + (`'foobarbaz'` in the example above). + * Returned by method #rest. + * Modified by any modification to either the stored string or the position. + + <b>Most importantly</b>: + the searching and traversing methods operate on the target substring, + which may be (and often is) less than the entire stored string. + + More at [Target Substring][3] below. + +## Stored \String + +The <i>stored string</i> is the string stored in the `StringScanner` object. + +Each of these methods sets, modifies, or returns the stored string: + +| Method | Effect | +|----------------------|-------------------------------------------------| +| ::new(string) | Creates a new scanner for the given string. | +| #string=(new_string) | Replaces the existing stored string. | +| #concat(more_string) | Appends a string to the existing stored string. | +| #string | Returns the stored string. | + +## Positions + +A `StringScanner` object maintains a zero-based <i>byte position</i> +and a zero-based <i>character position</i>. + +Each of these methods explicitly sets positions: + +| Method | Effect | +|--------------------------|----------------------------------------------------------| +| #reset | Sets both positions to zero (begining of stored string). | +| #terminate | Sets both positions to the end of the stored string. | +| #pos=(new_byte_position) | Sets byte position; adjusts character position. | + +### Byte Position (Position) + +The byte position (or simply _position_) +is a zero-based index into the bytes in the scanner's stored string; +for a new `StringScanner` object, the byte position is zero. + +When the byte position is: + +* Zero (at the beginning), the target substring is the entire stored string. +* Equal to the size of the stored string (at the end), + the target substring is the empty string `''`. + +To get or set the byte position: + +* \#pos: returns the byte position. +* \#pos=(new_pos): sets the byte position. + +Many methods use the byte position as the basis for finding matches; +many others set, increment, or decrement the byte position: + +``` +scanner = StringScanner.new('foobar') +scanner.pos # => 0 +scanner.scan(/foo/) # => "foo" # Match found. +scanner.pos # => 3 # Byte position incremented. +scanner.scan(/foo/) # => nil # Match not found. +scanner.pos # => 3 # Byte position not changed. +``` + +Some methods implicitly modify the byte position; +see: + +* [Setting the Target Substring][4]. +* [Traversing the Target Substring][5]. + +The values of these methods are derived directly from the values of #pos and #string: + +- \#charpos: the [character position][7]. +- \#rest: the [target substring][3]. +- \#rest_size: `rest.size`. + +### Character Position + +The character position is a zero-based index into the _characters_ +in the stored string; +for a new `StringScanner` object, the character position is zero. + +\Method #charpos returns the character position; +its value may not be reset explicitly. + +Some methods change (increment or reset) the character position; +see: + +* [Setting the Target Substring][4]. +* [Traversing the Target Substring][5]. + +Example (string includes multi-byte characters): + +``` +scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters. +scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters +scanner.string # => "Helloこんにちは" # Twenty bytes in all. +put_situation(scanner) +# Situation: +# pos: 0 +# charpos: 0 +# rest: "Helloこんにちは" +# rest_size: 20 +scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters. +put_situation(scanner) +# Situation: +# pos: 5 +# charpos: 5 +# rest: "こんにちは" +# rest_size: 15 +scanner.getch # => "こ" # One 3-byte character. +put_situation(scanner) +# Situation: +# pos: 8 +# charpos: 6 +# rest: "んにちは" +# rest_size: 12``` + +## Target Substring + +The target substring is the the part of the [stored string][1] +that extends from the current [byte position][2] to the end of the stored string; +it is always either: + +- The entire stored string (byte position is zero). +- A trailing substring of the stored string (byte position positive). + +The target substring is returned by method #rest, +and its size is returned by method #rest_size. + +Examples: + +``` +scanner = StringScanner.new('foobarbaz') +put_situation(scanner) +# Situation: +# pos: 0 +# charpos: 0 +# rest: "foobarbaz" +# rest_size: 9 +scanner.pos = 3 +put_situation(scanner) +# Situation: +# pos: 3 +# charpos: 3 +# rest: "barbaz" +# rest_size: 6 +scanner.pos = 9 +put_situation(scanner) +# Situation: +# pos: 9 +# charpos: 9 +# rest: "" +# rest_size: 0 +``` + +### Setting the Target Substring + +The target substring is set whenever: + +* The [stored string][1] is set (position reset to zero; target substring set to stored string). +* The [byte position][2] is set (target substring adjusted accordingly). + +### Querying the Target Substring + +This table summarizes (details and examples at the links): + +| Method | Returns | +|------------|-----------------------------------| +| #rest | Target substring. | +| #rest_size | Size (bytes) of target substring. | + +### Searching the Target Substring + +A _search_ method examines the target substring, +but does not advance the [positions][11] +or (by implication) shorten the target substring. + +This table summarizes (details and examples at the links): + +| Method | Returns | Sets Match Values? | +|-----------------------|-----------------------------------------------|--------------------| +| #check(pattern) | Matched leading substring or +nil+. | Yes. | +| #check_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. | +| #exist?(pattern) | Matched substring (anywhere) end index. | Yes. | +| #match?(pattern) | Size of matched leading substring or +nil+. | Yes. | +| #peek(size) | Leading substring of given length (bytes). | No. | +| #peek_byte | Integer leading byte or +nil+. | No. | +| #rest | Target substring (from byte position to end). | No. | + +### Traversing the Target Substring + +A _traversal_ method examines the target substring, +and, if successful: + +- Advances the [positions][11]. +- Shortens the target substring. + + +This table summarizes (details and examples at links): + +| Method | Returns | Sets Match Values? | +|----------------------|------------------------------------------------------|--------------------| +| #get_byte | Leading byte or +nil+. | No. | +| #getch | Leading character or +nil+. | No. | +| #scan(pattern) | Matched leading substring or +nil+. | Yes. | +| #scan_byte | Integer leading byte or +nil+. | No. | +| #scan_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. | +| #skip(pattern) | Matched leading substring size or +nil+. | Yes. | +| #skip_until(pattern) | Position delta to end-of-matched-substring or +nil+. | Yes. | +| #unscan | +self+. | No. | + +## Querying the Scanner + +Each of these methods queries the scanner object +without modifying it (details and examples at links) + +| Method | Returns | +|---------------------|----------------------------------| +| #beginning_of_line? | +true+ or +false+. | +| #charpos | Character position. | +| #eos? | +true+ or +false+. | +| #fixed_anchor? | +true+ or +false+. | +| #inspect | String representation of +self+. | +| #pos | Byte position. | +| #rest | Target substring. | +| #rest_size | Size of target substring. | +| #string | Stored string. | + +## Matching + +`StringScanner` implements pattern matching via Ruby class [Regexp][6], +and its matching behaviors are the same as Ruby's +except for the [fixed-anchor property][10]. + +### Matcher Methods + +Each <i>matcher method</i> takes a single argument `pattern`, +and attempts to find a matching substring in the [target substring][3]. + +| Method | Pattern Type | Matches Target Substring | Success Return | May Update Positions? | +|--------------|-------------------|--------------------------|--------------------|-----------------------| +| #check | Regexp or String. | At beginning. | Matched substring. | No. | +| #check_until | Regexp. | Anywhere. | Substring. | No. | +| #match? | Regexp or String. | At beginning. | Updated position. | No. | +| #exist? | Regexp. | Anywhere. | Updated position. | No. | +| #scan | Regexp or String. | At beginning. | Matched substring. | Yes. | +| #scan_until | Regexp. | Anywhere. | Substring. | Yes. | +| #skip | Regexp or String. | At beginning. | Match size. | Yes. | +| #skip_until | Regexp. | Anywhere. | Position delta. | Yes. | + +<br> + +Which matcher you choose will depend on: + +- Where you want to find a match: + + - Only at the beginning of the target substring: + #check, #match?, #scan, #skip. + - Anywhere in the target substring: + #check_until, #exist?, #scan_until, #skip_until. + +- Whether you want to: + + - Traverse, by advancing the positions: + #scan, #scan_until, #skip, #skip_until. + - Keep the positions unchanged: + #check, #check_until, #exist?, #match?. + +- What you want for the return value: + + - The matched substring: #check, #check_until, #scan, #scan_until. + - The updated position: #exist?, #match?. + - The position delta: #skip_until. + - The match size: #skip. + +### Match Values + +The <i>match values</i> in a `StringScanner` object +generally contain the results of the most recent attempted match. + +Each match value may be thought of as: + +* _Clear_: Initially, or after an unsuccessful match attempt: + usually, `false`, `nil`, or `{}`. +* _Set_: After a successful match attempt: + `true`, string, array, or hash. + +Each of these methods clears match values: + +- ::new(string). +- \#reset. +- \#terminate. + +Each of these methods attempts a match based on a pattern, +and either sets match values (if successful) or clears them (if not); + +- \#check(pattern) +- \#check_until(pattern) +- \#exist?(pattern) +- \#match?(pattern) +- \#scan(pattern) +- \#scan_until(pattern) +- \#skip(pattern) +- \#skip_until(pattern) + +#### Basic Match Values + +Basic match values are those not related to captures. + +Each of these methods returns a basic match value: + +| Method | Return After Match | Return After No Match | +|-----------------|----------------------------------------|-----------------------| +| #matched? | +true+. | +false+. | +| #matched_size | Size of matched substring. | +nil+. | +| #matched | Matched substring. | +nil+. | +| #pre_match | Substring preceding matched substring. | +nil+. | +| #post_match | Substring following matched substring. | +nil+. | + +<br> + +See examples below. + +#### Captured Match Values + +Captured match values are those related to [captures][16]. + +Each of these methods returns a captured match value: + +| Method | Return After Match | Return After No Match | +|-----------------|-----------------------------------------|-----------------------| +| #size | Count of captured substrings. | +nil+. | +| #[](n) | <tt>n</tt>th captured substring. | +nil+. | +| #captures | Array of all captured substrings. | +nil+. | +| #values_at(*n) | Array of specified captured substrings. | +nil+. | +| #named_captures | Hash of named captures. | <tt>{}</tt>. | + +<br> + +See examples below. + +#### Match Values Examples + +Successful basic match attempt (no captures): + +``` +scanner = StringScanner.new('foobarbaz') +scanner.exist?(/bar/) +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 3 +# pre_match: "foo" +# matched : "bar" +# post_match: "baz" +# Captured match values: +# size: 1 +# captures: [] +# named_captures: {} +# values_at: ["bar", nil] +# []: +# [0]: "bar" +# [1]: nil +``` + +Failed basic match attempt (no captures); + +``` +scanner = StringScanner.new('foobarbaz') +scanner.exist?(/nope/) +match_values_cleared?(scanner) # => true +``` + +Successful unnamed capture match attempt: + +``` +scanner = StringScanner.new('foobarbazbatbam') +scanner.exist?(/(foo)bar(baz)bat(bam)/) +put_match_values(scanner) +# Basic match values: +# matched?: true +# matched_size: 15 +# pre_match: "" +# matched : "foobarbazbatbam" +# post_match: "" +# Captured match values: +# size: 4 +# captures: ["foo", "baz", "bam"] +# named_captures: {} +# values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil] +# []: +# [0]: "foobarbazbatbam" +# [1]: "foo" +# [2]: "baz" +# [3]: "bam" +# [4]: nil +``` + +Successful named capture match attempt; +same as unnamed above, except for #named_captures: + +``` +scanner = StringScanner.new('foobarbazbatbam') +scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/) +scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"} +``` + +Failed unnamed capture match attempt: + +``` +scanner = StringScanner.new('somestring') +scanner.exist?(/(foo)bar(baz)bat(bam)/) +match_values_cleared?(scanner) # => true +``` + +Failed named capture match attempt; +same as unnamed above, except for #named_captures: + +``` +scanner = StringScanner.new('somestring') +scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/) +match_values_cleared?(scanner) # => false +scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil} +``` + +## Fixed-Anchor Property + +Pattern matching in `StringScanner` is the same as in Ruby's, +except for its fixed-anchor property, +which determines the meaning of `'\A'`: + +* `false` (the default): matches the current byte position. + + ``` + scanner = StringScanner.new('foobar') + scanner.scan(/\A./) # => "f" + scanner.scan(/\A./) # => "o" + scanner.scan(/\A./) # => "o" + scanner.scan(/\A./) # => "b" + ``` + +* `true`: matches the beginning of the target substring; + never matches unless the byte position is zero: + + ``` + scanner = StringScanner.new('foobar', fixed_anchor: true) + scanner.scan(/\A./) # => "f" + scanner.scan(/\A./) # => nil + scanner.reset + scanner.scan(/\A./) # => "f" + ``` + +The fixed-anchor property is set when the `StringScanner` object is created, +and may not be modified +(see StringScanner.new); +method #fixed_anchor? returns the setting. + diff --git a/doc/syntax/literals.rdoc b/doc/syntax/literals.rdoc index 0c1e4a434b..6d681419a2 100644 --- a/doc/syntax/literals.rdoc +++ b/doc/syntax/literals.rdoc @@ -138,19 +138,18 @@ Also \Rational numbers may be imaginary numbers. == Strings -=== \String Literals - -The most common way of writing strings is using <tt>"</tt>: - - "This is a string." - -The string may be many lines long. - -Any internal <tt>"</tt> must be escaped: - - "This string has a quote: \". As you can see, it is escaped" - -Double-quote strings allow escaped characters such as <tt>\n</tt> for +=== Escape Sequences + +Some characters can be represented as escape sequences in +double-quoted strings, +character literals, +here document literals (non-quoted, double-quoted, and with backticks), +double-quoted symbols, +double-quoted symbol keys in Hash literals, +Regexp literals, and +several percent literals (<tt>%</tt>, <tt>%Q,</tt> <tt>%W</tt>, <tt>%I</tt>, <tt>%r</tt>, <tt>%x</tt>). + +They allow escape sequences such as <tt>\n</tt> for newline, <tt>\t</tt> for tab, etc. The full list of supported escape sequences are as follows: @@ -174,11 +173,31 @@ sequences are as follows: \M-\cx same as above \c\M-x same as above \c? or \C-? delete, ASCII 7Fh (DEL) + \<newline> continuation line (empty string) + +The last one, <tt>\<newline></tt>, represents an empty string instead of a character. +It is used to fold a line in a string. + +=== Double-quoted \String Literals -Any other character following a backslash is interpreted as the +The most common way of writing strings is using <tt>"</tt>: + + "This is a string." + +The string may be many lines long. + +Any internal <tt>"</tt> must be escaped: + + "This string has a quote: \". As you can see, it is escaped" + +Double-quoted strings allow escape sequences described in +{Escape Sequences}[#label-Escape+Sequences]. + +In a double-quoted string, +any other character following a backslash is interpreted as the character itself. -Double-quote strings allow interpolation of other values using +Double-quoted strings allow interpolation of other values using <tt>#{...}</tt>: "One plus one is two: #{1 + 1}" @@ -190,8 +209,14 @@ You can also use <tt>#@foo</tt>, <tt>#@@foo</tt> and <tt>#$foo</tt> as a shorthand for, respectively, <tt>#{ @foo }</tt>, <tt>#{ @@foo }</tt> and <tt>#{ $foo }</tt>. +See also: + +* {% and %Q: Interpolable String Literals}[#label-25+and+-25Q-3A+Interpolable+String+Literals] + +=== Single-quoted \String Literals + Interpolation may be disabled by escaping the "#" character or using -single-quote strings: +single-quoted strings: '#{1 + 1}' #=> "\#{1 + 1}" @@ -199,6 +224,16 @@ In addition to disabling interpolation, single-quoted strings also disable all escape sequences except for the single-quote (<tt>\'</tt>) and backslash (<tt>\\\\</tt>). +In a single-quoted string, +any other character following a backslash is interpreted as is: +a backslash and the character itself. + +See also: + +* {%q: Non-Interpolable String Literals}[#label-25q-3A+Non-Interpolable+String+Literals] + +=== Literal String Concatenation + Adjacent string literals are automatically concatenated by the interpreter: "con" "cat" "en" "at" "ion" #=> "concatenation" @@ -211,10 +246,12 @@ be concatenated as long as a percent-string is not last. %q{a} 'b' "c" #=> "abc" "a" 'b' %q{c} #=> NameError: uninitialized constant q +=== Character Literal + There is also a character literal notation to represent single character strings, which syntax is a question mark (<tt>?</tt>) -followed by a single character or escape sequence that corresponds to -a single codepoint in the script encoding: +followed by a single character or escape sequence (except continuation line) +that corresponds to a single codepoint in the script encoding: ?a #=> "a" ?abc #=> SyntaxError @@ -228,11 +265,6 @@ a single codepoint in the script encoding: ?\C-\M-a #=> "\x81", same as above ?あ #=> "あ" -See also: - -* {%q: Non-Interpolable String Literals}[#label-25q-3A+Non-Interpolable+String+Literals] -* {% and %Q: Interpolable String Literals}[#label-25+and+-25Q-3A+Interpolable+String+Literals] - === Here Document Literals If you are writing a large block of text you may use a "here document" or @@ -283,9 +315,10 @@ its end is a multiple of eight. The amount to be removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that tab is not removed. -A heredoc allows interpolation and escaped characters. You may disable -interpolation and escaping by surrounding the opening identifier with single -quotes: +A heredoc allows interpolation and the escape sequences described in +{Escape Sequences}[#label-Escape+Sequences]. +You may disable interpolation and the escaping by surrounding the opening +identifier with single quotes: expected_result = <<-'EXPECTED' One plus one is #{1 + 1} @@ -326,12 +359,15 @@ details on what symbols are and when ruby creates them internally. You may reference a symbol using a colon: <tt>:my_symbol</tt>. -You may also create symbols by interpolation: +You may also create symbols by interpolation and escape sequences described in +{Escape Sequences}[#label-Escape+Sequences] with double-quotes: :"my_symbol1" :"my_symbol#{1 + 1}" + :"foo\sbar" -Like strings, a single-quote may be used to disable interpolation: +Like strings, a single-quote may be used to disable interpolation and +escape sequences: :'my_symbol#{1 + 1}' #=> :"my_symbol\#{1 + 1}" @@ -451,7 +487,12 @@ may use these paired delimiters: * <tt>(</tt> and <tt>)</tt>. * <tt>{</tt> and <tt>}</tt>. * <tt><</tt> and <tt>></tt>. -* Any other character, as both beginning and ending delimiters. +* Non-alphanumeric ASCII character except above, as both beginning and ending delimiters. + +The delimiters can be escaped with a backslash. +However, the first four pairs (brackets, parenthesis, braces, and +angle brackets) are allowed without backslash as far as they are correctly +paired. These are demonstrated in the next section. @@ -460,13 +501,20 @@ These are demonstrated in the next section. You can write a non-interpolable string with <tt>%q</tt>. The created string is the same as if you created it with single quotes: - %[foo bar baz] # => "foo bar baz" # Using []. - %(foo bar baz) # => "foo bar baz" # Using (). - %{foo bar baz} # => "foo bar baz" # Using {}. - %<foo bar baz> # => "foo bar baz" # Using <>. - %|foo bar baz| # => "foo bar baz" # Using two |. - %:foo bar baz: # => "foo bar baz" # Using two :. + %q[foo bar baz] # => "foo bar baz" # Using []. + %q(foo bar baz) # => "foo bar baz" # Using (). + %q{foo bar baz} # => "foo bar baz" # Using {}. + %q<foo bar baz> # => "foo bar baz" # Using <>. + %q|foo bar baz| # => "foo bar baz" # Using two |. + %q:foo bar baz: # => "foo bar baz" # Using two :. %q(1 + 1 is #{1 + 1}) # => "1 + 1 is \#{1 + 1}" # No interpolation. + %q[foo[bar]baz] # => "foo[bar]baz" # brackets can be nested. + %q(foo(bar)baz) # => "foo(bar)baz" # parenthesis can be nested. + %q{foo{bar}baz} # => "foo{bar}baz" # braces can be nested. + %q<foo<bar>baz> # => "foo<bar>baz" # angle brackets can be nested. + +This is similar to single-quoted string but only backslashs and +the specified delimiters can be escaped with a backslash. === <tt>% and %Q</tt>: Interpolable String Literals @@ -476,30 +524,63 @@ or with its alias <tt>%</tt>: %[foo bar baz] # => "foo bar baz" %(1 + 1 is #{1 + 1}) # => "1 + 1 is 2" # Interpolation. +This is similar to double-quoted string. +It allow escape sequences described in +{Escape Sequences}[#label-Escape+Sequences]. +Other escaped characters (a backslash followed by a character) are +interpreted as the character. + === <tt>%w and %W</tt>: String-Array Literals -You can write an array of strings with <tt>%w</tt> (non-interpolable) -or <tt>%W</tt> (interpolable): +You can write an array of strings as whitespace-separated words +with <tt>%w</tt> (non-interpolable) or <tt>%W</tt> (interpolable): %w[foo bar baz] # => ["foo", "bar", "baz"] %w[1 % *] # => ["1", "%", "*"] # Use backslash to embed spaces in the strings. %w[foo\ bar baz\ bat] # => ["foo bar", "baz bat"] + %W[foo\ bar baz\ bat] # => ["foo bar", "baz bat"] %w(#{1 + 1}) # => ["\#{1", "+", "1}"] %W(#{1 + 1}) # => ["2"] + # The nested delimiters evaluated to a flat array of strings + # (not nested array). + %w[foo[bar baz]qux] # => ["foo[bar", "baz]qux"] + +The following characters are considered as white spaces to separate words: + +* space, ASCII 20h (SPC) +* form feed, ASCII 0Ch (FF) +* newline (line feed), ASCII 0Ah (LF) +* carriage return, ASCII 0Dh (CR) +* horizontal tab, ASCII 09h (TAB) +* vertical tab, ASCII 0Bh (VT) + +The white space characters can be escaped with a backslash to make them +part of a word. + +<tt>%W</tt> allow escape sequences described in +{Escape Sequences}[#label-Escape+Sequences]. +However the continuation line <tt>\<newline></tt> is not usable because +it is interpreted as the escaped newline described above. + === <tt>%i and %I</tt>: Symbol-Array Literals -You can write an array of symbols with <tt>%i</tt> (non-interpolable) -or <tt>%I</tt> (interpolable): +You can write an array of symbols as whitespace-separated words +with <tt>%i</tt> (non-interpolable) or <tt>%I</tt> (interpolable): %i[foo bar baz] # => [:foo, :bar, :baz] %i[1 % *] # => [:"1", :%, :*] # Use backslash to embed spaces in the symbols. %i[foo\ bar baz\ bat] # => [:"foo bar", :"baz bat"] + %I[foo\ bar baz\ bat] # => [:"foo bar", :"baz bat"] %i(#{1 + 1}) # => [:"\#{1", :+, :"1}"] %I(#{1 + 1}) # => [:"2"] +The white space characters and its escapes are interpreted as the same as +string-array literals described in +{%w and %W: String-Array Literals}[#label-25w+and+-25W-3A+String-Array+Literals]. + === <tt>%s</tt>: Symbol Literals You can write a symbol with <tt>%s</tt>: @@ -507,6 +588,10 @@ You can write a symbol with <tt>%s</tt>: %s[foo] # => :foo %s[foo bar] # => :"foo bar" +This is non-interpolable. +No interpolation allowed. +Only backslashs and the specified delimiters can be escaped with a backslash. + === <tt>%r</tt>: Regexp Literals You can write a regular expression with <tt>%r</tt>; @@ -531,4 +616,10 @@ See {Regexp modes}[rdoc-ref:Regexp@Modes] for details. You can write and execute a shell command with <tt>%x</tt>: - %x(echo 1) # => "1\n" + %x(echo 1) # => "1\n" + %x[echo #{1 + 2}] # => "3\n" + %x[echo \u0030] # => "0\n" + +This is interpolable. +<tt>%x</tt> allow escape sequences described in +{Escape Sequences}[#label-Escape+Sequences]. diff --git a/doc/syntax/pattern_matching.rdoc b/doc/syntax/pattern_matching.rdoc index e49c09a1f8..6a30380f46 100644 --- a/doc/syntax/pattern_matching.rdoc +++ b/doc/syntax/pattern_matching.rdoc @@ -422,7 +422,8 @@ These core and library classes implement deconstruction: == Guard clauses -+if+ can be used to attach an additional condition (guard clause) when the pattern matches. This condition may use bound variables: ++if+ can be used to attach an additional condition (guard clause) when the pattern matches in +case+/+in+ expressions. +This condition may use bound variables: case [1, 2] in a, b if b == a*2 @@ -450,6 +451,11 @@ These core and library classes implement deconstruction: end #=> "matched" +Note that <code>=></code> and +in+ operator can not have a guard clause. +The following examples is parsed as a standalone expression with modifier +if+. + + [1, 2] in a, b if b == a*2 + == Appendix A. Pattern syntax Approximate syntax is: |