summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBurdette Lamar <BurdetteLamar@Yahoo.com>2022-02-24 14:10:49 -0600
committerGitHub <noreply@github.com>2022-02-24 14:10:49 -0600
commitc19a631c994e3745e821a87cc7eca3f02c33bda7 (patch)
treef7b77b05c2c426b5d4cbfd832a6d829cebd05ab2
parentfc7e42a4731b274e5e732de83b720424ba65df59 (diff)
[DOC] Enhancements for encoding.rdoc (#5578)
Adds sections: String Encoding Symbol and Regexp Encodings Filesystem Encoding Locale Encoding IO Encodings External Encoding Internal Encoding Script Encoding Transcoding Transcoding a String
Notes
Notes: Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
-rw-r--r--doc/encoding.rdoc170
1 files changed, 169 insertions, 1 deletions
diff --git a/doc/encoding.rdoc b/doc/encoding.rdoc
index 6f663b14cd..490066b5df 100644
--- a/doc/encoding.rdoc
+++ b/doc/encoding.rdoc
@@ -132,7 +132,175 @@ returns the \Encoding of the concatenated string, or +nil+ if incompatible:
s1 = "\xa1\xa1".force_encoding('euc-jp') # => "\x{A1A1}"
Encoding.compatible?(s0, s1) # => nil
-==== \Encoding Options
+=== \String \Encoding
+
+A Ruby String object has an encoding that is an instance of class \Encoding.
+The encoding may be retrieved by method String#encoding.
+
+The default encoding for a string literal is the script encoding
+(see Encoding@Script+encoding):
+
+ 's'.encoding # => #<Encoding:UTF-8>
+
+The default encoding for a string created with method String.new is:
+
+- For a \String object argument, the encoding of that string.
+- For a string literal, the script encoding (see Encoding@Script+encoding).
+
+In either case, any encoding may be specified:
+
+ s = String.new(encoding: 'UTF-8') # => ""
+ s.encoding # => #<Encoding:UTF-8>
+ s = String.new('foo', encoding: 'ASCII-8BIT') # => "foo"
+ s.encoding # => #<Encoding:ASCII-8BIT>
+
+The encoding for a string may be changed:
+
+ s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé"
+ s.encoding # => #<Encoding:UTF-8>
+ s.force_encoding('ISO-8859-1') # => "R\xC3\xA9sum\xC3\xA9"
+ s.encoding # => #<Encoding:ISO-8859-1>
+
+Changing the assigned encoding does not alter the content of the string;
+it changes only the way the content is to be interpreted:
+
+ s # => "R\xC3\xA9sum\xC3\xA9"
+ s.force_encoding('UTF-8') # => "Résumé"
+
+The actual content of a string may also be altered;
+see {Transcoding a String}[#label-Transcoding+a+String].
+
+Here are a couple of useful query methods:
+
+ s = "abc".force_encoding("UTF-8") # => "abc"
+ s.ascii_only? # => true
+ s = "abc\u{6666}".force_encoding("UTF-8") # => "abc晦"
+ s.ascii_only? # => false
+
+ s = "\xc2\xa1".force_encoding("UTF-8") # => "¡"
+ s.valid_encoding? # => true
+ s = "\xc2".force_encoding("UTF-8") # => "\xC2"
+ s.valid_encoding? # => false
+
+=== \Symbol and \Regexp Encodings
+
+The string stored in a Symbol or Regexp object also has an encoding;
+the encoding may be retrieved by method Symbol#encoding or Regexp#encoding.
+
+The default encoding for these, however, is:
+
+- US-ASCII, if all characters are US-ASCII.
+- The script encoding, otherwise (see Encoding@Script+encoding).
+
+=== Filesystem \Encoding
+
+The filesystem encoding is the default \Encoding for a string from the filesystem:
+
+ Encoding.find("filesystem") # => #<Encoding:UTF-8>
+
+=== Locale \Encoding
+
+The locale encoding is the default encoding for a string from the environment,
+other than from the filesystem:
+
+ Encoding.find('locale') # => #<Encoding:IBM437>
+
+=== \IO Encodings
+
+An IO object (an input/output stream), and by inheritance a File object,
+has at least one, and sometimes two, encodings:
+
+- Its _external_ _encoding_ identifies the encoding of the stream.
+- Its _internal_ _encoding_, if not +nil+, specifies the encoding
+ to be used for the string constructed from the stream.
+
+==== External \Encoding
+
+Bytes read from the stream are decoded into characters via the external encoding;
+by default (that is, if the internal encoding is +nil),
+those characters become a string whose encoding is set to the external encoding.
+
+The default external encoding is:
+
+- UTF-8 for a text stream.
+- ASCII-8BIT for a binary stream.
+
+ f = File.open('t.rus', 'rb')
+ f.external_encoding # => #<Encoding:ASCII-8BIT>
+
+The external encoding may be set by the open option +external_encoding+:
+
+ f = File.open('t.txt', external_encoding: 'ASCII-8BIT')
+ f.external_encoding # => #<Encoding:ASCII-8BIT>
+
+The external encoding may also set by method #set_encoding:
+
+ f = File.open('t.txt')
+ f.set_encoding('ASCII-8BIT')
+ f.external_encoding # => #<Encoding:ASCII-8BIT>
+
+==== Internal \Encoding
+
+If not +nil+, the internal encoding specifies that the characters read
+from the stream are to be converted to characters in the internal encoding;
+those characters become a string whose encoding is set to the internal encoding.
+
+The default internal encoding is +nil+ (no conversion).
+The internal encoding may set by the open option +internal_encoding+:
+
+ f = File.open('t.txt', internal_encoding: 'ASCII-8BIT')
+ f.internal_encoding # => #<Encoding:ASCII-8BIT>
+
+The internal encoding may also set by method #set_encoding:
+
+ f = File.open('t.txt')
+ f.set_encoding('UTF-8', 'ASCII-8BIT')
+ f.internal_encoding # => #<Encoding:ASCII-8BIT>
+
+=== Script \Encoding
+
+A Ruby script has a script encoding, which may be retrieved by:
+
+ __ENCODING__ # => #<Encoding:UTF-8>
+
+The default script encoding is UTF-8;
+a Ruby source file may set its script encoding with a magic comment
+on the first line of the file (or second line, if there is a shebang on the first).
+The comment must contain the word +coding+ or +encoding+,
+followed by a colon, space and the Encoding name or alias:
+
+ # encoding: ISO-8859-1
+ __ENCODING__ #=> #<Encoding:ISO-8859-1>
+
+=== Transcoding
+
+_Transcoding_ is the process of revising the content of a string or stream
+by changing its encoding.
+
+==== Transcoding a \String
+
+Each of these methods transcodes a string:
+
+String#encode :: Transcodes a string into a new string
+ according to a given destination encoding,
+ a given or default source encoding, and encoding options.
+
+String#encode! :: Like String#encode,
+ but transcodes the string in place.
+
+String#scrub :: Transcodes a string into a new string
+ by replacing invalid byte sequences
+ with a given or default replacement string.
+
+String#scrub! :: Like String#scrub, but transcodes the string in place.
+
+String#unicode_normalize :: Transcodes a string into a new string
+ according to Unicode normalization:
+
+String#unicode_normalize! :: Like String#unicode_normalize,
+ but transcodes the string in place.
+
+=== \Encoding Options
A number of methods in the Ruby core accept keyword arguments as encoding options.