summaryrefslogtreecommitdiff
path: root/doc/string/unicode_normalize.rdoc
blob: 5f733c0fb84f581957fcde284f7228bae35b045d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Returns a copy of +self+ with
{Unicode normalization}[https://unicode.org/reports/tr15] applied.

Argument +form+ must be one of the following symbols
(see {Unicode normalization forms}[https://unicode.org/reports/tr15/#Norm_Forms]):

- +:nfc+: Canonical decomposition, followed by canonical composition.
- +:nfd+: Canonical decomposition.
- +:nfkc+: Compatibility decomposition, followed by canonical composition.
- +:nfkd+: Compatibility decomposition.

The encoding of +self+ must be one of:

- <tt>Encoding::UTF_8</tt>.
- <tt>Encoding::UTF_16BE</tt>.
- <tt>Encoding::UTF_16LE</tt>.
- <tt>Encoding::UTF_32BE</tt>.
- <tt>Encoding::UTF_32LE</tt>.
- <tt>Encoding::GB18030</tt>.
- <tt>Encoding::UCS_2BE</tt>.
- <tt>Encoding::UCS_4BE</tt>.

Examples:

  "a\u0300".unicode_normalize       # => "à"  # Lowercase 'a' with grave accens.
  "a\u0300".unicode_normalize(:nfd) # => "à"  # Same.

Related: see {Converting to New String}[rdoc-ref:String@Converting+to+New+String].