<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ruby.git/enc/unicode.c, branch v4.0.3</title>
<subtitle>The Ruby Programming Language</subtitle>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/'/>
<entry>
<title>Avoid negative character</title>
<updated>2025-10-31T11:49:59+00:00</updated>
<author>
<name>K.Takata</name>
<email>kentkt@csc.jp</email>
</author>
<published>2019-01-25T09:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=54b963956b65f8333886e6afe4fb6d73e250148f'/>
<id>54b963956b65f8333886e6afe4fb6d73e250148f</id>
<content type='text'>
Better fix for k-takata/Onigmo#107.

https://github.com/k-takata/Onigmo/commit/85393e4a63223b538529e7095255ce1153c09cff
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Better fix for k-takata/Onigmo#107.

https://github.com/k-takata/Onigmo/commit/85393e4a63223b538529e7095255ce1153c09cff
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix lgtm.com warnings</title>
<updated>2025-10-31T11:49:59+00:00</updated>
<author>
<name>K.Takata</name>
<email>kentkt@csc.jp</email>
</author>
<published>2019-01-25T09:56:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=daf0d6c686e77f6d4d561ba2350f05be28a90ed4'/>
<id>daf0d6c686e77f6d4d561ba2350f05be28a90ed4</id>
<content type='text'>
* Multiplication result may overflow 'int' before it is converted to
  'OnigDistance'.
* Comparison is always true because code &lt;= 122.
* This statement makes ExprStmt unreachable.
* Empty block without comment

https://github.com/k-takata/Onigmo/commit/387ad616c3cb9370f99d2b11198c2135fa07030f
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Multiplication result may overflow 'int' before it is converted to
  'OnigDistance'.
* Comparison is always true because code &lt;= 122.
* This statement makes ExprStmt unreachable.
* Empty block without comment

https://github.com/k-takata/Onigmo/commit/387ad616c3cb9370f99d2b11198c2135fa07030f
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Encoding::UNICODE_VERSION constant</title>
<updated>2025-04-23T05:14:36+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2025-04-23T02:22:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=b4417ff66511ef94a80a3b49ba184603b8e85a1b'/>
<id>b4417ff66511ef94a80a3b49ba184603b8e85a1b</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Suppress warnings by gcc 10.1.0-RC-20200430</title>
<updated>2020-05-04T03:28:24+00:00</updated>
<author>
<name>Nobuyoshi Nakada</name>
<email>nobu@ruby-lang.org</email>
</author>
<published>2020-05-04T03:10:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=b7e1eda932c74196d58e6b63644200b764b5453e'/>
<id>b7e1eda932c74196d58e6b63644200b764b5453e</id>
<content type='text'>
* Folding results should not be empty.

  If `OnigCodePointCount(to-&gt;n)` were 0, `for` loop using `fn`
  wouldn't execute and `ncs` elements are not initialized.

  ```
  enc/unicode.c:557:21: warning: 'ncs[0]' may be used uninitialized in this function [-Wmaybe-uninitialized]
    557 |  for (i = 0; i &lt; ncs[0]; i++) {
        |                  ~~~^~~
  ```

* Cast to `enum yytokentype`

  Additional enums for scanner events by ripper are not included
  in `yytokentype`.

  ```
  ripper.y:7274:28: warning: implicit conversion from 'enum &lt;anonymous&gt;' to 'enum yytokentype' [-Wenum-conversion]
  ```
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Folding results should not be empty.

  If `OnigCodePointCount(to-&gt;n)` were 0, `for` loop using `fn`
  wouldn't execute and `ncs` elements are not initialized.

  ```
  enc/unicode.c:557:21: warning: 'ncs[0]' may be used uninitialized in this function [-Wmaybe-uninitialized]
    557 |  for (i = 0; i &lt; ncs[0]; i++) {
        |                  ~~~^~~
  ```

* Cast to `enum yytokentype`

  Additional enums for scanner events by ripper are not included
  in `yytokentype`.

  ```
  ripper.y:7274:28: warning: implicit conversion from 'enum &lt;anonymous&gt;' to 'enum yytokentype' [-Wenum-conversion]
  ```
</pre>
</div>
</content>
</entry>
<entry>
<title>implement special behavior for Georgian for String#capitalize</title>
<updated>2018-12-09T23:14:29+00:00</updated>
<author>
<name>duerst</name>
<email>duerst@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-12-09T23:14:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=3628eae2e754a7489feebc6f41371d42d2efcf3c'/>
<id>3628eae2e754a7489feebc6f41371d42d2efcf3c</id>
<content type='text'>
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
<entry>
<title>remove obsolete data from unicode.c</title>
<updated>2018-12-06T00:05:08+00:00</updated>
<author>
<name>duerst</name>
<email>duerst@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-12-06T00:05:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=e824e21beb4135f76a6e0f1e51ad578b53d53847'/>
<id>e824e21beb4135f76a6e0f1e51ad578b53d53847</id>
<content type='text'>
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
  onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
  because they are not needed anymore for Unicode 11.0.0.

* regparse.c: Remove external declarations for above arrays.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
  onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
  because they are not needed anymore for Unicode 11.0.0.

* regparse.c: Remove external declarations for above arrays.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
<entry>
<title>solve the genie/zombie/wrestlers bug</title>
<updated>2018-12-02T10:07:42+00:00</updated>
<author>
<name>duerst</name>
<email>duerst@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-12-02T10:07:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=a96a594f9978b28d2d374f4a0fc15f5a2224df9b'/>
<id>a96a594f9978b28d2d374f4a0fc15f5a2224df9b</id>
<content type='text'>
enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF
                 to onigenc_unicode_GCB_ranges_E_Base.
               - Add comments with character names.
test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers.
This closes issue #15343.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF
                 to onigenc_unicode_GCB_ranges_E_Base.
               - Add comments with character names.
test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers.
This closes issue #15343.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
<entry>
<title>Added words in the comment at r65088 [ci skip]</title>
<updated>2018-11-30T07:19:49+00:00</updated>
<author>
<name>nobu</name>
<email>nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-11-30T07:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=26771cadc09941ce75cd213f24a5cc9fa2922591'/>
<id>26771cadc09941ce75cd213f24a5cc9fa2922591</id>
<content type='text'>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66103 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66103 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
<entry>
<title>deal with ONIGENC_CASE_IS_TITLECASE flag on lowercase characters</title>
<updated>2018-11-25T10:12:45+00:00</updated>
<author>
<name>duerst</name>
<email>duerst@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-11-25T10:12:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=fc6243a6a6ef4fa1a241169342ad786dd148e3c7'/>
<id>fc6243a6a6ef4fa1a241169342ad786dd148e3c7</id>
<content type='text'>
In the function onigenc_unicode_case_map() in enc/unicode.c, deal
with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on
lowercase characters. This is in preparation for Georgian Mtavruli,
which are uppercase but not titlecase, in Unicode 11.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the function onigenc_unicode_case_map() in enc/unicode.c, deal
with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on
lowercase characters. This is in preparation for Georgian Mtavruli,
which are uppercase but not titlecase, in Unicode 11.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
<entry>
<title>enc/unicode.c: 'a' is bigger than 'A'</title>
<updated>2018-11-16T02:34:00+00:00</updated>
<author>
<name>shyouhei</name>
<email>shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e</email>
</author>
<published>2018-11-16T02:34:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.ruby-lang.org/ruby.git/commit/?id=6732423b5eb7191e81a23fe929926d50e0e4b39f'/>
<id>6732423b5eb7191e81a23fe929926d50e0e4b39f</id>
<content type='text'>
In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative
number (-32, to be precise). In C, the type of 'a' and 'A' are signed
int (cf: ISO/IEC 9899:1990 section 6.1.3.4). So 'A' - 'a' is also a
signed int. It is `(signed int)-32`.

The problem is, OnigCodePoint is unsigned int. Adding a negative
number to a variable of OnigCodepoint (`code` here) introduces an
unintentional cast of `(unsigned)(signed)-32`, which is
4,294,967,264. Adding this value to code then overflows, and the
result eventually becomes normal codepoint.

The series of operations are not a serious problem but because
`code &gt;= 'a'` holds, we can `(code - 'a') + 'A'` to reroute this.

See also: https://github.com/k-takata/Onigmo/pull/107


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative
number (-32, to be precise). In C, the type of 'a' and 'A' are signed
int (cf: ISO/IEC 9899:1990 section 6.1.3.4). So 'A' - 'a' is also a
signed int. It is `(signed int)-32`.

The problem is, OnigCodePoint is unsigned int. Adding a negative
number to a variable of OnigCodepoint (`code` here) introduces an
unintentional cast of `(unsigned)(signed)-32`, which is
4,294,967,264. Adding this value to code then overflows, and the
result eventually becomes normal codepoint.

The series of operations are not a serious problem but because
`code &gt;= 'a'` holds, we can `(code - 'a') + 'A'` to reroute this.

See also: https://github.com/k-takata/Onigmo/pull/107


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
</pre>
</div>
</content>
</entry>
</feed>
