ruby.git/lib/unicode_normalize, branch ruby_2_7

update to Unicode Version 12.1.0 (beta)

2019-04-05T00:58:51+00:00

Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA,
for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on
May 7th, so we go with the beta version because further changes in the data we
need are highly unlikely, and we want to make sure Ruby is ready for the new era.

* common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES

* enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h:
  add directory and generated data files for new version

* lib/unicode_normalize/tables.rb: update for new character

* test/ruby/test_regexp.rb: add test for character property age=12.1

* test/test_unicode_normalize.rb: add test for NFKC decomposition of new character

This (mostly) completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

change lib/unicode_normalize/tables.rb to single item per line to make diffs shorter

2019-04-04T23:40:48+00:00

* template/unicode_norm_gen.tmpl: Change formatting of output to produce only a
  single item (or range) for each line to make future diffs shorter and easier
  to understand and check.

* lib/unicode_normalize/tables.rb: output of the above

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67439 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

update Unicode version (and Emoji version) to 12.0.0

2019-03-06T01:55:19+00:00

- common.mk: set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0

- lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0

- enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated
  files for Unicode version 12.0.0

This is the main commit for #15321.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

update to Unicode 11.0.0 (main step, not complete yet)

2018-12-05T08:10:24+00:00

- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

lib/*: Prefer require_relative over require, remove explicit extension

2018-11-02T17:52:43+00:00

[#15206] [Fix GH-1976]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65506 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

revert r65091, r65090 because ci fails

2018-10-16T07:53:37+00:00

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

update to Unicode 11.0.0 (basic step, not complete yet)

2018-10-16T07:01:55+00:00

- common.mk: Change Unicode version to 11.0.0
- enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with
  Gregorian Mtavruli. This should bring us up to the same level as e.g.
  Python 3.7, by following the Unicode tables exactly. But it will
  produce undesirable (mixed-case) results for String#capitalize.
  This will be addressed in a later commit.
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files.
- lib/unicode_normalize/tables.rb: Updated table.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

fix range check for Hangul jamo trailers in Unicode normalization

2018-07-28T09:44:33+00:00

* lib/unicode_normalize/normalize.rb: Fix the range check for trailing
  Hangul jamo characters in Unicode normalization. Different from
  leading or vowel jamos, where LBASE and VBASE are actual characters,
  a value equal to TBASE expresses the absence of a trailing jamo.
  This fix is technically correct, but there was no bug because
  the regular expressions in lib/unicode_normalize/tables.rb
  eliminate jamos equal to TBASE from normalization processing.

* test/test_unicode_normalize.rb: Add preventive test
  test_no_trailing_jamo based on
  https://github.com/python/cpython/commit/d134809cd3764c6a634eab7bb8995e3e2eff14d5
  just for the case we ever get a regression.

This closes issue #14934, thanks to MaLin (Lin Ma) for reporting.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

update Ruby to Unicode 10.0.0

2017-09-06T07:56:41+00:00

- In common.mk, set UNICODE_VERSION  to 10.0.0
- Generate and add enc/unicode/10.0.0/casefold.h and
  enc/unicode/10.0.0/name2ctype.h
- Update lib/unicode_normalize/tables.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59759 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

add explanations about status of module UnicodeNormalize

2017-05-09T10:45:46+00:00

In lib/unicode_normalize/normalize.rb, add explanations and clarifications
about the status of the files and the module. [ci skip]
This is in response to discussions at https://github.com/ruby/spec/pull/433
and https://bugs.ruby-lang.org/issues/5481#note-58.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58617 b2dd03c8-39d4-4d8f-98ff-823fe69b080e