ruby.git/enc, branch ruby_2_6

merge revision(s) 67439,67441,67453,67476: [Backport #15740]

2019-04-13T15:01:39+00:00

        change lib/unicode_normalize/tables.rb to single item per line to make diffs shorter

        * template/unicode_norm_gen.tmpl: Change formatting of output to produce only a
          single item (or range) for each line to make future diffs shorter and easier
          to understand and check.

        * lib/unicode_normalize/tables.rb: output of the above

        update to Unicode Version 12.1.0 (beta)

        Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA,
        for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on
        May 7th, so we go with the beta version because further changes in the data we
        need are highly unlikely, and we want to make sure Ruby is ready for the new era.

        * common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES

        * enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h:
          add directory and generated data files for new version

        * lib/unicode_normalize/tables.rb: update for new character

        * test/ruby/test_regexp.rb: add test for character property age=12.1

        * test/test_unicode_normalize.rb: add test for NFKC decomposition of new character

        This (mostly) completes issue #15195.

        remove Unicode 12.0.0 related directory and generated files


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_6@67525 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

merge revision(s) 67169,67173,67174: [Backport #15641]

2019-03-06T06:36:32+00:00

	update Unicode version (and Emoji version) to 12.0.0

	- common.mk: set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0

	- lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0

	- enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated
	  files for Unicode version 12.0.0

	This is the main commit for #15321.

	add news about Unicode version update (issue #15321) to NEWS [ci skip]

	delete directory and files related to Unicode version 11.0.0

	this completes and closes feature #15321

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_6@67175 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

implement special behavior for Georgian for String#capitalize

2018-12-09T23:14:29+00:00

The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

delete Unicode 10.0.0 related files, no longer needed [#14802]

2018-12-09T02:02:45+00:00

This line, and those below, will be ignored--

D    enc/unicode/10.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

remove obsolete data from unicode.c

2018-12-06T00:05:08+00:00

* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
  onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
  because they are not needed anymore for Unicode 11.0.0.

* regparse.c: Remove external declarations for above arrays.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

update to Unicode 11.0.0 (main step, not complete yet)

2018-12-05T08:10:24+00:00

- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

solve the genie/zombie/wrestlers bug

2018-12-02T10:07:42+00:00

enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF
                 to onigenc_unicode_GCB_ranges_E_Base.
               - Add comments with character names.
test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers.
This closes issue #15343.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Added words in the comment at r65088 [ci skip]

2018-11-30T07:19:49+00:00

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66103 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Embed the Emoji version

2018-11-27T06:44:02+00:00

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

deal with ONIGENC_CASE_IS_TITLECASE flag on lowercase characters

2018-11-25T10:12:45+00:00

In the function onigenc_unicode_case_map() in enc/unicode.c, deal
with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on
lowercase characters. This is in preparation for Georgian Mtavruli,
which are uppercase but not titlecase, in Unicode 11.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e