path: root/enc
AgeCommit message (Collapse)Author
2020-11-22Add string encoding IBM720 alias CP720 (#3803)Lars Kanis
The mapping table is generated from the ICU project: Fixes bug 16233 : Notes: Merged-By: nurse <>
2020-08-27sed -i '/rmodule.h/d'卜部昌平
Notes: Merged:
2020-08-27sed -i '/r_cast.h/d'卜部昌平
Notes: Merged:
2020-08-27sed -i '\,2/extern.h,d'卜部昌平
Notes: Merged:
2020-07-28Use https instead of httpKazuhiro NISHIYAMA
2020-07-10Encode ' as &apos; when using encode(xml: :attr)Jeremy Evans
Fixes [Bug #16922] Notes: Merged:
2020-05-11sed -i 's|ruby/impl|ruby/internal|'卜部昌平
To fix build failures. Notes: Merged:
2020-05-11sed -i s|ruby/3|ruby/impl|g卜部昌平
This shall fix compile errors. Notes: Merged:
2020-05-04Suppress warnings by gcc 10.1.0-RC-20200430Nobuyoshi Nakada
* Folding results should not be empty. If `OnigCodePointCount(to->n)` were 0, `for` loop using `fn` wouldn't execute and `ncs` elements are not initialized. ``` enc/unicode.c:557:21: warning: 'ncs[0]' may be used uninitialized in this function [-Wmaybe-uninitialized] 557 | for (i = 0; i < ncs[0]; i++) { | ~~~^~~ ``` * Cast to `enum yytokentype` Additional enums for scanner events by ripper are not included in `yytokentype`. ``` ripper.y:7274:28: warning: implicit conversion from 'enum <anonymous>' to 'enum yytokentype' [-Wenum-conversion] ```
2020-04-08Merge pull request #2991 from shyouhei/ruby.h卜部昌平
Split ruby.h Notes: Merged-By: shyouhei <>
2020-04-05Suppress warnings: reserved for numbered parameterKazuki Tsujimoto
2020-04-05Added tooldir variableNobuyoshi Nakada
Notes: Merged:
2020-02-07more on NULL versus functions.卜部昌平
Function pointers are not void*. See also ce4ea956d24eab5089a143bba38126f2b11b55b6 8427fca49bd85205f5a8766292dd893f003c0e48
2019-12-26update dependencies卜部昌平
Notes: Merged:
2019-12-26decouple internal.h headers卜部昌平
Saves comitters' daily life by avoid #include-ing everything from internal.h to make each file do so instead. This would significantly speed up incremental builds. We take the following inclusion order in this changeset: 1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very first thing among everything). 2. RUBY_EXTCONF_H if any. 3. Standard C headers, sorted alphabetically. 4. Other system headers, maybe guarded by #ifdef 5. Everything else, sorted alphabetically. Exceptions are those win32-related headers, which tend not be self- containing (headers have inclusion order dependencies). Notes: Merged:
2019-12-24enc/x_emoji.h: fixed dead-links [ci skip]Nobuyoshi Nakada
English version pages seem no longer provided.
2019-12-22Fixed misspellingsNobuyoshi Nakada
Fixed misspellings reported at [Bug #16437], missed and a new typo.
2019-11-18Update dependenciesNobuyoshi Nakada
2019-08-10Init function is need to link staticallyNobuyoshi Nakada
2019-08-10Removed unnecessary headersNobuyoshi Nakada
2019-08-10Use ENC_REPLICATE to copy an encodingNobuyoshi Nakada
2019-08-10Revert "Removed unused includes"Yusuke Endoh
This reverts commit c9eb8f82e9febeb634a23bec6aeea915eb25fe26. The change caused "implicit declaration" warning and actual segfault. ``` /tmp/ruby/v2/src/trunk-gc-asserts/enc/gb2312.c: In function ‘Init_gb2312’: /tmp/ruby/v2/src/trunk-gc-asserts/enc/gb2312.c:6:31: warning: implicit declaration of function ‘rb_enc_find’ [-Wimplicit-function-declaration] rb_enc_register("GB2312", rb_enc_find("EUC-KR")); ^~~~~~~~~~~ /tmp/ruby/v2/src/trunk-gc-asserts/enc/gb2312.c:6:31: warning: passing argument 2 of ‘rb_enc_register’ makes pointer from integer without a cast [-Wint-conversion] <command-line>:0:19: note: expected ‘OnigEncoding {aka const struct OnigEncodingTypeST *}’ but argument is of type ‘int’ /tmp/ruby/v2/src/trunk-gc-asserts/regenc.h:231:12: note: in expansion of macro ‘ONIG_ENC_REGISTER’ extern int ONIG_ENC_REGISTER(const char *, OnigEncoding); ^~~~~~~~~~~~~~~~~ ```
2019-08-09Removed unused includesNobuyoshi Nakada
2019-07-14Include ruby/assert.h in ruby/ruby.h so that assertions can be thereNobuyoshi Nakada
2019-07-14Update dependencies for 369ff79394765ce198ac7cee872a8c739d895aaaTakashi Kokubun
Just copy-pasting diff from
2019-07-14add encoding conversion from/to CESU-8Martin Dürst
Add encoding conversion (transcoding) from UTF-8 to CESU-8 and back. CESU-8 is an encoding similar to UTF-8, but encodes codepoints above U+FFFF as two surrogates, these surrogates again being encoded as if they were UTF-8 codepoints. This preserves the same binary sorting order as in UTF-16. It is also somewhat similar (although not exactly identical) to an encoding used internally by Java. This completes issue #15995. enc/trans/cesu_8.trans: Add encoding conversion from/to CESU-8 test/ruby/test_transcode.rb: Add tests for above
2019-07-09Update dependenciesNobuyoshi Nakada
2019-06-24remove UNREACHABLENARUSE, Yui
2019-06-24Add new encoding CESU-8 [Feature #15931]NARUSE, Yui
2019-04-05remove Unicode 12.0.0 related directory and generated filesduerst
This completes issue #15195. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05update to Unicode Version 12.1.0 (beta)duerst
Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA, for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on May 7th, so we go with the beta version because further changes in the data we need are highly unlikely, and we want to make sure Ruby is ready for the new era. * change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES * enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h: add directory and generated data files for new version * lib/unicode_normalize/tables.rb: update for new character * test/ruby/test_regexp.rb: add test for character property age=12.1 * test/test_unicode_normalize.rb: add test for NFKC decomposition of new character This (mostly) completes issue #15195. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06delete directory and files related to Unicode version 11.0.0duerst
this completes and closes feature #15321 git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06update Unicode version (and Emoji version) to 12.0.0duerst
- set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0 - lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0 - enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated files for Unicode version 12.0.0 This is the main commit for #15321. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-14Removed duplicate dependentsnobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-12Update dependencies, internal.h includes ruby.hnobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09implement special behavior for Georgian for String#capitalizeduerst
The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09delete Unicode 10.0.0 related files, no longer needed [#14802]duerst
This line, and those below, will be ignored-- D enc/unicode/10.0.0 git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06remove obsolete data from unicode.cduerst
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ, onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji, because they are not needed anymore for Unicode 11.0.0. * regparse.c: Remove external declarations for above arrays. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05update to Unicode 11.0.0 (main step, not complete yet)duerst
- Change Unicode version to 11.0.0, and Emoji version to 11.0 - test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. Files for Unicode 10.0.0 will be removed once we are sure 11.0.0 works. - lib/unicode_normalize/tables.rb: Updated table. - regparse.c: Almost completely reimplement grapheme cluster detection in function node_extended_grapheme_cluster(). git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02solve the genie/zombie/wrestlers bugduerst
enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF to onigenc_unicode_GCB_ranges_E_Base. - Add comments with character names. test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers. This closes issue #15343. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-30Added words in the comment at r65088 [ci skip]nobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27Embed the Emoji versionnobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-25deal with ONIGENC_CASE_IS_TITLECASE flag on lowercase charactersduerst
In the function onigenc_unicode_case_map() in enc/unicode.c, deal with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on lowercase characters. This is in preparation for Georgian Mtavruli, which are uppercase but not titlecase, in Unicode 11.0.0. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-23prepare for Unicode 11.0.0 updateduerst
- enc/unicode/case-folding.rb: - Convert unpredicted case to actual flag setting - Eliminate an unused variable - Change a variable name to avoid a warning git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-16Make some internal functions staticnobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-16enc/unicode.c: 'a' is bigger than 'A'shyouhei
In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative number (-32, to be precise). In C, the type of 'a' and 'A' are signed int (cf: ISO/IEC 9899:1990 section So 'A' - 'a' is also a signed int. It is `(signed int)-32`. The problem is, OnigCodePoint is unsigned int. Adding a negative number to a variable of OnigCodepoint (`code` here) introduces an unintentional cast of `(unsigned)(signed)-32`, which is 4,294,967,264. Adding this value to code then overflows, and the result eventually becomes normal codepoint. The series of operations are not a serious problem but because `code >= 'a'` holds, we can `(code - 'a') + 'A'` to reroute this. See also: git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16revert r65091, r65090 because ci failsduerst
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16update to Unicode 11.0.0 (basic step, not complete yet)duerst
- Change Unicode version to 11.0.0 - enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with Gregorian Mtavruli. This should bring us up to the same level as e.g. Python 3.7, by following the Unicode tables exactly. But it will produce undesirable (mixed-case) results for String#capitalize. This will be addressed in a later commit. - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. - lib/unicode_normalize/tables.rb: Updated table. git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16add some comments to enc/unicode/case-folding.rb [ci skip]duerst
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16Removed data for old Unicode [ci skip]nobu
git-svn-id: svn+ssh:// b2dd03c8-39d4-4d8f-98ff-823fe69b080e