summaryrefslogtreecommitdiff
path: root/test/ruby/enc
AgeCommit message (Collapse)Author
2024-04-04Prevent "ambiguous first argument" warningsYusuke Endoh
2024-03-14Ensure test suite is compatible with --frozen-string-literalJean Boussier
As preparation for https://bugs.ruby-lang.org/issues/20205 making sure the test suite is compatible with frozen string literals is making things easier.
2023-03-18Fix handling of 6-byte codepoints in left_adjust_char_head in CESU-8 encodingJosef Haider
Notes: Merged: https://github.com/ruby/ruby/pull/7510 Merged-By: nobu <nobu@ruby-lang.org>
2023-02-27Prefer to use File.foreach instead of IO.foreachHiroshi SHIBATA
Notes: Merged: https://github.com/ruby/ruby/pull/7387
2023-02-27Prefer to use File.readlines instead of IO.readlinesHiroshi SHIBATA
Notes: Merged: https://github.com/ruby/ruby/pull/7387
2022-12-06add file version check for new Unicode emoji file headerMartin Dürst
The change in the Unicode emoji file header took place at version 14.0.0, but is needed only from version 15.0.0 because in version 14.0.0, another check is still active.
2022-04-22Avoid defining the same test class in multiple filesJeremy Evans
Should fix issues with parallel testing sometimes not running all tests. This should be viewed skipping whitespace changes. Fixes [Bug #18731] Notes: Merged: https://github.com/ruby/ruby/pull/5839
2022-03-16Revert "Fix version check to use Emoji version for ↵Martin Dürst
emoji-variation-sequences.txt" This reverts commit 48f1e8c5d85043e6adb8e93c94532daa201d42e9.
2022-03-16Revert "Allow `.0` version mismatch to pass the tests"Martin Dürst
This reverts commit fc6e4ce62bfa95b6a0d4d4898e1128c1fce4db8a.
2022-03-16Allow `.0` version mismatch to pass the testsKoichi Sasada
With `make update-unicode`, some tests failed with the following error due to header mismatch. * `RbConfig::CONFIG['UNICODE_EMOJI_VERSION']` => 14.0 * the header line is `# emoji-variation-sequences-14.0.0.txt` So the last `.0` is mismatch. This patch allows additional `.0` in the header line. Please revert this patch when a correct patach is merged. ``` 1) Error: TestEmojiBreaks#test_embedded_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:88:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:111:in `all_tests' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:127:in `test_embedded_emoji' 2) Error: TestEmojiBreaks#test_mixed_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:88:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:111:in `all_tests' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:139:in `test_mixed_emoji' 3) Error: TestEmojiBreaks#test_single_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:88:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:111:in `all_tests' /tmp/ruby/v3/src/trunk/test/ruby/enc/test_emoji_breaks.rb:117:in `test_single_emoji' ```
2022-03-16Fix version check to use Emoji version for emoji-variation-sequences.txtMartin Dürst
2022-03-14Fix failuresKazuhiro NISHIYAMA
http://ci.rvm.jp/results/trunk-no-mjit@phosphorus-docker/3870646 ``` 1) Error: TestEmojiBreaks#test_single_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:84:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:105:in `all_tests' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:111:in `test_single_emoji' 2) Error: TestEmojiBreaks#test_mixed_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:84:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:105:in `all_tests' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:133:in `test_mixed_emoji' 3) Error: TestEmojiBreaks#test_embedded_emoji: RuntimeError: File Name Mismatch: line: # emoji-variation-sequences-14.0.0.txt, expected filename: emoji-variation-sequences.txt /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:84:in `block (2 levels) in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `foreach' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:82:in `block in read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `each' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:79:in `read_data' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:105:in `all_tests' /tmp/ruby/v3/src/trunk-no-mjit/test/ruby/enc/test_emoji_breaks.rb:121:in `test_embedded_emoji' make: *** [uncommon.mk:823: yes-test-all] Error 3 ```
2021-12-29Use omit instead of skip: test/ruby/enc/**/*.rbHiroshi SHIBATA
2021-08-17Take into account data in emoji-variation-sequences.txt in tests.Martin Dürst
The emoji data in emoji-variation-sequences.txt was not used for in test/ruby/enc/test_emoji_breaks.rb, for unknown reasons. It turned out that the format of each of the emoji data/test files is slightly different, and that we didn't take into account that empty fields after a semicolon, as present in emoji-variation-sequences.txt, led to less fields than expected when using split. This addresses issue #18027.
2021-07-27Deal with Unicode ranges in the file emoji-sequences.txtMartin Dürst
Detect Unicode ranges and loop over them. This fixes issue #18028.
2021-07-27Adjust test/ruby/enc/test_emoji_breaks.rb to handle Emoji Version 13.1Martin Dürst
Deal with the issue that the emoji files in emoji/13.1 have Unicode Emoji version 13.1, but at the same time the files in 13.0.0/ucd/emoji are still at Emoji version 13.0. Specifically: - Add a version attribute to TestEmojiBreaks::BreakFile - Take the version for emoji-variant-sequences.txt from the Unicode version, removing the last two characters. - Improve information in exceptions for file name and version mismatches.
2021-07-08Adapt test_emoji_breaks.rb to Unicode 13.0.0/Emoji 13.0Martin Dürst
- Add UNICODE_VERSION,... to deal with new location of some of the emoji-related data files. - Introduce class BreakFile to handle various file properties. - Adapt main code to use BreakFile.
2020-01-29support multi-run for test/ruby/enc/test_regex_casefold.rbKoichi Sasada
should not mutate test data.
2019-06-28Removed excess spacesNobuyoshi Nakada
2019-06-28Fixed name conflict between helper classesNobuyoshi Nakada
2019-06-24Add new encoding CESU-8 [Feature #15931]NARUSE, Yui
2019-05-17Test to disable ASCII-only optimizationNobuyoshi Nakada
Examples why ASCII-only optimization cannot apply multi-byte encodings which have 7-bit trailing bytes. Suggested by @duerst at https://github.com/ruby/ruby/pull/2187#issuecomment-492949218
2018-12-10add a test to make sure some unassigned codepoints do not get convertedduerst
In test/ruby/enc/test_case_mapping.rb, add a test to make sure the unassigned codepoints in the Georgian MTAVRULI range (U+1CBB, U+1CBC) do not get converted to unrelated codepoints by String#capitalize. (It turns out that this test was not strictly necessary, because unassigned codepoints are already excluded by the fact that they are not found in the onigenc_unicode_fold_lookup table. So this test only serves to check against future regressions.) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66314 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09implement special behavior for Georgian for String#capitalizeduerst
The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-07replace hardcoded emoji version by RbConfig::CONFIG['UNICODE_EMOJI_VERSION']duerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05update to Unicode 11.0.0 (main step, not complete yet)duerst
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0 - test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. Files for Unicode 10.0.0 will be removed once we are sure 11.0.0 works. - lib/unicode_normalize/tables.rb: Updated table. - regparse.c: Almost completely reimplement grapheme cluster detection in function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-04exclude skin tones as second component in TestEmojiBreaks#test_mixed_emojiduerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-04change embedding character in TestEmojiBreaks#test_embedded_emojiduerst
In test/ruby/enc/test_emoji_breaks.rb, in method TestEmojiBreaks#test_embedded_emoji, change the surrounding characters from A/Z to the more neutral \t in preparation for upgrade to Unicode 11.0.0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66180 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02solve the genie/zombie/wrestlers bugduerst
enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF to onigenc_unicode_GCB_ranges_E_Base. - Add comments with character names. test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers. This closes issue #15343. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-26improve messages for test failuresduerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66010 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-26add tests for grapheme clusters using Unicode Emoji test dataduerst
Add file test/ruby/enc/test_emoji_breaks.rb to test String#each_grapheme_cluster test data provided by Unicode (at https://www.unicode.org/Public/emoji/#{EMOJI_VERSION}/). Lines containing emoji for genies, zombies, and wrestling are ignored because there seems to be a bug (#15343) in the implementation. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24remove guard against bug #15337, because it is fixedduerst
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24add tests using Unicode test data for grapheme clustersduerst
Add file test/ruby/enc/test_grapheme_breaks.rb to test String#each_grapheme_cluster and \X extended grapheme cluster matcher in regular expressions against test data provided by Unicode (ucd/auxiliary/GraphemeBreakTest.txt). Some lines in the data file are ignored, as follows: - Lines with a surrogate, because Ruby doesn't handle these - The case of "\r\n", because there is a bug (#15337) in the implementation git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65955 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-23fix unicode data directorynobu
* test/ruby/enc/test_regex_casefold.rb: fix searching unicode data directory, like as test_case_comprehensive.rb. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61417 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-22update unicode data files directorynobu
* test/ruby/enc/test_case_comprehensive.rb: search ucd directory first if exists. * test/ruby/enc/test_regex_casefold.rb: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61415 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-09fix UTF-32 valid_encoding?nobu
* enc/utf_32be.c (utf32be_mbc_enc_len): check arguments precisely. [ruby-core:79966] [Bug #13292] * enc/utf_32le.c (utf32le_mbc_enc_len): ditto. * regenc.h (UNICODE_VALID_CODEPOINT_P): predicate for valid Unicode codepoints. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57816 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-09test_utf16.rb: refine valid_encoding testsnobu
* test/ruby/enc/test_utf16.rb (test_utf16be_valid_encoding): assert all data and use assert_predicate. * test/ruby/enc/test_utf16.rb (test_utf16le_valid_encoding): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57815 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-03add tests againts regressions for upcoming codepoint reordering in unfolding ↵duerst
table * test/ruby/enc/test_case_mapping.rb: Add method test_reorder_unfold to test against problems when reordering codepoints in some entries in CaseUnfold_11_Type CaseUnfold_11_Table. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56968 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-03change test class name because it is not only about foldingduerst
* test/ruby/enc/test_case_comprehensive.rb: Change test class name from TestComprehensiveCaseFold to TestComprehensiveCaseMapping because the tests are about mapping in general, not only folding git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56966 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UKduerst
* enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC at the end of onigenc_unicode_case_map (Bug #12990). * enc/unicode/case-folding.rb: Add U+A64B to the special cases 03B9 and 03BC. Add a comment pointing to enc/unicode.c. Change warnings to exceptions for unpredicted cases, because this would have been more easily noticed (the warning was not noticed when upgrading to Unicode 9.0.0). * test/ruby/enc/test_case_comprehensive.rb: Remove temporary exclusion of U+A64B from testing. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29get rid of ambiguous parentheses warningsnobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56937 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29Fix erroneous test of target against targetduerst
* test/ruby/enc/test_case_comprehensive.rb: fix test condition, add a temporary check for U+A64B, the only character where the tests currently fail. (Bug #12990) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-16* enc/windows_1254.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for Windows-1254. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56433 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-26test_regex_casefold.rb: skip if no data filenobu
* test/ruby/enc/test_regex_casefold.rb (setup): skip with error message if CaseFolding.txt does not present, instead of printing the message, which causes unknown command in parallel test. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56017 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-30* enc/iso_8859_2.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for ISO-8859-2, by Yushiro Ishii. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26* enc/windows_1257.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for Windows-1257, by Sho Koike. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26* enc/windows_1250.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for Windows-1250, by Sho Koike. * ChangeLog: Fixed order of previous two entries. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26* enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55750 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26* enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26* remove trailing spaces.svn
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e