summaryrefslogtreecommitdiff
path: root/tool/transcode-tblgen.rb
AgeCommit message (Collapse)Author
2009-06-30* tool/*: executable.nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23909 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-05-05Fix: DON'T move in_p because before in_p is replaced by buffered data.naruse
* transcode.c: NOMAP is now multibyte direct map. * transcode.c: remove ASIS. * transcode_data.h: ditto. * tool/transcode-tb (ActionMap#generate_info): remove :asis. * tool/transcode-tb (ActionMap#generate_info): add :nomap0. * enc/trans/utf8_mac.trans: replace :asis by :nomap0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23344 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-04-26* tool/transcode-tb (ActionMap#each_firstbyte):naruse
if :asis collides other mappings, use another. * tool/transcode-tb (ActionMap#generate_info): add :asis for ASIS. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-04-26* tool/transcode-tb (ActionMap#generate_node):naruse
Use ActionMap#gennode instead of generate_node because of initialization. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-01-14* enc/trans/gb18030.trans: get rid of a 1.9 feature for crossnobu
compile. [ruby-core:21345] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21512 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-01-14* enc/trans/gb18030.trans, gb18030-tbl.rb:duerst
new Chinese GB18030 transcoding (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) * transcode_data.h, transcode.c, tool/transcode_tblgen.rb: added support for GB18030-specific 4-byte sequences (with Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-10-18* tool/transcode-tblgen.rb: added set_valid_byte_patternduerst
to reduce coupling between table generation script and specific encodings. * enc/trans/single_byte.trans: using set_valid_byte_pattern git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19831 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-10-14* enc/trans/single_byte.trans: added windows-1252duerst
* enc/trans/windows-1252-tbl.rb: new file (contributed by Yoshihiro Kambayashi) * tool/transcode-tblgen.rb: listed windows-1252 as '1byte' * test/ruby/test_transcode.rb: added test_windows_1252 (contributed by Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-15* transcode_data.h (STR1_LENGTH): defined.akr
(makeSTR1LEN): defined. * tool/transcode-tblgen.rb: use makeSTR1LEN. generate STR1 for 4 to 259 bytes. * transcode.c (rb_transcoding): new field: output_index. (transcode_restartable0): use STR1_LENGTH. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19366 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-09* tool/transcode-tblgen.rb (StrSet#hash): cache hash value.akr
(ActionMap#hash): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19279 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-08* include/ruby/encoding.h (rb_econv_asciicompat_encoding): renamedakr
from rb_econv_stateless_encoding to apply stateless ASCII incompatible encodings such as UTF-16BE. * io.c (make_writeconv): use rb_econv_asciicompat_encoding. * transcode_data.h (rb_transcoder_asciicompat_type_t): renamed from rb_transcoder_stateful_type_t. (rb_transcoder): use rb_transcoder_asciicompat_type_t. * transcode.c: follow the type change. (asciicompat_encoding_i): renamed from stateless_encoding_i. (rb_econv_asciicompat_encoding): renamed from rb_econv_stateless_encoding. (econv_s_asciicompat_encoding): method renamed. * tool/transcode-tblgen.rb: follow the type change. * enc/trans/utf_16_32.trans: follow the type change. rb_from_UTF_16BE to UTF-8 is asciicompat_decoder. rb_from_UTF_16LE to UTF-8 is asciicompat_decoder. rb_from_UTF_32BE to UTF-8 is asciicompat_decoder. rb_from_UTF_32LE to UTF-8 is asciicompat_decoder. UTF-8 to rb_to_UTF_16BE is asciicompat_encoder. UTF-8 to rb_to_UTF_16LE is asciicompat_encoder. UTF-8 to rb_to_UTF_32BE is asciicompat_encoder. UTF-8 to rb_to_UTF_32LE is asciicompat_encoder. * enc/trans/newline.trans: follow the type change. universal newline decoder is asciicompat_converter. * enc/trans/escape.trans: follow the type change. * enc/trans/iso2022.trans: ditto. * enc/trans/japanese.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-08* tool/transcode-tblgen.rb (ArrayCode): less string substitutions.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19242 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* tool/transcode-tblgen.rb (transcode_tblgen): log message refined.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* enc/trans/escape.trans: use transcode_tblgen.akr
* tool/transcode-tblgen.rb: generate an empty line after str1. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* tool/transcode-tblgen.rb (ActionMap#str_name): new method toakr
generate a name base on string content. (ActionMap#gen_str): extracted from generate_info and use str_name. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19216 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* tool/transcode-tblgen.rb (ActionMap#generate_info): use a memo toakr
avoid duplication for STR1. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19215 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* transcode_data.h (STR1): defined for a string up to 255 bytes.akr
(STR1_BYTEINDEX): defined. (makeSTR1): defined. * tool/transcode-tblgen.rb: generate STR1. * transcode.c (transcode_restartable0): interpret STR1. * enc/trans/escape.trans (fun_so_escape_xml_chref): removed. STR1 is used instead. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19214 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-07* tool/transcode-tblgen.rb: o4 is usable only if the first byte isakr
f0-f7. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-05* tool/transcode-tblgen.rb (StrSet.parse): accept upper caseakr
hexadecimal digits. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-04* tool/transcode-tblgen.rb (citrus_decode_mapsrc): support older 1.8.nobu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-03* transcode_data.h (rb_transcoding): remove stateful field.akr
add state field. (TRANSCODING_STATE): defined. (rb_transcoder): add fields: state_size, state_init_func, state_fini_func. change rb_transcoding* argument to void*. * transcode.c (transcode_restartable0): use TRANSCODING_STATE for first arguments of transcoder functions. (rb_transcoding_open_by_transcoder): initialize state field. (rb_transcoding_close): finalize state field. * tool/transcode-tblgen.rb: provide state size/init/fini. * enc/trans/newline.trans (universal_newline_init): defined. (fun_so_universal_newline): take void* as a state pointer. (rb_universal_newline): provide state size/init/fini. (rb_crlf_newline): ditto. (rb_cr_newline): ditto. * enc/trans/iso2022.trans (iso2022jp_init): defined. (fun_si_iso2022jp_to_eucjp): take void* as a state pointer. (fun_so_iso2022jp_to_eucjp): ditto. (fun_so_eucjp_to_iso2022jp): ditto. (iso2022jp_reset_sequence_size): ditto. (finish_eucjp_to_iso2022jp): ditto. (rb_ISO_2022_JP_to_EUC_JP): provide state size/init/fini. (rb_EUC_JP_to_ISO_2022_JP): ditto. * enc/trans/utf_16_32.trans (fun_so_from_utf_16be): take void* as a state pointer. (fun_so_to_utf_16be): ditto. (fun_so_from_utf_16le): ditto. (fun_so_to_utf_16le): ditto. (fun_so_from_utf_32be): ditto. (fun_so_to_utf_32be): ditto. (fun_so_from_utf_32le): ditto. (fun_so_to_utf_32le): ditto. (rb_from_UTF_16BE): provide state size/init/fini. (rb_to_UTF_16BE): ditto. (rb_from_UTF_16LE): ditto. (rb_to_UTF_16LE): ditto. (rb_from_UTF_32BE): ditto. (rb_to_UTF_32BE): ditto. (rb_from_UTF_32LE): ditto. (rb_to_UTF_32LE): ditto. * enc/trans/japanese.trans (fun_so_eucjp2sjis): take void* as a state pointer. (fun_so_sjis2eucjp): ditto. (rb_eucjp2sjis): provide state size/init/fini. (rb_sjis2eucjp): provide state size/init/fini. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-03* transcode_data.h (WORDINDEX_SHIFT_BITS): defined.akr
(WORDINDEX2INFO): defined. (INFO2WORDINDEX): defined. * tool/transcode-tblgen.rb: use WORDINDEX2INFO. * transcode.c: use INFO2WORDINDEX. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-03* transcode_data.h (rb_transcoder): new field: byte_array_length andakr
word_array_length. * tool/transcode-tblgen.rb (transcode_generated_code): generate byte_array_length and word_array_length. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-03* tool/transcode-tblgen.rb (ArrayCode): new class.akr
(ActionMap#gen_array_code): moved to ArrayCode. (ActionMap#numelt_array_code): ditto. (ActionMap#array_code_insert_at_last): ditto. (TRANSCODE_GENERATED_BYTES_CODE): use ArrayCode. (TRANSCODE_GENERATED_WORDS_CODE): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-03* tool/transcode-tblgen.rb (ActionMap#gen_array_code): extracted fromakr
generate_lookup_node. (ActionMap#numelt_array_code): ditto. (ActionMap#array_code_insert_at_last): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-02* transcode_data.h (base_element): removed.akr
(BYTE_LOOKUP): removed. (BYTE_LOOKUP_BASE): don't cast. (BYTE_LOOKUP_INFO): ditto. (PType): unsigned int, instead of uintptr_t. (rb_transcoding): change type of next_field, conv_tree_start and word_array. * tool/transcode-tblgen.rb: generate word_array as array of unsigned int. * transcode.c (transcode_restartable0): follow the above type change. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-02* tool/transcode-tblgen.rb: add prefix for byte_array and word_array.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19069 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: comment removed in generated code.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: define TRANSCODE_TABLE_INFO in generatedakr
code. use it in rb_transcoder. * enc/trans/newline.trans: use TRANSCODE_TABLE_INFO. * enc/trans/iso2022.trans: ditto. * enc/trans/utf_16_32.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: record infos and BYTE_LOOKUPs as index ofakr
word_array to avoid relocation. * transcode.c (transcode_restartable0): add word_array to get infos and BYTE_LOOKUPs. * transcode_data.h (BYTE_LOOKUP_INFO): change return type to uintptr_t. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: don't need to cast offsets array.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: record offsets array as index ofakr
byte_array to avoid relocation. * transcode.c (transcode_restartable0): add byte_array to get offsets array. * transcode_data.h (BYTE_LOOKUP_BASE): change return type to uintptr_t. (rb_transcoder): add fields: byte_array, word_array and word_size. * enc/trans/newline.trans: follow rb_transcoder change. * enc/trans/iso2022.trans: ditto. * enc/trans/utf_16_32.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: make infos arrays and BYTE_LOOKUPs intoakr
single array. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* transcode_data.h (BYTE_LOOKUP): change to uintptr_t array.akr
(BYTE_LOOKUP_BASE): follow the type change. (BYTE_LOOKUP_INFO): ditto. (PType): ditto. (rb_transcoding): ditto. * tool/transcode-tblgen.rb: follow the type change. * transcode.c: ditto. * enc/trans/newline.trans: ditto. * enc/trans/iso2022.trans: ditto. * enc/trans/utf_16_32.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19038 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: gather infos arrays and BYTE_LOOKUPs.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19036 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: make offsets arrays into single array.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19032 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: gather offsets array at top.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb: ValidEncoding['eucJP-ms'] defined.akr
"\xA2\xAF".encode("utf-8", "eucJP-ms") should raise Encoding::ConversionUndefined, not Encoding::InvalidByteSequence. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19029 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01* tool/transcode-tblgen.rb (transcode_compile_tree): use the firstnaruse
mapping when some mappings are given for a character. [ruby-dev:36068] * tool/transcode-tblgen.rb: expandtab. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-01 * tool/transcode-tblgen.rb: set ERB source filename for error message.usa
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19016 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-31* tool/transcode-tblgen.rb: change "illegal" to "invalid".akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19007 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-31* tool/transcode-tblgen.rb (transcode_generated_code): defined forakr
generating table at once. (transcode_tblgen): returns an empty string. (transcode_generate_node): ditto. * enc/trans/newline.trans: use transcode_generated_code. * enc/trans/iso2022.trans: ditto. * enc/trans/single_byte.trans: ditto. * enc/trans/utf_16_32.trans: ditto. * enc/trans/japanese.trans: ditto. * enc/trans/korean.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19006 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-31* tool/transcode-tblgen.rb (citrus_decode_mapsrc): print loggingakr
message on STDERR. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19005 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-31* tool/transcode-tblgen.rb: add table generator from Citrus maps.naruse
* enc/trans/japanese.trans: use Citrus maps. * enc/trans/CP: add maps from Citrus. * enc/trans/JIS: ditto. * test/ruby/test_transcode.rb: Shift_JIS and EUC-JP doesn't support IBM extended characters. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19003 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-15* transcode_data.h (rb_transcoder_stateful_type_t): defined.akr
(rb_transcoder): add field: stateful_type. * tool/transcode-tblgen.rb: generate stateful_type field as stateless_converter. * enc/trans/iso2022.trans: follow rb_transcoder change. * enc/trans/newline.trans: ditto. * enc/trans/utf_16_32.trans: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18650 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-14* include/ruby/encoding.h (rb_econv_output): declared.akr
* transcode_data.h (rb_transcoder): add resetsize_func field. * enc/trans/iso2022.trans (iso2022jp_reset_sequence_size): defined. (rb_EUC_JP_to_ISO_2022_JP): provede resetsize_func. * tool/transcode-tblgen.rb: set NULL for resetsize_func. * transcode.c (rb_econv_output): new function for inserting output. (output_replacement_character): use rb_econv_output. (transcode_loop): check return value of output_replacement_character. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-14* tool/transcode-tblgen.rb: check unexpected actions.akr
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18619 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-12* tool/transcode-tblgen.rb (#transcode_tblgen): slight messagematz
improvement. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18529 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-12* transcode_data.h (TRANSCODE_ERROR): removed.akr
* tool/transcode-tblgen.rb: 8bit byte of ASCII-8BIT is a valid (but unique to ASCII-8BIT) character. * transcode.c (rb_eConversionUndefined): new error. (rb_eInvalidByteSequence): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18524 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-08-11* transcode_data.h (rb_transcoder): add resetstate_func field forakr
resetting a state of stateful encoding. * enc/trans/iso2022.trans (rb_EUC_JP_to_ISO_2022_JP): specify finish_eucjp_to_iso2022jp for resetstate_func. * tool/transcode-tblgen.rb: specify NULL for resetstate_func. * transcode.c (output_replacement_character): call resetstate_func before appending the replacement character. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18503 b2dd03c8-39d4-4d8f-98ff-823fe69b080e