/* * Introduction * ************ * * The following notes assume that you are familiar with the YAML specification * (http://yaml.org/spec/cvs/current.html). We mostly follow it, although in * some cases we are less restrictive that it requires. * * The process of transforming a YAML stream into a sequence of events is * divided on two steps: Scanning and Parsing. * * The Scanner transforms the input stream into a sequence of tokens, while the * parser transform the sequence of tokens produced by the Scanner into a * sequence of parsing events. * * The Scanner is rather clever and complicated. The Parser, on the contrary, * is a straightforward implementation of a recursive-descendant parser (or, * LL(1) parser, as it is usually called). * * Actually there are two issues of Scanning that might be called "clever", the * rest is quite straightforward. The issues are "block collection start" and * "simple keys". Both issues are explained below in details. * * Here the Scanning step is explained and implemented. We start with the list * of all the tokens produced by the Scanner together with short descriptions. * * Now, tokens: * * STREAM-START(encoding) # The stream start. * STREAM-END # The stream end. * VERSION-DIRECTIVE(major,minor) # The '%YAML' directive. * TAG-DIRECTIVE(handle,prefix) # The '%TAG' directive. * DOCUMENT-START # '---' * DOCUMENT-END # '...' * BLOCK-SEQUENCE-START # Indentation increase denoting a block * BLOCK-MAPPING-START # sequence or a block mapping. * BLOCK-END # Indentation decrease. * FLOW-SEQUENCE-START # '[' * FLOW-SEQUENCE-END # ']' * BLOCK-SEQUENCE-START # '{' * BLOCK-SEQUENCE-END # '}' * BLOCK-ENTRY # '-' * FLOW-ENTRY # ',' * KEY # '?' or nothing (simple keys). * VALUE # ':' * ALIAS(anchor) # '*anchor' * ANCHOR(anchor) # '&anchor' * TAG(handle,suffix) # '!handle!suffix' * SCALAR(value,style) # A scalar. * * The following two tokens are "virtual" tokens denoting the beginning and the * end of the stream: * * STREAM-START(encoding) * STREAM-END * * We pass the information about the input stream encoding with the * STREAM-START token. * * The next two tokens are responsible for tags: * * VERSION-DIRECTIVE(major,minor) * TAG-DIRECTIVE(handle,prefix) * * Example: * * %YAML 1.1 * %TAG ! !foo * %TAG !yaml! tag:yaml.org,2002: * --- * * The corresponding sequence of tokens: * * STREAM-START(utf-8) * VERSION-DIRECTIVE(1,1) * TAG-DIRECTIVE("!","!foo") * TAG-DIRECTIVE("!yaml","tag:yaml.org,2002:") * DOCUMENT-START * STREAM-END * * Note that the VERSION-DIRECTIVE and TAG-DIRECTIVE tokens occupy a whole * line. * * The document start and end indicators are represented by: * * DOCUMENT-START * DOCUMENT-END * * Note that if a YAML stream contains an implicit document (without '---' * and '...' indicators), no DOCUMENT-START and DOCUMENT-END tokens will be * produced. * * In the following examples, we present whole documents together with the * produced tokens. * * 1. An implicit document: * * 'a scalar' * * Tokens: * * STREAM-START(utf-8) * SCALAR("a scalar",single-quoted) * STREAM-END * * 2. An explicit document: * * --- * 'a scalar' * ... * * Tokens: * * STREAM-START(utf-8) * DOCUMENT-START * SCALAR("a scalar",single-quoted) * DOCUMENT-END * STREAM-END * * 3. Several documents in a stream: * * 'a scalar' * --- * 'another scalar' * --- * 'yet another scalar' * * Tokens: * * STREAM-START(utf-8) * SCALAR("a scalar",single-quoted) * DOCUMENT-START * SCALAR("another scalar",single-quoted) * DOCUMENT-START * SCALAR("yet another scalar",single-quoted) * STREAM-END * * We have already introduced the SCALAR token above. The following tokens are * used to describe aliases, anchors, tag, and scalars: * * ALIAS(anchor) * ANCHOR(anchor) * TAG(handle,suffix) * SCALAR(value,style) * * The following series of examples illustrate the usage of these tokens: * * 1. A recursive sequence: * * &A [ *A ] * * Tokens: * * STREAM-START(utf-8) * ANCHOR("A") * FLOW-SEQUENCE-START * ALIAS("A") * FLOW-SEQUENCE-END * STREAM-END * * 2. A tagged scalar: * * !!float "3.14" # A good approximation. * * Tokens: * * STREAM-START(utf-8) * TAG("!!","float") * SCALAR("3.14",double-quoted) * STREAM-END * * 3. Various scalar styles: * * --- # Implicit empty plain scalars do not produce tokens. * --- a plain scalar * --- 'a single-quoted scalar' * --- "a double-quoted scalar" * --- |- * a literal scalar * --- >- * a folded * scalar * * Tokens: * * STREAM-START(utf-8) * DOCUMENT-START * DOCUMENT-START * SCALAR("a plain scalar",plain) * DOCUMENT-START * SCALAR("a single-quoted scalar",single-quoted) * DOCUMENT-START * SCALAR("a double-quoted scalar",double-quoted) * DOCUMENT-START * SCALAR("a literal scalar",literal) * DOCUMENT-START * SCALAR("a folded scalar",folded) * STREAM-END * * Now it's time to review collection-related tokens. We will start with * flow collections: * * FLOW-SEQUENCE-START * FLOW-SEQUENCE-END * FLOW-MAPPING-START * FLOW-MAPPING-END * FLOW-ENTRY * KEY * VALUE * * The tokens FLOW-SEQUENCE-START, FLOW-SEQUENCE-END, FLOW-MAPPING-START, and * FLOW-MAPPING-END represent the indicators '[', ']', '{', and '}' * correspondingly. FLOW-ENTRY represent the ',' indicator. Finally the * indicators '?' and ':', which are used for denoting mapping keys and values, * are represented by the KEY and VALUE tokens. * * The following examples show flow collections: * * 1. A flow sequence: * * [item 1, item 2, item 3] * * Tokens: * * STREAM-START(utf-8) * FLOW-SEQUENCE-START * SCALAR("item 1",plain) * FLOW-ENTRY * SCALAR("item 2",plain) * FLOW-ENTRY * SCALAR("item 3",plain) * FLOW-SEQUENCE-END * STREAM-END * * 2. A flow mapping: * * { * a simple key: a value, # Note that the KEY token is produced. * ? a complex key: another value, * } * * Tokens: * * STREAM-START(utf-8) * FLOW-MAPPING-START * KEY * SCALAR("a simple key",plain) * VALUE * SCALAR("a value",plain) * FLOW-ENTRY * KEY * SCALAR("a complex key",plain) * VALUE * SCALAR("another value",plain) * FLOW-ENTRY * FLOW-MAPPING-END * STREAM-END * * A simple key is a key which is not denoted by the '?' indicator. Note that * the Scanner still produce the KEY token whenever it encounters a simple key. * * For scanning block collections, the following tokens are used (note that we * repeat KEY and VALUE here): * * BLOCK-SEQUENCE-START * BLOCK-MAPPING-START * BLOCK-END * BLOCK-ENTRY * KEY * VALUE * * The tokens BLOCK-SEQUENCE-START and BLOCK-MAPPING-START denote indentation * increase that precedes a block collection (cf. the INDENT token in Python). * The token BLOCK-END denote indentation decrease that ends a block collection * (cf. the DEDENT token in Python). However YAML has some syntax pecularities * that makes detections of these tokens more complex. * * The tokens BLOCK-ENTRY, KEY, and VALUE are used to represent the indicators * '-', '?', and ':' correspondingly. * * The following examples show how the tokens BLOCK-SEQUENCE-START, * BLOCK-MAPPING-START, and BLOCK-END are emitted by the Scanner: * * 1. Block sequences: * * - item 1 * - item 2 * - * - item 3.1 * - item 3.2 * - * key 1: value 1 * key 2: value 2 * * Tokens: * * STREAM-START(utf-8) * BLOCK-SEQUENCE-START * BLOCK-ENTRY * SCALAR("item 1",plain) * BLOCK-ENTRY * SCALAR("item 2",plain) * BLOCK-ENTRY * BLOCK-SEQUENCE-START * BLOCK-ENTRY * SCALAR("item 3.1",plain) * BLOCK-ENTRY * SCALAR("item 3.2",plain) * BLOCK-END * BLOCK-ENTRY * BLOCK-MAPPING-START * KEY * SCALAR("key 1",plain) * VALUE * SCALAR("value 1",plain) * KEY * SCALAR("key 2",plain) * VALUE * SCALAR("value 2",plain) * BLOCK-END * BLOCK-END * STREAM-END * * 2. Block mappings: * * a simple key: a value # The KEY token is produced here. * ? a complex key * : another value * a mapping: * key 1: value 1 * key 2: value 2 * a sequence: * - item 1 * - item 2 * * Tokens: * * STREAM-START(utf-8) * BLOCK-MAPPING-START * KEY * SCALAR("a simple key",plain) * VALUE * SCALAR("a value",plain) * KEY * SCALAR("a complex key",plain) * VALUE * SCALAR("another value",plain) * KEY * SCALAR("a mapping",plain) * BLOCK-MAPPING-START * KEY * SCALAR("key 1",plain) * VALUE * SCALAR("value 1",plain) * KEY * SCALAR("key 2",plain) * VALUE * SCALAR("value 2",plain) * BLOCK-END * KEY * SCALAR("a sequence",plain) * VALUE * BLOCK-SEQUENCE-START * BLOCK-ENTRY * SCALAR("item 1",plain) * BLOCK-ENTRY * SCALAR("item 2",plain) * BLOCK-END * BLOCK-END * STREAM-END * * YAML does not always require to start a new block collection from a new * line. If the current line contains only '-', '?', and ':' indicators, a new * block collection may start at the current line. The following examples * illustrate this case: * * 1. Collections in a sequence: * * - - item 1 * - item 2 * - key 1: value 1 * key 2: value 2 * - ? complex key * : complex value * * Tokens: * * STREAM-START(utf-8) * BLOCK-SEQUENCE-START * BLOCK-ENTRY * BLOCK-SEQUENCE-START * BLOCK-ENTRY * SCALAR("item 1",plain) * BLOCK-ENTRY * SCALAR("item 2",plain) * BLOCK-END * BLOCK-ENTRY * BLOCK-MAPPING-START * KEY * SCALAR("key 1",plain) * VALUE * SCALAR("value 1",plain) * KEY * SCALAR("key 2",plain) * VALUE * SCALAR("value 2",plain) * BLOCK-END * BLOCK-ENTRY * BLOCK-MAPPING-START * KEY * SCALAR("complex key") * VALUE * SCALAR("complex value") * BLOCK-END * BLOCK-END * STREAM-END * * 2. Collections in a mapping: * * ? a sequence * : - item 1 * - item 2 * ? a mapping * : key 1: value 1 * key 2: value 2 * * Tokens: * * STREAM-START(utf-8) * BLOCK-MAPPING-START * KEY * SCALAR("a sequence",plain) * VALUE * BLOCK-SEQUENCE-START * BLOCK-ENTRY * SCALAR("item 1",plain) * BLOCK-ENTRY * SCALAR("item 2",plain) * BLOCK-END * KEY * SCALAR("a mapping",plain) * VALUE * BLOCK-MAPPING-START * KEY * SCALAR("key 1",plain) * VALUE * SCALAR("value 1",plain) * KEY * SCALAR("key 2",plain) * VALUE * SCALAR("value 2",plain) * BLOCK-END * BLOCK-END * STREAM-END * * YAML also permits non-indented sequences if they are included into a block * mapping. In this case, the token BLOCK-SEQUENCE-START is not produced: * * key: * - item 1 # BLOCK-SEQUENCE-START is NOT produced here. * - item 2 * * Tokens: * * STREAM-START(utf-8) * BLOCK-MAPPING-START * KEY * SCALAR("key",plain) * VALUE * BLOCK-ENTRY * SCALAR("item 1",plain) * BLOCK-ENTRY * SCALAR("item 2",plain) * BLOCK-END */ #include "yaml_private.h" /* * Ensure that the buffer contains the required number of characters. * Return 1 on success, 0 on failure (reader error or memory error). */ #define CACHE(parser,length) \ (parser->unread >= (length) \ ? 1 \ : yaml_parser_update_buffer(parser, (length))) /* * Advance the buffer pointer. */ #define SKIP(parser) \ (parser->mark.index ++, \ parser->mark.column ++, \ parser->unread --, \ parser->buffer.pointer += WIDTH(parser->buffer)) #define SKIP_LINE(parser) \ (IS_CRLF(parser->buffer) ? \ (parser->mark.index += 2, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread -= 2, \ parser->buffer.pointer += 2) : \ IS_BREAK(parser->buffer) ? \ (parser->mark.index ++, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread --, \ parser->buffer.pointer += WIDTH(parser->buffer)) : 0) /* * Copy a character to a string buffer and advance pointers. */ #define READ(parser,string) \ (STRING_EXTEND(parser,string) ? \ (COPY(string,parser->buffer), \ parser->mark.index ++, \ parser->mark.column ++, \ parser->unread --, \ 1) : 0) /* * Copy a line break character to a string buffer and advance pointers. */ #define READ_LINE(parser,string) \ (STRING_EXTEND(parser,string) ? \ (((CHECK_AT(parser->buffer,'\r',0) \ && CHECK_AT(parser->buffer,'\n',1)) ? /* CR LF -> LF */ \ (*((string).pointer++) = (yaml_char_t) '\n', \ parser->buffer.pointer += 2, \ parser->mark.index += 2, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread -= 2) : \ (CHECK_AT(parser->buffer,'\r',0) \ || CHECK_AT(parser->buffer,'\n',0)) ? /* CR|LF -> LF */ \ (*((string).pointer++) = (yaml_char_t) '\n', \ parser->buffer.pointer ++, \ parser->mark.index ++, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread --) : \ (CHECK_AT(parser->buffer,'\xC2',0) \ && CHECK_AT(parser->buffer,'\x85',1)) ? /* NEL -> LF */ \ (*((string).pointer++) = (yaml_char_t) '\n', \ parser->buffer.pointer += 2, \ parser->mark.index ++, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread --) : \ (CHECK_AT(parser->buffer,'\xE2',0) && \ CHECK_AT(parser->buffer,'\x80',1) && \ (CHECK_AT(parser->buffer,'\xA8',2) || \ CHECK_AT(parser->buffer,'\xA9',2))) ? /* LS|PS -> LS|PS */ \ (*((string).pointer++) = *(parser->buffer.pointer++), \ *((string).pointer++) = *(parser->buffer.pointer++), \ *((string).pointer++) = *(parser->buffer.pointer++), \ parser->mark.index ++, \ parser->mark.column = 0, \ parser->mark.line ++, \ parser->unread --) : 0), \ 1) : 0) /* * Public API declarations. */ YAML_DECLARE(int) yaml_parser_scan(yaml_parser_t *parser, yaml_token_t *token); /* * Error handling. */ static int yaml_parser_set_scanner_error(yaml_parser_t *parser, const char *context, yaml_mark_t context_mark, const char *problem); /* * High-level token API. */ YAML_DECLARE(int) yaml_parser_fetch_more_tokens(yaml_parser_t *parser); static int yaml_parser_fetch_next_token(yaml_parser_t *parser); /* * Potential simple keys. */ static int yaml_parser_stale_simple_keys(yaml_parser_t *parser); static int yaml_parser_save_simple_key(yaml_parser_t *parser); static int yaml_parser_remove_simple_key(yaml_parser_t *parser); static int yaml_parser_increase_flow_level(yaml_parser_t *parser); static int yaml_parser_decrease_flow_level(yaml_parser_t *parser); /* * Indentation treatment. */ static int yaml_parser_roll_indent(yaml_parser_t *parser, ptrdiff_t column, ptrdiff_t number, yaml_token_type_t type, yaml_mark_t mark); static int yaml_parser_unroll_indent(yaml_parser_t *parser, ptrdiff_t column); /* * Token fetchers. */ static int yaml_parser_fetch_stream_start(yaml_parser_t *parser); static int yaml_parser_fetch_stream_end(yaml_parser_t *parser); static int yaml_parser_fetch_directive(yaml_parser_t *parser); static int yaml_parser_fetch_document_indicator(yaml_parser_t *parser, yaml_token_type_t type); static int yaml_parser_fetch_flow_collection_start(yaml_parser_t *parser, yaml_token_type_t type); static int yaml_parser_fetch_flow_collection_end(yaml_parser_t *parser, yaml_token_type_t type); static int yaml_parser_fetch_flow_entry(yaml_parser_t *parser); static int yaml_parser_fetch_block_entry(yaml_parser_t *parser); static int yaml_parser_fetch_key(yaml_parser_t *parser); static int yaml_parser_fetch_value(yaml_parser_t *parser); static int yaml_parser_fetch_anchor(yaml_parser_t *parser, yaml_token_type_t type); static int yaml_parser_fetch_tag(yaml_parser_t *parser); static int yaml_parser_fetch_block_scalar(yaml_parser_t *parser, int literal); static int yaml_parser_fetch_flow_scalar(yaml_parser_t *parser, int single); static int yaml_parser_fetch_plain_scalar(yaml_parser_t *parser); /* * Token scanners. */ static int yaml_parser_scan_to_next_token(yaml_parser_t *parser); static int yaml_parser_scan_directive(yaml_parser_t *parser, yaml_token_t *token); static int yaml_parser_scan_directive_name(yaml_parser_t *parser, yaml_mark_t start_mark, yaml_char_t **name); static int yaml_parser_scan_version_directive_value(yaml_parser_t *parser, yaml_mark_t start_mark, int *major, int *minor); static int yaml_parser_scan_version_directive_number(yaml_parser_t *parser, yaml_mark_t start_mark, int *number); static int yaml_parser_scan_tag_directive_value(yaml_parser_t *parser, yaml_mark_t mark, yaml_char_t **handle, yaml_char_t **prefix); static int yaml_parser_scan_anchor(yaml_parser_t *parser, yaml_token_t *token, yaml_token_type_t type); static int yaml_parser_scan_tag(yaml_parser_t *parser, yaml_token_t *token); static int yaml_parser_scan_tag_handle(yaml_parser_t *parser, int directive, yaml_mark_t start_mark, yaml_char_t **handle); static int yaml_parser_scan_tag_uri(yaml_parser_t *parser, int directive, yaml_char_t *head, yaml_mark_t start_mark, yaml_char_t **uri); static int yaml_parser_scan_uri_escapes(yaml_parser_t *parser, int directive, yaml_mark_t start_mark, yaml_string_t *string); static int yaml_parser_scan_block_scalar(yaml_parser_t *parser, yaml_token_t *token, int literal); static int yaml_parser_scan_block_scalar_breaks(yaml_parser_t *parser, int *indent, yaml_string_t *breaks, yaml_mark_t start_mark, yaml_mark_t *end_mark); static int yaml_parser_scan_flow_scalar(yaml_parser_t *parser, yaml_token_t *token, int single); static int yaml_parser_scan_plain_scalar(yaml_parser_t *parser, yaml_token_t *token); /* * Get the next token. */ YAML_DECLARE(int) yaml_parser_scan(yaml_parser_t *parser, yaml_token_t *token) { assert(parser); /* Non-NULL parser object is expected. */ assert(token); /* Non-NULL token object is expected. */ /* Erase the token object. */ memset(token, 0, sizeof(yaml_token_t)); /* No tokens after STREAM-END or error. */ if (parser->stream_end_produced || parser->error) { return 1; } /* Ensure that the tokens queue contains enough tokens. */ if (!parser->token_available) { if (!yaml_parser_fetch_more_tokens(parser)) return 0; } /* Fetch the next token from the queue. */ *token = DEQUEUE(parser, parser->tokens); parser->token_available = 0; parser->tokens_parsed ++; if (token->type == YAML_STREAM_END_TOKEN) { parser->stream_end_produced = 1; } return 1; } /* * Set the scanner error and return 0. */ static int yaml_parser_set_scanner_error(yaml_parser_t *parser, const char *context, yaml_mark_t context_mark, const char *problem) { parser->error = YAML_SCANNER_ERROR; parser->context = context; parser->context_mark = context_mark; parser->problem = problem; parser->problem_mark = parser->mark; return 0; } /* * Ensure that the tokens queue contains at least one token which can be * returned to the Parser. */ YAML_DECLARE(int) yaml_parser_fetch_more_tokens(yaml_parser_t *parser) { int need_more_tokens; /* While we need more tokens to fetch, do it. */ while (1) { /* * Check if we really need to fetch more tokens. */ need_more_tokens = 0; if (parser->tokens.head == parser->tokens.tail) { /* Queue is empty. */ need_more_tokens = 1; } else { yaml_simple_key_t *simple_key; /* Check if any potential simple key may occupy the head position. */ if (!yaml_parser_stale_simple_keys(parser)) return 0; for (simple_key = parser->simple_keys.start; simple_key != parser->simple_keys.top; simple_key++) { if (simple_key->possible && simple_key->token_number == parser->tokens_parsed) { need_more_tokens = 1; break; } } } /* We are finished. */ if (!need_more_tokens) break; /* Fetch the next token. */ if (!yaml_parser_fetch_next_token(parser)) return 0; } parser->token_available = 1; return 1; } /* * The dispatcher for token fetchers. */ static int yaml_parser_fetch_next_token(yaml_parser_t *parser) { /* Ensure that the buffer is initialized. */ if (!CACHE(parser, 1)) return 0; /* Check if we just started scanning. Fetch STREAM-START then. */ if (!parser->stream_start_produced) return yaml_parser_fetch_stream_start(parser); /* Eat whitespaces and comments until we reach the next token. */ if (!yaml_parser_scan_to_next_token(parser)) return 0; /* Remove obsolete potential simple keys. */ if (!yaml_parser_stale_simple_keys(parser)) return 0; /* Check the indentation level against the current column. */ if (!yaml_parser_unroll_indent(parser, parser->mark.column)) return 0; /* * Ensure that the buffer contains at least 4 characters. 4 is the length * of the longest indicators ('--- ' and '... '). */ if (!CACHE(parser, 4)) return 0; /* Is it the end of the stream? */ if (IS_Z(parser->buffer)) return yaml_parser_fetch_stream_end(parser); /* Is it a directive? */ if (parser->mark.column == 0 && CHECK(parser->buffer, '%')) return yaml_parser_fetch_directive(parser); /* Is it the document start indicator? */ if (parser->mark.column == 0 && CHECK_AT(parser->buffer, '-', 0) && CHECK_AT(parser->buffer, '-', 1) && CHECK_AT(parser->buffer, '-', 2) && IS_BLANKZ_AT(parser->buffer, 3)) return yaml_parser_fetch_document_indicator(parser, YAML_DOCUMENT_START_TOKEN); /* Is it the document end indicator? */ if (parser->mark.column == 0 && CHECK_AT(parser->buffer, '.', 0) && CHECK_AT(parser->buffer, '.', 1) && CHECK_AT(parser->buffer, '.', 2) && IS_BLANKZ_AT(parser->buffer, 3)) return yaml_parser_fetch_document_indicator(parser, YAML_DOCUMENT_END_TOKEN); /* Is it the flow sequence start indicator? */ if (CHECK(parser->buffer, '[')) return yaml_parser_fetch_flow_collection_start(parser, YAML_FLOW_SEQUENCE_START_TOKEN); /* Is it the flow mapping start indicator? */ if (CHECK(parser->buffer, '{')) return yaml_parser_fetch_flow_collection_start(parser, YAML_FLOW_MAPPING_START_TOKEN); /* Is it the flow sequence end indicator? */ if (CHECK(parser->buffer, ']')) return yaml_parser_fetch_flow_collection_end(parser, YAML_FLOW_SEQUENCE_END_TOKEN); /* Is it the flow mapping end indicator? */ if (CHECK(parser->buffer, '}')) return yaml_parser_fetch_flow_collection_end(parser, YAML_FLOW_MAPPING_END_TOKEN); /* Is it the flow entry indicator? */ if (CHECK(parser->buffer, ',')) return yaml_parser_fetch_flow_entry(parser); /* Is it the block entry indicator? */ if (CHECK(parser->buffer, '-') && IS_BLANKZ_AT(parser->buffer, 1)) return yaml_parser_fetch_block_entry(parser); /* Is it the key indicator? */ if (CHECK(parser->buffer, '?') && (parser->flow_level || IS_BLANKZ_AT(parser->buffer, 1))) return yaml_parser_fetch_key(parser); /* Is it the value indicator? */ if (CHECK(parser->buffer, ':') && (parser->flow_level || IS_BLANKZ_AT(parser->buffer, 1))) return yaml_parser_fetch_value(parser); /* Is it an alias? */ if (CHECK(parser->buffer, '*')) return yaml_parser_fetch_anchor(parser, YAML_ALIAS_TOKEN); /* Is it an anchor? */ if (CHECK(parser->buffer, '&')) return yaml_parser_fetch_anchor(parser, YAML_ANCHOR_TOKEN); /* Is it a tag? */ if (CHECK(parser->buffer, '!')) return yaml_parser_fetch_tag(parser); /* Is it a literal scalar? */ if (CHECK(parser->buffer, '|') && !parser->flow_level) return yaml_parser_fetch_block_scalar(parser, 1); /* Is it a folded scalar? */ if (CHECK(parser->buffer, '>') && !parser->flow_level) return yaml_parser_fetch_block_scalar(parser, 0); /* Is it a single-quoted scalar? */ if (CHECK(parser->buffer, '\'')) return yaml_parser_fetch_flow_scalar(parser, 1); /* Is it a double-quoted scalar? */ if (CHECK(parser->buffer, '"')) return yaml_parser_fetch_flow_scalar(parser, 0); /* * Is it a plain scalar? * * A plain scalar may start with any non-blank characters except * * '-', '?', ':', ',', '[', ']', '{', '}', * '#', '&', '*', '!', '|', '>', '\'', '\"', * '%', '@', '`'. * * In the block context (and, for the '-' indicator, in the flow context * too), it may also start with the characters * * '-', '?', ':' * * if it is followed by a non-space character. * * The last rule is more restrictive than the specification requires. */ if (!(IS_BLANKZ(parser->buffer) || CHECK(parser->buffer, '-') || CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':') || CHECK(parser->buffer, ',') || CHECK(parser->buffer, '[') || CHECK(parser->buffer, ']') || CHECK(parser->buffer, '{') || CHECK(parser->buffer, '}') || CHECK(parser->buffer, '#') || CHECK(parser->buffer, '&') || CHECK(parser->buffer, '*') || CHECK(parser->buffer, '!') || CHECK(parser->buffer, '|') || CHECK(parser->buffer, '>') || CHECK(parser->buffer, '\'') || CHECK(parser->buffer, '"') || CHECK(parser->buffer, '%') || CHECK(parser->buffer, '@') || CHECK(parser->buffer, '`')) || (CHECK(parser->buffer, '-') && !IS_BLANK_AT(parser->buffer, 1)) || (!parser->flow_level && (CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':')) && !IS_BLANKZ_AT(parser->buffer, 1))) return yaml_parser_fetch_plain_scalar(parser); /* * If we don't determine the token type so far, it is an error. */ return yaml_parser_set_scanner_error(parser, "while scanning for the next token", parser->mark, "found character that cannot start any token"); } /* * Check the list of potential simple keys and remove the positions that * cannot contain simple keys anymore. */ static int yaml_parser_stale_simple_keys(yaml_parser_t *parser) { yaml_simple_key_t *simple_key; /* Check for a potential simple key for each flow level. */ for (simple_key = parser->simple_keys.start; simple_key != parser->simple_keys.top; simple_key ++) { /* * The specification requires that a simple key * * - is limited to a single line, * - is shorter than 1024 characters. */ if (simple_key->possible && (simple_key->mark.line < parser->mark.line || simple_key->mark.index+1024 < parser->mark.index)) { /* Check if the potential simple key to be removed is required. */ if (simple_key->required) { return yaml_parser_set_scanner_error(parser, "while scanning a simple key", simple_key->mark, "could not find expected ':'"); } simple_key->possible = 0; } } return 1; } /* * Check if a simple key may start at the current position and add it if * needed. */ static int yaml_parser_save_simple_key(yaml_parser_t *parser) { /* * A simple key is required at the current position if the scanner is in * the block context and the current column coincides with the indentation * level. */ int required = (!parser->flow_level && parser->indent == (ptrdiff_t)parser->mark.column); /* * A simple key is required only when it is the first token in the current * line. Therefore it is always allowed. But we add a check anyway. */ assert(parser->simple_key_allowed || !required); /* Impossible. */ /* * If the current position may start a simple key, save it. */ if (parser->simple_key_allowed) { yaml_simple_key_t simple_key; simple_key.possible = 1; simple_key.required = required; simple_key.token_number = parser->tokens_parsed + (parser->tokens.tail - parser->tokens.head); simple_key.mark = parser->mark; if (!yaml_parser_remove_simple_key(parser)) return 0; *(parser->simple_keys.top-1) = simple_key; } return 1; } /* * Remove a potential simple key at the current flow level. */ static int yaml_parser_remove_simple_key(yaml_parser_t *parser) { yaml_simple_key_t *simple_key = parser->simple_keys.top-1; if (simple_key->possible) { /* If the key is required, it is an error. */ if (simple_key->required) { return yaml_parser_set_scanner_error(parser, "while scanning a simple key", simple_key->mark, "could not find expected ':'"); } } /* Remove the key from the stack. */ simple_key->possible = 0; return 1; } /* * Increase the flow level and resize the simple key list if needed. */ static int yaml_parser_increase_flow_level(yaml_parser_t *parser) { yaml_simple_key_t empty_simple_key = { 0, 0, 0, { 0, 0, 0 } }; /* Reset the simple key on the next level. */ if (!PUSH(parser, parser->simple_keys, empty_simple_key)) return 0; /* Increase the flow level. */ if (parser->flow_level == INT_MAX) { parser->error = YAML_MEMORY_ERROR; return 0; } parser->flow_level++; return 1; } /* * Decrease the flow level. */ static int yaml_parser_decrease_flow_level(yaml_parser_t *parser) { if (parser->flow_level) { parser->flow_level --; (void)POP(parser, parser->simple_keys); } return 1; } /* * Push the current indentation level to the stack and set the new level * the current column is greater than the indentation level. In this case, * append or insert the specified token into the token queue. * */ static int yaml_parser_roll_indent(yaml_parser_t *parser, ptrdiff_t column, ptrdiff_t number, yaml_token_type_t type, yaml_mark_t mark) { yaml_token_t token; /* In the flow context, do nothing. */ if (parser->flow_level) return 1; if (parser->indent < column) { /* * Push the current indentation level to the stack and set the new * indentation level. */ if (!PUSH(parser, parser->indents, parser->indent)) return 0; #if PTRDIFF_MAX > INT_MAX if (column > INT_MAX) { parser->error = YAML_MEMORY_ERROR; return 0; } #endif parser->indent = (int)column; /* Create a token and insert it into the queue. */ TOKEN_INIT(token, type, mark, mark); if (number == -1) { if (!ENQUEUE(parser, parser->tokens, token)) return 0; } else { if (!QUEUE_INSERT(parser, parser->tokens, number - parser->tokens_parsed, token)) return 0; } } return 1; } /* * Pop indentation levels from the indents stack until the current level * becomes less or equal to the column. For each intendation level, append * the BLOCK-END token. */ static int yaml_parser_unroll_indent(yaml_parser_t *parser, ptrdiff_t column) { yaml_token_t token; /* In the flow context, do nothing. */ if (parser->flow_level) return 1; /* Loop through the intendation levels in the stack. */ while (parser->indent > column) { /* Create a token and append it to the queue. */ TOKEN_INIT(token, YAML_BLOCK_END_TOKEN, parser->mark, parser->mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; /* Pop the indentation level. */ parser->indent = POP(parser, parser->indents); } return 1; } /* * Initialize the scanner and produce the STREAM-START token. */ static int yaml_parser_fetch_stream_start(yaml_parser_t *parser) { yaml_simple_key_t simple_key = { 0, 0, 0, { 0, 0, 0 } }; yaml_token_t token; /* Set the initial indentation. */ parser->indent = -1; /* Initialize the simple key stack. */ if (!PUSH(parser, parser->simple_keys, simple_key)) return 0; /* A simple key is allowed at the beginning of the stream. */ parser->simple_key_allowed = 1; /* We have started. */ parser->stream_start_produced = 1; /* Create the STREAM-START token and append it to the queue. */ STREAM_START_TOKEN_INIT(token, parser->encoding, parser->mark, parser->mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the STREAM-END token and shut down the scanner. */ static int yaml_parser_fetch_stream_end(yaml_parser_t *parser) { yaml_token_t token; /* Force new line. */ if (parser->mark.column != 0) { parser->mark.column = 0; parser->mark.line ++; } /* Reset the indentation level. */ if (!yaml_parser_unroll_indent(parser, -1)) return 0; /* Reset simple keys. */ if (!yaml_parser_remove_simple_key(parser)) return 0; parser->simple_key_allowed = 0; /* Create the STREAM-END token and append it to the queue. */ STREAM_END_TOKEN_INIT(token, parser->mark, parser->mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce a VERSION-DIRECTIVE or TAG-DIRECTIVE token. */ static int yaml_parser_fetch_directive(yaml_parser_t *parser) { yaml_token_t token; /* Reset the indentation level. */ if (!yaml_parser_unroll_indent(parser, -1)) return 0; /* Reset simple keys. */ if (!yaml_parser_remove_simple_key(parser)) return 0; parser->simple_key_allowed = 0; /* Create the YAML-DIRECTIVE or TAG-DIRECTIVE token. */ if (!yaml_parser_scan_directive(parser, &token)) return 0; /* Append the token to the queue. */ if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Produce the DOCUMENT-START or DOCUMENT-END token. */ static int yaml_parser_fetch_document_indicator(yaml_parser_t *parser, yaml_token_type_t type) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* Reset the indentation level. */ if (!yaml_parser_unroll_indent(parser, -1)) return 0; /* Reset simple keys. */ if (!yaml_parser_remove_simple_key(parser)) return 0; parser->simple_key_allowed = 0; /* Consume the token. */ start_mark = parser->mark; SKIP(parser); SKIP(parser); SKIP(parser); end_mark = parser->mark; /* Create the DOCUMENT-START or DOCUMENT-END token. */ TOKEN_INIT(token, type, start_mark, end_mark); /* Append the token to the queue. */ if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the FLOW-SEQUENCE-START or FLOW-MAPPING-START token. */ static int yaml_parser_fetch_flow_collection_start(yaml_parser_t *parser, yaml_token_type_t type) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* The indicators '[' and '{' may start a simple key. */ if (!yaml_parser_save_simple_key(parser)) return 0; /* Increase the flow level. */ if (!yaml_parser_increase_flow_level(parser)) return 0; /* A simple key may follow the indicators '[' and '{'. */ parser->simple_key_allowed = 1; /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the FLOW-SEQUENCE-START of FLOW-MAPPING-START token. */ TOKEN_INIT(token, type, start_mark, end_mark); /* Append the token to the queue. */ if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the FLOW-SEQUENCE-END or FLOW-MAPPING-END token. */ static int yaml_parser_fetch_flow_collection_end(yaml_parser_t *parser, yaml_token_type_t type) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* Reset any potential simple key on the current flow level. */ if (!yaml_parser_remove_simple_key(parser)) return 0; /* Decrease the flow level. */ if (!yaml_parser_decrease_flow_level(parser)) return 0; /* No simple keys after the indicators ']' and '}'. */ parser->simple_key_allowed = 0; /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the FLOW-SEQUENCE-END of FLOW-MAPPING-END token. */ TOKEN_INIT(token, type, start_mark, end_mark); /* Append the token to the queue. */ if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the FLOW-ENTRY token. */ static int yaml_parser_fetch_flow_entry(yaml_parser_t *parser) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* Reset any potential simple keys on the current flow level. */ if (!yaml_parser_remove_simple_key(parser)) return 0; /* Simple keys are allowed after ','. */ parser->simple_key_allowed = 1; /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the FLOW-ENTRY token and append it to the queue. */ TOKEN_INIT(token, YAML_FLOW_ENTRY_TOKEN, start_mark, end_mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the BLOCK-ENTRY token. */ static int yaml_parser_fetch_block_entry(yaml_parser_t *parser) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* Check if the scanner is in the block context. */ if (!parser->flow_level) { /* Check if we are allowed to start a new entry. */ if (!parser->simple_key_allowed) { return yaml_parser_set_scanner_error(parser, NULL, parser->mark, "block sequence entries are not allowed in this context"); } /* Add the BLOCK-SEQUENCE-START token if needed. */ if (!yaml_parser_roll_indent(parser, parser->mark.column, -1, YAML_BLOCK_SEQUENCE_START_TOKEN, parser->mark)) return 0; } else { /* * It is an error for the '-' indicator to occur in the flow context, * but we let the Parser detect and report about it because the Parser * is able to point to the context. */ } /* Reset any potential simple keys on the current flow level. */ if (!yaml_parser_remove_simple_key(parser)) return 0; /* Simple keys are allowed after '-'. */ parser->simple_key_allowed = 1; /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the BLOCK-ENTRY token and append it to the queue. */ TOKEN_INIT(token, YAML_BLOCK_ENTRY_TOKEN, start_mark, end_mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the KEY token. */ static int yaml_parser_fetch_key(yaml_parser_t *parser) { yaml_mark_t start_mark, end_mark; yaml_token_t token; /* In the block context, additional checks are required. */ if (!parser->flow_level) { /* Check if we are allowed to start a new key (not nessesary simple). */ if (!parser->simple_key_allowed) { return yaml_parser_set_scanner_error(parser, NULL, parser->mark, "mapping keys are not allowed in this context"); } /* Add the BLOCK-MAPPING-START token if needed. */ if (!yaml_parser_roll_indent(parser, parser->mark.column, -1, YAML_BLOCK_MAPPING_START_TOKEN, parser->mark)) return 0; } /* Reset any potential simple keys on the current flow level. */ if (!yaml_parser_remove_simple_key(parser)) return 0; /* Simple keys are allowed after '?' in the block context. */ parser->simple_key_allowed = (!parser->flow_level); /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the KEY token and append it to the queue. */ TOKEN_INIT(token, YAML_KEY_TOKEN, start_mark, end_mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the VALUE token. */ static int yaml_parser_fetch_value(yaml_parser_t *parser) { yaml_mark_t start_mark, end_mark; yaml_token_t token; yaml_simple_key_t *simple_key = parser->simple_keys.top-1; /* Have we found a simple key? */ if (simple_key->possible) { /* Create the KEY token and insert it into the queue. */ TOKEN_INIT(token, YAML_KEY_TOKEN, simple_key->mark, simple_key->mark); if (!QUEUE_INSERT(parser, parser->tokens, simple_key->token_number - parser->tokens_parsed, token)) return 0; /* In the block context, we may need to add the BLOCK-MAPPING-START token. */ if (!yaml_parser_roll_indent(parser, simple_key->mark.column, simple_key->token_number, YAML_BLOCK_MAPPING_START_TOKEN, simple_key->mark)) return 0; /* Remove the simple key. */ simple_key->possible = 0; /* A simple key cannot follow another simple key. */ parser->simple_key_allowed = 0; } else { /* The ':' indicator follows a complex key. */ /* In the block context, extra checks are required. */ if (!parser->flow_level) { /* Check if we are allowed to start a complex value. */ if (!parser->simple_key_allowed) { return yaml_parser_set_scanner_error(parser, NULL, parser->mark, "mapping values are not allowed in this context"); } /* Add the BLOCK-MAPPING-START token if needed. */ if (!yaml_parser_roll_indent(parser, parser->mark.column, -1, YAML_BLOCK_MAPPING_START_TOKEN, parser->mark)) return 0; } /* Simple keys after ':' are allowed in the block context. */ parser->simple_key_allowed = (!parser->flow_level); } /* Consume the token. */ start_mark = parser->mark; SKIP(parser); end_mark = parser->mark; /* Create the VALUE token and append it to the queue. */ TOKEN_INIT(token, YAML_VALUE_TOKEN, start_mark, end_mark); if (!ENQUEUE(parser, parser->tokens, token)) return 0; return 1; } /* * Produce the ALIAS or ANCHOR token. */ static int yaml_parser_fetch_anchor(yaml_parser_t *parser, yaml_token_type_t type) { yaml_token_t token; /* An anchor or an alias could be a simple key. */ if (!yaml_parser_save_simple_key(parser)) return 0; /* A simple key cannot follow an anchor or an alias. */ parser->simple_key_allowed = 0; /* Create the ALIAS or ANCHOR token and append it to the queue. */ if (!yaml_parser_scan_anchor(parser, &token, type)) return 0; if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Produce the TAG token. */ static int yaml_parser_fetch_tag(yaml_parser_t *parser) { yaml_token_t token; /* A tag could be a simple key. */ if (!yaml_parser_save_simple_key(parser)) return 0; /* A simple key cannot follow a tag. */ parser->simple_key_allowed = 0; /* Create the TAG token and append it to the queue. */ if (!yaml_parser_scan_tag(parser, &token)) return 0; if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Produce the SCALAR(...,literal) or SCALAR(...,folded) tokens. */ static int yaml_parser_fetch_block_scalar(yaml_parser_t *parser, int literal) { yaml_token_t token; /* Remove any potential simple keys. */ if (!yaml_parser_remove_simple_key(parser)) return 0; /* A simple key may follow a block scalar. */ parser->simple_key_allowed = 1; /* Create the SCALAR token and append it to the queue. */ if (!yaml_parser_scan_block_scalar(parser, &token, literal)) return 0; if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Produce the SCALAR(...,single-quoted) or SCALAR(...,double-quoted) tokens. */ static int yaml_parser_fetch_flow_scalar(yaml_parser_t *parser, int single) { yaml_token_t token; /* A plain scalar could be a simple key. */ if (!yaml_parser_save_simple_key(parser)) return 0; /* A simple key cannot follow a flow scalar. */ parser->simple_key_allowed = 0; /* Create the SCALAR token and append it to the queue. */ if (!yaml_parser_scan_flow_scalar(parser, &token, single)) return 0; if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Produce the SCALAR(...,plain) token. */ static int yaml_parser_fetch_plain_scalar(yaml_parser_t *parser) { yaml_token_t token; /* A plain scalar could be a simple key. */ if (!yaml_parser_save_simple_key(parser)) return 0; /* A simple key cannot follow a flow scalar. */ parser->simple_key_allowed = 0; /* Create the SCALAR token and append it to the queue. */ if (!yaml_parser_scan_plain_scalar(parser, &token)) return 0; if (!ENQUEUE(parser, parser->tokens, token)) { yaml_token_delete(&token); return 0; } return 1; } /* * Eat whitespaces and comments until the next token is found. */ static int yaml_parser_scan_to_next_token(yaml_parser_t *parser) { /* Until the next token is not found. */ while (1) { /* Allow the BOM mark to start a line. */ if (!CACHE(parser, 1)) return 0; if (parser->mark.column == 0 && IS_BOM(parser->buffer)) SKIP(parser); /* * Eat whitespaces. * * Tabs are allowed: * * - in the flow context; * - in the block context, but not at the beginning of the line or * after '-', '?', or ':' (complex value). */ if (!CACHE(parser, 1)) return 0; while (CHECK(parser->buffer,' ') || ((parser->flow_level || !parser->simple_key_allowed) && CHECK(parser->buffer, '\t'))) { SKIP(parser); if (!CACHE(parser, 1)) return 0; } /* Eat a comment until a line break. */ if (CHECK(parser->buffer, '#')) { while (!IS_BREAKZ(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) return 0; } } /* If it is a line break, eat it. */ if (IS_BREAK(parser->buffer)) { if (!CACHE(parser, 2)) return 0; SKIP_LINE(parser); /* In the block context, a new line may start a simple key. */ if (!parser->flow_level) { parser->simple_key_allowed = 1; } } else { /* We have found a token. */ break; } } return 1; } /* * Scan a YAML-DIRECTIVE or TAG-DIRECTIVE token. * * Scope: * %YAML 1.1 # a comment \n * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * %TAG !yaml! tag:yaml.org,2002: \n * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */ int yaml_parser_scan_directive(yaml_parser_t *parser, yaml_token_t *token) { yaml_mark_t start_mark, end_mark; yaml_char_t *name = NULL; int major, minor; yaml_char_t *handle = NULL, *prefix = NULL; /* Eat '%'. */ start_mark = parser->mark; SKIP(parser); /* Scan the directive name. */ if (!yaml_parser_scan_directive_name(parser, start_mark, &name)) goto error; /* Is it a YAML directive? */ if (strcmp((char *)name, "YAML") == 0) { /* Scan the VERSION directive value. */ if (!yaml_parser_scan_version_directive_value(parser, start_mark, &major, &minor)) goto error; end_mark = parser->mark; /* Create a VERSION-DIRECTIVE token. */ VERSION_DIRECTIVE_TOKEN_INIT(*token, major, minor, start_mark, end_mark); } /* Is it a TAG directive? */ else if (strcmp((char *)name, "TAG") == 0) { /* Scan the TAG directive value. */ if (!yaml_parser_scan_tag_directive_value(parser, start_mark, &handle, &prefix)) goto error; end_mark = parser->mark; /* Create a TAG-DIRECTIVE token. */ TAG_DIRECTIVE_TOKEN_INIT(*token, handle, prefix, start_mark, end_mark); } /* Unknown directive. */ else { yaml_parser_set_scanner_error(parser, "while scanning a directive", start_mark, "found uknown directive name"); goto error; } /* Eat the rest of the line including any comments. */ if (!CACHE(parser, 1)) goto error; while (IS_BLANK(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } if (CHECK(parser->buffer, '#')) { while (!IS_BREAKZ(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } } /* Check if we are at the end of the line. */ if (!IS_BREAKZ(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a directive", start_mark, "did not find expected comment or line break"); goto error; } /* Eat a line break. */ if (IS_BREAK(parser->buffer)) { if (!CACHE(parser, 2)) goto error; SKIP_LINE(parser); } yaml_free(name); return 1; error: yaml_free(prefix); yaml_free(handle); yaml_free(name); return 0; } /* * Scan the directive name. * * Scope: * %YAML 1.1 # a comment \n * ^^^^ * %TAG !yaml! tag:yaml.org,2002: \n * ^^^ */ static int yaml_parser_scan_directive_name(yaml_parser_t *parser, yaml_mark_t start_mark, yaml_char_t **name) { yaml_string_t string = NULL_STRING; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; /* Consume the directive name. */ if (!CACHE(parser, 1)) goto error; while (IS_ALPHA(parser->buffer)) { if (!READ(parser, string)) goto error; if (!CACHE(parser, 1)) goto error; } /* Check if the name is empty. */ if (string.start == string.pointer) { yaml_parser_set_scanner_error(parser, "while scanning a directive", start_mark, "could not find expected directive name"); goto error; } /* Check for an blank character after the name. */ if (!IS_BLANKZ(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a directive", start_mark, "found unexpected non-alphabetical character"); goto error; } *name = string.start; return 1; error: STRING_DEL(parser, string); return 0; } /* * Scan the value of VERSION-DIRECTIVE. * * Scope: * %YAML 1.1 # a comment \n * ^^^^^^ */ static int yaml_parser_scan_version_directive_value(yaml_parser_t *parser, yaml_mark_t start_mark, int *major, int *minor) { /* Eat whitespaces. */ if (!CACHE(parser, 1)) return 0; while (IS_BLANK(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) return 0; } /* Consume the major version number. */ if (!yaml_parser_scan_version_directive_number(parser, start_mark, major)) return 0; /* Eat '.'. */ if (!CHECK(parser->buffer, '.')) { return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive", start_mark, "did not find expected digit or '.' character"); } SKIP(parser); /* Consume the minor version number. */ if (!yaml_parser_scan_version_directive_number(parser, start_mark, minor)) return 0; return 1; } #define MAX_NUMBER_LENGTH 9 /* * Scan the version number of VERSION-DIRECTIVE. * * Scope: * %YAML 1.1 # a comment \n * ^ * %YAML 1.1 # a comment \n * ^ */ static int yaml_parser_scan_version_directive_number(yaml_parser_t *parser, yaml_mark_t start_mark, int *number) { int value = 0; size_t length = 0; /* Repeat while the next character is digit. */ if (!CACHE(parser, 1)) return 0; while (IS_DIGIT(parser->buffer)) { /* Check if the number is too long. */ if (++length > MAX_NUMBER_LENGTH) { return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive", start_mark, "found extremely long version number"); } value = value*10 + AS_DIGIT(parser->buffer); SKIP(parser); if (!CACHE(parser, 1)) return 0; } /* Check if the number was present. */ if (!length) { return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive", start_mark, "did not find expected version number"); } *number = value; return 1; } /* * Scan the value of a TAG-DIRECTIVE token. * * Scope: * %TAG !yaml! tag:yaml.org,2002: \n * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */ static int yaml_parser_scan_tag_directive_value(yaml_parser_t *parser, yaml_mark_t start_mark, yaml_char_t **handle, yaml_char_t **prefix) { yaml_char_t *handle_value = NULL; yaml_char_t *prefix_value = NULL; /* Eat whitespaces. */ if (!CACHE(parser, 1)) goto error; while (IS_BLANK(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } /* Scan a handle. */ if (!yaml_parser_scan_tag_handle(parser, 1, start_mark, &handle_value)) goto error; /* Expect a whitespace. */ if (!CACHE(parser, 1)) goto error; if (!IS_BLANK(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a %TAG directive", start_mark, "did not find expected whitespace"); goto error; } /* Eat whitespaces. */ while (IS_BLANK(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } /* Scan a prefix. */ if (!yaml_parser_scan_tag_uri(parser, 1, NULL, start_mark, &prefix_value)) goto error; /* Expect a whitespace or line break. */ if (!CACHE(parser, 1)) goto error; if (!IS_BLANKZ(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a %TAG directive", start_mark, "did not find expected whitespace or line break"); goto error; } *handle = handle_value; *prefix = prefix_value; return 1; error: yaml_free(handle_value); yaml_free(prefix_value); return 0; } static int yaml_parser_scan_anchor(yaml_parser_t *parser, yaml_token_t *token, yaml_token_type_t type) { int length = 0; yaml_mark_t start_mark, end_mark; yaml_string_t string = NULL_STRING; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; /* Eat the indicator character. */ start_mark = parser->mark; SKIP(parser); /* Consume the value. */ if (!CACHE(parser, 1)) goto error; while (IS_ALPHA(parser->buffer)) { if (!READ(parser, string)) goto error; if (!CACHE(parser, 1)) goto error; length ++; } end_mark = parser->mark; /* * Check if length of the anchor is greater than 0 and it is followed by * a whitespace character or one of the indicators: * * '?', ':', ',', ']', '}', '%', '@', '`'. */ if (!length || !(IS_BLANKZ(parser->buffer) || CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':') || CHECK(parser->buffer, ',') || CHECK(parser->buffer, ']') || CHECK(parser->buffer, '}') || CHECK(parser->buffer, '%') || CHECK(parser->buffer, '@') || CHECK(parser->buffer, '`'))) { yaml_parser_set_scanner_error(parser, type == YAML_ANCHOR_TOKEN ? "while scanning an anchor" : "while scanning an alias", start_mark, "did not find expected alphabetic or numeric character"); goto error; } /* Create a token. */ if (type == YAML_ANCHOR_TOKEN) { ANCHOR_TOKEN_INIT(*token, string.start, start_mark, end_mark); } else { ALIAS_TOKEN_INIT(*token, string.start, start_mark, end_mark); } return 1; error: STRING_DEL(parser, string); return 0; } /* * Scan a TAG token. */ static int yaml_parser_scan_tag(yaml_parser_t *parser, yaml_token_t *token) { yaml_char_t *handle = NULL; yaml_char_t *suffix = NULL; yaml_mark_t start_mark, end_mark; start_mark = parser->mark; /* Check if the tag is in the canonical form. */ if (!CACHE(parser, 2)) goto error; if (CHECK_AT(parser->buffer, '<', 1)) { /* Set the handle to '' */ handle = yaml_malloc(1); if (!handle) goto error; handle[0] = '\0'; /* Eat '!<' */ SKIP(parser); SKIP(parser); /* Consume the tag value. */ if (!yaml_parser_scan_tag_uri(parser, 0, NULL, start_mark, &suffix)) goto error; /* Check for '>' and eat it. */ if (!CHECK(parser->buffer, '>')) { yaml_parser_set_scanner_error(parser, "while scanning a tag", start_mark, "did not find the expected '>'"); goto error; } SKIP(parser); } else { /* The tag has either the '!suffix' or the '!handle!suffix' form. */ /* First, try to scan a handle. */ if (!yaml_parser_scan_tag_handle(parser, 0, start_mark, &handle)) goto error; /* Check if it is, indeed, handle. */ if (handle[0] == '!' && handle[1] != '\0' && handle[strlen((char *)handle)-1] == '!') { /* Scan the suffix now. */ if (!yaml_parser_scan_tag_uri(parser, 0, NULL, start_mark, &suffix)) goto error; } else { /* It wasn't a handle after all. Scan the rest of the tag. */ if (!yaml_parser_scan_tag_uri(parser, 0, handle, start_mark, &suffix)) goto error; /* Set the handle to '!'. */ yaml_free(handle); handle = yaml_malloc(2); if (!handle) goto error; handle[0] = '!'; handle[1] = '\0'; /* * A special case: the '!' tag. Set the handle to '' and the * suffix to '!'. */ if (suffix[0] == '\0') { yaml_char_t *tmp = handle; handle = suffix; suffix = tmp; } } } /* Check the character which ends the tag. */ if (!CACHE(parser, 1)) goto error; if (!IS_BLANKZ(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a tag", start_mark, "did not find expected whitespace or line break"); goto error; } end_mark = parser->mark; /* Create a token. */ TAG_TOKEN_INIT(*token, handle, suffix, start_mark, end_mark); return 1; error: yaml_free(handle); yaml_free(suffix); return 0; } /* * Scan a tag handle. */ static int yaml_parser_scan_tag_handle(yaml_parser_t *parser, int directive, yaml_mark_t start_mark, yaml_char_t **handle) { yaml_string_t string = NULL_STRING; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; /* Check the initial '!' character. */ if (!CACHE(parser, 1)) goto error; if (!CHECK(parser->buffer, '!')) { yaml_parser_set_scanner_error(parser, directive ? "while scanning a tag directive" : "while scanning a tag", start_mark, "did not find expected '!'"); goto error; } /* Copy the '!' character. */ if (!READ(parser, string)) goto error; /* Copy all subsequent alphabetical and numerical characters. */ if (!CACHE(parser, 1)) goto error; while (IS_ALPHA(parser->buffer)) { if (!READ(parser, string)) goto error; if (!CACHE(parser, 1)) goto error; } /* Check if the trailing character is '!' and copy it. */ if (CHECK(parser->buffer, '!')) { if (!READ(parser, string)) goto error; } else { /* * It's either the '!' tag or not really a tag handle. If it's a %TAG * directive, it's an error. If it's a tag token, it must be a part of * URI. */ if (directive && !(string.start[0] == '!' && string.start[1] == '\0')) { yaml_parser_set_scanner_error(parser, "while parsing a tag directive", start_mark, "did not find expected '!'"); goto error; } } *handle = string.start; return 1; error: STRING_DEL(parser, string); return 0; } /* * Scan a tag. */ static int yaml_parser_scan_tag_uri(yaml_parser_t *parser, int directive, yaml_char_t *head, yaml_mark_t start_mark, yaml_char_t **uri) { size_t length = head ? strlen((char *)head) : 0; yaml_string_t string = NULL_STRING; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; /* Resize the string to include the head. */ while ((size_t)(string.end - string.start) <= length) { if (!yaml_string_extend(&string.start, &string.pointer, &string.end)) { parser->error = YAML_MEMORY_ERROR; goto error; } } /* * Copy the head if needed. * * Note that we don't copy the leading '!' character. */ if (length > 1) { memcpy(string.start, head+1, length-1); string.pointer += length-1; } /* Scan the tag. */ if (!CACHE(parser, 1)) goto error; /* * The set of characters that may appear in URI is as follows: * * '0'-'9', 'A'-'Z', 'a'-'z', '_', '-', ';', '/', '?', ':', '@', '&', * '=', '+', '$', ',', '.', '!', '~', '*', '\'', '(', ')', '[', ']', * '%'. */ while (IS_ALPHA(parser->buffer) || CHECK(parser->buffer, ';') || CHECK(parser->buffer, '/') || CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':') || CHECK(parser->buffer, '@') || CHECK(parser->buffer, '&') || CHECK(parser->buffer, '=') || CHECK(parser->buffer, '+') || CHECK(parser->buffer, '$') || CHECK(parser->buffer, ',') || CHECK(parser->buffer, '.') || CHECK(parser->buffer, '!') || CHECK(parser->buffer, '~') || CHECK(parser->buffer, '*') || CHECK(parser->buffer, '\'') || CHECK(parser->buffer, '(') || CHECK(parser->buffer, ')') || CHECK(parser->buffer, '[') || CHECK(parser->buffer, ']') || CHECK(parser->buffer, '%')) { /* Check if it is a URI-escape sequence. */ if (CHECK(parser->buffer, '%')) { if (!STRING_EXTEND(parser, string)) goto error; if (!yaml_parser_scan_uri_escapes(parser, directive, start_mark, &string)) goto error; } else { if (!READ(parser, string)) goto error; } length ++; if (!CACHE(parser, 1)) goto error; } /* Check if the tag is non-empty. */ if (!length) { if (!STRING_EXTEND(parser, string)) goto error; yaml_parser_set_scanner_error(parser, directive ? "while parsing a %TAG directive" : "while parsing a tag", start_mark, "did not find expected tag URI"); goto error; } *uri = string.start; return 1; error: STRING_DEL(parser, string); return 0; } /* * Decode an URI-escape sequence corresponding to a single UTF-8 character. */ static int yaml_parser_scan_uri_escapes(yaml_parser_t *parser, int directive, yaml_mark_t start_mark, yaml_string_t *string) { int width = 0; /* Decode the required number of characters. */ do { unsigned char octet = 0; /* Check for a URI-escaped octet. */ if (!CACHE(parser, 3)) return 0; if (!(CHECK(parser->buffer, '%') && IS_HEX_AT(parser->buffer, 1) && IS_HEX_AT(parser->buffer, 2))) { return yaml_parser_set_scanner_error(parser, directive ? "while parsing a %TAG directive" : "while parsing a tag", start_mark, "did not find URI escaped octet"); } /* Get the octet. */ octet = (AS_HEX_AT(parser->buffer, 1) << 4) + AS_HEX_AT(parser->buffer, 2); /* If it is the leading octet, determine the length of the UTF-8 sequence. */ if (!width) { width = (octet & 0x80) == 0x00 ? 1 : (octet & 0xE0) == 0xC0 ? 2 : (octet & 0xF0) == 0xE0 ? 3 : (octet & 0xF8) == 0xF0 ? 4 : 0; if (!width) { return yaml_parser_set_scanner_error(parser, directive ? "while parsing a %TAG directive" : "while parsing a tag", start_mark, "found an incorrect leading UTF-8 octet"); } } else { /* Check if the trailing octet is correct. */ if ((octet & 0xC0) != 0x80) { return yaml_parser_set_scanner_error(parser, directive ? "while parsing a %TAG directive" : "while parsing a tag", start_mark, "found an incorrect trailing UTF-8 octet"); } } /* Copy the octet and move the pointers. */ *(string->pointer++) = octet; SKIP(parser); SKIP(parser); SKIP(parser); } while (--width); return 1; } /* * Scan a block scalar. */ static int yaml_parser_scan_block_scalar(yaml_parser_t *parser, yaml_token_t *token, int literal) { yaml_mark_t start_mark; yaml_mark_t end_mark; yaml_string_t string = NULL_STRING; yaml_string_t leading_break = NULL_STRING; yaml_string_t trailing_breaks = NULL_STRING; int chomping = 0; int increment = 0; int indent = 0; int leading_blank = 0; int trailing_blank = 0; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error; /* Eat the indicator '|' or '>'. */ start_mark = parser->mark; SKIP(parser); /* Scan the additional block scalar indicators. */ if (!CACHE(parser, 1)) goto error; /* Check for a chomping indicator. */ if (CHECK(parser->buffer, '+') || CHECK(parser->buffer, '-')) { /* Set the chomping method and eat the indicator. */ chomping = CHECK(parser->buffer, '+') ? +1 : -1; SKIP(parser); /* Check for an indentation indicator. */ if (!CACHE(parser, 1)) goto error; if (IS_DIGIT(parser->buffer)) { /* Check that the intendation is greater than 0. */ if (CHECK(parser->buffer, '0')) { yaml_parser_set_scanner_error(parser, "while scanning a block scalar", start_mark, "found an intendation indicator equal to 0"); goto error; } /* Get the intendation level and eat the indicator. */ increment = AS_DIGIT(parser->buffer); SKIP(parser); } } /* Do the same as above, but in the opposite order. */ else if (IS_DIGIT(parser->buffer)) { if (CHECK(parser->buffer, '0')) { yaml_parser_set_scanner_error(parser, "while scanning a block scalar", start_mark, "found an intendation indicator equal to 0"); goto error; } increment = AS_DIGIT(parser->buffer); SKIP(parser); if (!CACHE(parser, 1)) goto error; if (CHECK(parser->buffer, '+') || CHECK(parser->buffer, '-')) { chomping = CHECK(parser->buffer, '+') ? +1 : -1; SKIP(parser); } } /* Eat whitespaces and comments to the end of the line. */ if (!CACHE(parser, 1)) goto error; while (IS_BLANK(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } if (CHECK(parser->buffer, '#')) { while (!IS_BREAKZ(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) goto error; } } /* Check if we are at the end of the line. */ if (!IS_BREAKZ(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a block scalar", start_mark, "did not find expected comment or line break"); goto error; } /* Eat a line break. */ if (IS_BREAK(parser->buffer)) { if (!CACHE(parser, 2)) goto error; SKIP_LINE(parser); } end_mark = parser->mark; /* Set the intendation level if it was specified. */ if (increment) { indent = parser->indent >= 0 ? parser->indent+increment : increment; } /* Scan the leading line breaks and determine the indentation level if needed. */ if (!yaml_parser_scan_block_scalar_breaks(parser, &indent, &trailing_breaks, start_mark, &end_mark)) goto error; /* Scan the block scalar content. */ if (!CACHE(parser, 1)) goto error; while ((int)parser->mark.column == indent && !IS_Z(parser->buffer)) { /* * We are at the beginning of a non-empty line. */ /* Is it a trailing whitespace? */ trailing_blank = IS_BLANK(parser->buffer); /* Check if we need to fold the leading line break. */ if (!literal && (*leading_break.start == '\n') && !leading_blank && !trailing_blank) { /* Do we need to join the lines by space? */ if (*trailing_breaks.start == '\0') { if (!STRING_EXTEND(parser, string)) goto error; *(string.pointer ++) = ' '; } CLEAR(parser, leading_break); } else { if (!JOIN(parser, string, leading_break)) goto error; CLEAR(parser, leading_break); } /* Append the remaining line breaks. */ if (!JOIN(parser, string, trailing_breaks)) goto error; CLEAR(parser, trailing_breaks); /* Is it a leading whitespace? */ leading_blank = IS_BLANK(parser->buffer); /* Consume the current line. */ while (!IS_BREAKZ(parser->buffer)) { if (!READ(parser, string)) goto error; if (!CACHE(parser, 1)) goto error; } /* Consume the line break. */ if (!CACHE(parser, 2)) goto error; if (!READ_LINE(parser, leading_break)) goto error; /* Eat the following intendation spaces and line breaks. */ if (!yaml_parser_scan_block_scalar_breaks(parser, &indent, &trailing_breaks, start_mark, &end_mark)) goto error; } /* Chomp the tail. */ if (chomping != -1) { if (!JOIN(parser, string, leading_break)) goto error; } if (chomping == 1) { if (!JOIN(parser, string, trailing_breaks)) goto error; } /* Create a token. */ SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start, literal ? YAML_LITERAL_SCALAR_STYLE : YAML_FOLDED_SCALAR_STYLE, start_mark, end_mark); STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); return 1; error: STRING_DEL(parser, string); STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); return 0; } /* * Scan intendation spaces and line breaks for a block scalar. Determine the * intendation level if needed. */ static int yaml_parser_scan_block_scalar_breaks(yaml_parser_t *parser, int *indent, yaml_string_t *breaks, yaml_mark_t start_mark, yaml_mark_t *end_mark) { int max_indent = 0; *end_mark = parser->mark; /* Eat the intendation spaces and line breaks. */ while (1) { /* Eat the intendation spaces. */ if (!CACHE(parser, 1)) return 0; while ((!*indent || (int)parser->mark.column < *indent) && IS_SPACE(parser->buffer)) { SKIP(parser); if (!CACHE(parser, 1)) return 0; } if ((int)parser->mark.column > max_indent) max_indent = (int)parser->mark.column; /* Check for a tab character messing the intendation. */ if ((!*indent || (int)parser->mark.column < *indent) && IS_TAB(parser->buffer)) { return yaml_parser_set_scanner_error(parser, "while scanning a block scalar", start_mark, "found a tab character where an intendation space is expected"); } /* Have we found a non-empty line? */ if (!IS_BREAK(parser->buffer)) break; /* Consume the line break. */ if (!CACHE(parser, 2)) return 0; if (!READ_LINE(parser, *breaks)) return 0; *end_mark = parser->mark; } /* Determine the indentation level if needed. */ if (!*indent) { *indent = max_indent; if (*indent < parser->indent + 1) *indent = parser->indent + 1; if (*indent < 1) *indent = 1; } return 1; } /* * Scan a quoted scalar. */ static int yaml_parser_scan_flow_scalar(yaml_parser_t *parser, yaml_token_t *token, int single) { yaml_mark_t start_mark; yaml_mark_t end_mark; yaml_string_t string = NULL_STRING; yaml_string_t leading_break = NULL_STRING; yaml_string_t trailing_breaks = NULL_STRING; yaml_string_t whitespaces = NULL_STRING; int leading_blanks; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, whitespaces, INITIAL_STRING_SIZE)) goto error; /* Eat the left quote. */ start_mark = parser->mark; SKIP(parser); /* Consume the content of the quoted scalar. */ while (1) { /* Check that there are no document indicators at the beginning of the line. */ if (!CACHE(parser, 4)) goto error; if (parser->mark.column == 0 && ((CHECK_AT(parser->buffer, '-', 0) && CHECK_AT(parser->buffer, '-', 1) && CHECK_AT(parser->buffer, '-', 2)) || (CHECK_AT(parser->buffer, '.', 0) && CHECK_AT(parser->buffer, '.', 1) && CHECK_AT(parser->buffer, '.', 2))) && IS_BLANKZ_AT(parser->buffer, 3)) { yaml_parser_set_scanner_error(parser, "while scanning a quoted scalar", start_mark, "found unexpected document indicator"); goto error; } /* Check for EOF. */ if (IS_Z(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a quoted scalar", start_mark, "found unexpected end of stream"); goto error; } /* Consume non-blank characters. */ if (!CACHE(parser, 2)) goto error; leading_blanks = 0; while (!IS_BLANKZ(parser->buffer)) { /* Check for an escaped single quote. */ if (single && CHECK_AT(parser->buffer, '\'', 0) && CHECK_AT(parser->buffer, '\'', 1)) { if (!STRING_EXTEND(parser, string)) goto error; *(string.pointer++) = '\''; SKIP(parser); SKIP(parser); } /* Check for the right quote. */ else if (CHECK(parser->buffer, single ? '\'' : '"')) { break; } /* Check for an escaped line break. */ else if (!single && CHECK(parser->buffer, '\\') && IS_BREAK_AT(parser->buffer, 1)) { if (!CACHE(parser, 3)) goto error; SKIP(parser); SKIP_LINE(parser); leading_blanks = 1; break; } /* Check for an escape sequence. */ else if (!single && CHECK(parser->buffer, '\\')) { size_t code_length = 0; if (!STRING_EXTEND(parser, string)) goto error; /* Check the escape character. */ switch (parser->buffer.pointer[1]) { case '0': *(string.pointer++) = '\0'; break; case 'a': *(string.pointer++) = '\x07'; break; case 'b': *(string.pointer++) = '\x08'; break; case 't': case '\t': *(string.pointer++) = '\x09'; break; case 'n': *(string.pointer++) = '\x0A'; break; case 'v': *(string.pointer++) = '\x0B'; break; case 'f': *(string.pointer++) = '\x0C'; break; case 'r': *(string.pointer++) = '\x0D'; break; case 'e': *(string.pointer++) = '\x1B'; break; case ' ': *(string.pointer++) = '\x20'; break; case '"': *(string.pointer++) = '"'; break; case '\'': *(string.pointer++) = '\''; break; case '\\': *(string.pointer++) = '\\'; break; case 'N': /* NEL (#x85) */ *(string.pointer++) = '\xC2'; *(string.pointer++) = '\x85'; break; case '_': /* #xA0 */ *(string.pointer++) = '\xC2'; *(string.pointer++) = '\xA0'; break; case 'L': /* LS (#x2028) */ *(string.pointer++) = '\xE2'; *(string.pointer++) = '\x80'; *(string.pointer++) = '\xA8'; break; case 'P': /* PS (#x2029) */ *(string.pointer++) = '\xE2'; *(string.pointer++) = '\x80'; *(string.pointer++) = '\xA9'; break; case 'x': code_length = 2; break; case 'u': code_length = 4; break; case 'U': code_length = 8; break; default: yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar", start_mark, "found unknown escape character"); goto error; } SKIP(parser); SKIP(parser); /* Consume an arbitrary escape code. */ if (code_length) { unsigned int value = 0; size_t k; /* Scan the character value. */ if (!CACHE(parser, code_length)) goto error; for (k = 0; k < code_length; k ++) { if (!IS_HEX_AT(parser->buffer, k)) { yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar", start_mark, "did not find expected hexdecimal number"); goto error; } value = (value << 4) + AS_HEX_AT(parser->buffer, k); } /* Check the value and write the character. */ if ((value >= 0xD800 && value <= 0xDFFF) || value > 0x10FFFF) { yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar", start_mark, "found invalid Unicode character escape code"); goto error; } if (value <= 0x7F) { *(string.pointer++) = value; } else if (value <= 0x7FF) { *(string.pointer++) = 0xC0 + (value >> 6); *(string.pointer++) = 0x80 + (value & 0x3F); } else if (value <= 0xFFFF) { *(string.pointer++) = 0xE0 + (value >> 12); *(string.pointer++) = 0x80 + ((value >> 6) & 0x3F); *(string.pointer++) = 0x80 + (value & 0x3F); } else { *(string.pointer++) = 0xF0 + (value >> 18); *(string.pointer++) = 0x80 + ((value >> 12) & 0x3F); *(string.pointer++) = 0x80 + ((value >> 6) & 0x3F); *(string.pointer++) = 0x80 + (value & 0x3F); } /* Advance the pointer. */ for (k = 0; k < code_length; k ++) { SKIP(parser); } } } else { /* It is a non-escaped non-blank character. */ if (!READ(parser, string)) goto error; } if (!CACHE(parser, 2)) goto error; } /* Check if we are at the end of the scalar. */ if (CHECK(parser->buffer, single ? '\'' : '"')) break; /* Consume blank characters. */ if (!CACHE(parser, 1)) goto error; while (IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer)) { if (IS_BLANK(parser->buffer)) { /* Consume a space or a tab character. */ if (!leading_blanks) { if (!READ(parser, whitespaces)) goto error; } else { SKIP(parser); } } else { if (!CACHE(parser, 2)) goto error; /* Check if it is a first line break. */ if (!leading_blanks) { CLEAR(parser, whitespaces); if (!READ_LINE(parser, leading_break)) goto error; leading_blanks = 1; } else { if (!READ_LINE(parser, trailing_breaks)) goto error; } } if (!CACHE(parser, 1)) goto error; } /* Join the whitespaces or fold line breaks. */ if (leading_blanks) { /* Do we need to fold line breaks? */ if (leading_break.start[0] == '\n') { if (trailing_breaks.start[0] == '\0') { if (!STRING_EXTEND(parser, string)) goto error; *(string.pointer++) = ' '; } else { if (!JOIN(parser, string, trailing_breaks)) goto error; CLEAR(parser, trailing_breaks); } CLEAR(parser, leading_break); } else { if (!JOIN(parser, string, leading_break)) goto error; if (!JOIN(parser, string, trailing_breaks)) goto error; CLEAR(parser, leading_break); CLEAR(parser, trailing_breaks); } } else { if (!JOIN(parser, string, whitespaces)) goto error; CLEAR(parser, whitespaces); } } /* Eat the right quote. */ SKIP(parser); end_mark = parser->mark; /* Create a token. */ SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start, single ? YAML_SINGLE_QUOTED_SCALAR_STYLE : YAML_DOUBLE_QUOTED_SCALAR_STYLE, start_mark, end_mark); STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); STRING_DEL(parser, whitespaces); return 1; error: STRING_DEL(parser, string); STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); STRING_DEL(parser, whitespaces); return 0; } /* * Scan a plain scalar. */ static int yaml_parser_scan_plain_scalar(yaml_parser_t *parser, yaml_token_t *token) { yaml_mark_t start_mark; yaml_mark_t end_mark; yaml_string_t string = NULL_STRING; yaml_string_t leading_break = NULL_STRING; yaml_string_t trailing_breaks = NULL_STRING; yaml_string_t whitespaces = NULL_STRING; int leading_blanks = 0; int indent = parser->indent+1; if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error; if (!STRING_INIT(parser, whitespaces, INITIAL_STRING_SIZE)) goto error; start_mark = end_mark = parser->mark; /* Consume the content of the plain scalar. */ while (1) { /* Check for a document indicator. */ if (!CACHE(parser, 4)) goto error; if (parser->mark.column == 0 && ((CHECK_AT(parser->buffer, '-', 0) && CHECK_AT(parser->buffer, '-', 1) && CHECK_AT(parser->buffer, '-', 2)) || (CHECK_AT(parser->buffer, '.', 0) && CHECK_AT(parser->buffer, '.', 1) && CHECK_AT(parser->buffer, '.', 2))) && IS_BLANKZ_AT(parser->buffer, 3)) break; /* Check for a comment. */ if (CHECK(parser->buffer, '#')) break; /* Consume non-blank characters. */ while (!IS_BLANKZ(parser->buffer)) { /* Check for 'x:x' in the flow context. TODO: Fix the test "spec-08-13". */ if (parser->flow_level && CHECK(parser->buffer, ':') && !IS_BLANKZ_AT(parser->buffer, 1)) { yaml_parser_set_scanner_error(parser, "while scanning a plain scalar", start_mark, "found unexpected ':'"); goto error; } /* Check for indicators that may end a plain scalar. */ if ((CHECK(parser->buffer, ':') && IS_BLANKZ_AT(parser->buffer, 1)) || (parser->flow_level && (CHECK(parser->buffer, ',') || CHECK(parser->buffer, ':') || CHECK(parser->buffer, '?') || CHECK(parser->buffer, '[') || CHECK(parser->buffer, ']') || CHECK(parser->buffer, '{') || CHECK(parser->buffer, '}')))) break; /* Check if we need to join whitespaces and breaks. */ if (leading_blanks || whitespaces.start != whitespaces.pointer) { if (leading_blanks) { /* Do we need to fold line breaks? */ if (leading_break.start[0] == '\n') { if (trailing_breaks.start[0] == '\0') { if (!STRING_EXTEND(parser, string)) goto error; *(string.pointer++) = ' '; } else { if (!JOIN(parser, string, trailing_breaks)) goto error; CLEAR(parser, trailing_breaks); } CLEAR(parser, leading_break); } else { if (!JOIN(parser, string, leading_break)) goto error; if (!JOIN(parser, string, trailing_breaks)) goto error; CLEAR(parser, leading_break); CLEAR(parser, trailing_breaks); } leading_blanks = 0; } else { if (!JOIN(parser, string, whitespaces)) goto error; CLEAR(parser, whitespaces); } } /* Copy the character. */ if (!READ(parser, string)) goto error; end_mark = parser->mark; if (!CACHE(parser, 2)) goto error; } /* Is it the end? */ if (!(IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer))) break; /* Consume blank characters. */ if (!CACHE(parser, 1)) goto error; while (IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer)) { if (IS_BLANK(parser->buffer)) { /* Check for tab character that abuse intendation. */ if (leading_blanks && (int)parser->mark.column < indent && IS_TAB(parser->buffer)) { yaml_parser_set_scanner_error(parser, "while scanning a plain scalar", start_mark, "found a tab character that violate intendation"); goto error; } /* Consume a space or a tab character. */ if (!leading_blanks) { if (!READ(parser, whitespaces)) goto error; } else { SKIP(parser); } } else { if (!CACHE(parser, 2)) goto error; /* Check if it is a first line break. */ if (!leading_blanks) { CLEAR(parser, whitespaces); if (!READ_LINE(parser, leading_break)) goto error; leading_blanks = 1; } else { if (!READ_LINE(parser, trailing_breaks)) goto error; } } if (!CACHE(parser, 1)) goto error; } /* Check intendation level. */ if (!parser->flow_level && (int)parser->mark.column < indent) break; } /* Create a token. */ SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start, YAML_PLAIN_SCALAR_STYLE, start_mark, end_mark); /* Note that we change the 'simple_key_allowed' flag. */ if (leading_blanks) { parser->simple_key_allowed = 1; } STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); STRING_DEL(parser, whitespaces); return 1; error: STRING_DEL(parser, string); STRING_DEL(parser, leading_break); STRING_DEL(parser, trailing_breaks); STRING_DEL(parser, whitespaces); return 0; }