Normalized PE Grammar Tree ========================== This file documents the tree generated by the transformational package 'pg::peg::normalize'. The input to the transformation is assumed to be a 'Raw PE Grammar AS Tree', as generated by the PEG frontend, and described in 'doc_grammar.txt'. General information ------------------- * The tree is implemented using 'struct::tree'. * The tree is used as a higher data structures keeping the various parts of a grammar together, which are: Start parsing expression, definitions, and their parsing expressions. The tree nature of the parsing expressions map especially nicely to this data structure. Structure --------- * The root node represents the overall grammar. It has one child node for the start expression, and one child per definition of a nonterminal symbol in the grammar. No other nodes are possible. The order of the children is not specified and an implementation detail. Attributes in the root provide quick access, and the nodes can also be distinguished by the attributes they have and/or their values. * A definition node represents a single nonterminal definition from the grammar. Most of the information describing the definition is stored in attributes of the node. Sole exception is the parsing expression associated with the defined nonterminal symbol. This is represented by an expression subtree, the root of which is the single child of the definition node. * All other nodes represent a parsing expression with the operator stored in the node and its arguments represented by the children of the node. For operators allowing more than one argument the children will be in the same order as specified in the grammar. I.e. the first child represents the first argument to the operator, and so on. Attributes ---------- Name Type Details ---- ---- ------- name string Root only. The name of the grammar represented by the tree. ---- ---- ------- start noderef Root only. Id of the tree node which is the root of the start expression. A child of the root node. Does not intersect with the set of definition nodes. Can be empty, representing a grammar without start expression. ---- ---- ------- definitions Root only. Maps the names (strings) of nonterminal dict symbols to the ids of the tree nodes (noderef) which represents the definition of that symbol. The nodes are all immediate children of the root node. None of them can be the root of the start expression however. The dictionary can be empty, representing a grammar which has no nonterminal symbols. ---- ---- ------- undefined Root only. Maps the name (string) of a nonterminal dict symbol which has no definition in the grammar to a list containting the ids of the tree nodes (noderef) which use the symbol despite that. I.e. if this value is not empty the grammar is invalid and has 'holes'. ==== ==== ======= symbol string Root and definition nodes only. The name of the nonterminal symbol whose definition the node is representing. For root this is '<StartExpression>'. It is defined for root so that some algorithms on expressions can use it as a sentinel. ---- ---- ------- label string Definition nodes only. The name of the input grammar level nonterminal symbol represented by the node. This is normally identical to 'symbol', but can differ for helper definitions introduced by transformations. For such 'symbol' will refer to the generated name of the symbol, and 'label' to the name of the symbol in the grammar the helper belongs to. ---- ---- ------- mode enum Definition nodes only. Values in {value, discard, leaf, match}. Specifies how the defined nonterminal handles the generation of its semantic value during matching. ---- ---- ------- users list Definition nodes only. A list containing the ids of the tree nodes which reference this definition. These nodes are always expression nodes, with operator 'n'. The list can be empty, representing a nonterminal symbol which is defined, but not used anywhere in grammar. ==== ==== ======= op enum All expression nodes. Values in {n, t, .., epsilon, alpha, alnum, x, /, ?, *, +, !, &}. Specifies the operator part of the expression represented by the node. ---- ---- ------- char char Expression nodes with operator 't' (t-Nodes) only. Value is the single character from the grammar to match, as represented by Tcl. I.e. any quoting from the input has been resolved. ---- ---- ------- begin char ..-Nodes only. Values are like 'char' above, the first end char and last character in the range to match. ---- ---- ------- sym string n-Nodes only. The name of the nonterminal symbol to match. ---- ---- ------- def noderef n-Nodes only. The id of the definition node for the nonterminal symbol to match. Can be empty. In that case the node repesents a try to match an undefined nonterminal symbol. The value of 'sym' will be a key in the dictionary of root->undefined, and the id of this node an element in the list associated with that key. ==== ==== ======= at*, to* See 'doc_grammar.txt' for the general definition. All nodes except root. Definition nodes: The span of input covered by the definition. Expression nodes: The span of input covered by the expression. The nodes for the operators dot, alpha, alnum, epsilon have no location information right now. Nodes based on them may have only partial or no information as well. ---- ---- -------