Normalized PE Grammar Tree
==========================

This file documents the tree generated by the transformational package
'pg::peg::normalize'. The input to the transformation is assumed to be
a 'Raw PE Grammar AS Tree', as generated by the PEG frontend, and
described in 'doc_grammar.txt'.

General information
-------------------

* The tree is implemented using 'struct::tree'.

* The tree is used as a higher data structures keeping the various
  parts of a grammar together, which are: Start parsing expression,
  definitions, and their parsing expressions. The tree nature of the
  parsing expressions map especially nicely to this data structure.

Structure
---------

* The root node represents the overall grammar. It has one child node
  for the start expression, and one child per definition of a
  nonterminal symbol in the grammar. No other nodes are possible. The
  order of the children is not specified and an implementation
  detail. Attributes in the root provide quick access, and the nodes
  can also be distinguished by the attributes they have and/or their
  values.

* A definition node represents a single nonterminal definition from
  the grammar. Most of the information describing the definition is
  stored in attributes of the node. Sole exception is the parsing
  expression associated with the defined nonterminal symbol. This is
  represented by an expression subtree, the root of which is the
  single child of the definition node.

* All other nodes represent a parsing expression with the operator
  stored in the node and its arguments represented by the children of
  the node. For operators allowing more than one argument the children
  will be in the same order as specified in the grammar. I.e. the
  first child represents the first argument to the operator, and so
  on.

Attributes
----------

Name	Type	Details
----	----	-------
name	string	Root only. The name of the grammar represented by the
		tree.
----	----	-------
start	noderef	Root only. Id of the tree node which is the root of
		the start expression. A child of the root node. Does
		not intersect with the set of definition nodes. Can be
		empty, representing a grammar without start expression.
----	----	-------
definitions	Root only. Maps the names (strings) of nonterminal
	dict	symbols to the ids of the tree nodes (noderef) which
		represents the definition of that symbol. The nodes
		are all immediate children of the root node. None of
		them can be the root of the start expression
		however. The dictionary can be empty, representing a
		grammar which has no nonterminal symbols.
----	----	-------
undefined	Root only. Maps the name (string) of a nonterminal
	dict	symbol which has no definition in the grammar to a
		list containting the ids of the tree nodes (noderef)
		which use the symbol despite that. I.e. if this value
		is not empty the grammar is invalid and has 'holes'.
====	====	=======
symbol	string	Root and definition nodes only. The name of the
		nonterminal symbol whose definition the node is
		representing. For root this is '<StartExpression>'.
		It is defined for root so that some algorithms on
		expressions can use it as a sentinel.
----	----	-------
label	string	Definition nodes only. The name of the input grammar
		level nonterminal symbol represented by the node. This
		is normally identical to 'symbol', but can differ for
		helper definitions introduced by transformations. For
		such 'symbol' will refer to the generated name of the
		symbol, and 'label' to the name of the symbol in the
		grammar the helper belongs to.
----	----	-------
mode	enum	Definition nodes only. Values in {value, discard,
		leaf, match}. Specifies how the defined nonterminal
		handles the generation of its semantic value during
		matching.
----	----	-------
users	list	Definition nodes only. A list containing the ids of
		the tree nodes which reference this definition. These
		nodes are always expression nodes, with operator
		'n'. The list can be empty, representing a nonterminal
		symbol which is defined, but not used anywhere in
		grammar.
====	====	=======
op	enum	All expression nodes. Values in {n, t, .., epsilon,
		alpha, alnum, x, /, ?, *, +, !, &}. Specifies the
		operator part of the expression represented by the
		node.
----	----	-------
char	char	Expression nodes with operator 't' (t-Nodes)
		only. Value is the single character from the grammar
		to match, as represented by Tcl. I.e. any quoting from
		the input has been resolved.
----	----	-------
begin	char	..-Nodes only. Values are like 'char' above, the first
end	char	and last character in the range to match.
----	----	-------
sym	string	n-Nodes only. The name of the nonterminal symbol to
		match.
----	----	-------
def	noderef	n-Nodes only. The id of the definition node for the
		nonterminal symbol to match. Can be empty. In that
		case the node repesents a try to match an undefined
		nonterminal symbol. The value of 'sym' will be a key
		in the dictionary of root->undefined, and the id of
		this node an element in the list associated with that
		key.
====	====	=======
at*, to*	See 'doc_grammar.txt' for the general definition.

		All nodes except root.

		Definition nodes: The span of input covered by the
				  definition.

		Expression nodes: The span of input covered by the
				  expression.

		The nodes for the operators

			dot, alpha, alnum, epsilon

		have no location information right now.  Nodes based
		on them may have only partial or no information as
		well.
----	----	-------