Saxonica.com

Internal Changes

There has been a change to the TinyTree structure. When a node has a very large number of siblings, finding the parent node from one of the siblings could involve a long search. The search was only necessary in cases where the child was found without remembering the parent - for example, when it was found via the descendant axis. The result was that for trees with a large fan-out (typically XML documents representing relational database tables) a few XPath expressions could experience very poor performance. This problem has been solved by inserting parent pointers into the tree. These are not added to every node, because it's important to save space in the TinyTree, but instead one parent pointer is added for every ten siblings. Many trees rarely see a fan-out as high as ten, in which case these pointers will not appear at all. Note that the parent pointers are counted as extra nodes in the TinyTree statistics.

Another change made to the TinyTree is that it now contains a boolean flag indicating whether the tree contains any namespace nodes (other than the XML namespace). This reduces the cost of operations that copy an element node with all its namespaces.

When delayed evaluation is used for an expression (typically for a local variable), the context that the expression depends on is saved for use when the expression is eventually evaluated. Saxon now saves only those local variables that the expression actually references, whereas it previously saved the whole stack frame. This change has several benefits. Although the cost of doing the copying is small, not making copies of variables means that values can be discarded more quickly from memory by the garbage collector. Also, Saxon has an algorithm that prevents long chains of unevaluated expressions building up, especially during recursive processing: this algorithm is now more sensitive, as it only recognizes a chain where there is a genuine dependency. This prevents unnecessary early evaluation of expressions, which costs time and uses memory. In particular, there were situations where a local variable is declared in a loop, and each time round the loop the stack frame is saved, including the previous value of the same local variable, and so on.

Operations that atomize a sequence of nodes and expect the result to be a sequence containing zero or one atomic values (arithmetic is a typical example) have been speeded up by combining the operations of atomization and cardinality checking, and by providing a new method atomize() on the NodeInfo interface which (unlike the existing getTypedValue() method) returns the actual value rather than an iterator. This removes several iterators from the evaluation pipeline, thus reducing overheads.

When taking input from a DOMSource, Saxon now tests to see whether the supplied document supports DOM level 3 methods, and if it does so, it uses the DOM methods isSameNode() and compareDocumentPosition() when sorting nodes into document order. These have the potential to be more efficient than Saxon's algorithms for comparing document order (though it depends on the implementation, of course).

The RegexIterator class, supporting xsl:analyze-string and the saxon:analyze-string() extension function, has been rewritten to take advantage of the SequenceIterator redesign introduced some while ago, which means that it no longer needs to perform lookahead. This greatly simplifies the design and improves performance. In the course of this simplification, it was found that the last() function was not working on this iterator.

Where the values of constructed text nodes and attributes are known at compile time, Saxon now analyzes the value at compile time; if the value consists of ASCII characters with no special characters that might need escaping, the value is marked as such to save unnecessary work at serialization time. Previously this optimization was done only for attribute values of literal result elements in XSLT. Whitespace text nodes generated as a result of the indent="yes" option also benefit from this optimization.

The Expression.analyze() method, previously used to perform type checking and optimization on expressions in the abstract expression tree, has been split into two methods: typeCheck() and optimize(). This means that this phase of processing has been split into two phases. This has a number of benefits: (a) the optimize() phase is optional, and can be omitted in contexts such as saxon:evaluate() or when debugging; (b) it removes the problem that analyze() was being executed twice in XSLT, once at the XPath expression level and once at the level of a template or function (now, typeCheck() is called once for each XPath expression, and optimize() is called at the template or function level), and (c) it enables the optimization of a parent expression to perform tree rewrites after the type-checking of subordinate expressions, but before they are themselves optimized, which sometimes makes it easier to recognize the subexpressions that are amenable to optimization.

The way the XSLT current() function works has changed. Previously, this was quietly moved to the top level of an XPath expression by virtue of the fact that expressions that don't depend on a loop variable are moved outside the loop. The problem was that as optimization increasingly happens at the level of a template or function, it was difficult to ensure that the call was moved to the right place. There is now an explicit rewrite that replaces the call on current() with a reference to a system-generated variable at the level of an XPath expression as written in XSLT. During this rewrite I discovered bug 1200164, which required a fairly substantial redesign of the way current() works within a pattern.

A set of nested let expressions is now evaluated iteratively rather than by recursion, reducing the use of Java stack space and increasing the scope for the garbage collector to free heap memory. This affects XSLT as well as XQuery, because a sequence of local xsl:variable instructions is compiled into a nested set of let expressions. In 8.4 there was a restriction that tail call optimization was not invoked on an XSLT template whose body was a let expression (in source terms, a sequence of instructions starting with xsl:variable): this restriction has been removed.

Runtime type checking can now be performed within the push pipeline: this typically arises when an "as" attribute is used on xsl:template. Previously this resulted in the value of the template being physically constructed in memory, and then checked against the type. This led to a lot of unnecessary use of memory and copying of intermediate results. The results of the template are now checked on-the-fly. As well as improving performance, this also means that the error messages can be more precise about the location of the code that generated the incorrect content.

Next