Optimizations and performance improvements
Many internal iterators work with a one-item lookahead. This is wasteful if the iteration is not
continued to completion, which happens for example with a numeric predicate such as expr[1], or
with an existential comparison such as sequenceA = sequenceB
, or when converting a
sequence to a string or a boolean. This lookahead has been removed for some commonly used iterations,
notably the FilterIterator, the MappingIterator, and the TinyTree SiblingIterator.
A consequence is that the hasNext() method of SequenceIterator can now throw an XPathException.
Deferred evaluation of variables happened in the past when the expression was a SequenceExpression. It now happens only if the compile-time cardinality of the expression allows more than one item. This means that deferred evaluation will not be used for an expression of the form expr[1]. And when deferred evaluation is used, the iterator is not primed by calling hasNext(): this means that (for an iterator that doesn't do lookahead), the search for the first item is now deferred until the variable is first used, and doesn't have to be repeated unnecessarily. In addition, if the variable is referenced in a context where only the first item in the sequence is required (e.g. to get the value as a boolean or as a string), the value is now saved without evaluating the full sequence.
I have added an optimization for constructs of the form <xsl:if test="a | b">
.
Where a union expression is evaluated in a boolean context it is now treated as if the operator were "or".
This potentially avoids the need to sort the two node-sets into document order.
There are some changes in the way global variables are handled. At compile time, a hash table is used in place of linear searching to search for duplicates: this should improve compilation performance for stylesheets with many global variables, especially when many of the variables are overridden by an importing stylesheet. At run-time, evaluation of global variables is now deferred until the first reference to the variable, which will improve execution performance when there are global variables that are never referenced. Note that this change will be visible if <xsl:message> is used to trace execution.
A filter expression of the form f[a and b]
is now rewritten as f[a][b]
when
appropriate, to enable an early exit in the case where a
is positional: for example
item[position() = 1 and child::desc]
. This is only done if a
is positional and
b
is not.
A union (or intersection or difference) of two path expressions is now rewritten to do the combination
as late as possible: for example ( /a/b/c | /a/b/d )
is rewritten as ( /a/b/(c|d) )
.
Note, this is a first small step in the identification of common subexpressions. The cases where two
subexpressions are detected as being identical are fairly limited, for example there is no knowledge
of which operators are commutative or associative.