SAXONICA |
The Saxon-SA optimizer causes certain filter expressions to be evaluated using indexing. There have been some changes in the strategy in this release.
There are essentially two kinds of index used: document-level indexes, and variable-level indexes. Document-level indexes are used when the expression being filtered is a path expression rooted at a document node; in this case the index (like an xsl:key index in XSLT) is attached to the document node, and lives as long as the document itself lives. Variable-level indexes are used when the expression being filtered is represented by a variable reference, which may be because it is written in the source as a variable reference, or because it is an expression that Saxon has moved out of a containing loop, thus creating a synthetic variable.
For document-level indexes:
This release reinstates the ability to use document-level indexes in conjunction with the "=" operator and operands whose type is not known statically. The difficulty here is that when searching for an untypedAtomic value "12", say, we must search both for the string "12" and for the number 12 (as an integer, decimal, float or double), because all these cases match. So Saxon builds multiple indexes in this situation, and searches them all. In the vast majority of cases, of course, all the indexed values turn out to have the same dynamic type, but this cannot be predicted in advance unless the source expression declares types (for example by using an explicit cast).
Previously, a document-level index would only be built for a path expression of the form /a/b/c[use=$value]
if the path /a/b/c
was a valid XSLT match pattern. This restriction was a consequence of re-using the
implementation of xsl:key
. The restriction has no been removed, and any absolute path expression may be
used here, provided that it cannot create new nodes, and that the nodes it selects are all in the same document.
Variable-level indexes:
A variable-level index is now used when evaluating a general comparison in which one operand can be evaluated
outside a containing filter. For example, in the expression A[not(. = B)]
, where B does not depend on the
context node, an indexed variable is now created to hold the contents of B, and each value in A causes an access to the
index to determine if B contains an equal value. (This doesn't affect expressions such as A[. = B]
, where
it is A that is indexed).
Other optimization changes
In previous releases a path expression was not optimized (and therefore never used an index) if the last step returned an atomic value rather than a node. This omission has been rectified.
Saxon now performs variable inlining where appropriate. If a variable is referenced only once, and the reference is not within any kind of loop, then the expression to which the variable is bound is substituted for the variable reference. This avoids the need to create run-time closures in such cases.
Where the specification of a system function states that "if the value of argument N is an empty sequence, the result is X" (regardless of the values of any other arguments), and when argument N is known at compile time to be an empty sequence, then the call to the system function is replaced by X at optimization time.