Optimizations and performance improvements
The release introduces the ability to compile a query to Java source code. The facility is available in Saxon-SA, and it applies to the Java platform only.
This is a first release of this capability and it should be treated as a beta release, providing early access for evaluation. Feedback is very welcome. Restrictions include:
- Calls to extension functions (external Java functions) cannot be compiled. This includes calls to the built-in extension functions in the Saxon namespace.
- When the query contains external variables, there is no check that the supplied value conforms to the required type.
- There is very limited support for collations.
- The SequenceType
document-node(element(ABC))
is not supported. - There is no support for substitution groups in a
schema-element()
SequenceType - On most run-time errors, no information is provided linking the error to a location in the source query.
- The
saxon:validate-type
pragma is not supported. - There is no separate compilation of modules: the whole query is compiled into a single Java class (sometimes with one or more inner classes)
- For large queries, it is possible that the generated Java code will exceed Java compiler limits.
It's important to have the right expectations for performance. Very often the query will run twice as
fast, but the speed-up factor is quite variable. A great deal of the time is spent in the
run-time library, and operations such as parsing, tree navigation, sorting, and serialization benefit
very little from compilation. Arithmetic computation, on the other hand, improves a lot (especially
when it takes place in a predicate or where condition), and so does
function calling. For simple path expressions such as /a/b/c/d
there is no speed-up
at all, but for some queries involving complex recursive function calls the compiled code may be
a factor of four or five faster.
It is hoped to extend this capability to XSLT in a subsequent release. Compiling for .NET is still under consideration, as is direct generation of bytecode rather than Java source code.
For more details see
indexing
The Saxon-SA optimizer causes certain filter expressions to be evaluated using indexing. There have been some changes in the strategy in this release.
There are essentially two kinds of index used: document-level indexes, and variable-level indexes. Document-level indexes are used when the expression being filtered is a path expression rooted at a document node; in this case the index (like an xsl:key index in XSLT) is attached to the document node, and lives as long as the document itself lives. Variable-level indexes are used when the expression being filtered is represented by a variable reference, which may be because it is written in the source as a variable reference, or because it is an expression that Saxon has moved out of a containing loop, thus creating a synthetic variable.
For document-level indexes:
- This release reinstates the ability to use document-level indexes in conjunction with the "=" operator and operands whose type is not known statically. The difficulty here is that when searching for an untypedAtomic value "12", say, we must search both for the string "12" and for the number 12 (as an integer, decimal, float or double), because all these cases match. So Saxon builds multiple indexes in this situation, and searches them all. In the vast majority of cases, of course, all the indexed values turn out to have the same dynamic type, but this cannot be predicted in advance unless the source expression declares types (for example by using an explicit cast).
- Previously, a document-level index would only be built for a path expression of the form
/a/b/c[use=$value]
if the path/a/b/c
was a valid XSLT match pattern. This restriction was a consequence of re-using the implementation ofxsl:key
. The restriction has no been removed, and any absolute path expression may be used here, provided that it cannot create new nodes, and that the nodes it selects are all in the same document.
Variable-level indexes:
- A variable-level index is now used when evaluating a general comparison in which one operand can be evaluated
outside a containing filter. For example, in the expression
A[not(. = B)]
, where B does not depend on the context node, an indexed variable is now created to hold the contents of B, and each value in A causes an access to the index to determine if B contains an equal value. (This doesn't affect expressions such asA[. = B]
, where it is A that is indexed).
In previous releases a path expression was not optimized (and therefore never used an index) if the last step returned an atomic value rather than a node. This omission has been rectified.
Saxon now performs variable inlining where appropriate. If a variable is referenced only once, and the reference is not within any kind of loop, then the expression to which the variable is bound is substituted for the variable reference. This avoids the need to create run-time closures in such cases.
Where the specification of a system function states that "if the value of argument N is an empty sequence, the result is X" (regardless of the values of any other arguments), and when argument N is known at compile time to be an empty sequence, then the call to the system function is replaced by X at optimization time.