Optimizations and performance improvements

The release introduces the ability to compile a query to Java source code. The facility is available in Saxon-SA, and it applies to the Java platform only.

This is a first release of this capability and it should be treated as a beta release, providing early access for evaluation. Feedback is very welcome. Restrictions include:

It's important to have the right expectations for performance. Very often the query will run twice as fast, but the speed-up factor is quite variable. A great deal of the time is spent in the run-time library, and operations such as parsing, tree navigation, sorting, and serialization benefit very little from compilation. Arithmetic computation, on the other hand, improves a lot (especially when it takes place in a predicate or where condition), and so does function calling. For simple path expressions such as /a/b/c/d there is no speed-up at all, but for some queries involving complex recursive function calls the compiled code may be a factor of four or five faster.

It is hoped to extend this capability to XSLT in a subsequent release. Compiling for .NET is still under consideration, as is direct generation of bytecode rather than Java source code.

For more details see Compiling Queries to Java code.

indexing

The Saxon-SA optimizer causes certain filter expressions to be evaluated using indexing. There have been some changes in the strategy in this release.

There are essentially two kinds of index used: document-level indexes, and variable-level indexes. Document-level indexes are used when the expression being filtered is a path expression rooted at a document node; in this case the index (like an xsl:key index in XSLT) is attached to the document node, and lives as long as the document itself lives. Variable-level indexes are used when the expression being filtered is represented by a variable reference, which may be because it is written in the source as a variable reference, or because it is an expression that Saxon has moved out of a containing loop, thus creating a synthetic variable.

For document-level indexes:

Variable-level indexes:

In previous releases a path expression was not optimized (and therefore never used an index) if the last step returned an atomic value rather than a node. This omission has been rectified.

Saxon now performs variable inlining where appropriate. If a variable is referenced only once, and the reference is not within any kind of loop, then the expression to which the variable is bound is substituted for the variable reference. This avoids the need to create run-time closures in such cases.

Where the specification of a system function states that "if the value of argument N is an empty sequence, the result is X" (regardless of the values of any other arguments), and when argument N is known at compile time to be an empty sequence, then the call to the system function is replaced by X at optimization time.