Optimizations and performance improvements

There has been a substantial change to the way stylesheets are "compiled". In previous releases, the compiled stylesheet was actually a standard tree representation of the source XML stylesheet, with annotations on the nodes to assist efficient execution. In this release, the tree representation of the stylesheet is discarded once compilation is complete, and a custom data structure is used to represent the executable stylesheet.

The compiled stylesheet may now be serialized (using Java serialization), enabling it to be saved on disk, or transferred between machines - this is especially useful in an Enterprise Java Beans environment. A new command java net.sf.saxon.Compile stylesheet output is available to compile a stylesheet, and the java net.sf.saxon.Transform command has a new option -c which causes the stylesheet parameter to be taken as a compiled stylesheet rather than a source stylesheet. In fact, using compiled stylesheets from the command line does not give a great performance advantage over recompiling them each time they are used, because the compilation time is dominated by Java initialization; the benefits are more likely to be realized in a high-throughput server-based environment, where it is now possible to use disk caching of stylesheets as an alternative to in-memory caching.

These changes bring (or promise) a number of benefits:

The compiled stylesheet is significantly smaller, important when a number of compiled stylesheets are cached in a web server.
It is possible to distribute a stylesheet in scrambled form, so that users cannot easily make changes.
Unused parts of the stylesheet, for example template rules in imported modules, are discarded.
The compiled stylesheet is relocatable between servers (e.g. under EJB).
Stylesheet optimizations, by rewriting the tree, become feasible. Until now the Saxon optimizer has only operated at the level of individual XPath expressions. A few simple optimizations have been implemented in this release, e.g. the decision whether to execute xsl:fallback is made entirely at compile-time.

The main drawback is that less of the static context is available during execution. This makes a number of things more difficult, or in some cases impossible:

Diagnostics, such as tracing and debugging, have less information available. For example, variable names are currently not retained in the executable.
Reflexive capabilities become more difficult. The obvious examples are saxon:evaluate and the saxon:allow-avt attribute which allows dyanamic selection of a template in xsl:call-template

In general I expect that stylesheets will need to be recompiled whenever a new Saxon version is issued, though this may be avoidable the case of a bug-clearance release.

Stylesheet compilation is a little fragile at this release. It has proved difficult to test it comprehensively. One known restriction is that stylesheets containing saxon:collation declarations cannot be compiled (because it uses Java classes that are not serializable). There may be other restrictions: please let me know if you find any.

As part of this change, the stylesheet tree now uses a different NamePool from the source tree. This NamePool is discarded as soon as compilation is complete. Names used in XPath expressions, names of literal result elements and attributes, and names of keys, variables, templates, and functions, are still registered in the NamePool for the source document, but the names of XSLT elements and attributes (e.g. xsl:template, select) no longer appear. This significantly reduces the size of the compiled version of a small stylesheet, and makes loading of the compiled stylesheet correspondingly faster. It also means that names used in the source document are less likely to encounter hashing conflicts in the NamePool, giving a small run-time speed-up.