Optimizations and performance improvements
There has been a substantial change to the way stylesheets are "compiled". In previous releases, the compiled stylesheet was actually a standard tree representation of the source XML stylesheet, with annotations on the nodes to assist efficient execution. In this release, the tree representation of the stylesheet is discarded once compilation is complete, and a custom data structure is used to represent the executable stylesheet.
The compiled stylesheet may now be serialized (using Java serialization), enabling it to be saved on
disk, or transferred between machines - this is especially useful in an Enterprise Java Beans environment.
A new command java net.sf.saxon.Compile stylesheet
output
is available to
compile a stylesheet, and the java net.sf.saxon.Transform
command has a new option -c
which causes the stylesheet parameter to be taken as a compiled stylesheet rather than a source
stylesheet. In fact, using compiled stylesheets from the command line does not give a great performance
advantage over recompiling them each time they are used, because the compilation time is dominated by
Java initialization; the benefits are more likely to be realized in
a high-throughput server-based environment, where it is now possible to use disk caching of stylesheets as
an alternative to in-memory caching.
These changes bring (or promise) a number of benefits:
- The compiled stylesheet is significantly smaller, important when a number of compiled stylesheets are cached in a web server.
- It is possible to distribute a stylesheet in scrambled form, so that users cannot easily make changes.
- Unused parts of the stylesheet, for example template rules in imported modules, are discarded.
- The compiled stylesheet is relocatable between servers (e.g. under EJB).
- Stylesheet optimizations, by rewriting the tree, become feasible. Until now the Saxon optimizer has
only operated at the level of individual XPath expressions. A few simple optimizations have been implemented
in this release, e.g. the decision whether to execute
xsl:fallback
is made entirely at compile-time.
The main drawback is that less of the static context is available during execution. This makes a number of things more difficult, or in some cases impossible:
- Diagnostics, such as tracing and debugging, have less information available. For example, variable names are currently not retained in the executable.
- Reflexive capabilities become more difficult. The obvious examples are
saxon:evaluate
and thesaxon:allow-avt
attribute which allows dyanamic selection of a template inxsl:call-template
In general I expect that stylesheets will need to be recompiled whenever a new Saxon version is issued, though this may be avoidable the case of a bug-clearance release.
Stylesheet compilation is a little fragile at this release. It has proved difficult to test it
comprehensively. One known restriction is that stylesheets containing saxon:collation
declarations
cannot be compiled (because it uses Java classes that are not serializable). There may be other restrictions:
please let me know if you find any.
As part of this change, the stylesheet tree now uses a different NamePool from the source tree. This
NamePool is discarded as soon as compilation is complete. Names
used in XPath expressions, names of literal result elements and attributes, and names of keys, variables,
templates, and functions, are still registered in the NamePool for the source document, but the names
of XSLT elements and attributes (e.g. xsl:template
, select
) no longer appear.
This significantly reduces the size of the compiled version of a small stylesheet, and makes loading
of the compiled stylesheet correspondingly faster. It also means that names used in the source document
are less likely to encounter hashing conflicts in the NamePool, giving a small run-time speed-up.