Compiling a Stylesheet
Generally, the cost of analyzing the XSLT source code in a stylesheet and preparing it for execution can be high in relation to the cost of actually running the code to transform an individual source document, especially where the stylesheet is large and the source document is small. Saxon provides several capabilities designed to ensure that when you use the same stylesheet repeatedly, you only need to incur this overhead once.
- In simple cases, you can exploit the ability to process an entire directory of source
files using a single invocation of the
Transform
command on the command line. - Both the JAXP and s9api interfaces separate the process of compiling a stylesheet and
the process of using it to transform a source document. (With JAXP the object
representing the compiled stylesheet is the
javax.xml.transform.Templates
object, with s9api it is the XsltExecutable). If you run transformations within a web service then it is always a good idea to cache the compiled form of the stylesheets it uses. - From Saxon 9.7, it is also possible to export the compiled form of a stylesheet as an XML file (called the stylesheet export file), in much the same way that object code from other languages is saved to filestore, and distributed from developers to users.
- A related capability is the ability in Saxon-EE to generate bytecode (intermediate Java code) to improve the speed of stylesheet execution.
Caching compiled stylesheets in memory
The JAXP interface represents a compiled stylesheet as a Templates
object. The
object contains the entire stylesheet; all modules must be compiled as a single unit. JAXP
was designed before packages were added to the XSLT 3.0 language. The
Templates
object is thread-safe, so once created it can be used by many
transformations running separately in parallel. To use the Templates
object to
run a transformation of a particular source document, a Transformer
object is
created. The Transformer
is not thread-safe; its transform()
method must not be called while a transformation is active. The Transformer
can be serially reused, but with Saxon there is no benefit in doing so; better garbage
collection generally occurs if a new Transformer
is created for each
transformation.
The s9api interface in its original form has a similar design: a compiled stylesheet is
represented by a XsltExecutable
object, and the instantiation of a stylesheet performing a single transformation by an
XsltTransformer object. The s9api
interface also adds a third class to the design, namely the XsltCompiler, which holds compile-time
options such as the base URI of the stylesheet, values of static parameters, and
compile-time options such as whether to generate bytecode, how to resolve references to
modules (xsl:include
/xsl:import
), what schema definitions to use,
and where to report compile-time errors. The XsltCompiler
is also thread-safe,
though the options in force should not be changed while the compiler is in use. Different
XsltCompiler
instances with different option settings can run concurrently
with each other.
A preliminary implementation of XSLT 3.0 packages appeared in Saxon 9.6, with a much more
complete implementation following in Saxon 9.7. A package may consist of a single module,
or of a number of modules connected using xsl:include
/xsl:import
;
a package is compiled as a unit, and may have references to other packages (via
xsl:use-package)
that are compiled independently. To allow independent
compilation, there is much stronger control over the interfaces that a package exposes to
the outside world, and over the ability of declarations in one package to override another.
For example, if a function is declared to return an integer, then when compiling a call to
that function, the compiler can be confident that any overriding declaration of the
function will still return an integer result.
In the s9api interface, a package is represented by an XsltPackage object. The
XsltCompiler
has a method compilePackage
which returns an
XsltPackage
if successful. The package may be made available for use by
other packages being compiled, in the same or in a different XsltCompiler
, by
the XsltCompiler
's importPackage
method. When an
xsl:use-package
declaration is found while compiling one package, the
compiler searches for a matching package among those that have been imported by the
XsltCompiler
in this way. It is possible to import several different
versions of the same package, and the package-version
attribute of
xsl:use-package
determines which of them is loaded.
The XsltPackage
object, once created, is immutable and thread-safe. It is tied
to a Saxon Configuration (or s9api Processor) but it can be imported by multiple
XsltCompiler
instances. If a common library package is used by many
different stylesheets, it makes sense to define it as a reusable package, since this avoids
the cost of compiling the code repeatedly, and avoids the need to keep multiple copies in
memory.
JIT Compilation of Template Rules
Sometimes a stylesheet may contain hundreds of template rules to define the processing of elements that never actually appear in the source documents; source documents may use a tiny fraction of the defined vocabulary. In this situation, it is wasteful to compile all these template rules every time the stylesheet is used. This isn't a problem when the stylesheet is compiled once, cached, and used to run a large number of transformations; but it is a problem in a batch workflow where the stylesheet is compiled every time it is used.
To improve the efficiency of this kind of workload, Saxon-EE by default uses just-in-time compilation of template rules. On first reading the stylesheet, all the match patterns are processed and a suitable decision table is constructed; but the body of a template rule is not compiled into executable form until the first time that template rule is matched.
A consequence of this is that static errors (for example, invalid path expressions) in such templates may go undetected if the code is not actually executed.
JIT compilation is enabled by default. It can be suppressed from the command line by setting -jit:off
.
Setting the export
, explain
, or nogo
options also has the side-effect
of suppressing JIT compilation. There is also an option available on the XsltCompiler
object.
It probably makes sense to suppress JIT compilation in any workload where the compiled stylesheet is cached and used repeatedly.
Exporting Packages
A package, once compiled into an XsltPackage object, can be saved as a stylesheet export file (SEF) using the save()
method
of the XsltPackage
. The generated file is intended to be used for one purpose
only, namely for reconstituting the XsltPackage
at a different time and place.
The format is XML, but its interpretation is not published and should not be considered
stable. The file contains a checksum and cannot be loaded in the event of a checksum
failure, so modifications to the content are not permitted. The content of the file is
sufficiently far removed from the original source that distributing code in this form
achieves a useful level of IP protection, though like Java bytecode, it is not intended to
resist determined attempts at reverse engineering. Indeed, in the interests of run-time
diagnostics, it preserves information such as variable names and line numbers that are not
strictly needed at execution time.
The simplest way to generate an export file is from the command line:
java -jar dir/saxon9ee.jar -xsl:stylesheet.xsl -export:stylesheet.sef -nogoHere, the option -nogo
suppresses any attempt to execute the stylesheet.
Additionally, the -relocate:on
option can be used to produce an export package
which can be deployed to a different location, with a different base URI.
The -target
option can be used to specify the edition of Saxon which will be used
to run the stylesheet export file. The accepted values are EE|PE|HE|JS|JS2
, and the
default is EE
. For instance, specify -target:HE
to produce an
export file which can be executed by Saxon-HE (this will suppress the generation of optimized
constructs that Saxon-HE cannot execute).
A stylesheet export file for a complete stylesheet (as distinct from a library package) is accepted by any Saxon interface that accepts a source stylesheet. For example, from the command line:
java -jar dir/saxon9ee.jar -xsl:stylesheet.sef -s:source.xmlA stylesheet export file is also needed when using the Saxon-JS product to run transformations
in the browser. In this case the export file must be generated with the option
-target:JS
or -target:JS2
because there are minor differences (for example, for some constructs such as node tests in path
expressions and match patterns the export file actually includes fragments of generated Javascript code
to speed evaluation).
When exporting a package, all components (templates, functions, etc) from the packages it
uses are also exported. It is possible therefore either to export an individual library
package (typically having no dependencies on other packages), or a complete stylesheet (a
package together with its tree of dependencies). As well as the s9api interface, packages
can also be exported using the -export
option on the net.sf.saxon.Transform command line. Packages can
similarly be imported either by listing them in the -pack
option of
net.sf.saxon.Transform
, or within s9api by use of the XsltCompiler methods
loadLibraryPackage
and loadExecutablePackage
.
In the case of schema-aware stylesheets, the schema components needed by a stylesheet are not exported along with the stylesheet code. The user of the stylesheet needs to import the required schemas before the stylesheets can be loaded. The schema loaded at execution time must match the schema used when the stylesheet was compiled. Saxon is not draconian about checking this, and many minor changes will cause no trouble (for example, changing the regular expression used in a pattern facet). Structural changes that invalidate the assumptions made during XSLT compilation, however, are likely to cause execution to fail, not necessarily in predictable ways.
The computer on which the stylesheet is executed needs to have a Saxon license of sufficient capability to meet the requirements of the stylesheet. There are two ways this can be achieved. Either the run-time system can have a conventional Saxon license installed in the normal way, or it can take advantage of a license embedded within the exported stylesheet itself. Saxonica offers developers the option of purchasing a "developer master key" which, if installed, will cause all exported stylesheets to contain an embedded license key sufficient to execute the stylesheet in question. An embedded license key applies only to that stylesheet and cannot be used for any other code developed elsewhere; stylesheets that are exported with an embedded license can only be executed "as is", and cannot be incorporated as libraries into larger applications.
Exporting stylesheet packages requires Saxon-EE, optionally with the Developer Master Key if stylesheets with embedded license information are to be exported. From Saxon 9.9, importing stylesheet packages is possible using any Saxon edition, provided that the run-time software and the run-time license key (where needed) support the features used by the stylesheet in question.
There are a small number of cases where a valid stylesheet cannot be exported; but they are very unlikely to be encountered in practice. For example:
- Where the value of a static global variable is initialized by calling
fn:load-query-module()
(because we cannot export functions containing XQuery-specific constructs such as general FLWOR expressions). - Where the value of a static global variable is a function item returned from
another stylesheet invoked by calling
fn:transform()
. - Where the stylesheet binds namespaces that include whitespace characters.
Bytecode generation
When a stylesheet package is compiled into its in-memory representation, Saxon-EE by default generates Java bytecode for faster execution of selected parts of the code. The generated bytecode is mixed with interpreted code, each calling the other where appropriate.
From Saxon 9.8, bytecode generation is by default applied only to hotspots, that is, parts of the
executable code that are found to be frequently executed. These will often be predicates in filter
expressions. The threshold for generating bytecode is configurable. Bytecode generation can be monitored
using the -TB
option on the command line.
The performance boost achieved by bytecode generation is variable; 25% is typical. The functions and templates that benefit the most are those where the expression tree contains many constructs that are relatively cheap in themselves, such as type conversion, comparisons, and arithmetic. This is because the saving from bytecode generation is mainly not in the cost of performing primitive operations, but in the cost of deciding which operations to perform: so the saving is greater where the number of operations is high relative to their average cost.
There are configuration options to suppress bytecode generation
(Feature.GENERATE_BYTE_CODE
), to insert debugging logic into the
generated bytecode (Feature.DEBUG_BYTE_CODE
), and to display the generated
bytecode (Feature.DISPLAY_BYTE_CODE
). See Configuration Features for more
information.
Currently, exported packages do not include bytecode.