Customizing serialization
In XSLT 3.0 there are three ways the result of a stylesheet may be delivered: it may be delivered as a raw sequence (any XDM value), it may be subjected to sequence normalization as defined in the W3C serialization specification, or it may be serialized after sequence normalization. This turns out to be a substantial change with significant implications, which has led to considerable internal change in the output pipeline in Saxon, some of which is visible in terms of changed API behavior.
At the s9api level, the data passed from an Xslt30Transformer
or XQueryEvaluator
to a Destination
is always a raw sequence (any sequence of items, nodes or atomic values or functions, not wrapped
in a document node). This is delivered over Saxon's internal Receiver
interface, which allows nodes to be passed either
in composed form (as a NodeInfo
object along with its entire subtree), or in decomposed form
(as a sequence of events like startElement
and endElement
). There are strict rules on the "well-formedness" of
the calls made across this interface, defined in the JavaDoc for class
net.sf.saxon.event.RegularSequenceChecker,
and it is possible to insert a RegularSequenceChecker
into the pipeline to check that the rules are being adhered to.
It is the responsibility of the Destination
to perform sequence normalization (if requested) and/or serialization.
Because sequence normalization may involve inserting item separators based on the item-separator
serialization property,
all destinations now have access to the full set of serialization properties.
There are a number of implementations of the Destination
interface supplied with the product:
- The
RawDestination
delivers the raw result as anXdmValue
. - The
Serializer
supports various serialization methods. - The
SAXDestination
allows output to a SAXContentHandler
. - The
XMLStreamWriterDestination
allows output to anXMLStreamWriter
. - The
DOMDestination
allows building of a DOM tree. - The
TeeDestination
allows forking of the output to two different destinations. - A destination can be obtained that streams data into an XSLT transformation or a schema validator.
In addition, it is quite possible to write your own implementation of the Destination
interface, either from scratch
or by subclassing an existing implementation.
Some ContentHandler
implementations require a sequence of events corresponding
to a well-formed document (that is, one whose document node has exactly one element node and
no text nodes among its children). If this is the case, you can specify the additional
output property saxon:require-well-formed="yes"
, which will cause Saxon to
report an error if the result tree is not well-formed.
As specified in the JAXP interface, requests to disable or re-enable output escaping are
also notified to the content handler by means of special processing instructions. The names
of these processing instructions are defined by the constants
PI_DISABLE_OUTPUT_ESCAPING
and PI_ENABLE_OUTPUT_ESCAPING
defined
in class javax.xml.transform.Result
.
An abstract implementation of the Receiver interface is available in the Emitter class. This class provides additional
functionality which is useful if you want to serialize the result to a byte or character output
stream. If the Receiver
that you supply as an output destination is an instance
of Emitter
, then it has access to all the serialization parameters supplied in
the xsl:output
declaration, or made available using the Java API.
See the documentation of class net.sf.saxon.event.Receiver for details of the methods available, or implementations such as HTMLEmitter and XMLEmitter and TEXTEmitter for the standard output formats supported by Saxon.
It can sometimes be useful to set up a chain of Receiver
s working as a
pipeline. To write a filter that participates in such a pipeline, the class ProxyReceiver is supplied.
See the class XMLIndenter,
which handles XML indentation, as an example of how to write a
ProxyReceiver
.
Saxon sets up such a pipeline when an output file is opened, using a class called the SerializerFactory. You can
override the standard SerializerFactory
with your own subclass, which you can
nominate to the setSerializerFactory()
method of the Configuration. This uses individual methods to
create each stage of the pipeline, so you can either override the method that constructs the
entire pipeline, or override a method that creates one of its stages. For example, if you
want to subclass the XMLEmitter (perhaps to force all non-ASCII characters to be output as hexadecimal
character references), you can override the method newXMLEmitter()
to return an
instance of your own subclass of XMLEmitter
, which might override the method
writeEscape()
.
Any number of user-defined attributes may be defined on xsl:output. These
attributes must have names in a non-null namespace, which must not be either the XSLT or the
Saxon namespace. The value of the attribute is inserted into the Properties
object made available to the Emitter
handling the output; they will be ignored
by the standard output methods, but can supply arbitrary information to a user-defined
output method. The name of the property will be the expanded name of the attribute in JAXP
format, for example {http://my-namespace/uri}local-name
, and the value will be
the value as given, after evaluation as an attribute value template.