Customizing Serialization

In XSLT 3.0 there are three ways the result of a stylesheet may be delivered: it may be delivered as a raw sequence (any XDM value), it may be subjected to sequence normalization as defined in the W3C serialization specification, or it may be serialized after sequence normalization. This turns out to be a substantial change with significant implications, which has led to considerable internal change in the output pipeline in Saxon, some of which is visible in terms of changed API behavior.

At the s9api level, the data passed from an Xslt30Transformer or XQueryEvaluator to a Destination is always a raw sequence (any sequence of items, nodes or atomic values or functions, not wrapped in a document node). This is delivered over Saxon's internal Receiver interface, which allows nodes to be passed either in composed form (as a NodeInfo object along with its entire subtree), or in decomposed form (as a sequence of events like startElement and endElement). There are strict rules on the "well-formedness" of the calls made across this interface, defined in the JavaDoc for class net.sf.saxon.event.RegularSequenceChecker, and it is possible to insert a RegularSequenceChecker into the pipeline to check that the rules are being adhered to.

It is the responsibility of the Destination to perform sequence normalization (if requested) and/or serialization.

Because sequence normalization may involve inserting item separators based on the item-separator serialization property, all destinations now have access to the full set of serialization properties.

There are a number of implementations of the Destination interface supplied with the product:

The RawDestination delivers the raw result as an XdmValue.
The Serializer supports various serialization methods.
The SAXDestination allows output to a SAX ContentHandler.
The XMLStreamWriterDestination allows output to an XMLStreamWriter.
The DOMDestination allows building of a DOM tree.
The TeeDestination allows forking of the output to two different destinations.
A destination can be obtained that streams data into an XSLT transformation or a schema validator.

In addition, it is quite possible to write your own implementation of the Destination interface, either from scratch or by subclassing an existing implementation.

Some ContentHandler implementations require a sequence of events corresponding to a well-formed document (that is, one whose document node has exactly one element node and no text nodes among its children). If this is the case, you can specify the additional output property saxon:require-well-formed="yes", which will cause Saxon to report an error if the result tree is not well-formed.

As specified in the JAXP interface, requests to disable or re-enable output escaping are also notified to the content handler by means of special processing instructions. The names of these processing instructions are defined by the constants PI_DISABLE_OUTPUT_ESCAPING and PI_ENABLE_OUTPUT_ESCAPING defined in class javax.xml.transform.Result.

An abstract implementation of the Receiver interface is available in the Emitter class. This class provides additional functionality which is useful if you want to serialize the result to a byte or character output stream. If the Receiver that you supply as an output destination is an instance of Emitter, then it has access to all the serialization parameters supplied in the xsl:output declaration, or made available using the Java API.

See the documentation of class net.sf.saxon.event.Receiver for details of the methods available, or implementations such as HTMLEmitter and XMLEmitter and TEXTEmitter for the standard output formats supported by Saxon.

It can sometimes be useful to set up a chain of Receivers working as a pipeline. To write a filter that participates in such a pipeline, the class ProxyReceiver is supplied. See the class XMLIndenter, which handles XML indentation, as an example of how to write a ProxyReceiver.

Saxon sets up such a pipeline when an output file is opened, using a class called the SerializerFactory. You can override the standard SerializerFactory with your own subclass, which you can nominate to the setSerializerFactory() method of the Configuration. This uses individual methods to create each stage of the pipeline, so you can either override the method that constructs the entire pipeline, or override a method that creates one of its stages. For example, if you want to subclass the XMLEmitter (perhaps to force all non-ASCII characters to be output as hexadecimal character references), you can override the method newXMLEmitter() to return an instance of your own subclass of XMLEmitter, which might override the method writeEscape().

Any number of user-defined attributes may be defined on xsl:output. These attributes must have names in a non-null namespace, which must not be either the XSLT or the Saxon namespace. The value of the attribute is inserted into the Properties object made available to the Emitter handling the output; they will be ignored by the standard output methods, but can supply arbitrary information to a user-defined output method. The name of the property will be the expanded name of the attribute in JAXP format, for example {http://my-namespace/uri}local-name, and the value will be the value as given, after evaluation as an attribute value template.