saxonica.com

Customizing Serialization

The output of a Saxon stylesheet or query can be directed to a user-defined output filter. This filter can be defined either as a SAX2 ContentHandler, or as a subclass of the Saxon classes net.sf.saxon.event.Receiver.

One advantage of using the Saxon classes is that more information is available from the stylesheet, for example the attributes of the xsl:output element; another is that (if you are using the schema-aware version of the product) type annotations are available on element and attribute nodes.

A transformation can be invoked from the Java API using the standard JAXP method transformer.transform(source, result). The second argument must implement the JAXP class javax.xml.transform.Result. To send output to a SAX ContentHandler, you can wrap the ContentHandler in a JAXP SAXResult object. To send output to a Saxon Receiver (which might also be an Emitter), you can supply the Receiver directly, since the Saxon Receiver interface extends the JAXP Result interface.

When running XQuery, Saxon offers a similar method on the XQueryExpression object: the run() method. This also takes an argument of type Result, which may be (among other things) a SAXResult or a Saxon Receiver.

Some ContentHandler implementations require a sequence of events corresponding to a well-formed document (that is, one whose document node has exactly one element node and no text nodes among its children). If this is the case, you can specify the additional output property saxon:require-well-formed="yes", which will cause Saxon to report an error if the result tree is not well-formed.

As specified in the JAXP interface, requests to disable or re-enable output escaping are also notified to the content handler by means of special processing instructions. The names of these processing instructions are defined by the constants PI_DISABLE_OUTPUT_ESCAPING and PI_ENABLE_OUTPUT_ESCAPING defined in class javax.xml.transform.Result.

As an alternative to specifying the destination in the transform() or run() methods, the Receiver or ContentHandler to be used may be specified in the method attribute of the xsl:output element, as a fully-qualified class name; for example method="prefix:com.acme.xml.SaxonOutputFilter". The namespace prefix is ignored, but must be present to meet XSLT conformance rules.

An abstract implementation of the Receiver interface is available in the Emitter class. This class provides additional functionality useful if you want to serialize the result to a byte or character output stream. If the Receiver that you supply as an output destination is an instance of Emitter, then it has access to all the serialization parameters supplied in the xsl:output declaration, or made available using the Java API.

See the documentation of class net.sf.saxon.event.Receiver for details of the methods available, or implementations such as HTMLEmitter and XMLEmitter and TEXTEmitter for the standard output formats supported by Saxon.

It can sometimes be useful to set up a chain of Receivers working as a pipeline. To write a filter that participates in such a pipeline, the class ProxyReceiver is supplied. Use the class XMLIndenter, which handles XML indentation, as an example of how to write a ProxyReceiver.

Saxon sets up such a pipeline when an output file is opened, using a class called the SerializerFactory. The standard SerializerFactory is in class net.sf.saxon.event.SerializerFactory, but you can override this with your own subclass, which you can nominate to the setSerializerFactory() method of the Configuration. This uses individual methods to create each stage of the pipeline, so you can either override the method that constructs the entire pipeline, or override a method that creates one of its stages. For example, if you want to subclass the XMLEmitter (perhaps to force all non-ASCII characters to be output as hexadecimal character references), you can override the method newXMLEmitter() to return an instance of your own subclass of XMLEmitter, which might override the method writeEscape().

Rather than writing an output filter in Java, Saxon also allows you to process the output through another XSLT stylesheet. To do this, simply name the next stylesheet in the saxon:next-in-chain attribute of xsl:output.

Any number of user-defined attributes may be defined on xsl:output. These attributes must have names in a non-null namespace, which must not be either the XSLT or the Saxon namespace. The value of the attribute is inserted into the Properties object made available to the Emitter handling the output; they will be ignored by the standard output methods, but can supply arbitrary information to a user-defined output method. The name of the property will be the expanded name of the attribute in JAXP format, for example {http://my-namespace/uri}local-name, and the value will be the value as given, after evaluation as an attribute value template.

Next