Writing input filters
SaxonJ generally takes its input from a JAXP SAXSource object, which
represents a sequence of SAX events as output by an XML parser. These events are sent
to the internal class ReceivingContentHandler,
which converts them to a slightly
different format, which are then passed to a Saxon
Receiver. In a typical scenario, the events are passed through
a pipeline of Receivers, each of which modifies the events in some way.
Examples of the steps on this pipeline include:
- A whitespace stripper, responsible for removing whitespace as directed by the
xsl:strip-spaceandxsl:preserve-spacedeclarations. - A schema validator, responsible for performing schema validation (which not only validates the input against the schema, but also adds type annotations and expands default values for absent attributes).
- An annotation stripper, responsible for removing type annotations as directed
by the
input-type-annotations="strip"attribute in a stylesheet.
At the end of this pipeline, the events are typically passed to one of:
- A tree builder, which builds a tree of nodes, ready for query or transformation.
- A streaming XSLT transformation.
- A serializer (to implement an identity transformation).
It is possible to add a user-written filter to the input pipeline. This might be used, for example, to:
- Rename elements or attributes, perhaps changing their namespace.
- Add or remove elements or attributes.
- Strip comments or processing instructions.
- Expand processing instructions (for example, a processing instruction might contain a SQL query to access a database).
- Perform a complete XSLT transformation, streamed or unstreamed.
A filter can either be inserted to process SAX events, before they are converted
to Receiver events, or it can be inserted to process Receiver
events after the conversion.
To filter events at the SAX level, the techniques include:
-
Generate the transformation as an
XMLFilterusing thenewXMLFilter()method of theTransformerFactory. This works with XSLT only. A drawback of this approach is that it is not possible to supply parameters to the transformation using standard JAXP facilities. It is possible, however, by casting theXMLFilterto a net.sf.saxon.jaxp.FilterImpl, and calling itsgetTransformer()method, which returns aTransformerobject offering the usualaddParameter()method. -
Generate the transformation as a SAX
ContentHandlerusing thenewTransformerHandler()method. The pipeline stages after the transformation can be added by giving the transformation aSAXResultas its destination. This again is XSLT only. -
Implement the pipeline step before the transformation or query as an
XMLFilter, and use this as theXMLReaderpart of aSAXSource, pretending to be an XML parser. This technique works with both XSLT and XQuery, and it can even be used from the command line, by nominating theXMLFilteras the source parser using the-xoption on the command line.
To insert a filter for Receiver events, it is usual to implement the
filter by extending the class ProxyReceiver, overriding only the methods for those
events that need to be changed. The filter can be injected into the pipeline by supplying
the document in the form of an AugmentedSource: a typical example would be:
Here MyFilter is typically a class that extends ProxyReceiver
by overriding some of its methods: for example, you might override the comment()
method to do nothing, which has the effect of stripping comments from the source document.
Filters inserted into the pipeline in this way are applied after any system-defined filters such as the schema validator.