Writing input filters
SaxonJ generally takes its input from a JAXP SAXSource
object, which
represents a sequence of SAX events as output by an XML parser. These events are sent
to the internal class ReceivingContentHandler,
which converts them to a slightly
different format, which are then passed to a Saxon
Receiver. In a typical scenario, the events are passed through
a pipeline of Receiver
s, each of which modifies the events in some way.
Examples of the steps on this pipeline include:
- A whitespace stripper, responsible for removing whitespace as directed by the
xsl:strip-space
andxsl:preserve-space
declarations. - A schema validator, responsible for performing schema validation (which not only validates the input against the schema, but also adds type annotations and expands default values for absent attributes).
- An annotation stripper, responsible for removing type annotations as directed
by the
input-type-annotations="strip"
attribute in a stylesheet.
At the end of this pipeline, the events are typically passed to one of:
- A tree builder, which builds a tree of nodes, ready for query or transformation.
- A streaming XSLT transformation.
- A serializer (to implement an identity transformation).
It is possible to add a user-written filter to the input pipeline. This might be used, for example, to:
- Rename elements or attributes, perhaps changing their namespace.
- Add or remove elements or attributes.
- Strip comments or processing instructions.
- Expand processing instructions (for example, a processing instruction might contain a SQL query to access a database).
- Perform a complete XSLT transformation, streamed or unstreamed.
A filter can either be inserted to process SAX events, before they are converted
to Receiver
events, or it can be inserted to process Receiver
events after the conversion.
To filter events at the SAX level, the techniques include:
-
Generate the transformation as an
XMLFilter
using thenewXMLFilter()
method of theTransformerFactory
. This works with XSLT only. A drawback of this approach is that it is not possible to supply parameters to the transformation using standard JAXP facilities. It is possible, however, by casting theXMLFilter
to a net.sf.saxon.jaxp.FilterImpl, and calling itsgetTransformer()
method, which returns aTransformer
object offering the usualaddParameter()
method. -
Generate the transformation as a SAX
ContentHandler
using thenewTransformerHandler()
method. The pipeline stages after the transformation can be added by giving the transformation aSAXResult
as its destination. This again is XSLT only. -
Implement the pipeline step before the transformation or query as an
XMLFilter
, and use this as theXMLReader
part of aSAXSource
, pretending to be an XML parser. This technique works with both XSLT and XQuery, and it can even be used from the command line, by nominating theXMLFilter
as the source parser using the-x
option on the command line.
To insert a filter for Receiver
events, it is usual to implement the
filter by extending the class ProxyReceiver, overriding only the methods for those
events that need to be changed. The filter can be injected into the pipeline by supplying
the document in the form of an AugmentedSource: a typical example would be:
Here MyFilter
is typically a class that extends ProxyReceiver
by overriding some of its methods: for example, you might override the comment()
method to do nothing, which has the effect of stripping comments from the source document.
Filters inserted into the pipeline in this way are applied after any system-defined filters such as the schema validator.