JAXP source types
When a user application invokes SaxonJ via the Java API, then a source document is supplied as
an instance of the JAXP Source
class. This is true whether invoking an XSLT
transformation, an XQuery query, or a free-standing XPath expression. The Source
class is essentially a marker interface. The Source
that is supplied must be a
kind of Source
that Saxon recognizes.
SaxonJ recognizes all three kinds of Source
defined in JAXP: a
StreamSource
, a SAXSource
, and a DOMSource
.
-
When using a
StreamSource
, note:- A
StreamSource
that wraps anInputStream
orReader
can only be used once: it is consumed by use. However, aStreamSource
that wraps aFile
or URI can be used multiple times. - Whoever creates an
InputStream
orReader
is responsible for closing it after use. This means that if Saxon creates anInputStream
from a suppliedFile
or URI, it will close thatInputStream
after use; but if theInputStream
is created by the calling application, then the calling application is responsible for closing it. (On some operating systems it is important not to leave unclosed streams lying around.) - If the
StreamSource
wraps anInputStream
orReader
, then the base URI of the document is taken from theSystemID
property of theStreamSource
. If this is not set, then the base URI is unknown, which may cause constructs that require a known base URI to fail.
- A
-
When using a
SAXSource
, note:- If no
XMLReader
is supplied, Saxon will allocate one, based on settings in theConfiguration
. - Processing of the contained
InputSource
is entirely the responsibility of the XML parser; Saxon is not involved in this. - Saxon will modify properties of the supplied
XMLReader
: it will set theContentHandler
andLexicalHandler
so that it can receive the output of parsing, and it will set theErrorHandler
so it can handle parsing errors. - Saxon makes no attempt to ensure that processing of a
SAXSource
or its underlyingXMLReader
is thread-safe. The sameXMLReader
should not be used concurrently in multiple threads.
- If no
-
When using a
DOMSource
, note:- The DOM is not thread-safe, even when used in read-only mode. Saxon therefore synchronizes all its access to DOM methods. However, that's no protection if there are application threads accessing the DOM that aren't using Saxon.
- Saxon can only handle a DOM that is namespace-aware. If you are building the DOM using JAXP interfaces, be
sure to set
DocumentBuilderFactory.setNamespaceAware(true)
(this is not the default!). Saxon cannot reliably detect whether the DOM is namespace aware (it gives a warning for some common problems, but not all) and in general, the results of using a non-namespace aware DOM are unpredictable. - If the DOM is created programmatically (rather than being built by parsing lexical XML), then the DOM APIs perform very little checking: for example it is possible to have elements and attributes with invalid names. Saxon makes no attempt to check for such conditions, and may produce unpredictable results.
- The base URI
of the document is taken from the
SystemID
property of theDOMSource
. If this is not set, then the base URI is unknown, which may cause constructs that require a known base URI to fail. - Saxon's native TinyTree model is faster than DOM by a factor of 5 to 10 in typical XPath searches. Don't use the DOM with Saxon unless you have a very good reason.
- From Saxon 9.8, Saxon-EE uses a new mechanism for processing DOM trees, called the Domino model. This involves creating
an index of all the nodes in the DOM, providing for faster navigation. Saxon-PE and Saxon-HE continue to use the DOM
NodeWrapper
model, where DOM methods are used to navigate the tree. A transformation using the Domino model is still slower than Saxon's native TinyTree, but only by a factor of two. It also uses a lot more memory.
Other kinds of Source
that are recognized by most Saxon interfaces are:
TreeInfo
: Saxon'sTreeInfo
holds information about a document (or more generally any tree of nodes), and can be used directly as aSource
of a transformation.NodeInfo
: Saxon'sNodeInfo
represents a node in a tree, and can be used directly as aSource
of a transformation.StaxSource
: allows a pull parser to be used.PullSource
: Saxon's internal pull interface.EventSource
: Similar to anXMLReader
,but with a much simpler interface, anEventSource
has asend()
method that sends a stream of events to a SaxonReceiver
.SaplingDocument
: a sapling tree constructed using the sapling construction interface can be used anywhere (within Saxon) that aSource
is expected.
Saxon also accepts input from an XMLStreamReader
(javax.xml.stream.XMLStreamReader
), that is a StAX pull parser as defined in
JSR 173. This is achieved by creating an instance of net.sf.saxon.pull.StaxBridge, supplying the
XMLStreamReader
using the setXMLStreamReader()
method, and
wrapping the StaxBridge
object in an instance of net.sf.saxon.pull.PullSource, which implements the
JAXP Source
interface and can be used in any Saxon method that expects a
Source
. Saxon has been validated with two StAX parsers: the Zephyr parser from
Sun (which is supplied as standard with JDK 1.6), and the open-source Woodstox parser from
Tatu Saloranta. In Saxonica's experience, Woodstox is the more reliable of the two. However, there is
no immediate benefit in using a pull parser to supply Saxon input rather than a push parser;
the main use case for using an XMLStreamReader
is when the data is supplied from
some source other than parsing of lexical XML.
Nodes in Saxon's implementation of the XPath data model are represented by the interface NodeInfo. A NodeInfo
is
itself a Source
, which means that any method in the API that requires a source
object will accept any implementation of NodeInfo
. As discussed in the next
section, implementations of NodeInfo
are available to wrap Axiom, DOM, DOM4J,
JDOM2, or XOM nodes, and in all cases these wrapper objects can be used wherever a
Source
is required.
Saxon also provides a class net.sf.saxon.lib.AugmentedSource which implements the Source
interface.
This class encapsulates one of the standard Source
objects, and allows additional
processing options to be specified. These options include whitespace handling, schema and DTD
validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon
tree model.
Saxon allows additional Source
types to be supported by registering a SourceResolver with the Configuration object. The task of a
SourceResolver
is to convert a Source
that Saxon does not
recognize into a Source
that it does recognize. For example, this may be done by
building the document tree in memory and returning the NodeInfo object representing the root of the tree.