Pull processing
Saxon 8.3 contains some new classes to support a pull pipeline. At present this should be regarded as preliminary and experimental; it provides some new ways of providing input to Saxon and reading results from Saxon, but plays no significant role within the product architecture yet. Interfaces are likely to change.
A new interface, PullProvider
is included. This interface is modelled on the
XMLStreamReader
interface that forms part of StAX
, but modified
to use Saxon concepts such as NamePools and SequenceIterators. This interface allows a caller to
read an XML document by a sequence of calls on the method next()
: each such call
advances the position of a cursor and makes information available about the current context. Typically,
next()
reports that it has read the start of an element, a text node, a comment, the end of an element
node, and so on. Attributes and namespaces are not reported as events, but information about them is
available to the caller immediately after the START_ELEMENT
event is notified.
The PullProvider
can in fact read any XPath sequence, containing nodes and atomic values.
When a node is encountered, the client can "drill down" to get the events within the subtree rooted at that node.
(Alternatively, the client can skip the node and move on). It is not possible to navigate in arbitrary directions
from the node, because the node may have no real existence in memory: this is a streaming interface.
A class PullSource
is available that wraps a PullProvider
as a JAXP Source
object. This allows any PullProvider
to be supplied as input to a transformation or query.
It is possible to obtain a PullProvider
that reads the contents of an Saxon tree, starting
at a given node. There are two variants of this: TreeWalker
which can handle any tree (that is,
any implementation of the NodeInfo
interface), and TinyTreeWalker
, which is
optimized for the TinyTree.
It is possible to bridge between Saxon's pull and push interfaces using a PullPushCopier
.
This reads events from a PullProvider
and sends equivalent events to a Receiver
.
A PullProvider
is available that interfaces to a StAX pull-parser. This class is called
StaxBridge
. It has been tested with pull parsers from BEA and Sun. Both these parsers are
currently early releases and have been found to be rather buggy: no doubt they will improve in subsequent
versions.
The StaxBridge
class is the only class in Saxon that depends on the presence
of the StAX API. For this reason, it is not bundled as part of the general saxon8.jar
file.
Instead, it is included for the time being in the samples
directory.
There is no dependency on any particular StAX parser: it will pick up whatever parser is on the classpath,
or selected using the relevant Java system properties.
A class PullFilter
is available that simply joins two PullProviders
end-to-end.
This can be subclassed (in the same way as the XMLFilter
class in SAX) to provide a wide
variety of components that analyze or modify the event stream. This allows pull pipelines to be built in very
much the same way as Saxon's existing push pipelines.
An eventual aim of this work is to enable tree-construction expressions to be evaluated in pull mode. This will
allow lazy evaluation of trees in the same way as Saxon currently makes heavy use of lazy evaluation of sequences.
For example, given a construct such as <e a="{$x}"/>
(in either XSLT or XQuery),
Saxon would be able to return to the caller a sequence of events (in this case, just a start-element and
end-element event) without ever building a tree in memory. This is similar to what happens today using
the push pipeline when writing a final result tree, especially in XSLT. For XQuery, however, where it is more common
to construct many small intermediate trees, being able to switch between pull and push processing for such
expressions offers considerable advantages.