Burst-mode streaming
The saxon:stream
extension function enables burst-mode streaming by reading a source document and delivering
a sequence of element nodes representing selected elements within that document. For example:
This example returns a sequence of employee
elements. These elements are parentless, so it is not
possible to navigate from one employee element to others in the file; in fact, only one of them actually exists in memory
at any one time.
The function saxon:stream
may be regarded as a pseudo-function. Conceptually, it takes the set of nodes
supplied in its argument, and makes a deep copy of each one (the copy operation is needed to make the
employee
elements parentless). The resulting sequence of nodes will usually be processed by
an instruction such as xsl:for-each
or xsl:iterate
, or by a FLWOR expression in XQuery,
which handles the nodes one at a time. The actual implementation of saxon:stream
, however, is
rather different, in that it changes the way in which its argument is evaluated: instead of the doc()
function building a tree in the normal way, the path expression doc('employees.xml')/*/employee)
is evaluated in streamed mode - which means that it must conform to a subset of the XPath syntax which Saxon
can evaluate in streamed mode. For details of this subset, see Streamable path expressions
The facility should not be used if the source document is read more than once in the course
of the query/transformation. There are two reasons for this: firstly, performance will be better in this case if the
document is read into memory; and secondly, when this optimization is used, there is no guarantee that the
doc()
function will be stable, that is, that it will return the same results when called
repeatedly with the same URI.
If the path expression cannot be evaluated in streaming mode, execution does not fail; rather it is evaluated
with an unoptimized copy-of instruction. This will give the same results provided enough memory is available for
this mode of evaluation. To check whether streamed processing is actually being used, set the -t option from the
command line or the FeatureKeys.TIMING
option from the configuration API; the output will indicate whether
a particular source document has been processed by building a tree, or by streaming.
In XSLT an alternative way of invoking the facility is by using an <xsl:copy-of>
instruction with the special attribute saxon:read-once="yes"
. Typically the xsl:copy-of
instruction will form the body of a stylesheet function, which can then be called in the same way
as saxon:stream
to deliver the stream of records. This approach has the advantage that the
code is portable to other XSLT processors (saxon:read-once="yes"
is an extension attribute,
a processing hint that other XSLT processors are required to ignore.)
In XQuery the same effect can
be achieved using a pragma (# saxon:read-once #)
. Again, processors other than Saxon are required to ignore this
pragma.
Example: selective copying
A very simple way of using this technique is when making a selective copy of parts of a document.
For example, the following code
creates an output document containing all the footnote
elements from the source document
that have the attribute @type='endnote'
:
XSLT example
<xsl:template name="main"> <footnotes> <xsl:sequence select="saxon:stream(doc('thesis.xml')//footnote[@type='endnote'])" xmlns:saxon="http://saxon.sf.net/"/> </footnotes> </xsl:template>XQuery example
<footnotes>{ saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) }</footnotes>XSLT example using xsl:copy-of
To allow code to be written in a way that will still work with processors other than Saxon, the facility can also be invoked using extension attributes in XSLT. Using this syntax, the previous example can be written as:
XSLT example
<xsl:template name="main"> <footnotes> <xsl:copy-of select="doc('thesis.xml')//footnote[@type='endnote']" saxon:read-once="yes" xmlns:saxon="http://saxon.sf.net/"/> </footnotes> </xsl:template>XQuery example using the saxon:stream pragma
In XQuery the pragma saxon:stream
is available as an alternative to the
function of the same name, allowing the code to be kept portable. The above example can
be written:
Note the restrictions below on the kind of predicate that may be used.