Burst-mode streaming

The saxon:stream extension function enables burst-mode streaming by reading a source document and delivering a sequence of element nodes representing selected elements within that document. For example:

saxon:stream(doc('employees.xml')/*/employee)

This example returns a sequence of employee elements. These elements are parentless, so it is not possible to navigate from one employee element to others in the file; in fact, only one of them actually exists in memory at any one time.

The function saxon:stream may be regarded as a pseudo-function. Conceptually, it takes the set of nodes supplied in its argument, and makes a deep copy of each one (the copy operation is needed to make the employee elements parentless). The resulting sequence of nodes will usually be processed by an instruction such as xsl:for-each or xsl:iterate, or by a FLWOR expression in XQuery, which handles the nodes one at a time. The actual implementation of saxon:stream, however, is rather different, in that it changes the way in which its argument is evaluated: instead of the doc() function building a tree in the normal way, the path expression doc('employees.xml')/*/employee) is evaluated in streamed mode - which means that it must conform to a subset of the XPath syntax which Saxon can evaluate in streamed mode. For details of this subset, see Streamable path expressions

The facility should not be used if the source document is read more than once in the course of the query/transformation. There are two reasons for this: firstly, performance will be better in this case if the document is read into memory; and secondly, when this optimization is used, there is no guarantee that the doc() function will be stable, that is, that it will return the same results when called repeatedly with the same URI.

If the path expression cannot be evaluated in streaming mode, execution does not fail; rather it is evaluated with an unoptimized copy-of instruction. This will give the same results provided enough memory is available for this mode of evaluation. To check whether streamed processing is actually being used, set the -t option from the command line or the FeatureKeys.TIMING option from the configuration API; the output will indicate whether a particular source document has been processed by building a tree, or by streaming.

In XSLT an alternative way of invoking the facility is by using an <xsl:copy-of> instruction with the special attribute saxon:read-once="yes". Typically the xsl:copy-of instruction will form the body of a stylesheet function, which can then be called in the same way as saxon:stream to deliver the stream of records. This approach has the advantage that the code is portable to other XSLT processors (saxon:read-once="yes" is an extension attribute, a processing hint that other XSLT processors are required to ignore.)

In XQuery the same effect can be achieved using a pragma (# saxon:read-once #). Again, processors other than Saxon are required to ignore this pragma.

Example: selective copying

A very simple way of using this technique is when making a selective copy of parts of a document. For example, the following code creates an output document containing all the footnote elements from the source document that have the attribute @type='endnote':

XSLT example

<xsl:template name="main"> <footnotes> <xsl:sequence select="saxon:stream(doc('thesis.xml')//footnote[@type='endnote'])" xmlns:saxon="http://saxon.sf.net/"/> </footnotes> </xsl:template>

XQuery example

<footnotes>{ saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) }</footnotes>

XSLT example using xsl:copy-of

To allow code to be written in a way that will still work with processors other than Saxon, the facility can also be invoked using extension attributes in XSLT. Using this syntax, the previous example can be written as:

XSLT example

<xsl:template name="main"> <footnotes> <xsl:copy-of select="doc('thesis.xml')//footnote[@type='endnote']" saxon:read-once="yes" xmlns:saxon="http://saxon.sf.net/"/> </footnotes> </xsl:template>

XQuery example using the saxon:stream pragma

In XQuery the pragma saxon:stream is available as an alternative to the function of the same name, allowing the code to be kept portable. The above example can be written:

<footnotes>{ (# saxon:stream #) { doc('thesis.xml')//footnote[@type='endnote'] } }</footnotes>

Note the restrictions below on the kind of predicate that may be used.