saxon:stream

Provides streamed input.

stream($input as item()*) ➔ item()*

Arguments
	$input	item()*	The input to be streamed
Result		item()*

Namespace

http://saxon.sf.net/

Notes on the Saxon implementation

Available since before Saxon 8.0. Obsolescent in XSLT, since the xsl:stream instruction provides equivalent functionality; but still useful in XQuery.

Changed in 9.6 so that the function delivers snapshots of the selected nodes rather than copies: that is, the delivered nodes include copies of the ancestors and their attributes, as well as the attributes and descendants of the selected node.

Details

Conceptually, this function returns a copy of its input. The intent, however, is to evaluate the supplied argument in "streaming mode", which allows an input document to be processed without building a tree represention of the whole document in memory. This allows much larger documents to be processed using Saxon than would otherwise be the case.

When there is a requirement to stream documents other than the principal input, this can be achieved in XQuery using the saxon:stream extension function, which enables burst-mode streaming by reading a source document and delivering a sequence of element nodes representing selected elements within that document. For example:

for $e in saxon:stream(doc('employees.xml')/*/employee) return <sal>{$e/salary}</sal>

This example returns a sequence of <sal> elements. The result of the saxon:stream call is a sequence of <employee> elements. Each <employee> element is linked to copies of: its attributes and namespaces; its descendants and their attributes; its ancestors and their attributes. But the siblings of the selected node, and of its ancestors, are missing, and any attempt to select them will return an empty sequence. This means it is not possible to navigate from one <employee> element to others in the file; in fact, only one of them actually exists in memory at any one time.

The function saxon:stream may be regarded as a pseudo-function. Conceptually, it takes the set of nodes supplied in its argument, and makes a snapshot of each one (in the sense of the XSLT 3.0 fn:snapshot() function). The resulting sequence of nodes will usually be processed by an expression such as an XQuery FLWOR expression, which handles the nodes one at a time. The actual implementation of saxon:stream, however, is rather different, in that it changes the way in which its argument is evaluated: instead of the doc() function building a tree in the normal way, the path expression doc('employees.xml')/*/employee) is evaluated in streamed mode - which means that it must conform to a subset of the XPath syntax which Saxon can evaluate in streamed mode. For more information, see Streaming in XQuery.

The facility should not be used if the source document is read more than once in the course of the query/transformation. There are two reasons for this: firstly, if it is read more than once then performance will be better if the document is read into memory; and secondly, when this optimization is used, there is no guarantee that the doc() function will be stable, that is, that it will return the same results when called repeatedly with the same URI.

If the path expression cannot be evaluated in streaming mode, execution does not fail; rather it is evaluated in unstreamed mode. This will give the same results provided enough memory is available for this mode of evaluation. To check whether streamed processing is actually being used, set the -t option from the command line or the FeatureKeys.TIMING option from the configuration API; the output will indicate whether a particular source document has been processed by building a tree, or by streaming.

The expression used as an argument to the saxon:stream function must consist of:

A call to the document() or doc() function, followed by
A streamable pattern

Streamable patterns use a subset of XPath expression syntax corresponding to the rules for motionless match patterns in XSLT 3.0.

For further details see Streaming of Large Documents.