saxon:stream

Processes an input document using streaming, that is, without allocating memory to contain the entire document.

stream($input as item()*) ➔ item()*

Arguments
	$input	item()*	The input to be streamed
Result		item()*

Namespace

http://saxon.sf.net/

Notes on the Saxon implementation

Available since before Saxon 8.0. Obsolescent in XSLT, since the xsl:source-document instruction with attribute streamable="yes" provides equivalent functionality; but still useful in XQuery.

Changed in 9.6 so that the function delivers snapshots of the selected nodes rather than copies: that is, the delivered nodes include copies of the ancestors and their attributes, as well as the attributes and descendants of the selected node.

Changed in 9.7 to compile into a call on the XSLT 3.0 xsl:source-document instruction, which leads to minor changes in functionality; most notably, the argument expression must now call the doc() function, and not document().

Details

Conceptually, this function returns a copy of its input. The intent, however, is to evaluate the supplied argument in "streaming mode", which allows an input document to be processed without building a tree represention of the whole document in memory. This allows much larger documents to be processed using Saxon than would otherwise be the case.

When there is a requirement to stream documents other than the principal input, this can be achieved in XQuery using the saxon:stream extension function, which enables burst-mode streaming by reading a source document and delivering a sequence of element nodes representing selected elements within that document. For example:

for $e in saxon:stream(doc('employees.xml')/*/employee) return <sal>{$e/salary}</sal>

This example returns a sequence of <sal> elements. The result of the saxon:stream call is a sequence of <employee> elements. Each <employee> element is linked to copies of: its attributes and namespaces; its descendants and their attributes; its ancestors and their attributes. But the siblings of the selected node, and of its ancestors, are missing, and any attempt to select them will return an empty sequence. This means it is not possible to navigate from one <employee> element to others in the file; in fact, only one of them actually exists in memory at any one time.

The function saxon:stream may be regarded as a pseudo-function. Conceptually, it takes the set of nodes supplied in its argument, and makes a snapshot of each one (in the sense of the XSLT 3.0 fn:snapshot() function). The resulting sequence of nodes will usually be processed by an expression such as an XQuery FLWOR expression, which handles the nodes one at a time. The actual implementation of saxon:stream, however, is rather different, in that it changes the way in which its argument is evaluated: instead of the doc() function building a tree in the normal way, the path expression doc('employees.xml')/*/employee) is evaluated in streamed mode - which means that it must conform to a subset of the XPath syntax which Saxon can evaluate in streamed mode. For more information, see Streaming in XQuery.

The facility should not be used if the source document is read more than once in the course of the query/transformation. There are two reasons for this: firstly, if it is read more than once then performance will be better if the document is read into memory; and secondly, when this optimization is used, there is no guarantee that the doc() function will be stable, that is, that it will return the same results when called repeatedly with the same URI.

A call of the form saxon:stream(doc(D)/PATH[PREDICATE]) is translated into the XSLT 3.0 construct <xsl:source-document streamable="yes" href="D"><xsl:sequence select="snapshot(PATH[PREDICATE])"/></xsl:source-document> and is allowed only if that construct is guaranteed-streamable according to the XSLT 3.0 rules. This means that the predicate, if present, must be motionless (it cannot call position(), explicitly or implicitly, and it cannot navigate downwards from the node being tested, either explicitly using the child or descendant axis, or implicitly by getting the string value or typed value of the node).

If the path expression cannot be evaluated in streaming mode, execution fails, unless the configuration option FeatureKeys.STREAMING_FALLBACK is set, in which case it is executed in non-streaming mode (that is, the result is simply the result of evaluating the argument to the function).

For further details see Streaming of Large Documents.