saxon:stream
Processes an input document using streaming, that is, without allocating memory to contain the entire document.
stream($input as item()*) ➔ item()*
Arguments | |||
| $input | item()* | The input to be streamed |
Result | item()* |
Namespace
http://saxon.sf.net/
Notes on the Saxon implementation
Available since before Saxon 8.0. Obsolescent in XSLT, since the xsl:stream instruction provides equivalent functionality; but still useful in XQuery.
Changed in 9.6 so that the function delivers snapshots of the selected nodes rather than copies: that is, the delivered nodes include copies of the ancestors and their attributes, as well as the attributes and descendants of the selected node.
Changed in 9.7 to compile into a call on the XSLT 3.0 xsl:stream
instruction, which leads to minor changes in functionality; most notably, (a) the argument
expression must now call the doc()
function, and not document()
,
and (b) there is no fallback to non-streaming mode if the argument expression is not
streamable.
Details
Conceptually, this function returns a copy of its input. The intent, however, is to evaluate the supplied argument in "streaming mode", which allows an input document to be processed without building a tree represention of the whole document in memory. This allows much larger documents to be processed using Saxon than would otherwise be the case.
When there is a requirement to stream documents other than the principal input, this
can be achieved in XQuery using the saxon:stream
extension function, which
enables burst-mode streaming by reading a source document and delivering a sequence of
element nodes representing selected elements within that document. For example:
This example returns a sequence of <sal>
elements. The result of the
saxon:stream
call is a sequence of <employee>
elements. Each <employee>
element is linked to copies of: its
attributes and namespaces; its descendants and their attributes; its ancestors and their
attributes. But the siblings of the selected node, and of its ancestors, are missing,
and any attempt to select them will return an empty sequence. This means it is not
possible to navigate from one <employee>
element to others in the
file; in fact, only one of them actually exists in memory at any one time.
The function saxon:stream
may be regarded as a pseudo-function.
Conceptually, it takes the set of nodes supplied in its argument, and makes a snapshot
of each one (in the sense of the XSLT 3.0 fn:snapshot() function). The resulting sequence of
nodes will usually be processed by an expression such as an XQuery FLWOR expression,
which handles the nodes one at a time. The actual implementation of
saxon:stream
, however, is rather different, in that it changes the way in
which its argument is evaluated: instead of the doc()
function building a
tree in the normal way, the path expression
doc('employees.xml')/*/employee)
is evaluated in streamed mode - which
means that it must conform to a subset of the XPath syntax which Saxon can evaluate in
streamed mode. For more information, see Streaming in XQuery.
The facility should not be used if the source document is read more than once in the
course of the query/transformation. There are two reasons for this: firstly, if it is
read more than once then performance will be better if the document is read into memory;
and secondly, when this optimization is used, there is no guarantee that the
doc()
function will be stable, that is, that it will return the same
results when called repeatedly with the same URI.
If the path expression cannot be evaluated in streaming mode, execution fails.
A call of the form saxon:stream(doc(D)/PATH[PREDICATE])
is translated into
the XSLT 3.0 construct
<xsl:stream href="D"><xsl:sequence select="snapshot(PATH)[PREDICATE]"/></xsl:stream>
and is allowed only if that construct is guaranteed-streamable according to the XSLT 3.0
rules.
For further details see Streaming of Large Documents.