saxon:stream
Processes an input document using streaming, that is, without allocating memory to contain the entire document.
stream($input as item()*) ➔ item()*
Arguments | |||
| $input | item()* | The input to be streamed |
Result | item()* |
Namespace
http://saxon.sf.net/
Notes on the Saxon implementation
Available since before Saxon 8.0. Obsolescent in XSLT, since the xsl:source-document
instruction with attribute streamable="yes"
provides
equivalent functionality; but still useful in XQuery.
Changed in 9.6 so that the function delivers snapshots of the selected nodes rather than copies: that is, the delivered nodes include copies of the ancestors and their attributes, as well as the attributes and descendants of the selected node.
Changed in 9.7 to compile into a call on the XSLT 3.0 xsl:source-document
instruction, which leads to minor changes in functionality; most notably, the argument
expression must now call the doc()
function, and not document()
.
Details
Conceptually, this function returns a copy of its input. The intent, however, is to evaluate the supplied argument in "streaming mode", which allows an input document to be processed without building a tree represention of the whole document in memory. This allows much larger documents to be processed using Saxon than would otherwise be the case.
When there is a requirement to stream documents other than the principal input, this
can be achieved in XQuery using the saxon:stream
extension function, which
enables burst-mode streaming by reading a source document and delivering a sequence of
element nodes representing selected elements within that document. For example:
This example returns a sequence of <sal>
elements. The result of the
saxon:stream
call is a sequence of <employee>
elements. Each <employee>
element is linked to copies of: its
attributes and namespaces; its descendants and their attributes; its ancestors and their
attributes. But the siblings of the selected node, and of its ancestors, are missing,
and any attempt to select them will return an empty sequence. This means it is not
possible to navigate from one <employee>
element to others in the
file; in fact, only one of them actually exists in memory at any one time.
The function saxon:stream
may be regarded as a pseudo-function.
Conceptually, it takes the set of nodes supplied in its argument, and makes a snapshot
of each one (in the sense of the XSLT 3.0 fn:snapshot() function). The resulting sequence of
nodes will usually be processed by an expression such as an XQuery FLWOR expression,
which handles the nodes one at a time. The actual implementation of
saxon:stream
, however, is rather different, in that it changes the way in
which its argument is evaluated: instead of the doc()
function building a
tree in the normal way, the path expression
doc('employees.xml')/*/employee)
is evaluated in streamed mode - which
means that it must conform to a subset of the XPath syntax which Saxon can evaluate in
streamed mode. For more information, see Streaming in XQuery.
The facility should not be used if the source document is read more than once in the
course of the query/transformation. There are two reasons for this: firstly, if it is
read more than once then performance will be better if the document is read into memory;
and secondly, when this optimization is used, there is no guarantee that the
doc()
function will be stable, that is, that it will return the same
results when called repeatedly with the same URI.
A call of the form saxon:stream(doc(D)/PATH[PREDICATE])
is translated into
the XSLT 3.0 construct
<xsl:source-document streamable="yes" href="D"><xsl:sequence
select="snapshot(PATH[PREDICATE])"/></xsl:source-document>
and is allowed only if that construct is guaranteed-streamable according to the XSLT 3.0
rules. This means that the predicate, if present, must be motionless (it cannot call position()
,
explicitly or implicitly, and it cannot navigate downwards from the node being tested, either explicitly
using the child or descendant axis, or implicitly by getting the string value or typed value of the node).
If the path expression cannot be evaluated in streaming mode, execution fails, unless the configuration option Feature.STREAMING_FALLBACK is set, in which case it is executed in non-streaming mode (that is, the result is simply the result of evaluating the argument to the function).
For further details see Streaming of Large Documents.