Streaming Templates

Streaming templates allow a document to be processed hierarchically in the classical XSLT style, applying template rules to each element (or other nodes) in a top-down manner, while scanning the source document in a pure streaming fashion, without building the source tree in memory. Saxon-EE allows streamed processing of a document using template rules, provided the templates conform to a set of strict guidelines. The facility was introduced in a very simple form in Saxon 9.2, and is greatly enhanced in Saxon 9.3.

Streaming is a property of a mode; a mode can be declared to be streamable, and if it is so declared, then all template rules using that mode must obey the rules for streamability. A mode is declared to be streamable using the top-level stylesheet declaration:

<xsl:mode name="s" streamable="yes"/>

The name attribute is optional; if omitted, the declaration applies to the default (unnamed) mode.

Streamed processing of a source document can be applied either to the principal source document of the transformation, or to a secondary source document read using the doc() or document() function.

To use streaming on the principal source document, the input to the transformation must be supplied in the form of a StreamSource or SAXSource, and the initial mode selected on entry to the transformation must be a streamable mode. In this case there must be no references to the context item in the initializer of any global variable.

Streamed processing of a secondary document is initiated using the instruction:

<xsl:apply-templates select="doc('abc.xml')" mode="s"/>

Here the select attribute must contain a simple call on the doc() or document() function, and the mode (explicit or implicit) must be declared as streamable. The call on doc() or document() can be extended with a streamable selection path, for example select="doc('employee.xml')/*/employee"

If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not possible to apply templates using a streaming mode if the selected nodes are ordinary non-streamed nodes.

Every template rule within a streamable mode must follow strict rules to ensure it can be processed in a streaming manner. The essence of these rules is:

The match pattern for the template rule must be a simple pattern that can be evaluated when positioned at the start tag of an element, without repositioning the stream (but information about the ancestors of the element and their attribute is available). Examples of acceptable patterns are *, para, or para/*
The body of the template rule must contain at most one expression or instruction that reads the contents below the matched element (that is, children or descendants), and it must process the contents in document order. This expression or instruction will often be one of the following:
- <xsl:apply-templates/>
- <xsl:value-of select="."/>
- <xsl:copy-of select="."/>
- string(.)
- data(.) (explicitly or implicitly)
but this list is not exhaustive. It is possible to process the contents selectively by using a streamable path expression, for example:
- <xsl:apply-templates select="foo"/>
- <xsl:value-of select="a/b/c"/>
- <xsl:copy-of select="x/y"/>
but this effectively means that the content not selected by this path is skipped entirely; the transformation ignores it.

The template can access attributes of the context item without restriction, as well as properties such as its name(), local-name(), and base-uri(). It can also access the ancestors of the context item, the attributes of the ancestors, and properties such as the name of an ancestor; but having navigated to an ancestor, it cannot then navigate downwards or sideways, since the siblings and the other descendants of the ancestor are not available while streaming.

The restriction that only one downwards access is allowed makes it an error to use an expression such as price - discount in a streamable template. This problem can often be circumvented by making a copy of the context item. This can be done using an xsl:variable containing an xsl:copy-of instruction, or for convenience it can also be done using the copy-of() function: for example <xsl:value-of select="copy-of(.)/(price - discount)"/>. Taking a copy of the context node requires memory, of course, and should be avoided unless the contents of the node are small.

The following rules gives further advice on what is allowed and disallowed within the body of a streaming template.

Non-context-sensitive instructions

Instructions and expressions that do not access the context node are allowed without restriction.

This includes:

Instructions that create new nodes, for example literal result elements, xsl:element and xsl:attribute are allowed without restriction.
Instructions that declare variables, including temporary trees, if the value of the variable does not depend on the context.
Instructions that process documents other than the streamed document, for example by calling the doc() or document() functions. Provided such processing is not streamed, the full capabilities of the XSLT language can be used.

Access to attributes and ancestors

Access to attributes: there are no restrictions on accessing attributes of the context node, or attributes of its ancestors.

Properties of the context node: there are no restrictions on using functions such as name(), node-name(), or base-uri() to access properties of the context node, or properties of its ancestors, its attributes, or attributes of its ancestors. It is also possible to use the is operator to test the identity of the node, the << and >> operators to test its position in document order, or the instance of operator to test its type. For attribute nodes it is possible to use (explicitly or implicitly) the string() function to get its string value and the data() function to get its typed value.

It is not possible to perform navigation from the attributes of the node or from its ancestors, only to access the values of attributes and properties such as the name of the node.

It is not possible to bind a variable (or pass a parameter, or return a result) to a node in the streamed document, because Saxon does not currently include the logic to analyse that the way in which the variable is subsequently used is consistent with streaming.

Conditional instructions

This includes xsl:if, xsl:choose, and the XPath if expression. All of these are regarded as special cases of a construct of the form if (condition-1) then action-1 else if (condition-2) then action2 else ...

The rule is that the conditional must fit one of the following descriptions:

The first condition makes a downward selection, in which case none of the actions and none of the subsequent conditions may make a downward selection
The first condition makes no downward selection, in which case each of the actions is allowed to make a downward selection (but subsequent conditions must not do so).

So examples of permitted conditionals are:

if (@a = 3) then b else c
if (a = 3) then @b else @c

while the following are not permitted:

if (a = 3) then b else c
<xsl:choose> <xsl:when test="a=3">foo</xsl:when> <xsl:when test="a=4">bar</xsl:when> </xsl:choose>

Looping instructions

This applies primarily to xsl:for-each and xsl:iterate. In addition, an XPath expression for $x in SEQ return E is translated to an equivalent xsl:for-each instruction, provided that E does not depend on the context item, position, or size.

The common case is where the select expression and the loop body each make a downward selection, for example:

<xsl:for-each select="employee"> <salary><xsl:value-of select="salary"/></salary> </xsl:for-each>

The body of the loop may only make a single downwards selection of this kind.

No sorting is allowed.

If the select expression does not make a downward selection, then the loop body must not perform any navigation from the context node. This is because the same navigation would have to take place more than once, which is inconsistent with streaming.

Saxon handles the case where some reordering of the output is required. This arises when the select expression uses the descandant axis, for example:

<xsl:for-each select=".//section"> <size><xsl:value-of select="string-length(.)"/></size> </xsl:for-each>

In this example, given nested sections, the downward selections for each section needed to evaluate string-length() overlap with each other, and the string-length of section 2.1 (say) must be output before that of its children (sections 2.1.1 and 2.1.2, say), even though the computation for the children completes earlier. Saxon achieves this by buffering output results where necessary to achieve the correct ordering.

It is of course quite permissible to call xsl:apply-templates within the body of the xsl:for-each; this will count as the one permitted downward selection.

It is permitted to call position() within the loop, but not last().

Sorting, grouping and numbering

Sorting (xsl:sort), grouping (xsl:for-each-group), and numbering (xsl:number) are not supported in streaming mode.