Streaming Templates

Streaming templates allow a document to be processed hierarchically in the classical XSLT style, applying template rules to each element (or other nodes) in a top-down manner, while scanning the source document in a pure streaming fashion, without building the source tree in memory. Saxon-EE allows streamed processing of a document using template rules, provided the templates conform to a set of strict guidelines.

Streaming is a property of a mode; a mode can be declared to be streamable, and if it is so declared, then all template rules using that mode must obey the rules for streamability. A mode is declared to be streamable using the top-level stylesheet declaration:

<xsl:mode name="s" streamable="yes"/>

The name attribute is optional; if omitted, the declaration applies to the default (unnamed) mode.

Streamed processing of a source document can be applied either to the principal source document of the transformation, or to a secondary source document read using the xsl:stream instruction.

To use streaming on the principal source document, the input to the transformation must be supplied in the form of a StreamSource or SAXSource, and the initial mode selected on entry to the transformation must be a streamable mode. In this case there must be no references to the context item in the initializer of any global variable.

Streamed processing of a secondary document is initiated using the instruction:

<xsl:stream href="abc.xml"> <xsl:apply-templates mode="s"/> </xsl:stream>

Saxon will also recognize an instruction of the form:

<xsl:apply-templates select="doc('abc.xml')" mode="s"/>

Here the select attribute must contain a simple call on the doc() or document() function, and the mode (explicit or implicit) must be declared as streamable. The call on doc() or document() can be extended with a streamable selection path, for example select="doc('employee.xml')/*/employee"

If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not possible to apply templates using a streaming mode if the selected nodes are ordinary non-streamed nodes.

Every template rule within a streamable mode must follow strict rules to ensure it can be processed in a streaming manner. The essence of these rules is:

The match pattern for the template rule must be a simple pattern that can be evaluated when positioned at the start tag of an element, without repositioning the stream (but information about the ancestors of the element and their attribute is available, together with some limited information about their position relative to their siblings). Examples of acceptable patterns are *, para, para[1], or para/*

If the match pattern includes a boolean predicate, then the predicate must be "motionless", which means that it can be evaluated while the input stream is positioned at the start tag. This means it can reference properties such as name() and base-uri(), and can reference attributes of the element, but cannot reference its children or content.

If the match pattern includes a numeric predicate, then it must be possible to evaluate this by counting either the total number of preceding-sibling elements, or the number of preceding siblings with a given name. Examples of permitted patterns include *[1], p[3], and *:p[2][@class='bold']; disallowed patterns include (descendant::fig)[1], p[@class='bold'][2], and p[last()].
The body of the template rule must contain at most one expression or instruction that reads the contents below the matched element (that is, children or descendants), and it must process the contents in document order. This expression or instruction will often be one of the following:
- <xsl:apply-templates/>
- <xsl:value-of select="."/>
- <xsl:copy-of select="."/>
- string(.)
- data(.) (explicitly or implicitly)
but this list is not exhaustive. It is possible to process the contents selectively by using a streamable path expression, for example:
- <xsl:apply-templates select="foo"/>
- <xsl:value-of select="a/b/c"/>
- <xsl:copy-of select="x/y"/>
but this effectively means that the content not selected by this path is skipped entirely; the transformation ignores it.

The template can access attributes of the context item without restriction, as well as properties such as its name(), local-name(), and base-uri(). It can also access the ancestors of the context item, the attributes of the ancestors, and properties such as the name of an ancestor; but having navigated to an ancestor, it cannot then navigate downwards or sideways, since the siblings and the other descendants of the ancestor are not available while streaming.

The restriction that only one downwards access is allowed makes it an error to use an expression such as price - discount in a streamable template. This problem can often be circumvented by making a copy of the context item. This can be done using the copy-of() function: for example <xsl:value-of select="copy-of(.)/(price - discount)"/>. Taking a copy of the context node requires memory, of course, and should be avoided unless the contents of the node are small.

Certain constructs using positional filters can be evaluated in streaming mode. For example, it is possible to use xsl:apply-templates select="*[1]"/>. The filter must be on a node test that uses the child axis and selects element nodes. The forms accepted are expressions that can be expressed as x[position() op N] where N is an expression that is independent of the focus and is statically known to evaluate to a number, x is a node test using the child axis, and op is one of the operators eq, le, lt, gt, or ge. Alternative forms of this construct such as x[N], remove(x, 1), head(x), tail(x), and subsequence(x, 1, N) are also accepted.

The following rules gives further advice on what is allowed and disallowed within the body of a streaming template.

Non-context-sensitive instructions

Instructions and expressions that do not access the context node are allowed without restriction.

This includes:

Instructions that create new nodes, for example literal result elements, xsl:element and xsl:attribute are allowed without restriction.
Instructions that declare variables, including temporary trees, if the value of the variable does not depend on the context.
Instructions that process documents other than the streamed document, for example by calling the doc() or document() functions. Provided such processing is not streamed, the full capabilities of the XSLT language can be used.

Access to attributes and ancestors

Access to attributes: there are no restrictions on accessing attributes of the context node, or attributes of its ancestors, provided that the content of the attribute is atomized. Before allowing access to attributes the processor needs to check that no further navigation from the attribute is possible.

Properties of the context node: there are no restrictions on using functions such as name(), node-name(), or base-uri() to access properties of the context node, or properties of its ancestors, its attributes, or attributes of its ancestors. It is also possible to use the is operator to test the identity of the node, the << and >> operators to test its position in document order, or the instance of operator to test its type. For attribute nodes it is possible to use (explicitly or implicitly) the string() function to get its string value and the data() function to get its typed value.

It is not possible to perform navigation from the attributes of the node or from its ancestors, only to access the values of attributes and properties such as the name of the node.

It is not possible to bind a variable (or pass a parameter, or return a result) to a node in the streamed document, because Saxon does not have the capability to analyse that the way in which the variable is subsequently used is consistent with streaming.

Conditional instructions

This includes xsl:if, xsl:choose, and the XPath if expression. All of these are regarded as special cases of a construct of the form if (condition-1) then action-1 else if (condition-2) then action2 else ...

The rule is that the conditional must fit one of the following descriptions:

The first condition makes a downward selection, in which case none of the actions and none of the subsequent conditions may make a downward selection (they must all be "motionless")
The first condition makes no downward selection, in which case each of the actions is allowed to make a downward selection (but subsequent conditions must not do so).

So examples of permitted conditionals are:

if (@a = 3) then b else c
if (a = 3) then @b else @c

while the following are not permitted:

if (a = 3) then b else c
<xsl:choose> <xsl:when test="a=3">foo</xsl:when> <xsl:when test="a=4">bar</xsl:when> </xsl:choose>

Looping instructions

This applies primarily to xsl:for-each and xsl:iterate. In addition, an XPath expression for $x in SEQ return E is translated to an equivalent xsl:for-each instruction, provided that E does not depend on the context item, position, or size.

The common case is where the select expression and the loop body each make a downward selection, for example:

<xsl:for-each select="employee"> <salary><xsl:value-of select="salary"/></salary> </xsl:for-each>

The body of the loop may only make a single downwards selection of this kind.

No sorting is allowed.

If the select expression does not make a downward selection, then the loop body must not perform any navigation from the context node. This is because the same navigation would have to take place more than once, which is inconsistent with streaming.

Saxon handles the case where some reordering of the output is required. This arises when the select expression uses the descandant axis, for example:

<xsl:for-each select=".//section"> <size><xsl:value-of select="string-length(.)"/></size> </xsl:for-each>

In this example, given nested sections, the downward selections for each section needed to evaluate string-length() overlap with each other, and the string-length of section 2.1 (say) must be output before that of its children (sections 2.1.1 and 2.1.2, say), even though the computation for the children completes earlier. Saxon achieves this by buffering output results where necessary to achieve the correct ordering.

It is of course quite permissible to call xsl:apply-templates within the body of the xsl:for-each; this will count as the one permitted downward selection.

It is permitted to call position() within the loop, but not last().

Grouping

Saxon supports grouping in streamed mode using <xsl:for-each-group>, with a number of restrictions:

The grouping algorithm must be one of adjacent, starting-with, or ending-with. The group-by method is not supported, because it is intrinsically unstreamable.
Nested grouping is not allowed.
The current group must be bound using the new bind-group attribute, not using the traditional current-group() function.
The body of the xsL:for-each-group instruction must not use the context item (but it can refer to position()).
The body of the xsL:for-each-group instruction must use the bound grouping variable exactly once. It is not allowed to have a conditional expression with references to the grouping variable in each branch of the conditional.
Each item selected by the select expression of xsl:for-each-group is materialized in memory while it is being processed.

Sorting and numbering

Sorting (xsl:sort) and numbering (xsl:number) are not supported in streaming mode.