Saxon Documentation

Full Contents

About Saxon

Changes in this Release

Licensing

Using XSLT 2.0

Using XQuery

Handling Source Documents
	Handling Source Documents
	Source Documents on the Command Line
	Collections
	Building a Source Document from an application
	Preloading shared reference documents
	Writing input filters
	XInclude processing
	Saxon and XML 1.1
	JAXP Source Types
	Third-party Object Models: DOM, JDOM, XOM, and DOM4J
	Choosing a Tree Model
	The PTree File Format
	Validation of Source Documents
	Whitespace Stripping in Source Documents
»	Streaming of Large Documents
	Document Projection

XML Schema Processing

XPath API for Java

Saxon on .NET

Extensibility

Saxon Extensions

Sample Saxon Applications

The Saxon SQL Extension

XSLT Elements

XPath 2.0 Expression Syntax

Function Library

Standards Conformance

How streaming works

Where necessary, the implementation of this facility will use multithreading. One thread (which operates as a push pipeline) is used to read the source document and filter out the nodes selected by the path expression. The nodes are then handed over to the main processing thread, which iterates over the selected nodes using an XPath pull pipeline. Because multithreading is used, this facility is not used when tracing is enabled. It should also be disabled when using a debugger (there is a method in the Configuration object to achieve this.)

In cases where the entire stylesheet or query can be evaluated in "push" mode (as in the first example above), there is no need for multithreading: the selected nodes are written directly to the current output destination.

Note that a tree is built for each selected node, and its subtree. Trees are also built for all nodes selected by the path expression, whether or not the satisfy the filter (if they do not satisfy the filter, they will be immediately discarded from memory). The saving in memory comes when these nodes are processed one at a time, because each subtree can then be discarded as soon as it has been processed. There is no benefit if the stylesheet needs to perform non-serial processing, such as sorting. There is also no benefit if the path expression selects a node that contains most or all of the source document, for example its outermost element.

Saxon can handle expressions that select nested nodes, for example //section where one section contains another. However, the need to deliver nodes in document order makes the pipeline somewhat turbulent in such cases, increasing memory usage.

Streamed processing in this way is not actually faster than conventional processing (in fact, when multithreading is required, it may only run at half the speed). Its big advantage is that it saves memory, thus making it possible to process documents that would otherwise be too large for XSLT to handle. There may also be environments where the multithreading enables greater use of the processor capacity available. To run without this optimization, either change the xsl:copy-of instruction to xsl:sequence, or set saxon:read-once to "no".