Saxonica.com

Internal Changes

The Receiver interface has been changed: instead of a Configuration and a LocationProvider being passed down the pipeline, the context information for the pipeline is now passed as a PipelineConfiguration object. The provides access to the Configuration and the LocationProvider, as well as other information: currently an ErrorListener and a URIResolver. This means that warnings detected by receivers in the pipeline (for example, serialization errors and validation warnings) can now be properly reported to the ErrorListener associated with a transformation, rather than with the global ErrorListener associated with the Configuration. This change was necessary in order to implement the JAXP Validator interface (which uses a local ErrorHandler), but it has other spin-off benefits. In particular, it means that the information passed down a pipeline can in future be extended by adding new fields to the PipelineConfiguration class, with no impact on the Receiver interface itself.

User-defined functions are now evaluated lazily (that is, if the function returns a sequence, each item in the sequence is evaluated only when it is needed). This has required some changes to the implementation of tail call optimization. There are now two kinds of Closure: the new Closure class is used when the results are needed only once, as when evaluating a function call. The old Closure class is renamed MemoClosure, and is used when the results are likely to be needed more than once, as when evaluating a variable.

Saxon now does static analysis of variable references to identify variables that are never referenced, and variables that are only referenced once. If a variable is only referenced once, then during lazy evaluation of the variable the value will be discarded rather than being retained in memory for subsequent reference. There are now two classes supporting lazy evaluation: Closure is a value that is evaluated when first needed and is immediately discarded from memory, while MemoClosure also defers evaluation, but retains each item in the evaluated sequence once it is known. This analysis is currently done only for local variables and function parameters (not for global variables or XSLT template parameters).

The algorithm for type-checking (the XPath function call rules) has been rewritten to follow the specification more precisely. The rules have gradually been refined over successive W3C drafts, and although the changes are very minor, the implementation had got a little out of step.

The optimizer now recognizes that certain expressions cannot be moved out of a loop. A classic example is the XQuery expression count(./(for $i in 1 to 5 return <a/>)), which should return 5. Previous Saxon releases moved the element constructor out of the loop, and thus returned the value 1. Similar constructs occur in XSLT in the case of an XPath expression that calls a stylesheet function. At present all calls on user-defined functions (and XSLT templates) are treated as if they might create new nodes. This does not affect expressions that create new nodes in a context where the final result cannot depend on the identity of the new nodes: for example, if a node is created or a function is called within the predicate of a filter expression, this will still be extracted from the loop and evaluated only once.

The same considerations apply to path expressions in which one of the steps constructs new nodes. For example the result of count(a/<x/>) should be equal to the number of a elements selected; in previous Saxon releases it was always 1.

The TinyTree data structure now allows a single TinyTree to contain any number of trees rooted either at document nodes or element nodes. Allowing multiple parentless element nodes in a single TinyTree reduces the overhead involved in constructing sequences of elements.

Where appropriate, xsl:copy-of now creates virtual copies of nodes, using the new class VirtualCopy. This is simply a reference to the node that was copied, together with sufficient information to give the copy a different node identity from the original. This technique is used in cases where the copy is not being directly written to another tree, for example where it is returned as the value of a variable or function.

Next