Internal changes
The URI http://www.w3.org/2002/11/query-operators/collation/codepoint
is now recognized
as the name of the code-point collation; if this URI is specified in calls to sorting or comparison
operations, strings will be compared according to their Unicode code-points. Note that this URI
is likely to change in subsequent versions of the XPath working drafts.
The following changes should not affect users unless you exploit internal interfaces within Saxon.
Parameters to stylesheet functions are now passed by position (in an array of values), not by name.
Internally, there has been a change to the processing of literal result elements. XPath expressions contained within attribute value templates on such an element are now processed during the first (prepareAttributes) compilation phase, as with other stylesheet instructions. Type checking happens during the second (validate) phase. A consequence of this change is that user-defined top-level elements are now represented by a different class, DataElement, to prevent their attributes being processed as AVTs.
Type Checking
Changes made in support of XPath type-checking include the following:
-
The general trend is towards doing more of the work at compile time. Where type conversions are necessary, or where it is determined statically that they might be necessary, then the conversions are compiled into the executable expression; if they are not necessary, they are not performed. Similarly, if dynamic type checking is necessary, then it is compiled into the expression; otherwise, it is not performed.
-
Function calls to standard functions are now compiled with knowledge of the signature of the function. The code generated is conditional on whether backwards compatible mode is enabled or not. If the supplied arguments are incompatible with the function signature (that is, if the call cannot possibly succeed) then a static type error is generated. Code to atomize nodes and perform other allowed conversions (e.g. numeric promotion) is compiled into the expression tree. If the supplied value cannot be statically guaranteed to be of the correct type, then type-checking code is generated in the expression tree.
-
The same logic is used for calls to stylesheet functions. In this case, backwards compatible mode is never used, which means there is no implicit conversion of arguments. Calls to stylesheet functions are now statically checked; this is done by means of a fixup process that allows for the fact that the function call can be parsed before the function declaration is encountered.
-
The same logic is used for evaluating keys.
-
Within the implementation of standard functions, arguments are now evaluated without any type conversion: any conversions that are performed are done by the function calling mechanism, using internal tables that represent the signatures of each function.
-
The internal
Expression#evaluate()
method has been dropped. All implementations and usages of this function have changed to useevaluateItem()
oriterate()
(or in some cases,lazyEvaluate()
), as appropriate. -
The code for value comparisons and general comparisons has been split into a number separate classes. These do stricter type checking of their arguments. The decision which algorithm to use (hash join, etc) is now made at compile time, using static information about the types and cardinality of the arguments. But the conversion of untypedAtomic values (which result from atomizing a node with no type annotation) to a string or double (depending on the type of the other argument) is done dynamically. In the final stages of testing I found a design problem in this area: neither the new code nor the code in previous releases handled comparisons such as
(U, U, U) = (1, 2, '3')
correctly, where U is an untyped value. The problem here is that a mixture of string and numeric comparisons is required. I fixed this for the time being by changing the code so it always does a naive nested-loop comparison. This doesn't appear to have a noticable effect on performance in most cases: there will be some cases where it is very inefficient, but these don't arise very often. -
Other classes, notably the code for arithmetic expressions, also do stricter type checking.
-
The code for attribute value templates has been reorganized. The
AttributeValueTemplate
class is now used only at compile time, and it has therefore been moved to thestyle
package. It no longer acts as a pseudo-XPath expression; instead, compiling the AVT generates a true XPath expression, including calls toconcat()
,string-join()
, andstring()
where required. These handle all necessary type conversions. -
The
Expression#evaluateAsString()
method no longer does conversion of the expression result to a string; the method should only be used where (a) the expression is statically known to return a string or (), and (b) the returned value of () is treated as equivalent to "". In practice, this means that the use of the method is now largely confined to the evaluation of attribute value templates. This method will probably be phased out. -
The code for
xsl:value-of
has changed so it now compiles any code needed to convert the supplied expression to a string (or, if the separator attribute is present, a sequence of strings) -
The code for
xsl:sort
has changed so that the sort key is converted to the required type using the same rules as the rules for function arguments. Internally, a new classFixedSortKeyDefinition
is introduced to represent a sort key definition that contains no context dependencies, that is, one in which the values of all the parameters such as order, case-order, language, and data-type are known. Sometimes it is possible to create this statically, sometimes (when AVTs are used) it cannot be created until the values of variables are known. -
Those Saxon extension functions that need special treatment at compile time (specifically,
saxon:evaluate
,saxon:expression
,saxon:parse
, andsaxon:serialize
), are now treated in the same way as system functions. -
The class
SimpleValue
has been renamedAtomicValue
. -
The method
convert()
is now available only on theAtomicValue
class, it is not available for all values as previously. This method implements the logic of the casting rules. -
Expressions are now parsed in three stages: parsing, context-independent rewriting, and static type analysis. The first stage is done by the
ExpressionParser
class, the second by calling thesimplify()
method on the resulting Expression object. The third stage is done by calling thetypeCheck()
method on the Expression object. In an XSLT context, type information for stylesheet variables and stylesheet functions is added before thetypeCheck()
method is called. TheExpression.make()
call only does the first two steps; applications that use this interface must be changed to call typeCheck() as well. The XPath API in packagenet.sf.saxon.xpath
works unchanged. -
Higher-order expressions, such as path expressions, filter expressions, and "for", "some", and "every" expressions, are now rewritten statically to promote any subexpressions that don't depend on the iteration variables. The effect is that such subexpressions are only evaluated once. This mechanism replaces the previous run-time optimisation based on the concept of expression reduction (at run-time, the expression was replaced with an expression in which the independent sub-expressions were replaced with their value). The new mechanism is done entirely at compile time and is therefore much more economical. Also it avoids doing trivial rewrites, that is, extracting constants and simple variable references.{opt001-004}
-
Run-time expression reduction is still used to eliminate context dependencies in an expression that is being evaluated lazily (always an expression that returns a sequence), and is being held as the value of a variable. When evaluation of such an expression is deferred, it is necessary to make a copy of all aspects of the context that it depends on, and this is done by rewriting the expression with a new expression in which all context variables are replaced with their values.