Internal changes
Saxon now includes its own code for converting doubles and floats to strings. Previously it used the Java code, and then modified the result in cases where the XPath rules differ from the Java rules. This change was prompted by the port to the .NET platform, which produced different (and incorrect) output for these operations. The new code should be more efficient; it gives results that conform to the specification, (in particular, ensuring that converting a double to a string and back again will always give you the double you started with) but the output may not always be identical to the previous results.
Some improvements have been made to the TinyTree structure, as follows:
- The tree no longer uses a single contiguous character array for storage, but instead uses an indexed list of smaller arrays. This is designed to reduce the cost of building large trees, as it saves repeated copying of the character data as the array is enlarged. For documents up to a few megabytes in size, the effect is minor: a small improvement in tree-building time, at the cost of a small increase in execution time. For larger documents, the effect is to improve predictability of performance, by reducing the variations that arise from the overheads of memory management.
- Whitespace-only text nodes are now (in most cases) held in compressed form. This adds a little processing overhead for small documents, but for large documents this is more than compensated by the savings in memory usage, which can be very substantial.
- Saxon now maintains statistics on the average size of the trees constructed during the life of the Java VM. These statistics are used when deciding on the initial amount of space to be allocated to a new tree. This learning mechanism gives substantial throughput improvements for a workload that creates many small trees.
Some performance improvements have been made to the TinyTree code for the following
axis, and for
the isAncestorOrSelf()
test used to support the XSLT key()
function when the third argument requests
searching within a subtree.
In the NamePool
, the limit on the number of prefixes allowed for any given URI has been
raised from 256 to 1024, and the implementation has been made more efficient in situations where many
different prefixes are used with the same URI (though it still involves some serial searching).
The optimizer now recognizes a wider range of equivalences between expressions involving associative operators. for example, the expressions ((a|b)|c) and (b|(c|a)) are recognized as being interchangeable. Currently the only important place where such equivalences are used is where they appear as steps in a path expression.
The xsl:analyze-string
instruction now has an iterate() method. This means it can be
evaluated in "pull" mode, making it more efficient for example when it is used as the body of a
user-defined function.
Calls on extension functions are now marked with a new property identifying them as having potential side-effects; any expression marked with this property is disqualified from being moved out of a loop by the optimizer. Previously the "creative" property was used for this purpose (this property applies to an expression that constructs new nodes), but this proved insufficient as the "creative" property is not passed through to a containing expression that performs atomization.
The DOM interface now allows the supplied DOM object to be a DocumentFragment node.
The code that implements the options case-order="upper-first"
and case-order="lower-first"
has been rewritten to take account of the fact that two strings that are considered equal except for case
differences will not necessarily contain corresponding letters in corresponding positions: for example, one
of the strings may contain spaces or punctuation that the underlying collation has deemed insignificant.
These options are no longer available only on xsl:sort
, but also as a query parameter
in a collation URI or as an attribute on the saxon:collation
declaration in XSLT.
The standard collation URI resolver now makes use of an underlying factory class to create a collation
with given properties. There are two versions of this factory class, one for each target platform. User-written
collation URI resolvers also have access to these factory classes. The same factory classes are used
to implement the saxon:collation
declaration.
In error messages, a SequenceType originally written as schema-element(xyz) is now displayed in the same form, instead of displaying Saxon's internal representation of the construct.
The conversion of a DateTimeValue or DateValue to a Java GregorianCalendar has changed so that BC dates are correctly represented.