Saxonica.com

Version 8.7.1 (2006-04-13)

This page summarizes the changes in 8.7.1. In addition to the changes listed here, all bugs listed on the SourceForge site under group v8.7.1 have been fixed.

New features

A new extension attribute saxon:allow-all-built-in-types="yes" has been added to enable the use of types such as xs:int which are not permitted by the W3C conformance rules for a Basic XSLT Processor. These types are already allowed by Saxon-SA, of course, but this switch also enables their use with Saxon-B. The particular use case that prompted this extension was Dimitre Novatchev's XPath 2.0 Visualizer tool, which uses dynamically-constructed XSLT stylesheets as a vehicle for exercising XPath expressions.

Saxon-B is now capable of taking as input a source tree that contains typed elements and attributes, provided that the type annotations are restricted to the built-in types. Such input can be supplied, for example by sending the document to Saxon in the form of a sequence of Receiver events with type annotations included, or by creating a user-defined implementation of the NodeInfo interface. If it is known that all nodes will be untyped, it is useful to call the method Configuration.setAllNodesUntyped(true) because this information is useful to the compiler. This is done automatically when the XSLT or XQuery processor is invoked from the command line with Saxon-B.

A number of new output character encodings are now supported natively, including EUC-JP, EUC-KR, Big5, GB2312, ISO 8859-5, ISO 8859-7, ISO8859-8, ISO8859-9. Thanks to Lauren Ward of Hewlett Packard for supplying these.

The DOM4J object model is now recognized in the same way as DOM, JDOM, and XOM. The code has been lifted from the Orbeon OPS server (it was originally written as a modification of the JDOM support module in Saxon). A few minor bugs have been fixed. Thanks to Erik Bruchez for identifying this opportunity.

The StaxBridge class now has a method that allows you to supply your own XMLStreamReader.

Previously, the default language for format-date() and related functions in XSLT was taken from the Java default locale. This has been changed so that a non-English language is used as the default only if (a) it is the language of the Java default locale, and (b) there is an installed numberer for that language. The effect of this change is to eliminate the warning output [Language: en] produced when the Java default locale is non-English but there is no localized numberer available for that language.

Problems fixed

The new code introduced in Saxon 8.7 for converting floating point numbers to strings was found to be unsatisfactory, and has been completely rewritten using a different algorithm.

An optimization used by the schema validator while constructing finite state machines to implement the schema grammar was found to be unsound in a very small number of cases; the optimization has therefore been removed. Unfortunately this means that compiling a schema is now a little slower.

In xsl:analyze-string, a check is now made for error XTDE1150 (regex matches a zero-length string) in the case where the regex is not known until run-time.

In XSLT, the attribute stable="yes" or stable="no" is now permitted on xsl:sort. It currently has no effect (sorting is always stable in Saxon). This is conformant behaviour, because the effect of stable="no" is implementation-dependent.

In XQuery, when a ModuleURIResolver is set on the StaticQueryContext for a main module, it is now also used for resolving module imports contained in any transitively-imported library modules.

The "tiny forest" mechanism, whereby a single TinyTree structure is used to hold multiple trees (root nodes) in a sequence, was found not to be working reliably in Saxon 8.7, and has been redesigned to make it more robust. Generally speaking, this mechanism reduces the number of objects that are allocated but increases their size; this may affect the performance profile of some applications.

The XSLT xsl:number instruction now recognizes non-BMP digits in its format string. (This works best with JDK 1.5; there are some restrictions under JDK 1.4)

The TypeHierarchy object, which holds a cache of type information, is now held as part of the Configuration and no longer as part of the NamePool. This is to avoid memory leaks in cases where one long-lived NamePool was used with many transient Configuration objects. (This happened with the schema-aware product only, because user-defined types held in the TypeHierarchy hold a reference to the Configuration under which they were created.)

A change has been made to the way in which XSLT current template rule is maintained. This is to implement the rule that when a template is defined using a union pattern, it is treated as a set of template rules with potentially different priorities. The xsl:next-match instruction can therefore invoke the same template more than once. To implement this, the currentTemplate maintained in the context is now a Rule object rather than a Template object.

In schema-aware processing, improvements have been made to the type inferencing. The type of a path expresssion starting with a variable whose static type is document-node(schema-element(x)) is now inferred more precisely, and the cardinality of an expression using the child axis is also now inferred more precisely. This enables better compile-time detection of type errors, and in some cases better optimization.

On .NET, Saxon 8.7.1 is built using IKVM 0.26. The associated version of GNU Classpath fixes a number of bugs, including a serious one involving decimal arithmetic.

W3C language conformance

The component extraction functions get-years-from-duration(), get-months-from-duration(), etc, now operate on any xs:duration value, not only on an xdt:yearMonthDuration or xdt:dayTimeDuration value. (W3C Bugzilla 2934)

The type names dayTimeDuration, yearMonthDuration, untypedAtomic, untyped, and anyAtomicType are now recognized in the xs namespace http://www.w3.org/2001/XMLSchema as well as in the previous xdt namespace (in fact several versions of the xdt namespace are recognized. This situation is transitional: eventually only the XMLSchema namespace will be allowed.

The functions encode-for-uri() and iri-to-uri() have been modified according to the changes agreed in W3C Bugzilla 2457

Casting from a derived type to a supertype is no longer a no-op. Although I believe that the language specification permits the previous behavior, it was controversial, and it seems better to do something that causes fewer surprises even if it is slower.

In schema-aware XQuery with multiple modules, error XQST0036 (an imported function or variable uses an unknown type) is now reported only if the function or variable is actually referenced in the importing module. See W3C Bugzilla 2546.

XSLT and XQuery error codes have been added for most validation errors. I have also started the process of incorporating XML Schema error codes as mandated by Appendix C of XML Schema Part 1.

Performance tuning

The internal MappingIterator and MappingFunction classes have been subdivided into three pairs of classes that provide different subsets of the functionality: ContextMappingIterator is used when each item being mapped becomes the context item; MappingIterator when this is not the case, and ItemMappingIterator when the mapping is from one input item to zero-or-one output items. This change was made to reduce code pathlengths in the most commonly used cases.

Handling of decimal values has been speeded up by using the JDK 1.5 method stripTrailingZeros if it is available (Saxon uses an equivalent but slower routine otherwise).

There have been some improvements to the join optimizer in Saxon-SA, allowing hash joins to be used in some situations where they were not used previously.

Improvements have been made to the memoizing optimization used for <xsl:number level="any"/>.

API Change on .NET

In the .NET API, the BaseUri property of an XsltCompiler is now a Uri rather than a String. This change is for compatibility with the BaseUri property of the DocumentBuilder class.

Next