Functions, operators, and data types for XPath 2.0

Changes in Functions, Operators, and Data Types

The nilled() function has been implemented.

Duration values (xs:duration, xdt:dayTimeDuration, and subtypes thereof) are now maintained to microsecond rather than millisecond precision. Trailing zeros in the fractional part of the seconds value are no longer displayed when converting the value to a string. Note that the precision of dateTime values is still milliseconds, a restriction of the underlying Java classes.

The normalize-unicode() function has been implemented. The normalization code used is based on the code published by the Unicode consortium, but with changes to avoid the heavy cost of reading the Unicode character database files every time Saxon starts up. The code has also been modified to remove its dependence on routines in the ICU product from IBM: these were all basic utility routines for handling UTF-16 encoding which already had equivalents in Saxon, and it seemed desirable to avoid introducing additional complexity into the software license. The Saxon version of the code has been used to run the conformance tests published by the Unicode consortium (with the exception of those tests that use chararacters not permitted in XML), using a test driver written in XSLT. The same code is also used to support the normalization-form serialization property. The normalization forms supported are NFC, NFD, NFKC, and NFKD (as well as the keyword "none"). The "fully-normalized" option is not implemented.

Negative zero is now output (converted to a string) as "-0" rather than "0" as in XPath 1.0.

When an untypedAtomic value is used in an arithmetic expression, for example @price+1, and the untypedAtomic value cannot be cast to a double, Saxon was returning NaN. It now returns NaN only when processing in backwards compatibility mode, and otherwise reports an error.

The rules for casting from xs:duration to xdt:yearMonthDuration or xdt:dayTimeDuration have been changed to match the current specification. This extracts the relevant components of the value, ignoring the other components: previously Saxon reported an error if the other components were non-zero.

Casting from xs:date to xs:gYear or xs:gYearMonth has been corrected to retain the "era" (BC or AD).

Overflow is now detected and reported when multiplying or dividing a duration by a number. Saxon supports xdt:yearMonthDuration values up to 2^31 months, and xdt:dayTimeDuration values up to 2^63 microseconds.

When converting an xs:duration to a string, zero components are now omitted; a duration of length zero is output as PT0S.

Saxon previously allowed any string to be used as the value of an xs:anyURI. It now applies checks that the string is valid. Specifically, the string after trimming leading and trailing whitespace must be one of the following: (a) a zero-length string, (b) a string that the java.net.URI class accepts as a valid URI, or (c) a string which after escaping of non-ASCII and other special characters (-_.!~*'()%;/?:@&=+$,#[]) is accepted by the java.net.URI class. This validation is applied only when casting a string to xs:anyURI, or when validating against the schema type xs:anyURI. To prevent problems interoperating with other software, Saxon continues to allow any string to be used as the namespace URI in an xs:QName. An example of an invalid URI is 1:2:3 - the scheme name, which is the part before the first colon, must start with a letter.

The standard URI resolver (used, for example, to resolve the URIs passed to the document() and doc() functions) now performs escaping of disallowed characters such as spaces. This means the "URI" that is passed can be either a valid URI, or a string that becomes a valid URI when such characters are escaped. (However, spaces cannot be used in contexts where a space-separated list of URIs is required, for example in the xsi:schemaLocation attribute).

There is now an option (-p on the command line) that causes the standard URI resolver to recognize query parameters supplied in the URI passed to the document() or doc() functions. For example, doc("books.xml?validation=strict") loads the contents of the file books.xml, and applies strict validation. Other options include strip-space=yes which strips all whitespace-only text nodes (regardless of the setting of xsl:strip-space in the stylesheet). Full details are provided in the description of the doc function.

The URI supplied as an argument to the collection() function may now be resolved by a user-defined CollectionURIResolver. Such a resolver may be registered using the setCollectionResolver() method on the Configuration object. The standard CollectionURIResolver behaves as before if the URI supplied to the collection() function identifies an XML catalogue file. As an alternative, however, the URI may be one that refers to a directory, with optional query parameters that filter the files in the directory. For example, collection("file:///c:/temp/docs/?select=*.xml") returns all files in the directory c:/temp/docs that have the file extension ".xml". In addition, the query parameter recurse=yes may be added to expand the directory recursively; the parameter validate=strict or validate=lax may be added to request schema validation; the parameter strip-space=yes may be added to request whitespace-stripping; the parameter on-error=fail|warning|ignore controls the action when processing of a file in the collection fails; the parameter parser=full.class.name selects a parser (SAX XMLReader) to be used to process the files (for example, Jown Cowan's TagSoup parser may be selected to process ill-formed HTML).

The type of the first argument to namespace-uri-for-prefix() is now an optional string. (Change agreed by the working groups, not yet published).