Functions, operators, and data types for XPath 2.0
The nilled()
function has been implemented.
Duration values (xs:duration, xdt:dayTimeDuration, and subtypes thereof) are now maintained to microsecond rather than millisecond precision. Trailing zeros in the fractional part of the seconds value are no longer displayed when converting the value to a string. Note that the precision of dateTime values is still milliseconds, a restriction of the underlying Java classes.
The normalize-unicode()
function has been implemented. The normalization code
used is based on the code published by the Unicode consortium, but with changes to avoid the heavy
cost of reading the Unicode character database files every time Saxon starts up. The code has also been
modified to remove its dependence on routines in the ICU product from IBM: these were all basic utility
routines for handling UTF-16 encoding which already had equivalents in Saxon, and it seemed desirable
to avoid introducing additional complexity into the software license. The Saxon version of the code has been used
to run the conformance tests published by the Unicode consortium (with the exception of those tests
that use chararacters not permitted in XML), using a test driver written in XSLT. The same code is
also used to support the normalization-form
serialization property. The normalization
forms supported are NFC, NFD, NFKC, and NFKD (as well as the keyword "none"). The "fully-normalized"
option is not implemented.
Negative zero is now output (converted to a string) as "-0" rather than "0" as in XPath 1.0.
When an untypedAtomic value is used in an arithmetic expression, for example @price+1
,
and the untypedAtomic value cannot be cast to a double, Saxon was returning NaN. It now returns NaN
only when processing in backwards compatibility mode, and otherwise reports an error.
The rules for casting from xs:duration
to xdt:yearMonthDuration
or
xdt:dayTimeDuration
have been changed to match the current specification. This
extracts the relevant components of the value, ignoring the other components: previously Saxon reported
an error if the other components were non-zero.
Casting from xs:date
to xs:gYear
or xs:gYearMonth
has been corrected to retain the "era" (BC or AD).
Overflow is now detected and reported when multiplying or dividing a duration by a number.
Saxon supports xdt:yearMonthDuration
values up to 2^31 months, and xdt:dayTimeDuration
values up to 2^63 microseconds.
When converting an xs:duration
to a string, zero components are now omitted;
a duration of length zero is output as PT0S
.
Saxon previously allowed any string to be used as the value of an xs:anyURI
. It now
applies checks that the string is valid. Specifically, the string after trimming leading and trailing whitespace
must be one of the following: (a) a zero-length string, (b) a string that the java.net.URI
class
accepts as a valid URI, or (c) a string which after escaping of non-ASCII and other special characters
(-_.!~*'()%;/?:@&=+$,#[]) is accepted by the java.net.URI
class. This validation is applied only
when casting a string to xs:anyURI
, or when validating against the schema type xs:anyURI
.
To prevent problems interoperating with other software, Saxon continues to allow any string to be used
as the namespace URI in an xs:QName
. An example of an invalid URI is 1:2:3
- the
scheme name, which is the part before the first colon, must start with a letter.
The standard URI resolver (used, for example, to resolve the URIs passed to the document() and doc()
functions) now performs escaping of disallowed characters such as spaces. This means the "URI"
that is passed can be either a valid URI, or a string that becomes a valid URI when such characters
are escaped. (However, spaces cannot be used in contexts where a space-separated list of URIs is required,
for example in the xsi:schemaLocation
attribute).
There is now an option (-p on the command line) that causes the standard URI resolver to recognize query
parameters supplied in the URI passed to the document() or doc() functions. For example,
doc("books.xml?validation=strict")
loads the contents of the file books.xml
,
and applies strict validation. Other options include strip-space=yes
which strips all whitespace-only
text nodes (regardless of the setting of xsl:strip-space in the stylesheet). Full details are provided
in the description of the doc
function.
The URI supplied as an argument to the collection()
function may now be resolved by a
user-defined CollectionURIResolver
. Such a resolver may be registered using
the setCollectionResolver()
method on the Configuration
object. The standard
CollectionURIResolver
behaves as before if the URI supplied to the collection()
function identifies an XML catalogue file. As an alternative, however, the URI may be one that refers
to a directory, with optional query parameters that filter
the files in the directory. For example, collection("file:///c:/temp/docs/?select=*.xml")
returns
all files in the directory c:/temp/docs
that have the file extension ".xml". In addition,
the query parameter
recurse=yes
may be added to expand the directory recursively; the parameter
validate=strict
or validate=lax
may be added to request schema validation;
the parameter strip-space=yes
may be added to request whitespace-stripping; the parameter
on-error=fail|warning|ignore
controls the action when processing of a file in the collection fails;
the parameter parser=full.class.name
selects a parser (SAX XMLReader) to be used to process
the files (for example, Jown Cowan's TagSoup parser may be selected to process ill-formed HTML).
The type of the first argument to namespace-uri-for-prefix()
is now an optional string.
(Change agreed by the working groups, not yet published).