Handling of source documents
To conform with the W3C Data Model specification, an incompatible change has been introduced in this release: by default, "ignorable whitespace" is stripped from source documents before a query or transformation commences.
If the document has a DTD, then ignorable whitespace is whitespace appearing between the child elements of an element that is declared to have element-only content. Such whitespace is always reported as such when a validating parser is used, but there appear to be differences between parsers as to whether it is reported when not validating.
If the document has a schema, then ignorable whitespace is a whitespace text node appearing as a child of an element that has element-only content.
If there is no DTD and no schema, then whitespace is never ignorable.
The presence of an xml:space="preserve"
attribute has no effect on this process.
This whitespace stripping is additional (and prior) to any whitespace stripping requested using
xsl:strip-space
in an XSLT stylesheet. It occurs only when a document is built from a StreamSource
or SAXSource.
The default behavior can be overridden from the command line. The Transform
and Query
commands have three new options: -snone
strips no whitespace, -signorable
strips
ignorable whitespace, and -sall
strips all whitespace text nodes.
It is also possible to override the default behavior when using the doc()
or document()
function, if query URI parameters are enabled, by adding the parameter strip=no
to the URI.
XML 1.1 support
There is a new option setXMLVersion()
in the
Configuration
, which defaults to 1.0.
This configuration setting affects:
- validation of names used in XQuery and XPath expressions, including names of elements, attributes, functions, variables, and types
- validation of names of constructed elements, attributes, and processing instructions in XQuery and XSLT
- schema validation of values of type NCName, QName, NOTATION, and ID
- permitted names of stylesheet objects such as keys, templates, decimal-formats, output declarations, and output methods
- characters considered valid in the source of an XQuery query
- characters considered valid in the result of the functions codepoints-to-string() and unparsed-text()
- characters considered valid in the result of certain Saxon extension functions
- the way in which line endings in XQuery queries are normalized
- the default version used by the serializer (with output method XML)
The Saxon configuration setting has no effect on the XML parser. If XML 1.1 documents are supplied as input
to Saxon, then you MUST call config.setXMLVersion(Configuration.XML11)
(or use -1.1 on the command
line). Saxon won't necessarily detect the error if you fail to do so (especially if the documents don't use any
XML 1.1 features).
Note that there are a few incompatibilities introduced by this change, for example XQuery will only accept XML 1.1 line endings if the -1.1 flag is set.