XML Parsing and Serialization
Parsing
If a SAXSource
containing an XMLReader
is supplied to Saxon, Saxon now
respects the ErrorHandler
associated with the XMLReader
rather than replacing
it with its own.
Serialization
Some very basic support for HTML 5 has been added. If the serialization method is "html" and the version is "5.0", a
heading <!DOCTYPE HTML>
will be output regardless of the doctype-system
and doctype-public
properties.
A new serialization option saxon:recognize-binary
has been added for use with the text
output method
(only). If set to yes, the processing instructions <?hex XXXX?>
and <?b64 XXXX?>
will be
recognized; the value is taken as a hexBinary or base64 representation of a character string, encoded using the encoding in use by
the serializer, and this character string will be output without validating it to ensure it contains valid XML characters. This
enables non-XML characters, notably binary zero, to be output. For example, <?hex 0c?>
outputs an ASCII form feed.
Also recognized are <?hex.EEEE XXXX?>
and <?b64.EEEE XXXX?>
, where EEEE is the name of the encoding
of the base64 or hexBinary data: for example hex.ascii
or b64.utf8
.
A new UTF8 writer, contributed by Tatu Saloranta, is used in place of the standard Java UTF8 writer. The effect is to speed up serialization by around 20%; for a transformation that copies its input to its output, the improvement is about 10% overall.