saxonica.com

Serialization

The serialization property byte-order-mark="yes" is now honored when the selected encoding is utf-16le or utf-16be.

The HTML serialization method now uses named entity references in preference to decimal character references for characters outside the specified encoding, where an entity reference is known. Since the list of known entity references is confined to characters in the range xA0 to xFF, this only really affects the outcome when encoding="us-ascii". More detailed control is available using saxon:character-representation, though this may have to be changed in future since it is technically non-conformant - vendor-defined serialization attributes are no longer allowed to cause behaviour that contradicts the provisions of the serialization specification.

Saxon now implements the change to the specification made as a result of bug 3441: with the HTML and XHTML output methods, any generated <meta> element is now produced earlier in the serialization pipeline, which has the effect that characters in this element are subject to substitution by means of character maps.

A new serialization method saxon:xquery is available. This is intended to be useful when generating an XQuery query as the output of a query or stylesheet. This method differs from the XML serialization method in that "<" and ">" characters appearing between curly braces (but not between quotes) in text nodes and attribute nodes are not escaped. The idea is to allow queries to generated, or to be written within an XML document, and processed by first serializing them with this output method, then parsing the result with the XQuery parser. For example, the document <a>{$a &lt; '&lt;'}</a> will serialize as <a>{$a < '&lt;'}</a>.

With the XML output method, indentation is now suppressed for any element that is known to have mixed content: specifically, any element that is validated against a user-defined type (not xs:anyType or xs:untyped) that specifies mixed="true" in the schema. No whitespace will be added to the content of such an element. For simplicity, the option applies to all the descendants of the element, even if there are descendants that do not allow mixed content.

A new serialization parameter saxon:suppress-indentation is introduced for the XML output method. (It does not affect the HTML or XHTML output methods.) The value of the attribute is a whitespace-aeparated list of element names, and it works in the same way as cdata-section-elements (for example, values in xsl:output and xsl:result-document are cumulative). Its effect is that no indentation takes place for the children or descendants of any of the named elements (just as if they specified xml:space="preserve". This option is useful where parts of the output document contain mixed content where whitespace is significant.

Next