xsl:character-map

The xsl:character-map declaration defines a named character map for use during serialization. The name attribute gives the name of the character map, which can be referenced from the use-character-maps attribute of xsl:output. The xsl:character-map element contains a set of xsl:output-character elements each of which defines the output representation of a given Unicode character. The character is specified using the character attribute, the string which is to replace this character on serialization is specified using the string attribute. Both attributes are mandatory.

The replacement string is output as is, even if it contains special (markup) characters. So, for example, you can define <xsl:output-character character="&#xa0;" string="&nbsp;"/> to ensure that NBSP characters are output using the entity reference &nbsp;.

Character maps allow you to produce output that is not well-formed XML, and they thus provide a replacement facility for disable-output-escaping. A useful technique is to use characters in the Unicode private use area (xE000 to xF8FF) as characters which, if present in the result tree, will be mapped to special strings on output. For example, if you want to generate a proprietary XML-like format that uses tags such as <!IF>, <!THEN>, and <!ELSE>, then you could map these to the three characters xE000, xE001, xE002 (which you could in turn define as entities so they can be written symbolically in your stylesheet or source document).

Character maps are preferred to disable-output-escaping because they do not rely on an intimate interface between the transformation engine and the serializer, and they do not distort the data model. The special characters can happily be stored in a DOM, passed across the SAX interface, or manipulated in any other way, before finally being rendered by the serializer.

Character maps may be assembled from other character maps using the use-character-maps attribute. This contains a space-separated list of the names of other character maps that are to be included in this character map.

Using character maps may be expensive at run-time. Saxon currently makes no special attempts to optimize their use: if character maps are used, then every character that is output will be looked up in a hash table to see if there is a replacement string.