The xsl:character-map
declaration defines a named character map for use
during serialization. The name
attribute gives the name of the character map, which can be
referenced from the use-character-maps
attribute of xsl:output
. The
xsl:character-map
element contains a set of xsl:output-character
elements each
of which defines the output representation of a given Unicode character. The character is specified using
the character
attribute, the string which is to replace this character on serialization is
specified using the string
attribute. Both attributes are mandatory.
The replacement string is output as is, even if it contains special (markup) characters. So, for
example, you can define <xsl:output-character character=" " string=" "/> to ensure that
NBSP characters are output using the entity reference
.
Character maps allow you to produce output that is not well-formed XML, and they thus provide a replacement
facility for disable-output-escaping
. A useful technique is to use characters in the Unicode
private use area (xE000 to xF8FF) as characters which, if present in the result tree, will be mapped to
special strings on output. For example, if you want to generate a proprietary XML-like format that uses
tags such as <!IF>, <!THEN>, and <!ELSE>, then you could map these to the three characters
xE000, xE001, xE002 (which you could in turn define as entities so they can be written symbolically in your
stylesheet or source document).
Character maps are preferred to disable-output-escaping
because they do not rely on an
intimate interface between the transformation engine and the serializer, and they do not distort the data model. The
special characters can happily be stored in a DOM, passed across the SAX interface, or manipulated in any
other way, before finally being rendered by the serializer.
Character maps may be assembled from other character maps using the use-character-maps
attribute. This contains a space-separated list of the names of other character maps that are to be
included in this character map.
Using character maps may be expensive at run-time. I have not measured the effect. Saxon currently makes no special attempts to optimize their use: if character maps are used, then every character that is output will be looked up in a hash table to see if there is a replacement string.