net.sf.saxon.serialize.charcode

Interface Summary
Interface Description

CharacterSet
This interface defines properties of a character set, built in to the Saxon product.

Interface Summary
Interface	Description
CharacterSet	This interface defines properties of a character set, built in to the Saxon product.

Class Summary
Class	Description
ASCIICharacterSet	This class defines properties of the US-ASCII character set
CharacterSetFactory	This class delivers a CharacterSet object for a given named encoding.
ISO88591CharacterSet	This class defines properties of the ISO-8859-1 character set
JavaCharacterSet	This class establishes properties of a character set that is known to the Java VM but not specifically known to Saxon.
UTF16CharacterSet	A class to hold some static constants and methods associated with processing UTF16 and surrogate pairs
UTF8CharacterSet	This class defines properties of the UTF-8 character set
XMLCharacterData	This module contains data regarding the classification of characters in XML 1.0 and XML 1.1, and a number of interrogative methods to support queries on this data.

Package net.sf.saxon.serialize.charcode Description

This package provides classes for handling different character sets, especially when serializing the output of a query or transformation.

Most of the classes in this package are implementations of the interface CharacterSet. The sole function of these classes is to determine whether a particular character is present in the character set or not: if not, Saxon has to replace it with a character reference.

The actual translation of Unicode characters to characters in the selected encoding is left to the Java run-time library. (Note that different versions of Java support different sets of encodings, and there is no easy way to find out which encodings are supported in a given installation).

It is possible to configure Saxon to support additional character sets by writing an implementation of the CharacterSet interface, and registering this class with the Configuration using the call getCharacterSetFactory().setCharacterSetImplementation()

If an output encoding is requested that Saxon does not recognize, but which the Java platform does recognize, then Saxon attempts to determine which characters the encoding can represent, so that unsupported characters can be written as numeric character references. Saxon wraps the Java CharSet object in a JavaCharacterSet object, and tests whether a character is encodable by calling the Java interrogative encoding.canEncode(), caching the result locally. Since this mechanism appears to have become reliable in JDK 1.5, it is now used much more widely than before, and most character sets are now supported in Saxon by relying on this mechanism.