|
|||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
CharacterSet | This interface defines properties of a character set, built in to the Saxon product. |
PluggableCharacterSet | This interface defines properties of a pluggable character set, that is, a user-supplied character set. |
Class Summary | |
---|---|
ASCIICharacterSet | This class defines properties of the US-ASCII character set |
Big5CharacterSet | |
BuggyCharacterSet | This class establishes properties of a character set that is known to the Java VM but not specifically known to Saxon. |
CharacterSetFactory | This class creates a CharacterSet object for a given named encoding. |
CP1250CharacterSet | This class defines properties of the cp1250 Central Europe character set, as defined at http://www.microsoft.com/globaldev/reference/sbcs/1250.htm. |
CP1251CharacterSet | This class defines properties of the CP1251 Cyrillic character set, as defined at http://www.microsoft.com/globaldev/reference/sbcs/1251.htm. |
CP1252CharacterSet | This class defines properties of the CP1252 (Latin 1) character set, as defined at http://www.microsoft.com/globaldev/reference/sbcs/1252.htm. |
CP852CharacterSet | This package defines character set CP852 |
EucJPCharacterSet | |
EucKRCharacterSet | |
GB2312CharacterSet | |
ISO88591CharacterSet | This class defines properties of the ISO-8859-1 character set |
ISO88592CharacterSet | This class defines properties of the ISO-8859-2 character set |
ISO88595CharacterSet | Description: This class implements the CharacterSet to support ISO-8859-5 (Latin/Cyrillic) encoding. |
ISO88597CharacterSet | |
ISO88598CharacterSet | |
ISO88599CharacterSet | |
KOI8RCharacterSet | This class defines properties of the KO18R Cyrillic character set |
ShiftJISCharacterSet | |
UnicodeCharacterSet | This class defines properties of the Unicode character set |
UnknownCharacterSet | This class establishes properties of a character set that is known to the Java VM but not specifically known to Saxon |
UTF16 | A class to hold some static constants and methods associated with processing UTF16 and surrogate pairs |
XMLCharacterData | This module contains data regarding the classification of characters in XML 1.0 and XML 1.1, and a number of interrogative methods to support queries on this data. |
This package provides classes for handling different output character sets.
The sole function of these classes is to determine whether a particular character is present in the character set or not: if not, Saxon has to replace it with a character reference.
The actual translation of Unicode characters to characters in the selected encoding is left to the Java run-time library. (Note that different versions of Java support different sets of encodings, and there is no easy way to find out which encodings are supported in a given installation).
It is possible to configure Saxon to support additional character sets by writing an implementation of the PluggableCharacterSet interface, and registering this class as the value of the system property whose name is given by the expression:
OutputKeys.ENCODING + "." + encoding
where "encoding" is the name of the encoding as used in <xsl:output> - for example, iso-8859-10.
If an output encoding is requested that Saxon does not recognize, but which the Java
platform does recognize, then Saxon attempts to determine which characters the encoding
can represent, so that unsupported characters can be written as numeric character references.
Saxon uses two approaches to doing this. (The logic for this is in the
CharacterSetFactory
class.) Where possible, it uses the UnknownCharacterSet
class, which tests the availability of individual characters using the Java interrogative
encoding.canEncode()
. However, some encodings do not implement this method
reliably; Saxon attempts to detect this, and represents such encodings instead using the
BuggyCharacterSet
class. This class attempts to encode each character, and relies
on catching an exception when it fails: expensive, but it only happens once for any given character.
Michael H. Kay
Saxonica Limited
9 February 2005
|
|||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |