SAXONICA |
The encodings supported on input depend entirely on your choice of XML parser.
On output, any encoding supported by the Java VM or the .NET platform (as appropriate) may be used.
The encodings iso-646
and iso646
(in any mixture of upper and lower
case) are recognized as synonyms of US-ASCII
.
On the Java platform, there are some differences between the character encodings supported by the old java.io
package
and the new java.nio
package. If the requested encoding is not supported by the java.nio
package, then
all non-ASCII characters will be represented using numeric character references. If the encoding is
not supported by the java.io
package, then Saxon will revert to using UTF-8 as the actual output
encoding.
A list of the character encodings
supported in the java.nio
package can be obtained by using the command
java net.sf.saxon.charcode.CharacterSetFactory
,
with no parameters. Java does not provide any means of determining the list of encodings
supported by the java.io
package.
On output, character encoding is a two stage process. Saxon itself has to decide whether a particular character is supported by the chosen encoding. If not, it converts the character to a numeric character reference if it appears in a context where this would be valid; otherwise (for example it it appears in an element name) it reports an error. Then the character has to be converted to the appropriate sequence of bytes: this second stage is delegated to the Java VM.
For the first stage, Saxon handles certain encodings itself, because this is more efficient and more reliable. If an encoding is used that is known to Java but not known to Saxon, Saxon attempts to discover from the Java VM whether particular characters are encodable are not. The encodings that Saxon recognizes directly (including synonyms) are ASCII, US-ASCII, iso-646, iso646, iso-8859-1, ISO8859_1, iso-8859-2, ISO8859_2, iso-8859-5, ISO8859_5, iso-8859-7, ISO8859_7, iso-8859-8, ISO8859_8, iso-8859-9, ISO8859_9, UTF-8, UTF8, UTF-16, UTF16, KOI8-R, Big5, SJIS, Shift_JIS, EUC_CN, GB2312, EUC-JP, EUC-KR cp1250, windows-1250, cp1251, windows-1251, cp1252, windows-1252, cp852, windows-852.