Package net.sf.saxon.serialize.charcode
Class UTF8CharacterSet
- java.lang.Object
-
- net.sf.saxon.serialize.charcode.UTF8CharacterSet
-
- All Implemented Interfaces:
CharacterSet
public final class UTF8CharacterSet extends java.lang.Object implements CharacterSet
This class defines properties of the UTF-8 character set
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static int
decodeUTF8(byte[] in, int used)
Decode a UTF8 characterstatic byte[]
encode(IntIterator codePoints)
Static method to generate the UTF-8 representation of a sequence of Unicode codepointsjava.lang.String
getCanonicalName()
Get the preferred Java name of the character set.static UTF8CharacterSet
getInstance()
Get the singular instance of this classstatic int
getUTF8Encoding(char in, char in2, byte[] out)
Static method to generate the UTF-8 representation of a Unicode characterboolean
inCharset(int c)
Determine if a character is present in the character set
-
-
-
Method Detail
-
getInstance
public static UTF8CharacterSet getInstance()
Get the singular instance of this class- Returns:
- the singular instance of this class
-
inCharset
public boolean inCharset(int c)
Description copied from interface:CharacterSet
Determine if a character is present in the character set- Specified by:
inCharset
in interfaceCharacterSet
- Parameters:
c
- the codepoint being tested- Returns:
- true if the codepoint is supported
-
getCanonicalName
public java.lang.String getCanonicalName()
Description copied from interface:CharacterSet
Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".- Specified by:
getCanonicalName
in interfaceCharacterSet
- Returns:
- the preferred Java name
-
getUTF8Encoding
public static int getUTF8Encoding(char in, char in2, byte[] out)
Static method to generate the UTF-8 representation of a Unicode character- Parameters:
in
- the Unicode character, or the high half of a surrogate pairin2
- the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)out
- an array of at least 4 bytes to hold the UTF-8 representation.- Returns:
- the number of bytes in the UTF-8 representation
-
encode
public static byte[] encode(IntIterator codePoints)
Static method to generate the UTF-8 representation of a sequence of Unicode codepoints- Parameters:
codePoints
- the sequence of Unicode codepoints: must not include surrogates- Returns:
- the UTF-8 encoding of the characters
-
decodeUTF8
public static int decodeUTF8(byte[] in, int used) throws java.lang.IllegalArgumentException
Decode a UTF8 character- Parameters:
in
- array of bytes representing a single UTF-8 encoded characterused
- number of bytes in the array that are actually used- Returns:
- the Unicode codepoint of this character
- Throws:
java.lang.IllegalArgumentException
- if the byte sequence is not a valid UTF-8 representation
-
-