Class UTF16CharacterSet

  • All Implemented Interfaces:
    CharacterSet

    public class UTF16CharacterSet
    extends java.lang.Object
    implements CharacterSet
    A class to hold some static constants and methods associated with processing UTF16 and surrogate pairs
    • Method Detail

      • getInstance

        public static UTF16CharacterSet getInstance()
        Get the singular instance of this class
        Returns:
        the singular instance of this class
      • inCharset

        public boolean inCharset​(int c)
        Description copied from interface: CharacterSet
        Determine if a character is present in the character set
        Specified by:
        inCharset in interface CharacterSet
        Parameters:
        c - the codepoint being tested
        Returns:
        true if the codepoint is supported
      • getCanonicalName

        public java.lang.String getCanonicalName()
        Description copied from interface: CharacterSet
        Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".
        Specified by:
        getCanonicalName in interface CharacterSet
        Returns:
        the preferred Java name
      • combinePair

        public static int combinePair​(char high,
                                      char low)
        Return the non-BMP character corresponding to a given surrogate pair surrogates.
        Parameters:
        high - The high surrogate.
        low - The low surrogate.
        Returns:
        the Unicode codepoint represented by the surrogate pair
      • highSurrogate

        public static char highSurrogate​(int ch)
        Return the high surrogate of a non-BMP character
        Parameters:
        ch - The Unicode codepoint of the non-BMP character to be divided.
        Returns:
        the first character in the surrogate pair
      • lowSurrogate

        public static char lowSurrogate​(int ch)
        Return the low surrogate of a non-BMP character
        Parameters:
        ch - The Unicode codepoint of the non-BMP character to be divided.
        Returns:
        the second character in the surrogate pair
      • isSurrogate

        public static boolean isSurrogate​(int c)
        Test whether a given character is a surrogate (high or low)
        Parameters:
        c - the character to test
        Returns:
        true if the character is the high or low half of a surrogate pair
      • isHighSurrogate

        public static boolean isHighSurrogate​(int ch)
        Test whether the given character is a high surrogate
        Parameters:
        ch - The character to test.
        Returns:
        true if the character is the first character in a surrogate pair
      • isLowSurrogate

        public static boolean isLowSurrogate​(int ch)
        Test whether the given character is a low surrogate
        Parameters:
        ch - The character to test.
        Returns:
        true if the character is the second character in a surrogate pair
      • firstInvalidChar

        public static int firstInvalidChar​(IntIterator iter,
                                           IntPredicateProxy predicate)
        Test whether all the characters in a CharSequence are valid XML characters
        Parameters:
        iter - iterator over the character sequence to be tested
        predicate - the predicate that all characters must satisfy
        Returns:
        the codepoint of the first invalid character in the character sequence (according to the supplied predicate); or -1 if all characters in the character sequence are valid