Package net.sf.saxon.str
Class Twine24
- java.lang.Object
-
- net.sf.saxon.str.UnicodeString
-
- net.sf.saxon.str.Twine24
-
- All Implemented Interfaces:
java.lang.Comparable<UnicodeString>,AtomicMatchKey
public class Twine24 extends UnicodeString
Twine24is Unicode string that accommodates any codepoint value up to 24 bits. It never includes any surrogates. The length of the string is limited to 2^31-1 codepoints.
-
-
Field Summary
Fields Modifier and Type Field Description protected byte[]bytesprotected intcachedHash
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intcodePointAt(long index)Get the code point at a given position in the stringIntIteratorcodePoints()Get an iterator over the Unicode codepoints in the value.intcompareTo(UnicodeString other)Compare this string to another using codepoint comparisonjava.lang.Stringdetails()booleanequals(java.lang.Object o)Test whether this string is equal to another under the rules of the codepoint collation.byte[]getByteArray()intgetWidth()Get the number of bits needed to hold all the characters in this stringinthashCode()Compute a hashCode.longindexOf(int code, long from)Get the first position, at or beyond start, where a given codepoint appears in this string.longindexOf(UnicodeString other, long from)Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.longindexWhere(java.util.function.IntPredicate predicate, long from)Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the stringbooleanisEmpty()Determine whether the string is a zero-length string.longlength()Get the length of this string, in codepointsintlength32()Get the length of the string, provided it is less than 2^31 charactersUnicodeStringsubstring(long start, long end)Get a substring of this string (following the rules ofString.substring(int), but measuring Unicode codepoints rather than 16-bit code units)java.lang.StringtoString()Display as a string.-
Methods inherited from class net.sf.saxon.str.UnicodeString
asAtomic, checkSubstringBounds, concat, economize, estimatedLength, hasSubstring, indexOf, prefix, requireInt, substring, tidy, verifyCharacters
-
-
-
-
Constructor Detail
-
Twine24
protected Twine24(byte[] bytes)
Protected constructor- Parameters:
bytes- the Unicode characters, three bytes per character
-
Twine24
public Twine24(int[] codePoints, int used)Construct aTwinefrom an array of codepoints.- Parameters:
codePoints- the codepoints making up the string: must not contain any surrogates (that is, codepoints higher than 65535 must be supplied as a single unit)
-
Twine24
public Twine24(int[] codePoints)
Construct aTwinefrom an array of codepoints.- Parameters:
codePoints- the codepoints making up the string: must not contain any surrogates (that is, codepoints higher than 65535 must be supplied as a single unit)
-
-
Method Detail
-
getByteArray
public byte[] getByteArray()
-
length
public long length()
Get the length of this string, in codepoints- Specified by:
lengthin classUnicodeString- Returns:
- the length of the string in Unicode code points
-
length32
public int length32()
Description copied from class:UnicodeStringGet the length of the string, provided it is less than 2^31 characters- Overrides:
length32in classUnicodeString- Returns:
- the length of the string if it fits within a Java
int
-
substring
public UnicodeString substring(long start, long end)
Get a substring of this string (following the rules ofString.substring(int), but measuring Unicode codepoints rather than 16-bit code units)- Specified by:
substringin classUnicodeString- Parameters:
start- the offset of the first character to be included in the result, counting Unicode codepointsend- the offset of the first character to be excluded from the result, counting Unicode codepoints- Returns:
- the substring
-
codePointAt
public int codePointAt(long index) throws java.lang.IndexOutOfBoundsExceptionDescription copied from class:UnicodeStringGet the code point at a given position in the string- Specified by:
codePointAtin classUnicodeString- Parameters:
index- the given position (0-based)- Returns:
- the code point at the given position
- Throws:
java.lang.IndexOutOfBoundsException- if the index is out of range
-
indexOf
public long indexOf(int code, long from)Get the first position, at or beyond start, where a given codepoint appears in this string.- Specified by:
indexOfin classUnicodeString- Parameters:
code- the sought codepointfrom- the position (0-based) where searching is to start (counting in codepoints)- Returns:
- the first position where the substring is found, or -1 if it is not found
-
indexOf
public long indexOf(UnicodeString other, long from)
Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.- Overrides:
indexOfin classUnicodeString- Parameters:
other- the other (sought) stringfrom- the position (0-based) where searching is to start (counting in codepoints)- Returns:
- the first position where the substring is found, or -1 if it is not found
-
isEmpty
public boolean isEmpty()
Determine whether the string is a zero-length string. This may be more efficient than testing whether the length is equal to zero- Overrides:
isEmptyin classUnicodeString- Returns:
- true if the string is zero length
-
getWidth
public int getWidth()
Description copied from class:UnicodeStringGet the number of bits needed to hold all the characters in this string- Specified by:
getWidthin classUnicodeString- Returns:
- 7 for ascii characters (not used??), 8 for latin-1, 16 for BMP, 24 for general Unicode.
-
codePoints
public IntIterator codePoints()
Get an iterator over the Unicode codepoints in the value. These will always be full codepoints, never surrogates (surrogate pairs are combined where necessary).- Specified by:
codePointsin classUnicodeString- Returns:
- a sequence of Unicode codepoints
-
hashCode
public int hashCode()
Compute a hashCode. All implementations ofUnicodeStringuse compatible hash codes and the hashing algorithm is therefore identical to that forjava.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.- Overrides:
hashCodein classUnicodeString- Returns:
- the hash code
-
equals
public boolean equals(java.lang.Object o)
Test whether this string is equal to another under the rules of the codepoint collation.- Overrides:
equalsin classUnicodeString- Parameters:
o- the value to be compared with this value- Returns:
- true if the strings are equal on a codepoint-by-codepoint basis
-
compareTo
public int compareTo(UnicodeString other)
Description copied from class:UnicodeStringCompare this string to another using codepoint comparison- Specified by:
compareToin interfacejava.lang.Comparable<UnicodeString>- Overrides:
compareToin classUnicodeString- Parameters:
other- the other string- Returns:
- -1 if this string comes first, 0 if they are equal, +1 if the other string comes first
-
toString
public java.lang.String toString()
Display as a string.- Overrides:
toStringin classjava.lang.Object
-
indexWhere
public long indexWhere(java.util.function.IntPredicate predicate, long from)Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string- Overrides:
indexWherein classUnicodeString- Parameters:
predicate- condition that the codepoint must satisfyfrom- the position from which the search should start (0-based)- Returns:
- the position (0-based) of the first codepoint to match the predicate, or -1 if not found
- Throws:
java.lang.UnsupportedOperationException- if theUnicodeStringhas not been prepared for codePoint access
-
details
public java.lang.String details()
-
-