Package net.sf.saxon.str
Class StringTool
- java.lang.Object
-
- net.sf.saxon.str.StringTool
-
public class StringTool extends java.lang.Object
-
-
Constructor Summary
Constructors Constructor Description StringTool()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
appendRepeated(java.lang.StringBuilder builder, char ch, int count)
Insert repeated occurrences of a given character at the end of a StringBuilderstatic IntIterator
codePoints(java.lang.CharSequence value)
Get an iterator over the codepoints in aCharSequence
- typically aString
static UnicodeString
compress(char[] in, int offset, int len, boolean compressWS)
Attempt to compress a UnicodeString consisting entirely of whitespace.static boolean
containsSurrogates(java.lang.CharSequence str)
Ask whether a string contains astral characters (represented as surrogate pairs)static void
copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)
Copy from an array of 16-bit characters to an array holding 16-bit characters.static void
copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)
Copy from an array of 8-bit characters to an array holding 16-bit characters.static void
copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.static java.lang.String
diagnosticDisplay(java.lang.String s)
Produce a diagnostic representation of the contents of the stringstatic int[]
expand(UnicodeString s)
Expand a string into an array of 32-bit charactersstatic UnicodeString
fromCharSequence(java.lang.CharSequence chars)
Construct aUnicodeString
from aCharSequence
- typically aString
static UnicodeString
fromCodePoints(int[] codes, int used)
Contract an array of integers containing Unicode codepoints into a stringstatic UnicodeString
fromLatin1(java.lang.String str)
Construct aUnicodeString
from aString
that is known to consist entirely of 8-bit Latin-1 characters.static int
getStringLength(java.lang.CharSequence s)
Get the length of a string, as defined in XPath.static int
lastCodePoint(UnicodeString str)
Get the last codepoint in a UnicodeStringstatic long
lastIndexOf(UnicodeString str, int codePoint)
Get the position of the last occurrence of a given codepoint within a stringstatic void
prependRepeated(java.lang.StringBuilder builder, char ch, int count)
Insert repeated occurrences of a given character at the start of a StringBuilderstatic void
prependWideChar(java.lang.StringBuilder builder, int ch)
Insert a wide character (surrogate pair) at the start of a StringBuilder
-
-
-
Method Detail
-
getStringLength
public static int getStringLength(java.lang.CharSequence s)
Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.- Parameters:
s
- The string whose length is required- Returns:
- the length of the string in Unicode code points
-
expand
public static int[] expand(UnicodeString s)
Expand a string into an array of 32-bit characters- Parameters:
s
- the string to be expanded- Returns:
- an array of integers representing the Unicode code points
-
containsSurrogates
public static boolean containsSurrogates(java.lang.CharSequence str)
Ask whether a string contains astral characters (represented as surrogate pairs)- Parameters:
str
- the string to be tested- Returns:
- true if the string contains surrogate characters
-
fromCodePoints
public static UnicodeString fromCodePoints(int[] codes, int used)
Contract an array of integers containing Unicode codepoints into a string- Parameters:
codes
- an array of integers representing the Unicode code pointsused
- the number of items in the array that are actually used- Returns:
- the constructed string
-
fromCharSequence
public static UnicodeString fromCharSequence(java.lang.CharSequence chars)
Construct aUnicodeString
from aCharSequence
- typically aString
- Parameters:
chars
- the suppliedString
orCharSequence
- Returns:
- the equivalent
UnicodeString
-
fromLatin1
public static UnicodeString fromLatin1(java.lang.String str)
Construct aUnicodeString
from aString
that is known to consist entirely of 8-bit Latin-1 characters.- Parameters:
str
- the suppliedString
: the caller warrants that this contains no characters with codepoint higher than 255.- Returns:
- the equivalent
UnicodeString
-
codePoints
public static IntIterator codePoints(java.lang.CharSequence value)
Get an iterator over the codepoints in aCharSequence
- typically aString
- Parameters:
value
- the supplied string- Returns:
- an
IntIterator
allowing iteration over the codepoints. Note the protocol forIntIterator
requires exactly one call ofIntIterator.hasNext()
before every call ofIntIterator.next()
-
diagnosticDisplay
public static java.lang.String diagnosticDisplay(java.lang.String s)
Produce a diagnostic representation of the contents of the string- Parameters:
s
- the string- Returns:
- a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
-
prependWideChar
public static void prependWideChar(java.lang.StringBuilder builder, int ch)
Insert a wide character (surrogate pair) at the start of a StringBuilder- Parameters:
builder
- the string builderch
- the codepoint of the character to be inserted
-
prependRepeated
public static void prependRepeated(java.lang.StringBuilder builder, char ch, int count)
Insert repeated occurrences of a given character at the start of a StringBuilder- Parameters:
builder
- the string builderch
- the character to be insertedcount
- the number of repetitions
-
appendRepeated
public static void appendRepeated(java.lang.StringBuilder builder, char ch, int count)
Insert repeated occurrences of a given character at the end of a StringBuilder- Parameters:
builder
- the string builderch
- the character to be insertedcount
- the number of repetitions
-
lastCodePoint
public static int lastCodePoint(UnicodeString str)
Get the last codepoint in a UnicodeString- Parameters:
str
- the input string- Returns:
- the integer value of the last character in the string
- Throws:
java.lang.IndexOutOfBoundsException
- if the string is empty
-
lastIndexOf
public static long lastIndexOf(UnicodeString str, int codePoint)
Get the position of the last occurrence of a given codepoint within a string- Parameters:
str
- the input stringcodePoint
- the sought codepoint- Returns:
- the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
-
compress
public static UnicodeString compress(char[] in, int offset, int len, boolean compressWS)
Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node- Parameters:
in
- the Unicode string to be compressedoffset
- the start position of the substring we are interested inlen
- the length of the substring we are interested incompressWS
- set to true if whitespace compression is to be attempted- Returns:
- the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
-
copy8to16
public static void copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)
Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source arraysourcePos
- the position in the source array where copying is to startdest
- the destination arraydestPos
- the position in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-
copy8to24
public static void copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source arraysourcePos
- the position in the source array where copying is to startdest
- the destination array, using three bytes per codepointdestPos
- the codepoint position (not byte position) in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-
copy16to24
public static void copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)
Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source array. The caller is responsible for ensuring that this contains no surrogatessourcePos
- the position in the source array where copying is to startdest
- the destination arraydestPos
- the position in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-
-