Package net.sf.saxon.str
Class UnicodeBuilder
- java.lang.Object
-
- net.sf.saxon.str.UnicodeBuilder
-
- All Implemented Interfaces:
UnicodeWriter
,UniStringConsumer
public final class UnicodeBuilder extends java.lang.Object implements UniStringConsumer, UnicodeWriter
Builder class to construct a UnicodeString by appending text incrementally
-
-
Constructor Summary
Constructors Constructor Description UnicodeBuilder()
Create a Unicode builder with an initial allocation of 256 codepointsUnicodeBuilder(int allocate)
Create a Unicode builder with an initial space allocation
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description UnicodeBuilder
accept(UnicodeString chars)
Process a supplied stringUnicodeBuilder
append(char ch)
Append a character, which must not be a surrogate.UnicodeBuilder
append(int codePoint)
Append a single unicode character to the contentUnicodeBuilder
append(java.lang.CharSequence str)
Append a Java CharSequence to the content.UnicodeBuilder
append(UnicodeString str)
Append a UnicodeString object to the content.UnicodeBuilder
append(IntIterator codePoints)
Append multiple unicode characters to the contentUnicodeBuilder
appendAll(SequenceIterator iter)
Append the string values of all the items in a sequence, with no separatorUnicodeBuilder
appendLatin(java.lang.String str)
Append a Java string to the content.void
clear()
Reset the contents of this builder to be emptyvoid
close()
Complete the writing of characters to the result.static byte[]
expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)
Expand the width of the characters in a byte arraystatic byte[]
expand1to2(byte[] in, int start, int used, int allocate)
Expand a byte array from 1-byte-per-character to 2-bytes-per-characterstatic byte[]
expand1to3(byte[] in, int start, int used, int allocate)
Expand a byte array from 1-byte-per-character to 3-bytes-per-characterstatic byte[]
expand2to3(byte[] in, int start, int used, int allocate)
Expand a byte array from 2-bytes-per-character to 3-bytes-per-characterstatic char[]
expandBytesToChars(byte[] in, int start, int end)
boolean
isEmpty()
Ask whether the content of the builder is emptylong
length()
Get the number of codepoints currently in the builderjava.lang.String
toString()
Return a string containing the character content of this builderStringValue
toStringItem(AtomicType type)
Construct a StringValue whose value is formed from the contents of this builderUnicodeString
toUnicodeString()
Construct a UnicodeString whose value is formed from the contents of this buildervoid
trimToSize()
void
write(java.lang.String chars)
Process a supplied stringvoid
write(UnicodeString chars)
Process a supplied stringvoid
writeAscii(byte[] content)
Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface net.sf.saxon.str.UnicodeWriter
flush, writeCodePoint, writeRepeatedAscii
-
Methods inherited from interface net.sf.saxon.str.UniStringConsumer
open
-
-
-
-
Constructor Detail
-
UnicodeBuilder
public UnicodeBuilder()
Create a Unicode builder with an initial allocation of 256 codepoints
-
UnicodeBuilder
public UnicodeBuilder(int allocate)
Create a Unicode builder with an initial space allocation- Parameters:
allocate
- the initial space allocation, in codepoints (32-bit integers)
-
-
Method Detail
-
append
public UnicodeBuilder append(char ch)
Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)- Parameters:
ch
- the character- Returns:
- this builder, with the new character added
-
append
public UnicodeBuilder append(int codePoint)
Append a single unicode character to the content- Parameters:
codePoint
- the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate- Returns:
- this builder, with the new character added
-
append
public UnicodeBuilder append(IntIterator codePoints)
Append multiple unicode characters to the content- Parameters:
codePoints
- an iterator delivering the codepoints to be added.- Returns:
- this builder, with the new characters added
-
appendLatin
public UnicodeBuilder appendLatin(java.lang.String str)
Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set- Parameters:
str
- the string to be appended- Returns:
- this builder, with the new string added
-
appendAll
public UnicodeBuilder appendAll(SequenceIterator iter)
Append the string values of all the items in a sequence, with no separator- Parameters:
iter
- the sequence of items- Returns:
- this builder, with the new items added
-
append
public UnicodeBuilder append(java.lang.CharSequence str)
Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs- Parameters:
str
- the string to be appended- Returns:
- this builder, with the new string added
-
append
public UnicodeBuilder append(UnicodeString str)
Append a UnicodeString object to the content.- Parameters:
str
- the string to be appended. The length is currently restricted to 2^31.- Returns:
- this builder, with the new string added
-
length
public long length()
Get the number of codepoints currently in the builder- Returns:
- the size in codepoints
-
isEmpty
public boolean isEmpty()
Ask whether the content of the builder is empty- Returns:
- true if the size is zero
-
toUnicodeString
public UnicodeString toUnicodeString()
Construct a UnicodeString whose value is formed from the contents of this builder- Returns:
- the constructed
UnicodeString
-
toStringItem
public StringValue toStringItem(AtomicType type)
Construct a StringValue whose value is formed from the contents of this builder- Parameters:
type
- the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out- Returns:
- the constructed StringValue
-
toString
public java.lang.String toString()
Return a string containing the character content of this builder- Overrides:
toString
in classjava.lang.Object
- Returns:
- the character content of this builder as a Java String
-
clear
public void clear()
Reset the contents of this builder to be empty
-
expand1to2
public static byte[] expand1to2(byte[] in, int start, int used, int allocate)
Expand a byte array from 1-byte-per-character to 2-bytes-per-character- Parameters:
in
- the input byte arraystart
- the start offset in bytesused
- the end offset in bytesallocate
- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expandBytesToChars
public static char[] expandBytesToChars(byte[] in, int start, int end)
-
expand1to3
public static byte[] expand1to3(byte[] in, int start, int used, int allocate)
Expand a byte array from 1-byte-per-character to 3-bytes-per-character- Parameters:
in
- the input byte arraystart
- the start offset in bytesused
- the end offset in bytesallocate
- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand2to3
public static byte[] expand2to3(byte[] in, int start, int used, int allocate)
Expand a byte array from 2-bytes-per-character to 3-bytes-per-character- Parameters:
in
- the input byte arraystart
- the start offset in bytesused
- the end offset in bytesallocate
- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand
public static byte[] expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)
Expand the width of the characters in a byte array- Parameters:
in
- the input byte arraystart
- the start offset in bytesend
- the end offset in bytesoldWidth
- the width of the characters (number of bytes per character) in the input arraynewWidth
- the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reducedallocate
- the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion- Returns:
- the new byte array
-
accept
public UnicodeBuilder accept(UnicodeString chars)
Process a supplied string- Specified by:
accept
in interfaceUniStringConsumer
- Parameters:
chars
- the characters to be processed- Returns:
- this CharSequenceConsumer (to allow method chaining)
-
write
public void write(UnicodeString chars)
Description copied from interface:UnicodeWriter
Process a supplied string- Specified by:
write
in interfaceUnicodeWriter
- Parameters:
chars
- the characters to be processed
-
writeAscii
public void writeAscii(byte[] content) throws java.io.IOException
Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array- Specified by:
writeAscii
in interfaceUnicodeWriter
- Parameters:
content
- byte array holding ASCII characters only- Throws:
java.io.IOException
- if processing fails for any reason
-
write
public void write(java.lang.String chars) throws java.io.IOException
Process a supplied string- Specified by:
write
in interfaceUnicodeWriter
- Parameters:
chars
- the characters to be processed- Throws:
java.io.IOException
- if processing fails for any reason
-
trimToSize
public void trimToSize()
-
close
public void close()
Complete the writing of characters to the result. The default implementation does nothing.- Specified by:
close
in interfaceUnicodeWriter
- Specified by:
close
in interfaceUniStringConsumer
-
-