Package net.sf.saxon.str
This package contains classes used to handle Unicode strings: notably implementations of the
UnicodeString
interface, which represents a string as a sequence of directly-addressible
Unicode codepoints (without relying on surrogate pairs).
-
Interface Summary Interface Description TwineConsumer Interface that accepts a a sequence of Unicode codepoints.UnicodeWriter Interface that accepts strings in the form ofUnicodeString
objects, which are written to some destination.UniStringConsumer Interface that accepts a string in the form of a sequence of CharSequences, which are conceptually concatenated (though in some implementations, the final string may never be materialized in memory) -
Class Summary Class Description AbstractUniStringConsumer This abstract implementation of UniStringConsumer exists largely for C#, as a place to capture the default methods defined in the interface, and avoid them proliferating into multiple subclassesBMPString An implementation ofUnicodeString
that wraps a Java string which is known to contain no surrogates.CodepointIterator Iterator over a string to produce a sequence of single character stringsCompressedWhitespace This class provides a compressed representation of a sequence of whitespace characters.EmptyUnicodeString A zero-length Unicode stringIndentWhitespace This class provides a compressed representation of a string used to represent indentation: specifically, an integer number of newlines followed by an integer number of spaces.LargeTextBuffer The segments (other than the last) have a fixed size of 65536 codepoints, which may use one byte per codepoint, two bytes per codepoint, or three bytes per codepoint, depending on the largest codepoint present in the segment.Slice16 A Unicode string consisting entirely of 16-bit BMP characters, implemented as a range of an underlying byte arraySlice24 A Unicode string consisting of 24-bit characters, implemented as a range of an underlying byte array holding three bytes per codepointSlice8 A Unicode string consisting entirely of 8-bit characters, implemented as a range of an underlying byte arrayStringConstants Contains constants representing some frequently used strings, either as aUnicodeString
or in some cases as a byte array.StringTool StringView An implementation of the CodePoints interface that wraps an ordinary Java string.ToLower Class to perform lowercase conversion.ToUpper Class to perform uppercase conversion.Twine16 Twine16
is a Unicode string consisting entirely of codepoints in the range 0-65535 (that is, the basic multilingual plane), excluding surrogates.Twine24 Twine24
is Unicode string that accommodates any codepoint value up to 24 bits.Twine8 Twine8
is Unicode string whose codepoints are all in the range 0-255 (that is, Latin-1).UnicodeBuilder Builder class to construct a UnicodeString by appending text incrementallyUnicodeChar A UnicodeString containing a single codepointUnicodeString A UnicodeString is a sequence of Unicode codepoints that supports codepoint addressing.UnicodeWriterToWriter Implementation ofUnicodeWriter
that converts Unicode strings to ordinary Java strings and sends them to a supplied WriterWhitespaceString This abstract class represents a couple of different implementations of strings containing whitespace only.ZenoString A ZenoString is an implementation of UnicodeString that comprises a list of segments representing substrings of the total string.