Package net.sf.saxon.expr.parser
Class Tokenizer
- java.lang.Object
-
- net.sf.saxon.expr.parser.Tokenizer
-
public final class Tokenizer extends java.lang.Object
Tokenizer for expressions and inputs.This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.
-
-
Field Summary
Fields Modifier and Type Field Description boolean
allowSaxonExtensions
Flag to allow Saxon extensionsstatic int
BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("int
currentToken
The number identifying the most recently read tokenint
currentTokenStartOffset
The position in the input expression where the current token startsjava.lang.String
currentTokenValue
The string value of the most recently read tokenstatic int
DEFAULT_STATE
Initial default state of the Tokenizerboolean
disallowUnionKeyword
Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patternsstatic char
FULL_WIDTH_GT
static char
FULL_WIDTH_LT
java.lang.String
input
The string being parsedint
inputOffset
The current position within the input stringboolean
isXQuery
Flag to indicate that this is XQuery as distinct from XPathint
languageLevel
XPath language level: e.g.static char
NUL
static int
OPERATOR_STATE
State in which the next thing to be read is an operatorstatic int
SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType
-
Constructor Summary
Constructors Constructor Description Tokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
copyTo(Tokenizer u)
Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)int
getColumnNumber()
Get the column number of the current tokenint
getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expressionint
getLineNumber()
Get the line number of the current tokenint
getLineNumber(int offset)
Return the line number corresponding to a given offset in the expressionint
getState()
Get the current tokenizer statevoid
incrementLineNumber(int offset)
Increment the line number, making a record of where in the input string the newline character occurred.void
lookAhead()
Look ahead by one token.void
next()
Get the next token from the input expression.char
nextChar()
Read next character directly.char
peekChar()
Look ahead to see what the next character will be, without changing the current statevoid
setState(int state)
Set the tokenizer into a special stateboolean
thereMightBeAnArrowAhead()
Return true if there is a thin arrow ("->") somewhere beyond the current position.void
tokenize(java.lang.String input, int start, int end)
Prepare a string for tokenization.void
treatCurrentAsOperator()
Force the current token to be treated as an operator if possiblevoid
unreadChar()
Step back one character.
-
-
-
Field Detail
-
FULL_WIDTH_LT
public static final char FULL_WIDTH_LT
- See Also:
- Constant Field Values
-
FULL_WIDTH_GT
public static final char FULL_WIDTH_GT
- See Also:
- Constant Field Values
-
NUL
public static final char NUL
- See Also:
- Constant Field Values
-
DEFAULT_STATE
public static final int DEFAULT_STATE
Initial default state of the Tokenizer- See Also:
- Constant Field Values
-
BARE_NAME_STATE
public static final int BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("- See Also:
- Constant Field Values
-
SEQUENCE_TYPE_STATE
public static final int SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType- See Also:
- Constant Field Values
-
OPERATOR_STATE
public static final int OPERATOR_STATE
State in which the next thing to be read is an operator- See Also:
- Constant Field Values
-
currentToken
public int currentToken
The number identifying the most recently read token
-
currentTokenValue
public java.lang.String currentTokenValue
The string value of the most recently read token
-
currentTokenStartOffset
public int currentTokenStartOffset
The position in the input expression where the current token starts
-
input
public java.lang.String input
The string being parsed
-
inputOffset
public int inputOffset
The current position within the input string
-
disallowUnionKeyword
public boolean disallowUnionKeyword
Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns
-
isXQuery
public boolean isXQuery
Flag to indicate that this is XQuery as distinct from XPath
-
languageLevel
public int languageLevel
XPath language level: e.g. 2.0, 3.0, or 3.1
-
allowSaxonExtensions
public boolean allowSaxonExtensions
Flag to allow Saxon extensions
-
-
Method Detail
-
getState
public int getState()
Get the current tokenizer state- Returns:
- the current state
-
setState
public void setState(int state)
Set the tokenizer into a special state- Parameters:
state
- the new state
-
tokenize
public void tokenize(java.lang.String input, int start, int end) throws XPathException
Prepare a string for tokenization. The actual tokens are obtained by calls on next()- Parameters:
input
- the string to be tokenizedstart
- start point within the stringend
- end point within the string (last character not read): -1 means end of string- Throws:
XPathException
- if a lexical error occurs, e.g. unmatched string quotes
-
next
public void next() throws XPathException
Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.- Throws:
XPathException
- if a lexical error is detected
-
thereMightBeAnArrowAhead
public boolean thereMightBeAnArrowAhead()
Return true if there is a thin arrow ("->") somewhere beyond the current position. This can be used to eliminate unnecessary lookahead- Returns:
- true if a thin arrow is present. Of course, this might be a false positive.
-
treatCurrentAsOperator
public void treatCurrentAsOperator()
Force the current token to be treated as an operator if possible
-
lookAhead
public void lookAhead() throws XPathException
Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.- Throws:
XPathException
- if a lexical error occurs
-
nextChar
public char nextChar()
Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax- Returns:
- the next character from the input, or NUL at the end of the input
-
peekChar
public char peekChar()
Look ahead to see what the next character will be, without changing the current state- Returns:
- the next character, or NUL at the end of the input.
-
incrementLineNumber
public void incrementLineNumber(int offset)
Increment the line number, making a record of where in the input string the newline character occurred.- Parameters:
offset
- the place in the input string where the newline occurred
-
unreadChar
public void unreadChar()
Step back one character. If this steps back to a previous line, adjust the line number. If we have already read off the end of the input, do nothing.
-
copyTo
public void copyTo(Tokenizer u)
Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)- Parameters:
u
- When checkpointing, a Tokenizer used simply to hold the state so that it can be restored later. This tokenizer is not capable of active tokenizing because many of its variables are uninitialised. When restoring from a checkpoint, the original tokenizer whose state is to be restored.
-
getLineNumber
public int getLineNumber()
Get the line number of the current token- Returns:
- the line number. Line numbers reported by the tokenizer start at zero.
-
getColumnNumber
public int getColumnNumber()
Get the column number of the current token- Returns:
- the column number. Column numbers reported by the tokenizer start at zero.
-
getLineNumber
public int getLineNumber(int offset)
Return the line number corresponding to a given offset in the expression- Parameters:
offset
- the byte offset in the expression- Returns:
- the line number. Line and column numbers reported by the tokenizer start at zero.
-
getColumnNumber
public int getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression- Parameters:
offset
- the byte offset in the expression- Returns:
- the column number. Line and column numbers reported by the tokenizer start at zero.
-
-