java.lang.Object
- net.sf.saxon.expr.parser.Tokenizer

```
public final class Tokenizer
extends java.lang.Object
```
Tokenizer for expressions and inputs.
This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.

Field Summary

Fields
Modifier and Type	Field	Description
`boolean`	`allowSaxonExtensions`	Flag to allow Saxon extensions
`static int`	`BARE_NAME_STATE`	State in which a name is NOT to be merged with what comes next, for example "("
`int`	`currentToken`	The number identifying the most recently read token
`int`	`currentTokenStartOffset`	The position in the input expression where the current token starts
`java.lang.String`	`currentTokenValue`	The string value of the most recently read token
`static int`	`DEFAULT_STATE`	Initial default state of the Tokenizer
`boolean`	`disallowUnionKeyword`	Flag to disallow "union" as a synonym for "\|" when parsing XSLT 2.0 patterns
`static char`	`FULL_WIDTH_GT`
`static char`	`FULL_WIDTH_LT`
`java.lang.String`	`input`	The string being parsed
`int`	`inputOffset`	The current position within the input string
`boolean`	`isXQuery`	Flag to indicate that this is XQuery as distinct from XPath
`int`	`languageLevel`	XPath language level: e.g.
`static char`	`NUL`
`static int`	`OPERATOR_STATE`	State in which the next thing to be read is an operator
`static int`	`SEQUENCE_TYPE_STATE`	State in which the next thing to be read is a SequenceType

Constructor Summary

Constructors
Constructor Description

Tokenizer()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`copyTo(Tokenizer u)`	Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)
`int`	`getColumnNumber()`	Get the column number of the current token
`int`	`getColumnNumber(int offset)`	Return the column number corresponding to a given offset in the expression
`int`	`getLineNumber()`	Get the line number of the current token
`int`	`getLineNumber(int offset)`	Return the line number corresponding to a given offset in the expression
`int`	`getState()`	Get the current tokenizer state
`void`	`incrementLineNumber(int offset)`	Increment the line number, making a record of where in the input string the newline character occurred.
`void`	`lookAhead()`	Look ahead by one token.
`void`	`next()`	Get the next token from the input expression.
`char`	`nextChar()`	Read next character directly.
`char`	`peekChar()`	Look ahead to see what the next character will be, without changing the current state
`void`	`setState(int state)`	Set the tokenizer into a special state
`boolean`	`thereMightBeAnArrowAhead()`	Return true if there is a thin arrow ("->") somewhere beyond the current position.
`void`	`tokenize(java.lang.String input, int start, int end)`	Prepare a string for tokenization.
`void`	`treatCurrentAsOperator()`	Force the current token to be treated as an operator if possible
`void`	`unreadChar()`	Step back one character.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - FULL_WIDTH_LT
```
public static final char FULL_WIDTH_LT
```
    See Also:
    
    Constant Field Values
  - FULL_WIDTH_GT
```
public static final char FULL_WIDTH_GT
```
    See Also:
    
    Constant Field Values
  - NUL
```
public static final char NUL
```
    See Also:
    
    Constant Field Values
  - DEFAULT_STATE
```
public static final int DEFAULT_STATE
```
    Initial default state of the Tokenizer
    
    See Also:
    
    Constant Field Values
  - BARE_NAME_STATE
```
public static final int BARE_NAME_STATE
```
    State in which a name is NOT to be merged with what comes next, for example "("
    
    See Also:
    
    Constant Field Values
  - SEQUENCE_TYPE_STATE
```
public static final int SEQUENCE_TYPE_STATE
```
    State in which the next thing to be read is a SequenceType
    
    See Also:
    
    Constant Field Values
  - OPERATOR_STATE
```
public static final int OPERATOR_STATE
```
    State in which the next thing to be read is an operator
    
    See Also:
    
    Constant Field Values
  - currentToken
```
public int currentToken
```
    The number identifying the most recently read token
  - currentTokenValue
```
public java.lang.String currentTokenValue
```
    The string value of the most recently read token
  - currentTokenStartOffset
```
public int currentTokenStartOffset
```
    The position in the input expression where the current token starts
  - input
```
public java.lang.String input
```
    The string being parsed
  - inputOffset
```
public int inputOffset
```
    The current position within the input string
  - disallowUnionKeyword
```
public boolean disallowUnionKeyword
```
    Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns
  - isXQuery
```
public boolean isXQuery
```
    Flag to indicate that this is XQuery as distinct from XPath
  - languageLevel
```
public int languageLevel
```
    XPath language level: e.g. 2.0, 3.0, or 3.1
  - allowSaxonExtensions
```
public boolean allowSaxonExtensions
```
    Flag to allow Saxon extensions
- Constructor Detail
  - Tokenizer
```
public Tokenizer()
```
- Method Detail
  - getState
```
public int getState()
```
    Get the current tokenizer state
    
    Returns:
    
    the current state
  - setState
```
public void setState(int state)
```
    Set the tokenizer into a special state
    
    Parameters:
    
    state - the new state
  - tokenize
```
public void tokenize(java.lang.String input,
                     int start,
                     int end)
              throws XPathException
```
    Prepare a string for tokenization. The actual tokens are obtained by calls on next()
    
    Parameters:
    
    input - the string to be tokenized
    
    start - start point within the string
    
    end - end point within the string (last character not read): -1 means end of string
    
    Throws:
    
    XPathException - if a lexical error occurs, e.g. unmatched string quotes
  - next
```
public void next()
          throws XPathException
```
    Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.
    
    Throws:
    
    XPathException - if a lexical error is detected
  - thereMightBeAnArrowAhead
```
public boolean thereMightBeAnArrowAhead()
```
    Return true if there is a thin arrow ("->") somewhere beyond the current position. This can be used to eliminate unnecessary lookahead
    
    Returns:
    
    true if a thin arrow is present. Of course, this might be a false positive.
  - treatCurrentAsOperator
```
public void treatCurrentAsOperator()
```
    Force the current token to be treated as an operator if possible
  - lookAhead
```
public void lookAhead()
               throws XPathException
```
    Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.
    
    Throws:
    
    XPathException - if a lexical error occurs
  - nextChar
```
public char nextChar()
```
    Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax
    
    Returns:
    
    the next character from the input, or NUL at the end of the input
  - peekChar
```
public char peekChar()
```
    Look ahead to see what the next character will be, without changing the current state
    
    Returns:
    
    the next character, or NUL at the end of the input.
  - incrementLineNumber
```
public void incrementLineNumber(int offset)
```
    Increment the line number, making a record of where in the input string the newline character occurred.
    
    Parameters:
    
    offset - the place in the input string where the newline occurred
  - unreadChar
```
public void unreadChar()
```
    Step back one character. If this steps back to a previous line, adjust the line number. If we have already read off the end of the input, do nothing.
  - copyTo
```
public void copyTo(Tokenizer u)
```
    Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)
    
    Parameters:
    
    u - When checkpointing, a Tokenizer used simply to hold the state so that it can be restored later. This tokenizer is not capable of active tokenizing because many of its variables are uninitialised. When restoring from a checkpoint, the original tokenizer whose state is to be restored.
  - getLineNumber
```
public int getLineNumber()
```
    Get the line number of the current token
    
    Returns:
    
    the line number. Line numbers reported by the tokenizer start at zero.
  - getColumnNumber
```
public int getColumnNumber()
```
    Get the column number of the current token
    
    Returns:
    
    the column number. Column numbers reported by the tokenizer start at zero.
  - getLineNumber
```
public int getLineNumber(int offset)
```
    Return the line number corresponding to a given offset in the expression
    
    Parameters:
    
    offset - the byte offset in the expression
    
    Returns:
    
    the line number. Line and column numbers reported by the tokenizer start at zero.
  - getColumnNumber
```
public int getColumnNumber(int offset)
```
    Return the column number corresponding to a given offset in the expression
    
    Parameters:
    
    offset - the byte offset in the expression
    
    Returns:
    
    the column number. Line and column numbers reported by the tokenizer start at zero.

Class Tokenizer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

FULL_WIDTH_LT

FULL_WIDTH_GT

NUL

DEFAULT_STATE

BARE_NAME_STATE

SEQUENCE_TYPE_STATE

OPERATOR_STATE

currentToken

currentTokenValue

currentTokenStartOffset

input

inputOffset

disallowUnionKeyword

isXQuery

languageLevel

allowSaxonExtensions

Constructor Detail

Tokenizer

Method Detail

getState

setState

tokenize

next

thereMightBeAnArrowAhead

treatCurrentAsOperator

lookAhead

nextChar

peekChar

incrementLineNumber

unreadChar

copyTo

getLineNumber

getColumnNumber

getLineNumber

getColumnNumber