public class REMatcher extends Object
RE r = new RE("a*b");Once you have done this, you can call either of the RE.match methods to perform matching on a String. For example:
boolean matched = r.match("aaaab");will cause the boolean matched to be set to true because the pattern "a*b" matches the string "aaaab". If you were interested in the number of a's which matched the first part of our example expression, you could change the expression to "(a*)b". Then when you compiled the expression and matched it against something like "xaaaab", you would get results like this:
RE r = new RE("(a*)b"); // Compile expression boolean matched = r.match("xaaaab"); // Match against "xaaaab" String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab' String insideParens = r.getParen(1); // insideParens will be 'aaaa' int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1 int endWholeExpr = r.getParenEnd(0); // endWholeExpr will be index 6 int lenWholeExpr = r.getParenLength(0); // lenWholeExpr will be 5 int startInside = r.getParenStart(1); // startInside will be index 1 int endInside = r.getParenEnd(1); // endInside will be index 5 int lenInside = r.getParenLength(1); // lenInside will be 4You can also refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:
([0-9]+)=\1will match any string of the form n=n (like 0=0 or 2=2). The full regular expression syntax accepted by RE is as defined in the XSD 1.1 specification, modified by the XPath 2.0 or 3.0 specifications. Line terminators
// Pre-compiled regular expression "a*b" char[] re1Instructions = { 0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000d, 0x0041, 0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047, 0x0000, 0xfff6, 0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000, 0x0000, }; REProgram re1 = new REProgram(re1Instructions);You can then construct a regular expression matcher (RE) object from the pre-compiled expression re1 and thus avoid the overhead of compiling the expression at runtime. If you require more dynamic regular expressions, you can construct a single RECompiler object and re-use it to compile each expression. Similarly, you can change the program run by a given matcher object at any time. However, RE and RECompiler are not threadsafe (for efficiency reasons, and because requiring thread safety in this class is deemed to be a rare requirement), so you will need to construct a separate compiler or matcher object for each thread (unless you do thread synchronization yourself). Once expression compiled into the REProgram object, REProgram can be safely shared across multiple threads and RE objects.
This library is based on the Apache Jakarta regex library as downloaded on 3 January 2012. Changes have been made to make the grammar and semantics conform to XSD and XPath rules; these changes are listed in source code comments in the RECompiler source code module.
RECompiler
Constructor and Description |
---|
REMatcher(REProgram program)
Construct a matcher for a pre-compiled regular expression from program
(bytecode) data.
|
Modifier and Type | Method and Description |
---|---|
boolean |
anchoredMatch(UnicodeString search)
Tests whether the regex matches a string in its entirety, anchored
at both ends
|
UnicodeString |
getParen(int which)
Gets the contents of a parenthesized subexpression after a successful match.
|
int |
getParenCount()
Returns the number of parenthesized subexpressions available after a successful match.
|
int |
getParenEnd(int which)
Returns the end index of a given paren level.
|
int |
getParenStart(int which)
Returns the start index of a given paren level.
|
REProgram |
getProgram()
Returns the current regular expression program in use by this matcher object.
|
boolean |
match(String search)
Matches the current regular expression program against a String.
|
boolean |
match(UnicodeString search,
int i)
Matches the current regular expression program against a character array,
starting at a given index.
|
protected boolean |
matchAt(int i,
boolean anchored)
Match the current regular expression program against the current
input string, starting at index i of the input string.
|
CharSequence |
replace(UnicodeString in,
UnicodeString replacement)
Substitutes a string for this regular expression in another string.
|
protected void |
setParenEnd(int which,
int i)
Sets the end of a paren level
|
protected void |
setParenStart(int which,
int i)
Sets the start of a paren level
|
void |
setProgram(REProgram program)
Sets the current regular expression program used by this matcher object.
|
List<UnicodeString> |
split(UnicodeString s)
Splits a string into an array of strings on regular expression boundaries.
|
public REMatcher(REProgram program)
program
- Compiled regular expression programRECompiler
public void setProgram(REProgram program)
program
- Regular expression program compiled by RECompiler.RECompiler
,
REProgram
public REProgram getProgram()
setProgram(net.sf.saxon.regex.REProgram)
public int getParenCount()
public UnicodeString getParen(int which)
which
- Nesting level of subexpressionpublic final int getParenStart(int which)
which
- Nesting level of subexpressionpublic final int getParenEnd(int which)
which
- Nesting level of subexpressionprotected final void setParenStart(int which, int i)
which
- Which paren leveli
- Index in input arrayprotected final void setParenEnd(int which, int i)
which
- Which paren leveli
- Index in input arrayprotected boolean matchAt(int i, boolean anchored)
i
- The input string index to start matching atanchored
- true if the regex must match all characters up to the end of the stringpublic boolean anchoredMatch(UnicodeString search)
search
- the string to be matchedpublic boolean match(UnicodeString search, int i)
search
- String to match againsti
- Index to start searching atpublic boolean match(String search)
search
- String to match againstpublic List<UnicodeString> split(UnicodeString s)
Please note that the first string in the resulting array may be an empty string. This happens when the very first character of input string is matched by the pattern.
s
- String to split on this regular exressionpublic CharSequence replace(UnicodeString in, UnicodeString replacement)
in
- String to substitute withinreplacement
- String to substitute for matches of this regular expressionCopyright (c) 2004-2014 Saxonica Limited. All rights reserved.