net.sf.saxon.java
Class JDK14RegexTranslator

java.lang.Object
  extended by net.sf.saxon.regex.RegexTranslator
      extended by net.sf.saxon.regex.SurrogateRegexTranslator
          extended by net.sf.saxon.java.JDK14RegexTranslator

public class JDK14RegexTranslator
extends SurrogateRegexTranslator

This class translates XML Schema regex syntax into JDK 1.4 regex syntax. Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file. Modified by Michael Kay (a) to integrate the code into Saxon, and (b) to support XPath additions to the XML Schema regex syntax.

This version of the regular expression translator treats each half of a surrogate pair as a separate character, translating anything in an XPath regex that can match a non-BMP character into a Java regex that matches the two halves of a surrogate pair independently. This approach doesn't work under JDK 1.5, whose regex engine treats a surrogate pair as a single character.


Nested Class Summary
 
Nested classes/interfaces inherited from class net.sf.saxon.regex.SurrogateRegexTranslator
SurrogateRegexTranslator.BackReference, SurrogateRegexTranslator.CharClass, SurrogateRegexTranslator.CharRange, SurrogateRegexTranslator.Complement, SurrogateRegexTranslator.Dot, SurrogateRegexTranslator.Empty, SurrogateRegexTranslator.Property, SurrogateRegexTranslator.SimpleCharClass, SurrogateRegexTranslator.SingleChar, SurrogateRegexTranslator.WideSingleChar
 
Nested classes/interfaces inherited from class net.sf.saxon.regex.RegexTranslator
RegexTranslator.Range
 
Field Summary
 
Fields inherited from class net.sf.saxon.regex.SurrogateRegexTranslator
categoryCharClasses, subCategoryCharClasses
 
Fields inherited from class net.sf.saxon.regex.RegexTranslator
ALL, captures, caseBlind, curChar, currentCapture, eos, ignoreWhitespace, inCharClassExpr, isXPath, length, NONE, NOT_ALLOWED_CLASS, pos, regExp, result, SOME, SURROGATES1_CLASS, SURROGATES2_CLASS, xmlVersion
 
Constructor Summary
JDK14RegexTranslator()
          Create a regex translator for JDK 1.4
 
Method Summary
static void main(java.lang.String[] args)
          Diagnostic entry point
 void setIgnoreWhitespace(boolean ignore)
          Indicate whether whitespace should be ignored
 java.lang.String translate(java.lang.CharSequence regExp, int xmlVersion, boolean xpath)
          Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern.
protected  boolean translateAtom()
           
 
Methods inherited from class net.sf.saxon.regex.RegexTranslator
absorbSurrogatePair, advance, copyCurChar, expect, highSurrogateRanges, isAsciiAlnum, isBlock, isJavaMetaChar, lowSurrogateRanges, makeException, makeException, parseQuantExact, recede, sortRangeList, translateBranch, translateQuantifier, translateQuantity, translateRegExp, translateTop
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JDK14RegexTranslator

public JDK14RegexTranslator()
Create a regex translator for JDK 1.4

Method Detail

setIgnoreWhitespace

public void setIgnoreWhitespace(boolean ignore)
Indicate whether whitespace should be ignored

Parameters:
ignore - true if whitespace should be ignored

translate

public java.lang.String translate(java.lang.CharSequence regExp,
                                  int xmlVersion,
                                  boolean xpath)
                           throws RegexSyntaxException
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern. The translation assumes that the string to be matched against the regex uses surrogate pairs correctly. If the string comes from XML content, a conforming XML parser will automatically check this; if the string comes from elsewhere, it may be necessary to check surrogate usage before matching.

Parameters:
regExp - a String containing a regular expression in the syntax of XML Schemas Part 2
xmlVersion - integer constant indicating XML 1.0 or XML 1.1
xpath - a boolean indicating whether the XPath 2.0 F+O extensions to the schema regex syntax are permitted
Returns:
a String containing a regular expression in the syntax of java.util.regex.Pattern
Throws:
RegexSyntaxException - if regexp is not a regular expression in the syntax of XML Schemas Part 2, or XPath 2.0, as appropriate
See Also:
Pattern, XML Schema Part 2

translateAtom

protected boolean translateAtom()
                         throws RegexSyntaxException
Specified by:
translateAtom in class RegexTranslator
Throws:
RegexSyntaxException

main

public static void main(java.lang.String[] args)
                 throws RegexSyntaxException
Diagnostic entry point

Parameters:
args - argument 1 - XPath regex; argument 2 - xpath|xmlschema
Throws:
RegexSyntaxException


Copyright (c) Saxonica Limited. All rights reserved.