Saxon.Api
Class DocumentBuilder
-
public class DocumentBuilder
The DocumentBuilder
class enables XDM documents to be built from various sources.
The class is always instantiated using the NewDocumentBuilder
method
on the Processor
object.
Property Summary |
|
---|---|
Uri | BaseUri
The base URI of a document loaded using this |
XQueryExecutable | DocumentProjectionQuery Set a compiled query to be used for implementing document projection. |
bool | DtdValidation
Determines whether DTD validation is applied to documents loaded using this
|
bool | LineNumbering
Determines whether line numbering is enabled for documents loaded using this
|
SchemaValidationMode | SchemaValidationMode
Determines whether schema validation is applied to documents loaded using this
|
SchemaValidator | SchemaValidator Property to set and get the schemaValidator to be used. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place. |
QName | TopLevelElementName The required name of the top level element in a document instance being validated against a schema. |
TreeModel | TreeModel
The Tree Model implementation to be used for the constructed document. By default
the |
WhitespacePolicy | WhitespacePolicy
Determines the whitespace stripping policy applied when loading a document
using this |
ResourceResolver | XmlDocumentResolver
An XmlDocumentResolver used to resolve any URI passed to the |
XmlResolver | XmlResolver
A |
Method Summary |
|
---|---|
XdmNode | Build (Uri uri) Load an XML document, retrieving it via a URI. |
XdmNode | Build (Stream input)
Load an XML document supplied as raw (lexical) XML on a |
XdmNode | Build (TextReader input)
Load an XML document supplied using a |
XdmNode | Build (XmlReader reader)
Load an XML document, delivered using an |
XdmNode | Build (XContainer source)
Load an Linq document or element node, supplied as an |
XdmNode | Build (XmlNode source)
Load an XML DOM document, supplied as an |
XdmNode | Wrap (XmlDocument doc)
Wrap an XML DOM document, supplied as an |
XdmNode | Wrap (XdmNode docWrapper, XmlNode node)
Wrap an XML DOM node (other than a document node), as a Saxon |
XdmNode | Wrap (XDocument doc)
Wrap a Linq document node, supplied as an |
XdmNode | Wrap (XdmNode docWrapper, XNode node)
Wrap a Linq element node, as a Saxon |
Property Detail
BaseUri
The base URI of a document loaded using this DocumentBuilder
.
This is used for resolving any relative URIs appearing
within the document, for example in references to DTDs and external entities.
This information is required when the document is loaded from a source that does not
provide an intrinsic URI, notably when loading from a Stream
or a TextReader
.
DocumentProjectionQuery
Set a compiled query to be used for implementing document projection.
The effect of using this option is that the tree constructed by the
DocumentBuilder
contains only those parts
of the source document that are needed to answer this query. Running this query against
the projected document should give the same results as against the raw document, but
the
projected document typically occupies significantly less memory. It is permissible
to run
other queries against the projected document, but unless they are carefully chosen,
they
will give the wrong answer, because the document being used is different from the
original.
The query should be written to use the projected document as its initial context item.
For example, if the query is //ITEM[COLOR='blue']
, then only ITEM
elements and their COLOR
children will be retained in the projected document.
This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.
DtdValidation
Determines whether DTD validation is applied to documents loaded using this
DocumentBuilder
.
By default, no DTD validation takes place.
LineNumbering
Determines whether line numbering is enabled for documents loaded using this
DocumentBuilder
.
By default, line numbering is disabled.
Line numbering is not available for all kinds of source: in particular,
it is not available when loading from an existing XmlDocument
.
The resulting line numbers are accessible to applications using the
extension function saxon:line-number()
applied to a node.
Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element.
SchemaValidationMode
Determines whether schema validation is applied to documents loaded using this
DocumentBuilder
, and if so, whether it is strict or lax. If schema validation
is requested and the document is not valid, then the Build
method will fail
with an exception.
By default, no schema validation takes place.
This option requires Saxon Enterprise Edition (Saxon-EE).
SchemaValidator
Property to set and get the schemaValidator to be used. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place.
If validation is requested using this mechanism, and the document is not valid, then no exception is raised; it is for the application to handle any invalidity reports from the SchemaValidator.
The supplied SchemaValidator
is not actually used directly when a document is built
(the {@link SchemaValidator#validate(Source)}
method is never called). Rather, some of the properties of the SchemaValidator
are used to control
how schema validation is performed by the DocumentBuilder
. The particular properties
that take effect include:
- The validation mode (strict or lax)
- The required top-level element declaration (see {@link SchemaValidator#setDocumentElementName(QName)}
- The required type of the top-level element (see {@link SchemaValidator#setDocumentElementTypeName(QName)}
- The option {@link SchemaValidator#isUseXsiSchemaLocation()}
- The option {@link SchemaValidator#isExpandAttributeDefaults()}
- Validation parameters set using {@link SchemaValidator#setParameter}
- The {@link net.sf.saxon.lib.InvalidityHandler}
Properties that currently do NOT have any effect include:
- The option {@link SchemaValidator#isCollectStatistics()}
TopLevelElementName
The required name of the top level element in a document instance being validated against a schema.
If this property is set, and if schema validation is requested, then validation will fail unless the outermost element of the document has the required name.
This option requires the schema-aware version of the Saxon product (Saxon-EE).
TreeModel
The Tree Model implementation to be used for the constructed document. By default
the TinyTree
is used. The main reason for using the LinkedTree
alternative is if
updating is required (the TinyTree
is not updateable).
WhitespacePolicy
Determines the whitespace stripping policy applied when loading a document
using this DocumentBuilder
.
By default, whitespace text nodes appearing in element-only content are stripped, and all other whitespace text nodes are retained.
If DTD or schema validation is applied, the only permitted setting
is WhitespacePolicy#IGNORABLE
. Any other value results
in an exception from the Build()
method.
XmlDocumentResolver
An XmlDocumentResolver used to resolve any URI passed to the Build
method.
If a resolver is supplied, it must take total responsibility for resolving all URIs;
there is no fallback if it returns null or raises an error. If the resolver is to
handle
some URIs but delegate the handling of others, this can be achieved by creating a
CommonResourceResolver
and chaining a DirectResourceResolver
.
XmlResolver
A System.Xml.XmlResolver
, which will be used to resolve references to external entities
within XML documents being loaded (including any external DTD) when the DocumentBuilder
allocates an XmlReader
.
If no XmlResolver
is supplied, the ResourceResolver
associated with the
Saxon configuration is used (Configuration.getResourceResolver()
)
In Saxon releases prior to 11.1, the supplied XmlResolver
was also used to
resolve any relative URI passed to the DocumentBuilder.Build()
method.
Method Detail
Build
Load an XML document supplied as raw (lexical) XML on a Stream
.
The document is parsed using the Microsoft System.Xml
parser.
Before calling this method, the BaseUri
property should be set to identify the
base URI of this document, used for resolving any relative URIs contained within it;
if it has not been set, the current working directory is assumed.
Note that the Microsoft System.Xml
parser does not report whether attributes are
defined in the DTD as being of type ID
and IDREF
. This is true whether or not
DTD-based validation is enabled. This means that such attributes are not accessible
to the
id()
and idref()
functions.
Parameters:
input
- The Stream
containing the XML source to be parsed. Closing this stream
on completion is the responsibility of the caller.Returns:
XdmNode
, the document node at the root of the tree of the resulting
in-memory document.
Build
Load an XML document supplied using a TextReader
.
The document is parsed using the Microsoft System.Xml
parser.
Before calling this method, the BaseUri
property should be set to identify the
base URI of this document, used for resolving any relative URIs contained within it;
if it has not been set, the current working directory is assumed.
Note that the Microsoft System.Xml
parser does not report whether attributes are
defined in the DTD as being of type ID
and IDREF
. This is true whether or not
DTD-based validation is enabled. This means that such attributes are not accessible
to the
id()
and idref()
functions.
Parameters:
input
- The TextReader
containing the XML source to be parsedReturns:
XdmNode
, the document node at the root of the tree of the resulting
in-memory document.
Build
Load an XML document, delivered using an XmlReader
.
The XmlReader
is responsible for parsing the document; this method builds a tree
representation of the document (in an internal Saxon format) and returns its document
node.
The XmlReader
is not required to perform validation but it must expand any entity references.
Saxon uses the properties of the XmlReader
as supplied.
Use of a plain XmlTextReader
is discouraged, because it does not expand entity
references. This should only be used if you know in advance that the document will
contain
no entity references (or perhaps if your query or stylesheet is not interested in
the content
of text and attribute nodes). Instead, with .NET 1.1 use an XmlValidatingReader
(with ValidationType
set to None
). The constructor for XmlValidatingReader
is obsolete in .NET 2.0,
but the same effect can be achieved by using the Create
method of XmlReader
with
appropriate XmlReaderSettings
.
The base URI of the resulting document is taken from the BaseURI
property
of the XmlReader
if this is non-null and non-empty; otherwise it is taken from the
BaseUri
property of this DocumentBuilder
.
Conformance with the W3C specifications requires that the Normalization
property
of an XmlTextReader
should be set to true
. However, Saxon does not insist
on this.
If the XmlReader
performs schema validation, Saxon will ignore any resulting type
information. Type information can only be obtained by using Saxon's own schema validator,
which
will be run if the SchemaValidationMode
property is set to Strict
or Lax
.
Note that the Microsoft System.Xml
parser does not report whether attributes are
defined in the DTD as being of type ID
and IDREF
. This is true whether or not
DTD-based validation is enabled. This means that such attributes are not accessible
to the
id()
and idref()
functions.
Note that setting the XmlResolver
property of the DocumentBuilder
has no effect when this method is used; if an XmlResolver
is required, it must
be set on the XmlReader
itself.
Parameters:
reader
- The XMLReader
that supplies the parsed XML sourceReturns:
XdmNode
, the document node at the root of the tree of the resulting
in-memory document.
Build
Load an Linq document or element node, supplied as an XContainer
, into a Saxon XdmNode
.
The returned document will contain only the subtree rooted at the supplied node.
This method copies the Linq tree to create a Saxon tree. See the Wrap
method for
an alternative that creates a wrapper around the Linq tree, allowing it to be modified
in situ.
Parameters:
source
- The Linq document or element node to be copied to form a Saxon treeReturns:
XdmNode
, the document or element node corresponding to the supplied
Linq node. If the supplied source was an XDocument node, the result will be an XDM
document node;
if it was an XElement node, it will be an XDM element node forming the outermost element
of
a tree whose root is an XDM document node.
Build
Load an XML DOM document, supplied as an XmlNode
, into a Saxon XdmNode
.
The returned document will contain only the subtree rooted at the supplied node.
This method copies the DOM tree to create a Saxon tree. See the Wrap
method for
an alternative that creates a wrapper around the DOM tree, allowing it to be modified
in situ.
Parameters:
source
- The DOM Node to be copied to form a Saxon treeReturns:
XdmNode
, the document node at the root of the tree of the resulting
in-memory document.
Wrap
Wrap an XML DOM document, supplied as an XmlDocument
, as a Saxon XdmNode
.
This method must be applied at the level of the Document Node. Unlike the
Build
method, the original DOM is not copied. This saves memory and
time, but it also means that it is not possible to perform operations such as
whitespace stripping and schema validation.
Parameters:
doc
- The DOM document node to be wrappedReturns:
XdmNode
, the Saxon document node at the root of the tree of the resulting
in-memory document.
Wrap
Wrap an XML DOM node (other than a document node), as a Saxon XdmNode
.
Parameters:
docWrapper
- The wrapper for the containing DOM document nodenode
- The DOM node containing the node to be wrappedReturns:
XdmNode
, wrapping the supplied DOM node
Wrap
Wrap a Linq document node, supplied as an System.Linq.XDocument
, as a Saxon XdmNode
.
This method must be applied at the level of the Document Node. Unlike the
Build
method, the original tree is not copied. This saves memory and
time, but it also means that it is not possible to perform operations such as
whitespace stripping and schema validation.
Parameters:
doc
- The Linq document node to be wrappedReturns:
XdmNode
, the Saxon document node at the root of the tree of the resulting
in-memory document.
Load an XML document, retrieving it via a URI.
Note that the type
Uri
requires an absolute URI.The URI is dereferenced using the registered
XmlResolver
.This method takes no account of any fragment part in the URI.
The
role
passed to theGetEntity
method of theXmlResolver
is "application/xml", and the required return type isSystem.IO.Stream
.The document located via the URI is parsed using the
System.Xml
parser.Note that the Microsoft
System.Xml
parser does not report whether attributes are defined in the DTD as being of typeID
andIDREF
. This is true whether or not DTD-based validation is enabled. This means that such attributes are not accessible to theid()
andidref()
functions.Parameters:
uri
- The URI identifying the location where the document can be found. This will also be used as the base URI of the document (regardless of the setting of theBaseUri
property).Returns:
XdmNode
, the document node at the root of the tree of the resulting in-memory document.