Saxon.Api

 

 

Saxon.Api

Class DocumentBuilder


public class DocumentBuilder

The DocumentBuilder class enables XDM documents to be built from various sources. The class is always instantiated using the NewDocumentBuilder method on the Processor object.

Property Summary

 Uri BaseUri

The base URI of a document loaded using this DocumentBuilder. This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.

 XQueryExecutable DocumentProjectionQuery

Set a compiled query to be used for implementing document projection.

 bool DtdValidation

Determines whether DTD validation is applied to documents loaded using this DocumentBuilder.

 bool LineNumbering

Determines whether line numbering is enabled for documents loaded using this DocumentBuilder.

 SchemaValidationMode SchemaValidationMode

Determines whether schema validation is applied to documents loaded using this DocumentBuilder, and if so, whether it is strict or lax. If schema validation is requested and the document is not valid, then the Build method will fail with an exception.

 SchemaValidator SchemaValidator

Property to set and get the schemaValidator to be used. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place.

 QName TopLevelElementName

The required name of the top level element in a document instance being validated against a schema.

 TreeModel TreeModel

The Tree Model implementation to be used for the constructed document. By default the TinyTree is used. The main reason for using the LinkedTree alternative is if updating is required (the TinyTree is not updateable).

 WhitespacePolicy WhitespacePolicy

Determines the whitespace stripping policy applied when loading a document using this DocumentBuilder.

 ResourceResolver XmlDocumentResolver

An XmlDocumentResolver used to resolve any URI passed to the Build method.

 XmlResolver XmlResolver

A System.Xml.XmlResolver, which will be used to resolve references to external entities within XML documents being loaded (including any external DTD) when the DocumentBuilder allocates an XmlReader.

 

Method Summary

 XdmNode Build (Uri uri)

Load an XML document, retrieving it via a URI.

 XdmNode Build (Stream input)

Load an XML document supplied as raw (lexical) XML on a Stream.

 XdmNode Build (TextReader input)

Load an XML document supplied using a TextReader.

 XdmNode Build (XmlReader reader)

Load an XML document, delivered using an XmlReader.

 XdmNode Build (XContainer source)

Load an Linq document or element node, supplied as an XContainer, into a Saxon XdmNode.

 XdmNode Build (XmlNode source)

Load an XML DOM document, supplied as an XmlNode, into a Saxon XdmNode.

 XdmNode Wrap (XmlDocument doc)

Wrap an XML DOM document, supplied as an XmlDocument, as a Saxon XdmNode.

 XdmNode Wrap (XdmNode docWrapper, XmlNode node)

Wrap an XML DOM node (other than a document node), as a Saxon XdmNode.

 XdmNode Wrap (XDocument doc)

Wrap a Linq document node, supplied as an System.Linq.XDocument, as a Saxon XdmNode.

 XdmNode Wrap (XdmNode docWrapper, XNode node)

Wrap a Linq element node, as a Saxon XdmNode.

 

Property Detail

BaseUri

public Uri BaseUri {get; set; }

The base URI of a document loaded using this DocumentBuilder. This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.

This information is required when the document is loaded from a source that does not provide an intrinsic URI, notably when loading from a Stream or a TextReader.

DocumentProjectionQuery

public XQueryExecutable DocumentProjectionQuery {get; set; }

Set a compiled query to be used for implementing document projection.

The effect of using this option is that the tree constructed by the DocumentBuilder contains only those parts of the source document that are needed to answer this query. Running this query against the projected document should give the same results as against the raw document, but the projected document typically occupies significantly less memory. It is permissible to run other queries against the projected document, but unless they are carefully chosen, they will give the wrong answer, because the document being used is different from the original.

The query should be written to use the projected document as its initial context item. For example, if the query is //ITEM[COLOR='blue'], then only ITEM elements and their COLOR children will be retained in the projected document.

This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.

DtdValidation

public bool DtdValidation {get; set; }

Determines whether DTD validation is applied to documents loaded using this DocumentBuilder.

By default, no DTD validation takes place.

LineNumbering

public bool LineNumbering {get; set; }

Determines whether line numbering is enabled for documents loaded using this DocumentBuilder.

By default, line numbering is disabled.

Line numbering is not available for all kinds of source: in particular, it is not available when loading from an existing XmlDocument.

The resulting line numbers are accessible to applications using the extension function saxon:line-number() applied to a node.

Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element.

SchemaValidationMode

public SchemaValidationMode SchemaValidationMode {get; set; }

Determines whether schema validation is applied to documents loaded using this DocumentBuilder, and if so, whether it is strict or lax. If schema validation is requested and the document is not valid, then the Build method will fail with an exception.

By default, no schema validation takes place.

This option requires Saxon Enterprise Edition (Saxon-EE).

SchemaValidator

public SchemaValidator SchemaValidator {get; set; }

Property to set and get the schemaValidator to be used. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place.

If validation is requested using this mechanism, and the document is not valid, then no exception is raised; it is for the application to handle any invalidity reports from the SchemaValidator.

The supplied SchemaValidator is not actually used directly when a document is built (the {@link SchemaValidator#validate(Source)} method is never called). Rather, some of the properties of the SchemaValidator are used to control how schema validation is performed by the DocumentBuilder. The particular properties that take effect include:

  • The validation mode (strict or lax)
  • The required top-level element declaration (see {@link SchemaValidator#setDocumentElementName(QName)}
  • The required type of the top-level element (see {@link SchemaValidator#setDocumentElementTypeName(QName)}
  • The option {@link SchemaValidator#isUseXsiSchemaLocation()}
  • The option {@link SchemaValidator#isExpandAttributeDefaults()}
  • Validation parameters set using {@link SchemaValidator#setParameter}
  • The {@link net.sf.saxon.lib.InvalidityHandler}

Properties that currently do NOT have any effect include:

  • The option {@link SchemaValidator#isCollectStatistics()}

TopLevelElementName

public QName TopLevelElementName {get; set; }

The required name of the top level element in a document instance being validated against a schema.

If this property is set, and if schema validation is requested, then validation will fail unless the outermost element of the document has the required name.

This option requires the schema-aware version of the Saxon product (Saxon-EE).

TreeModel

public TreeModel TreeModel {get; set; }

The Tree Model implementation to be used for the constructed document. By default the TinyTree is used. The main reason for using the LinkedTree alternative is if updating is required (the TinyTree is not updateable).

WhitespacePolicy

public WhitespacePolicy WhitespacePolicy {get; set; }

Determines the whitespace stripping policy applied when loading a document using this DocumentBuilder.

By default, whitespace text nodes appearing in element-only content are stripped, and all other whitespace text nodes are retained.

If DTD or schema validation is applied, the only permitted setting is WhitespacePolicy#IGNORABLE. Any other value results in an exception from the Build() method.

XmlDocumentResolver

public ResourceResolver XmlDocumentResolver {get; set; }

An XmlDocumentResolver used to resolve any URI passed to the Build method.

If a resolver is supplied, it must take total responsibility for resolving all URIs; there is no fallback if it returns null or raises an error. If the resolver is to handle some URIs but delegate the handling of others, this can be achieved by creating a CommonResourceResolver and chaining a DirectResourceResolver.

XmlResolver

public XmlResolver XmlResolver {get; set; }

A System.Xml.XmlResolver, which will be used to resolve references to external entities within XML documents being loaded (including any external DTD) when the DocumentBuilder allocates an XmlReader.

If no XmlResolver is supplied, the ResourceResolver associated with the Saxon configuration is used (Configuration.getResourceResolver())

In Saxon releases prior to 11.1, the supplied XmlResolver was also used to resolve any relative URI passed to the DocumentBuilder.Build() method.

Method Detail

Build

public XdmNode Build(Uri uri)

Load an XML document, retrieving it via a URI.

Note that the type Uri requires an absolute URI.

The URI is dereferenced using the registered XmlResolver.

This method takes no account of any fragment part in the URI.

The role passed to the GetEntity method of the XmlResolver is "application/xml", and the required return type is System.IO.Stream.

The document located via the URI is parsed using the System.Xml parser.

Note that the Microsoft System.Xml parser does not report whether attributes are defined in the DTD as being of type ID and IDREF. This is true whether or not DTD-based validation is enabled. This means that such attributes are not accessible to the id() and idref() functions.

Parameters:

uri - The URI identifying the location where the document can be found. This will also be used as the base URI of the document (regardless of the setting of the BaseUri property).

Returns:

An XdmNode, the document node at the root of the tree of the resulting in-memory document.

Build

public XdmNode Build(Stream input)

Load an XML document supplied as raw (lexical) XML on a Stream.

The document is parsed using the Microsoft System.Xml parser.

Before calling this method, the BaseUri property should be set to identify the base URI of this document, used for resolving any relative URIs contained within it; if it has not been set, the current working directory is assumed.

Note that the Microsoft System.Xml parser does not report whether attributes are defined in the DTD as being of type ID and IDREF. This is true whether or not DTD-based validation is enabled. This means that such attributes are not accessible to the id() and idref() functions.

Parameters:

input - The Stream containing the XML source to be parsed. Closing this stream on completion is the responsibility of the caller.

Returns:

An XdmNode, the document node at the root of the tree of the resulting in-memory document.

Build

public XdmNode Build(TextReader input)

Load an XML document supplied using a TextReader.

The document is parsed using the Microsoft System.Xml parser.

Before calling this method, the BaseUri property should be set to identify the base URI of this document, used for resolving any relative URIs contained within it; if it has not been set, the current working directory is assumed.

Note that the Microsoft System.Xml parser does not report whether attributes are defined in the DTD as being of type ID and IDREF. This is true whether or not DTD-based validation is enabled. This means that such attributes are not accessible to the id() and idref() functions.

Parameters:

input - The TextReader containing the XML source to be parsed

Returns:

An XdmNode, the document node at the root of the tree of the resulting in-memory document.

Build

public XdmNode Build(XmlReader reader)

Load an XML document, delivered using an XmlReader.

The XmlReader is responsible for parsing the document; this method builds a tree representation of the document (in an internal Saxon format) and returns its document node. The XmlReader is not required to perform validation but it must expand any entity references. Saxon uses the properties of the XmlReader as supplied.

Use of a plain XmlTextReader is discouraged, because it does not expand entity references. This should only be used if you know in advance that the document will contain no entity references (or perhaps if your query or stylesheet is not interested in the content of text and attribute nodes). Instead, with .NET 1.1 use an XmlValidatingReader (with ValidationType set to None). The constructor for XmlValidatingReader is obsolete in .NET 2.0, but the same effect can be achieved by using the Create method of XmlReader with appropriate XmlReaderSettings.

The base URI of the resulting document is taken from the BaseURI property of the XmlReader if this is non-null and non-empty; otherwise it is taken from the BaseUri property of this DocumentBuilder.

Conformance with the W3C specifications requires that the Normalization property of an XmlTextReader should be set to true. However, Saxon does not insist on this.

If the XmlReader performs schema validation, Saxon will ignore any resulting type information. Type information can only be obtained by using Saxon's own schema validator, which will be run if the SchemaValidationMode property is set to Strict or Lax.

Note that the Microsoft System.Xml parser does not report whether attributes are defined in the DTD as being of type ID and IDREF. This is true whether or not DTD-based validation is enabled. This means that such attributes are not accessible to the id() and idref() functions.

Note that setting the XmlResolver property of the DocumentBuilder has no effect when this method is used; if an XmlResolver is required, it must be set on the XmlReader itself.

Parameters:

reader - The XMLReader that supplies the parsed XML source

Returns:

An XdmNode, the document node at the root of the tree of the resulting in-memory document.

Build

public XdmNode Build(XContainer source)

Load an Linq document or element node, supplied as an XContainer, into a Saxon XdmNode.

The returned document will contain only the subtree rooted at the supplied node.

This method copies the Linq tree to create a Saxon tree. See the Wrap method for an alternative that creates a wrapper around the Linq tree, allowing it to be modified in situ.

Parameters:

source - The Linq document or element node to be copied to form a Saxon tree

Returns:

An XdmNode, the document or element node corresponding to the supplied Linq node. If the supplied source was an XDocument node, the result will be an XDM document node; if it was an XElement node, it will be an XDM element node forming the outermost element of a tree whose root is an XDM document node.

Build

public XdmNode Build(XmlNode source)

Load an XML DOM document, supplied as an XmlNode, into a Saxon XdmNode.

The returned document will contain only the subtree rooted at the supplied node.

This method copies the DOM tree to create a Saxon tree. See the Wrap method for an alternative that creates a wrapper around the DOM tree, allowing it to be modified in situ.

Parameters:

source - The DOM Node to be copied to form a Saxon tree

Returns:

An XdmNode, the document node at the root of the tree of the resulting in-memory document.

Wrap

public XdmNode Wrap(XmlDocument doc)

Wrap an XML DOM document, supplied as an XmlDocument, as a Saxon XdmNode.

This method must be applied at the level of the Document Node. Unlike the Build method, the original DOM is not copied. This saves memory and time, but it also means that it is not possible to perform operations such as whitespace stripping and schema validation.

Parameters:

doc - The DOM document node to be wrapped

Returns:

An XdmNode, the Saxon document node at the root of the tree of the resulting in-memory document.

Wrap

public XdmNode Wrap(XdmNode docWrapper,
XmlNode node)

Wrap an XML DOM node (other than a document node), as a Saxon XdmNode.

Parameters:

docWrapper - The wrapper for the containing DOM document node
node - The DOM node containing the node to be wrapped

Returns:

An XdmNode, wrapping the supplied DOM node

Wrap

public XdmNode Wrap(XDocument doc)

Wrap a Linq document node, supplied as an System.Linq.XDocument, as a Saxon XdmNode.

This method must be applied at the level of the Document Node. Unlike the Build method, the original tree is not copied. This saves memory and time, but it also means that it is not possible to perform operations such as whitespace stripping and schema validation.

Parameters:

doc - The Linq document node to be wrapped

Returns:

An XdmNode, the Saxon document node at the root of the tree of the resulting in-memory document.

Wrap

public XdmNode Wrap(XdmNode docWrapper,
XNode node)

Wrap a Linq element node, as a Saxon XdmNode.

Parameters:

docWrapper - The wrapper of the containing XDocument node
node - The Linq element node containing the node to be wrapped

Returns:

An XdmNode, wrapping the supplied Linq element node