Saxonica: Introduction to XML

Introduction to XML

XML is for publishing, XML is for exchanging data

XML is a notation that represents the logical structure of a document by using a series of tags (known as markup) within the document. These make it easier to automate the processing of the content of the document. It is used both for narrative documents (such as web pages) and for data messages (such as financial transactions).

XML is an open standard, developed by the W3C (World Wide Web Consortium) and adopted by all major software vendors as a standard for structured information exchange.

XML allows document authors to choose their own tags, appropriate to the kind of information represented in the document. An XML Schema (itself a type of XML document) defines your XML vocabulary i.e. the tags that are used and the rules for what can appear within each tag.

In effect the tags enable the document to be self-describing – see below for a simple example of XML markup.

<?xml version="1.0" encoding="UTF-8"?>
    <books>
        <book category="reference">
            <publisher>National Geographic</publisher>
            <title>The National Geographic Atlas of the World</title>
            <price>40.00</price>
            <format>hardback</format>
        </book>
        <book category="fiction">
            <author>J. K. Rowling</author>
            <title>Harry Potter and the Philosopher's Stone</title>
            <price>6.99</price>
            <format>paperback</format>
        </book>
    </books>

XML is now widely used both as a format for maintaining published documents, and as a protocol for exchanging data between applications, often between different organizations. It became successful because it can handle information of arbitrary complexity, yet at the same time it is essentially a very simple (and therefore inexpensive) technology.

The rules for particular kinds of XML documents can be written and agreed by means of an XML Schema, allowing errors to be detected automatically, and because XML documents are human-readable, it is very easy to find and resolve problems when they occur.

XML is also easy to integrate into business applications, thanks to the wide availability of XML parsers on popular platforms such as Java and .NET, and the increasing number of software packages (from relational databases to office desktop applications) that are now XML-enabled "out of the box".