XML briefing | |
Author | Norman Gray |
Release date | 15 June 1999 |
Starlink Project: CCLRC / Rutherford Appleton Laboratory / PPARC |
Abstract:
This is a short briefing on XML, and its relationship with SGML. It is intended as a brief overview, and pointer to more detailed resources.
The primary XML resources are
<http://www.w3.org/XML>
for the W3C's XML spec
<http://www.ucc.ie/xml/>
for the XML FAQ
<http://www.xml.com/xml/pub/axml/axmlintro.html>
for the
annotated spec
I also have a collection of pointers with links to other important resources.
An SGML document consists of an `SGML declaration' which sets various options, a `document type definition' (DTD) which establishes the syntax of a document type, and a `document instance', which is the actual document.
XML has a single, fixed, SGML declaration, which sets most of the SGML options to `off'. For example, in XML all element names are case sensitive, there is no tag omission, there are some restrictions on the possible syntaxes expressible by the DTD, and more exotic features such as SUBDOC are forbidden. For a more detailed discussion of the differences, see appendix A of the spec.
This means that parsers are easy to write, and there are numerous such parsers available for free.
XML dispenses with the SGML declaration; it can dispense with a DTD as well. XML introduces the notion of `well-formed' versus `valid' documents.
If a document has all closing tags present, and all elements properly nested, and starts with the declaration
and empty elements are written
<?xml version="1.0" standalone="yes"?>
<empty/>
, then it is `well-formed', and may
be processed in the absence of a DTD.A file which has a DTD and which conforms to it (which will also be well-formed), is `valid'. It may optionally also begin with the XML declaration
<?xml version="1.0"?>
That is, a valid XML document is also a conforming SGML document. This has been made possible by recent subtle, technical, changes to the SGML standard.
The latter has come about because there has been close cooperation between the developers of XML and the wider SGML community. That is, XML is fully legit as SGML.
XML is a real standard (well, there are HTML standards, but noone pays any attention to them).
HTML has a fixed element set, and associates fixed semantics with those elements. XML has neither restriction.
XLink is a draft specification for links in XML. It's closely related to the hyperlinks module of HyTime.
XPointer is a draft specification for location specifiers in XML, so that you can refer, for example, to `the second section beneath the element with id so-and-so'. As with XLink, it's closely related to HyTime.
XSL are style sheets for XML. These are vital if XML is to be readable when it is served over the web (because it doesn't have the fixed semantics HTML has, XML rendering can't be left entirely to a browser).
The Document Object Model (DOM) is `a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents' [from the spec]. It's a simple set of O-O declarations for querying and manipulating XML documents in simple ways (small subset of DSSSL).
HyTime is a very high-level standard for associating semantics with SGML DTDs.
DSSSL is the Document Style and Semantics Specification Language. It's a language for writing stylesheets in. Both HyTime and DSSSL are specific to SGML, but have informed the other standards above.
SGML technology will work with XML (as long as it conforms to the minor technical corrigenda mentioned above)
Because XML is much easier to parse than fully general SGML, it is much easier to produce parsers for it. It is therefore very likely that we will soon see many XML editors and XML-aware browsers in the months to come.
We should also see XML-aware search engines, potentially finally realising the possibilities offered by hightly structured information storage and retrieval.
The development of MathML should help see maths on the internet
END |
Norman Gray
3 February 1998