Next Up Previous Contents
Next: 2.2 What is SGML?
Up: 2 SGML Overview
Previous: 2 SGML Overview
[ID index][Keyword index]

2.1 Why SGML?

The problem SGML tries to solve is that humans are good at intuiting structure from layout, but computers are exceedingly bad at it. This means that if we want computers to help us with documents - displaying them, transforming them, storing them, searching them - we have to give the computer some help.

A common way of providing this help is through markup systems such as runoff (or its variants), or TeX, or LaTeX. The last of these three has served Starlink very well, and even survived the conversion of Starlink's document set to hypertext, through the tool Star2HTML (see SUN/199), but the problems of the LaTeX system illustrate the advantages of the SGML approach quite well.

LaTeX templates are ill-defined
What is a Starlink document? Although Starlink distributes templates for the principal document types (for example in /star/docs/sun.tex), and distributes style guides such as SGP/28, there is no way either of mandating that certain elements, such as a document number, actually appear, nor of ensuring that other features, such as raw TeX, do not. This means that document authors can never quite be sure that they have produced a `correct' document; that, since the effective definition of `correct' is `processable by this tool', there is no guarantee that a document which is correct at one time will continue to be so; and that tools which search or process such documents have to have contingency strategies available, when the elements they hope to use are missing.
The conversion is a hack!
Because the input documents are not well-defined, the tools which process them cannot be designed against any standard, but instead have to rely on heuristics and uncheckable conventions. LaTeX documents are defined to work with one particular tool - the TeX tokeniser and parser - so that using them with any other tool (which is necessary if you want to produce anything other than DVI files[Note 2]) is asking for trouble.
Authors have to know too much.
When you are writing a Star2HTML document, you are riding two horses at once; you must write in that subset of LaTeXwhich Star2HTML knows about. Authors must frequently be sensitive to the treatment of their document by two separate systems.
Authors can know too much!
Part of the strength of TeX and LaTeX is their flexibility, and it is easy for authors, especially programmer authors, to exploit this when writing documentation, and a burst of TeX macro magic can be an effective antidote to demotivation during the longeurs of preparing documentation. However much this flexibility may assist (or even amuse) authors, it inflicts parsing hell on anyone else who wants to do something unforseen with the document. If you restrict what an author can do, then you limit what a parser has to do to repurpose the document.
Not future-proof
LaTeX is not precisely defined; on the contrary, it is continually being developed, and a completely new version, LaTeX3 is eagerly anticipated (which will, incidentally, be heavily influenced by SGML). Similarly, the LaTeX2HTML package is continually being upgraded, causing a Star2HTML (based on LaTeX2HTML) maintenance headache.

None of these problems is fatal - all of them have manifestly been overcome by the authors and maintainers of the current document set - but taken together they make the maintenance of the document set more expensive in time than there is time to spare.

SGML addresses these problems effectively.

SGML documents are well-defined
The structure of an SGML document is formally specified in the Document Type Definition (DTD) associated with it. This specifies what features must, which may, and which may not, be present in a document. This means that systems processing the document need work only within a much smaller universe of possibilities.
Future-proof
SGML has had only two backward-compatible amendments since 1986; it's not dependent on any particular tool; SGML is designed to be used decades hence; a sufficient number of hangarfuls of technical documentation have been produced using SGML that if and when a replacement comes along, it will have good support.

The most eligible candidite for a replacement for SGML is XML. XML is a cut-down version of SGML, omitting rarely-used or dispensable features, but preserving many of its strengths (Section 8.2). The only serious disadvantage of XML, from our point of view, is its verbosity, since it has discarded SGML's markup minimisation features (Section 3.4), and so can be tediously verbose to type.


Next Up Previous Contents
Next: 2.2 What is SGML?
Up: 2 SGML Overview
Previous: 2 SGML Overview
[ID index][Keyword index]
The Starlink SGML Set
Starlink System Note 70
Norman Gray, Mark Taylor
21 April 1999. Release DR-0.7-13. Last updated 24 August 2001