SGML is a metalanguage: that is, a language for writing languages in.
This is not actually as arcane as it sounds. It simply means that you use SGML to define the abstract structure of a document type (the Document Type Definition, or DTD), so that any documents which claim to be of that type must have a certain syntactic structure.
HTML is a well known example of an SGML DTD, whose rules are well
known. An HTML document consists of precisely one head
and one
body
element. The head
must have precisely one
title
element, and may have zero or more link
elements.
The title
element has simple characters as content. The
link
element has no content, but has four optional
attributes..., and so on.
Given a document marked up in some specific DTD it will be parsed by some tool which reads first the DTD then the document, and then creates some abstract representation of the document which it passes to a formatter, which in turn produces output in some form which may, of course, be further processed to a final document. The SGML parser and whichever editor you use are both quite generic, but the formatter is tied to a particular DTD. There is a diagram of this system in Figure 1.
![]() |
Figure 1: An SGML system |
For further information on SGML, see Appendix A. For other texts, see the well-known Gentle Introduction to SGML[gentle]. For more detailed information, see the useful but compressed [bradley]; and for an authoritative account see [goldfarb], which is an exegesis of the standard, [iso8879].