The discussion in Section 3 should be enough to let you produce your own documents but, now or in the future, you may find it useful to be able to read the DTD directly.[Note 11] Once you are familiar with the underlying ideas, the expression of them in the DTD turns out to be agreeably compact and reasonably readable.
My account of the DTD syntax will be rather compressed - see [gentle], or the other references in Section 2.2 for alternatives.
A simple HTML-like DTD could be declared as follows:
<!ELEMENT html O O (head, body, copyright?)>
<!ELEMENT head O O (title & link*)>
<!ELEMENT title - - (#PCDATA)>
<!ELEMENT link - O EMPTY>
<!ELEMENT body O O (p | dl)+>
<!ELEMENT p - O (#PCDATA)>
<!ELEMENT dl - - (dt, dd)+>
<!ELEMENT (dt|dd) - O (#PCDATA)>
<!ELEMENT copyright - - (#PCDATA)>
<!ENTITY % URL "CDATA"
-- The term URL means a CDATA attribute
whose value is a Uniform Resource Locator,
See RFC1808 (June 95) and RFC1738 (Dec 94).
-->
<!ATTLIST link
href %URL #REQUIRED -- URL for linked resource --
rel (next | prev) #IMPLIED -- reverse link types --
>
<!ENTITY amp "&">
And here is a simple document which uses this DTD:
This displays most of the important syntactical features in an SGML DTD, so if we explain it line-by-line, it should illustrate the features you need to make some sense of most DTDs.<link href="http://www.astro.gla.ac.uk/users/norman/" rel=next> <title>This is a title</title> <p>And here is a paragraph <dl> <dt>With a delimited list <dd>Correctly formed & OK </dl>
This `element declaration' declares the<!ELEMENT html O O (head, body, copyright?)>
html element. The element type name is followed
by a statement of whether the start and end tags may be omitted if the
parser can infer their presence. The minimisation specifications may
be either `-' (minus), indicating that the corresponding tag is
required, or `O' (letter O), indicating that it may be omitted.
Following this is the `content model' which, in this case, states that
the html element must consist of one head, one
body, and an optional copyright, in that order - the
comma connecting the element
names specifies that they must be in order, and the question mark
following the copyright element indicates that it may occur zero or
one times.
The omission of the start element is possible in this case, since the
first element in the html element must be a head
element, so whenever the parser finds a head element, it can
know that the html element has begun.So what is in the head element?
The<!ELEMENT head O O (title & link*)>
head element consists of precisely one title, and zero or
more link elements, in either order. The head tags
can be inferred from the presence of
the title and link elements, and so it is feasible for
us to declare that they may be omitted. The star following the
link token in the content model indicates that this element may
appear zero or more times, and the ampersand declares that the
elements on either side of it must both appear, but can do so in
either order. Note that this content model allows `title',
`title link link...' and `link link...title', but not
`link' or `link title link'.Finally we have some text:
The title element is very simple: neither the start not the end tag may be omitted, and it may contain only characters (<!ELEMENT title - - (#PCDATA)>
#PCDATA
stands for `parseable character data') and entity references such as
&.The<!ELEMENT link - O EMPTY>
link element has no actual content, so it is given a
content model consisting of the reserved word EMPTY. The tag
omission for empty elements is always `- O'. The point of the
link element is to hold its attributes, which we will come to
shortly.The document body consists of paragraph elements, or `delimited lists'. The `or' connector, `<!ELEMENT body O O (p | dl)+>
|', indicates that either of the
p or dl elements may appear, and the `plus' occurrence
indicator asserts that the group (p|dl) must appear one or more
times. In other words, the body consists of a sequence of p
and dl elements in arbitrary order.Finally, we start to specify the `interesting' content of the document.
Like the<!ELEMENT dl - - (dt, dd)+>
body itself, the dl element consists of a
sequence of one or more structures. Unlike the body element,
however, the structure is not a list of alternatives, but a sequence.
Where the body element would allow `p p dl p' for example, the
dl element requires that the dt and dd elements
alternate - the repeatable element is the ordered pair of
elements `dt, dd'.The paragraph, list and copyright elements have simple content models. Note that we can specify the structure of more than one element in the same declaration.<!ELEMENT (dt|dd) - O (#PCDATA)> <!ELEMENT p - O (#PCDATA)> <!ELEMENT copyright - - (#PCDATA)>
Prior to specifying the attributes for the link element, we
may declare an abbreviation.
<!ENTITY % URL "CDATA"
-- The term URL means a CDATA attribute
whose value is a Uniform Resource Locator,
See RFC1808 (June 95) and RFC1738 (Dec 94).
-->
This declares URL to be a `parameter entity', usable only
within this DTD. The entity reference `%URL' will be
substituted by the string `CDATA' (unparsed character data)
when it is encountered. A DTD may declare an entity more than once,
but any declarations after the first are silently ignored.Note the structure of the comment in this last declaration: in SGML,
comments may appear only within markup declarations (that is within
`<! ... >'), they start and end with the string
`--', and there may be more than one in a row. Thus you
may legally find `<!>' within an SGML file - this is a
completely empty markup declaration. Such a declaration may have a
single comment within it, as in `<!-- this is a comment
-->', or it may have several, as in
<!-- here -- -- is a comment ------>, which
has three comments within it, the third of which is empty.
Now we declare the attributes for the link element.
<!ATTLIST link
href %URL #REQUIRED -- URL for linked resource --
This declares an href attribute. After expansion of the
%URL entity reference, this attribute is seen to have a
`declared value' of CDATA (unparsed character data), and this
attribute is required to be present, so that the SGML parser will
object if it finds a link element in a document without an
href attribute.The rel attribute can take only two values:
rel (next | prev) #IMPLIED -- reverse link types --
>
The link element may have the attribute `rel=next' or
`rel=prev', but no other strings. Since this attribute is
`#IMPLIED', it is also permitted to omit it entirely. A
document may even specify this as simply `<link
href="here.html" next>' and the parser will infer that the value
`next' is associated with the attribute name `rel'.Entity references (other than parameter entities, which are internal to a DTD) are made using a construction such as `<!ENTITY amp "&">
&entname;. This presents a problem
if you want to include the ampersand in your text, but this
declaration sets up an entity called `amp', which can be used to
include an ampersand in text by typing `&'. You can
use this in your own documents to create shorthand forms for bits of
text you don't want to retype.