The discussion in Section 3 should be enough to let you produce your own documents but, now or in the future, you may find it useful to be able to read the DTD directly.[Note 11] Once you are familiar with the underlying ideas, the expression of them in the DTD turns out to be agreeably compact and reasonably readable.
My account of the DTD syntax will be rather compressed - see [gentle], or the other references in Section 2.2 for alternatives.
A simple HTML-like DTD could be declared as follows:
And here is a simple document which uses this DTD:<!ELEMENT html O O (head, body, copyright?)> <!ELEMENT head O O (title & link*)> <!ELEMENT title - - (#PCDATA)> <!ELEMENT link - O EMPTY> <!ELEMENT body O O (p | dl)+> <!ELEMENT p - O (#PCDATA)> <!ELEMENT dl - - (dt, dd)+> <!ELEMENT (dt|dd) - O (#PCDATA)> <!ELEMENT copyright - - (#PCDATA)> <!ENTITY % URL "CDATA" -- The term URL means a CDATA attribute whose value is a Uniform Resource Locator, See RFC1808 (June 95) and RFC1738 (Dec 94). --> <!ATTLIST link href %URL #REQUIRED -- URL for linked resource -- rel (next | prev) #IMPLIED -- reverse link types -- > <!ENTITY amp "&">
This displays most of the important syntactical features in an SGML DTD, so if we explain it line-by-line, it should illustrate the features you need to make some sense of most DTDs.<link href="http://www.astro.gla.ac.uk/users/norman/" rel=next> <title>This is a title</title> <p>And here is a paragraph <dl> <dt>With a delimited list <dd>Correctly formed & OK </dl>
This `element declaration' declares the<!ELEMENT html O O (head, body, copyright?)>
html
element. The element type name is followed
by a statement of whether the start and end tags may be omitted if the
parser can infer their presence. The minimisation specifications may
be either `-' (minus), indicating that the corresponding tag is
required, or `O' (letter O), indicating that it may be omitted.
Following this is the `content model' which, in this case, states that
the html
element must consist of one head
, one
body
, and an optional copyright
, in that order - the
comma connecting the element
names specifies that they must be in order, and the question mark
following the copyright element indicates that it may occur zero or
one times.
The omission of the start element is possible in this case, since the
first element in the html
element must be a head
element, so whenever the parser finds a head
element, it can
know that the html
element has begun.So what is in the head
element?
The<!ELEMENT head O O (title & link*)>
head
element consists of precisely one title, and zero or
more link
elements, in either order. The head
tags
can be inferred from the presence of
the title
and link
elements, and so it is feasible for
us to declare that they may be omitted. The star following the
link
token in the content model indicates that this element may
appear zero or more times, and the ampersand declares that the
elements on either side of it must both appear, but can do so in
either order. Note that this content model allows `title
',
`title link link...
' and `link link...title
', but not
`link
' or `link title link
'.Finally we have some text:
The title element is very simple: neither the start not the end tag may be omitted, and it may contain only characters (<!ELEMENT title - - (#PCDATA)>
#PCDATA
stands for `parseable character data') and entity references such as
&
.The<!ELEMENT link - O EMPTY>
link
element has no actual content, so it is given a
content model consisting of the reserved word EMPTY
. The tag
omission for empty elements is always `- O
'. The point of the
link
element is to hold its attributes, which we will come to
shortly.The document body consists of paragraph elements, or `delimited lists'. The `or' connector, `<!ELEMENT body O O (p | dl)+>
|
', indicates that either of the
p
or dl
elements may appear, and the `plus' occurrence
indicator asserts that the group (p|dl)
must appear one or more
times. In other words, the body consists of a sequence of p
and dl
elements in arbitrary order.Finally, we start to specify the `interesting' content of the document.
Like the<!ELEMENT dl - - (dt, dd)+>
body
itself, the dl
element consists of a
sequence of one or more structures. Unlike the body
element,
however, the structure is not a list of alternatives, but a sequence.
Where the body
element would allow `p p dl p
' for example, the
dl
element requires that the dt
and dd
elements
alternate - the repeatable element is the ordered pair of
elements `dt, dd
'.The paragraph, list and copyright elements have simple content models. Note that we can specify the structure of more than one element in the same declaration.<!ELEMENT (dt|dd) - O (#PCDATA)> <!ELEMENT p - O (#PCDATA)> <!ELEMENT copyright - - (#PCDATA)>
Prior to specifying the attributes for the link
element, we
may declare an abbreviation.
This declares<!ENTITY % URL "CDATA" -- The term URL means a CDATA attribute whose value is a Uniform Resource Locator, See RFC1808 (June 95) and RFC1738 (Dec 94). -->
URL
to be a `parameter entity', usable only
within this DTD. The entity reference `%URL
' will be
substituted by the string `CDATA
' (unparsed character data)
when it is encountered. A DTD may declare an entity more than once,
but any declarations after the first are silently ignored.Note the structure of the comment in this last declaration: in SGML,
comments may appear only within markup declarations (that is within
`<! ... >
'), they start and end with the string
`--
', and there may be more than one in a row. Thus you
may legally find `<!>
' within an SGML file - this is a
completely empty markup declaration. Such a declaration may have a
single comment within it, as in `<!-- this is a comment
-->
', or it may have several, as in
<!-- here -- -- is a comment ------>
, which
has three comments within it, the third of which is empty.
Now we declare the attributes for the link
element.
This declares an<!ATTLIST link href %URL #REQUIRED -- URL for linked resource --
href
attribute. After expansion of the
%URL
entity reference, this attribute is seen to have a
`declared value' of CDATA
(unparsed character data), and this
attribute is required to be present, so that the SGML parser will
object if it finds a link
element in a document without an
href
attribute.The rel
attribute can take only two values:
Therel (next | prev) #IMPLIED -- reverse link types -- >
link
element may have the attribute `rel=next
' or
`rel=prev
', but no other strings. Since this attribute is
`#IMPLIED
', it is also permitted to omit it entirely. A
document may even specify this as simply `<link
href="here.html" next
>' and the parser will infer that the value
`next
' is associated with the attribute name `rel
'.Entity references (other than parameter entities, which are internal to a DTD) are made using a construction such as `<!ENTITY amp "&">
&entname;
. This presents a problem
if you want to include the ampersand in your text, but this
declaration sets up an entity called `amp', which can be used to
include an ampersand in text by typing `&
'. You can
use this in your own documents to create shorthand forms for bits of
text you don't want to retype.