To avoid ambiguity, this somewhat arcane note needs to be here. The terms `SGML system' and `SGML application' have precise meanings in the SGML world. An SGML system is a program such as Jade or SP which parses SGML. An SGML application is a collection of DTDs and other supporting documents (see the standard [iso8879], clause 4.279). This package is therefore properly referred to as an SGML application, but since this term could be confusing, I will refer to this package instead as the (Starlink) SGML Set.
In fact, there is a version of TeX
which produces PDF files directly, and a TeX-to-HTML converter
called TeX4ht (see
<http://www.tug.org/applications/tex4ht/mn.html>
) which
writes special DVI files which work with a
postprocessor, and so uses the TeX parser to produce HTML
indirectly. This really just pushes the hack elsewhere.
Note for pedants: There is a difference between `elements' and `element types': the former are things which appear in documents, with data in them; the latter are the abstract things defined by the DTD. The distinction is not particularly important outside of a DTD, however, so I will not continue to make it in the description of the element types below. It will always be possible to make the distinction from the context anyway.
This is unlike XML, which is likely to be written largely by authoring programs
Note that this is different from the default
SGML minimisation, which would have <em/emphasised
text/
- in order to be able to write XML-style
<ref/>
for empty elements, we had to change the form of this particular
minimisation option. This much is a convenient shortcut. SGML defines
other tag minimisation locutions, so that
<p<em/emph></>
is legal. This rarely improves
readability, can get you into terrible messes, and is part of the
`parser hell' which XML was designed to avoid. I mention it purely
for completeness; some of these rather more extreme minimisation
functions have been disabled in this SGML application.
MathML was considered, but is neither well-supported in browsers, nor designed to be easily written by hand.
For example,
&
inside an <meqnarray>
element
results in a literal ampersand in the maths, rather than being
interpreted as alignment characters. I could go on (you guessed!), but
even at this
temporal distance, I feel my reader's tolerance for parser detail
gurgling down the plughole.
Note for SGML initiates: it does
seem a little disappointing that none of SGML's various escaping
mechanisms could help here, but the fact that the code text has to
remain pretty much inviolate (apart from the possibility of a few
spaces here and there) is rather restrictive. Another possibility I
considered was making the codebody
element have CDATA content,
even though that's generally deprecated in the most lurid terms. Far from
solving the problem, this would make things worse, however: the
</
which ends the element would still be magic, you'd
have to have an explicit </codebody>
closing the
element (no entities recognised), and you'd have to have an explicit
<codebody>
starting the element, since the
*-
short reference doesn't seem to permit the element to
be ended with the </codebody>
end-tag.
Unusually, the common misspelling `straightened' is also appropriate here.
In fact the sgml2docs
command produces a file called
doc.tar
, so running either of these commands
directly after the other would overwrite the result of
the former one.
Note for pedants:
there is a distinction between document type declaration and
definition. The document type definition is the collection of rules
which specifies which elements can go where, what attributes they
have, and so on; the declaration is the <!DOCTYPE...>
invocation at the top of the document file - the `document instance'
in SGML parlance - which associates that instance with a particular
definition. The abbreviation `DTD' usually refers to the definition,
and it is the definition that this section is about.
Note that this currently doesn't work fully - there's some defect in the HyTime declaration of the docxref element type which I haven't been able to identify.
Although I believe Clark was involved with the specification of the DSSSL standard, he has said (on the DSSSList discussion list, May 1999) that the transformation language has significant weaknesses.
It's a matter of taste whether you prefer Perl or DSSSL. DSSSL, based on Scheme Lisp, is undeniably odd, but I've grown to rather like it. A language with no assignments, no loops, no real sequences of actions, and where absolutely everything is a function, has a certain rather twisted glamour to it, like a nature programme about fish-life four miles down the Marianas Trench.