The programcode element structure

Next: 5.2 Specific language variants
Up: 5.1 The structure of programcode documents
Previous: 5.1.1 Example document
[ID index][Keyword index]

5.1.2 The programcode element structure

In the case of the Starlink General DTD, described in Section 4, the important features were the meanings of the element types. In the case of the programcode DTD, however, the meanings of the element types are fairly straightforward, and the detail is in the structure of the DTD. It therefore seems best to focus on the structure of the DTD here, leaving the detailed descriptions of the elements, and their attributes, to Appendix D.

The programcode DTD includes essentially all of the paragraph-level elements in the Starlink General DTD, that is, everything that may be included in a paragraph in that DTD may also be included in a paragraph in the programcode DTD (except `docxref' and `ref', but with the addition of the `funcname' element).

Figure 5 displays the element structure of the programcode DTD. The syntax is that of a DTD - see Appendix A.2 for brief notes on this.

<!ELEMENT programcode   O O (docblock, (codegroup | codereference)+)>
<!ELEMENT codegroup     - O (docblock, routine+)>
<!ELEMENT codereference - O (docblock)>
<!ELEMENT docblock      O O (title, description?,
                             userkeywords?, softwarekeywords?,
                             authorlist?, copyright?, history?)>

<!ELEMENT routine       O O (codeopener?, routineprologue, codebody)>
<!ELEMENT codeopener    O O (#PCDATA)>
<!ELEMENT codebody      O O (#PCDATA)>
<!ELEMENT routineprologue O O (
                                (routinename,   diytopic*)? &
                                (moduletype,    diytopic*)? &
                                (purpose,       diytopic*)? &
                                (description,   diytopic*)  & 
                                (returnvalue,   diytopic*)? &
                                (argumentlist,  diytopic*)? &
                                (parameterlist, diytopic*)? &
                                (authorlist,    diytopic*)? &
                                (history,       diytopic*)? &
                                (usage,         diytopic*)? & 
                                (invocation,    diytopic*)? & 
                                (examplelist,   diytopic*)? &
                                (implementationstatus,  diytopic*)? & 
                                (bugs,          diytopic*)?
                                )>

<!ELEMENT routinename   O O (name, othernames?)>
<!ELEMENT moduletype    - O (#PCDATA)>
<!ELEMENT name          O O (#PCDATA)>
<!ELEMENT othernames    - O (name+)>

<!ELEMENT purpose       - O (%p.model)>
<!ELEMENT title         O O (#PCDATA)>
<!ELEMENT description   - O (%paralist;)>
<!ELEMENT (userkeywords | softwarekeywords)
                                - O (#PCDATA)>
<!ELEMENT returnvalue   - O (%paralist;)>
<!ELEMENT (argumentlist | parameterlist)
                        O O (parameter*)>
<!ELEMENT parameter     - O (name, type, description)>
<!ELEMENT type          - O (#PCDATA)>
<!ELEMENT examplelist   O O ((example,description)+)>
<!ELEMENT example       - O (#PCDATA)>
<!ELEMENT (usage | invocation | implementationstatus | bugs)
                        - O (%paralist;)>
<!ELEMENT diytopic     - O (title, %paralist;)>
<!ELEMENT copyright     - O (%paralist;)>

<!ELEMENT authorlist    O O ((author+ | authorref+), otherauthors?)>
<!ELEMENT otherauthors  - O (author+ | authorref+)>
<!ELEMENT author        - O (name, authornote?)>
<!ELEMENT authornote    - O (%paralist;)>

<!ELEMENT history       O O (change+)>
<!ELEMENT change        - O (%paralist;)>

<!ELEMENT funcname      - - (#PCDATA)>
<!ELEMENT webref - - (%simpletext)+>
<!ELEMENT url - - (#PCDATA)>

Figure 5:

Element structure of the programcode DTD

Below, I describe these elements group-by-group. This description concentrates on the structure of the DTD and the relationships between the elements - I have not described the details of the elements or their attributes where these can be found in the detailed element listing in Appendix D.

<!ELEMENT programcode   O O (docblock, (codegroup | codereference)+)>
<!ELEMENT codegroup     - O (docblock, routine+)>
<!ELEMENT codereference - O (docblock)>
<!ELEMENT docblock      O O (title, description?,
                             userkeywords?, softwarekeywords?,
                             authorlist?, copyright?, history?)>

The programcode top-level element, like the codegroup and codereference elements which it contains, starts off with a docblock element. This may provide discussion, author, copyright, change history information, or it may give as little as a title. Where this information is provided is up to the author of the documentation. The elements in the docblock must be present in the order specified here.

A codegroup element simply gathers together several related functions (this is deliberately vague); it might therefore represent all the functions defined in one source file, or in one directory of a source tree. A codereference is even vaguer: it documents a relationship between the current programcode document and another one. In the case of the DSSSL DTD, this is mapped to the structure in that language which included one source file in another; in the case of the Fortran DTD, it could document the dependence of a source file on an `include' file.

<!ELEMENT routine       O O (codeopener?, routineprologue, codebody)>
<!ELEMENT codeopener    O O (#PCDATA)>
<!ELEMENT codebody      O O (#PCDATA)>
<!ELEMENT routineprologue O O (
                                (routinename,   diytopic*)? &
                                (purpose,       diytopic*)? &
                                (description,   diytopic*)  & 
                                (returnvalue,   diytopic*)? &
                                (argumentlist,  diytopic*)? &
                                (parameterlist, diytopic*)? &
                                (authorlist,    diytopic*)? &
                                (history,       diytopic*)? &
                                (usage,         diytopic*)? & 
                                (invocation,    diytopic*)? & 
                                (examplelist,   diytopic*)? &
                                (implementationstatus,  diytopic*)? & 
                                (bugs,          diytopic*)?
                                )>

A routine element documents a function, with arguments, a return value, and the like.

The codebody element is ignored by the processing system, but is still scanned by the parser. This could cause you a problem if there's anything in there which looks like something the parser would be interested in, namely an element start-tag, an entity reference, or something that looks like markup. The ampersand and left angle-bracket are only recognised as markup if they are immediately followed by a name-start character (upper- or lowercase letter); markup is something starting with the string <!.... If the parser trips up on something, there are two things you can do. You can make minor edits to your source code (adding a space character will always be enough), to stop things looking like markup: <a is the beginning of an element start-tag, but < a, with an interpolated space, is not. Alternatively, you can bracket the code in a CDATA marked section (Section 3.5.1) as follows

* ... end of code prologue
*-<![CDATA[
      ...fortran code including <ignored &markup...
*]]>

Which of these alternatives you prefer is largely a matter of taste, I think, but remember that you'll only have to do this for those source-code files which you include within your documentation. It is undeniable that these strategies are ugly, but something like this is fairly inevitable as the downside of having your source processable by more than one system at once. If both of these are unacceptable to you (on aesthetic grounds if nothing else), then you can always preprocess your sources to strip the code out and leave a `pure' SGML document.[Note 8]

Note: If the codebody element actually contains no code at all (perhaps because the document has been generated by a preprocessor stage), then you should include the attribute `empty' in the start-tag; this has no effect at present, but could become significant if the documents are repurposed in future versions of this set.

The routineprologue element contains all the `meta-information' for the routine, such as authorship, argument list, return value and the like. The declaration here looks particularly complicated, but that is largely due to an unwieldiness in SGML's DTD syntax. This declaration simply states that each of the routinename, purpose, etc., elements may appear at most once, but that each of these elements may be freely interspersed with diytopic elements. Only the description element must appear.

<!ELEMENT routinename   O O (name, othernames?)>
<!ELEMENT moduletype    - O (#PCDATA)>
<!ELEMENT name          O O (#PCDATA)>
<!ELEMENT othernames    - O (name+)>

The routinename element has structure, though in the usual case (<routinenamehelloworld>) you wouldn't notice this. The othernames element is useful when a function has some generic name, say allocarray, plus some specific names, say allocarray_int and allocarray_float. The moduletype element allows you to document that a particular module is a <moduletype>Perl script, for example.

<!ELEMENT purpose       - O (%p.model)>
<!ELEMENT title         O O (#PCDATA)>
<!ELEMENT description   - O (%paralist;)>
<!ELEMENT (userkeywords | softwarekeywords)
                                - O (#PCDATA)>
<!ELEMENT returnvalue   - O (%paralist;)>
<!ELEMENT (argumentlist | parameterlist)
                        O O (parameter*)>
<!ELEMENT parameter     - O (name, type, description)>
<!ELEMENT type          - O (#PCDATA)>
<!ELEMENT examplelist   O O ((example,description)+)>
<!ELEMENT example       - O (#PCDATA)>
<!ELEMENT (usage | invocation | implementationstatus | bugs)
                        - O (%paralist;)>
<!ELEMENT diytopic     - O (title, %paralist;)>
<!ELEMENT copyright     - O (%paralist;)>

The distinction between purpose and description is that purpose is intended for a brief, perhaps one-line, summary of the function, whereas description is intended for a longer discussion.

The description element is used in the docblock, codeprologue, miscprologue and parameter elements, authorlist is used in both codeprologue and docblock elements, and name is used in the author, othernames, parameter and routinename elements.

The diytopic element is for other notes which aren't otherwise covered by the element types listed here. It has a very simple structure: a title followed by paragraphs of text.

The distinction between the userkeywords and softwarekeywords elements is that the former is intended to supply keywords to help the final user of the software, whereas the latter is intended to be a home for keywords concerned with the categorisation of the software within the Starlink project.

The text %p.model; indicates that at this point, any of the paragraph-level elements from the Starlink General DTD may be used, with the exception of the `docxref' and `ref' elements, and the addition of the `funcname' element.

The text %paralist; is shorthand for p, (p | tabular)*, or in other words, a sequence of paragraphs and tabular elements.

<!ELEMENT authorlist    O O ((author+ | authorref+), otherauthors?)>
<!ELEMENT otherauthors  - O (author+ | authorref+)>
<!ELEMENT author        - O (name, authornote?)>
<!ELEMENT authornote    - O (%paralist;)>

Each author element must be given an ID. The down-converter which processes the document will assume that authors with the same ID are the same author, and will attempt to assemble a full set of information about that author (ie, email address, webpage) from the various available author elements with the same ID and, for example, assemble a list of all the authors represented in a collection of code at the top of a codecollection element. You should probably try to make the information given in these scattered author elements consistent, although the down-converter won't impose this.

<!ELEMENT history       O O (change+)>
<!ELEMENT change        - O (p+)>

The history mechanism in programcode documents is intentionally simple, as it merely emulates the list-of-changes style in the majority of the Starlink code-base. Specifically, it is simpler than the history mechanism in the General DTD (see Section 4.5). The change element has a required date, and a required `author' attribute, which links back to a previous author element.

<!ELEMENT funcname      - - (#PCDATA)>
<!ELEMENT webref - - (%simpletext)+>
<!ELEMENT url - - (#PCDATA)>

The only unusual element is funcname, which is intended to indicate other functions within the same `world' (vagueness again): these could be language primitives, or other documented functions. At present, this simply functions as a variant of the code element, but the system could be extended in future to generate cross-references for these.

Next: 5.2 Specific language variants
Up: 5.1 The structure of programcode documents
Previous: 5.1.1 Example document
[ID index][Keyword index]

The Starlink SGML Set
Starlink System Note 70
Norman Gray, Mark Taylor
21 April 1999. Release DR-0.7-13. Last updated 24 August 2001