Up: Supporting library programs
Previous: img-eqlist.pl
[ID index][Keyword index]

abstract-star2html.pl - Obtain an SGML summary of a Star2HTML document

Description

This reads a file marked up using the Star2HTML extensions, plus an HTX index file, and produces a summary of the document marked up using the DocumentSummary DTD. That includes the sub*section structure, the cross-references, and most of the header. It emits a suitable CATALOG line on STDOUT.

The aim of this tool is only secondarily to produce an accurate, or complete, summary of the target document. It is primarily required simply to complete whilst running unattended (within the package's installation script), producing a valid SGML document, which can be used as a cross-reference target in the General DTD's DOCXREF element without being grossly misleading.

Usage:

abstract-star2html.pl \
--prefix=sun180.htx/ --output=sun180.summary \
/star/docs/sun180.tex /star/docs/sun180.htx/htx.index \
>>CATALOG 2>abstract-warnings

The parsing can cope with \xlabel commands either outside section headings or inside them, and copes with multiple \xlabel commands within a heading by emitting <label> elements after the heading. It logs a message to stderr when it discovers this.

If the parser discovers LaTeX markup in the section headings (which is true at some point for almost every file), then it logs a message to STDERR.

The parsing respects \begin{htmlonly}...\end{htmlonly}

The parsing copes with the arguments to each of the commands it matches (\newcommand{\star...} and \sub*section{...}) being on more than one line. It concatenates the lines before analysing them.

Emits a warning if an \xlabel doesn't appear in the HTX index file.

Argument list

input-file-name = Star2HTML file (Given): A file marked up using the Star2HTML extensions to LaTeX2HTML
index-file-name = HTX index file (Given): The index file produced by HTX (see SUN/188)
--prefix=url-prefix = option (Given): The prefix is added to each of the filenames in the HTX index, to make the generated URLs relative to the appropriate root of the document server, which is set in the DSSSL variable %starlink-document-server%, and might have the value `file:///star/docs/'
--output=outfile = option (Given): The name of the file to receive the generated output. If omitted, the result goes to STDOUT.
--force = option (Given): If present, then any errors which emerge after this will terminate the program, but return with a zero exit status.
--version = option (Given): Print the version number and exit.

Return value

Type: file

A file conforming to the DocumentSummary DTD

Authors

Norman Gray

Limitations

The result could benefit from a little editing, to insert the attribute values of the AUTHOR element such as email address, which aren't included in the Star2HTML file, but it should be valid without it. However, these attributes aren't actually used in the cross reference, so there's no great loss at present.

It also assumes that the things it matches are at the beginning of the line, possibly preceded by whitespace (this isn't just to speed it up, but also to avoid matching any reference to the \section command within the body of the text).

The parsing doesn't attempt to deal with markup in section titles, but it does attempt to detect and warn about it, logging a message to stderr.

There's not a lot of point in working hard to make this code do much better than this, since that would essentially require the sophistication of a full document conversion.

Because folk can do arbitrarily clever things with newcommands, I've had to dumb down the parsing of them. The code here will successfully extract document number, author, etc, as long as the corresponding newcommand is all on one line. I've had to do the same with \(sub)*section parsing. This will fail sometimes, but the result will be a thin but valid SGML document, and will not cause this code to spin its wheels indefinitely. If you include the option --force, then even if there's some fatal error, such as an input file not being present, then the script will still return with a zero exit status.

Up: Supporting library programs
Previous: img-eqlist.pl
[ID index][Keyword index]