Next Up Previous Contents
Next: parse
Up: Upconverter program descriptions
Previous: code2sgml
[ID index][Keyword index]

latex2sgml - Translate a LaTeX Starlink document into SGML.

Description

This program converts a LaTeX Starlink document into an SGML version of the same thing, conforming to the appropriate SGML DTD (SUN, SSN, etc). If a programcode document containing marked up routine prologues is specified, it will ensure that references to routines therein are handled properly.

The upconversion process is somewhat complicated and error-prone for a number of reasons, and some manual intervention is likely to be required to generate an SGML document which conforms to the appropriate DTD, is semantically equivalent to the LaTeX original, and whose SGML source is clear and maintainable.

Examples

Example 1:

     latex2sgml sun321.tex
        

The named LaTeX document is converted, and the output written to sun321.sgml. Progress messages are printed, including one at the end which says to look in sun321.translog for detailed warnings about the conversion, and indicates whether the resulting document conforms to the appropriate Starlink DTD.

Example 2:

     latex2sgml -R routines.sgml sun321.tex
        

In this example the same conversion is done, but with reference to the document routines.sgml which contains the marked up prologues of the routines which are to be documented in an appendix of the SUN. Appropriately identified references (\ref commands) in the text of the SUN will reference into the routines document (they will be translated into coderef elements).

Authors
Output

The program tries to provide as much helpful information about how the conversion has gone as it can.

The first part of the output to the screen is the result of an invocation of latex2html, and provides the normal, rather verbose, output which that generates.

Subsequent screen output is restricted to a brief summary of what stage the conversion is currently at. When conversion is complete (or has encountered a fatal error) a summary is written giving the name, doc.translog, of a log file into which more detailed information has been written. The log file provides line-based reports on items which might need attention. Here is an example of what it might look like:

     2:1:     Beginning l2s -> Starlink final SGML transformation
     2:1:     Verbosity level 5
     5:26:    Apparently duplicate author initials - using ID `auto.id-auth.1'
     5:55:    Ignoring line break in element H1
     4:56:    Eight-bit character ô may cause trouble for LaTeX
                 ...
     4:368:   Extra info from includegraphics ignored:
     4:            clip,height=118mm
     2:377:   References stable - no further passes required
     2:377:   Ending l2s -> Starlink final SGML transformation
     
The first digit on each line is an integer indicating the severity of the message, as follows:
0
fatal
1
serious
2
control information
3
almost certainly requires attention to the converted document
4
probably requires attention to the converted document
5
can probably be ignored
6
debugging
The second number is the line number of the output SGML file to which the message refers; if there is no second number the line is a continuation of the message on the previous line. The text on the rest of the line is intended to explain what the problem is, or may be.

Finally, the script attempts to parse the resulting document, and reports on whether it conformed to the Starlink DTD. If it failed to conform, the program shows how to use the parse command to see the errors.

On exit a file doc.sgml is left in the current directory. If conversion has been entirely successful then intermediate files will be removed, but otherwise some intermediate files will be left since they may be useful for post-mortem analysis and to aid in future runs of the code.

Flags

-R routinesdoc
The argument of the -R flag names a file containing an SGML programcode document to which the main SGML document will refer. A routinelist element referring to this file will be added at the end of the document (note, if this is not the desired place for it you will have to move it by hand), and any references in the LaTeX document to IDs in this file (typically each routine in the file will have an ID with its own name) will become appropriate coderef elements.

LaTeX extensions

As well as understanding the various extra commands provided by latex2html, this program provides a new command

        \sgml{}
     
and a new environment
        \begin{rawsgml} ... \end{rawsgml}
     
for specifying particular SGML to be output at given places in the LaTeX source. They do what you would expect them to: their contents, after normal expansion of any LaTeX macros, appear in the output SGML document. Note these are not verbatim- like as far as latex is concerned, so that latex-like markup may get modified, though SGML-like markup should not. These may be useful when defining, or redefining, high-level user LaTeX macros defined in the document.

Internals

This program operates in two stages. First the script l2s.pl is invoked to convert the LaTeX source into SGML conforming to a temporary DTD called `L2S' which preserves the structure of the original but is rather messy (l2s.pl is a harness for a modified version of latex2html, and suffers from the same deficiencies as that program as well as a few of its own). This produces a file with the extension .l2s. Secondly the script trans.pl is invoked to perform an SGML transformation from text marked up in the L2S DTD to the appropriate Starlink document DTD. This second part is a multiple pass process -- normally the first pass produces badly non-conforming SGML, the second pass produces something nearly right, and after three passes the document is stable.

For long documents, all of these stages may be slow, and it may be possible to speed matters up by not repeating unnecessary stages. Therefore the adventurous user may wish to run the l2s.pl and trans.pl scripts by hand, tweaking the intermediate .l2s file in between. If the first part of the conversion has gone very badly the program may warn that the intermediate .l2s document fails to conform to the L2S DTD. This is not good news but may not necessarily scupper the rest of the conversion; in any case running the parse utility on the .l2s file will show where errors are, and may give a clue about how the original LaTeX can be fixed to prevent the problem.

Deficiencies

Output of the program is by no means perfect; as well as the (hopefully fixable) Bugs listed separately, the following problems resulting from the conversion are more or less unavoidable, or at least no fix is planned. They will have to be tackled by modifying the LaTeX input file of the the SGML output file by hand.

Tabbing environments
Tabbing environments in the LaTeX original will almost certainly get badly mangled and will have to be recoded as something else (e.g. tabular or verbatim elements).
Ugly LaTeX-notation maths
Maths is embedded in latex notation in the final document within m, mequation or meqnarray elements, and should give the same output as the original. However, macros defined in the LaTeX (\newcommands) are expanded before being put into the output, which can lead to verbose and ugly inline latex-notation source. This can be avoided by removing the definitions of such from the input document before conversion; the definitions can then be replaced by hand in the output document within mdefs elements.
Latex lengths
Unlike commands, LaTeX lengths are not expanded, so that when a length is reset somewhere in a document and used later, this setting will be wrong. This most frequently causes problems in LaTeX picture environments -- a figurecontent element with attribute notation=latexgraphics may come out with entirely the wrong dimensions. In this case it is necessary to identify by hand where the relevant lengths are set in the LaTeX source and put them into each bit of LaTeX where they are used.
History and author information
The program constructs basic history and authorlist elements but doesn't try to be too clever. You may wish to modify these.
Line breaks
The upconverter sometimes makes a poor job of getting line breaks in the right place. It inserts some and removes some near tags, and the result may be ugly. This might undergo some improvement in future versions. It does not attempt to insert line breaks other than near certain tags, and this can result in overlong lines if long tags have been inserted.
Garbage in, garbage out
Some LaTeX source which has only ever been processed by latex2html and not by LaTeX (e.g. within an htmlonly environment or similar) may not make much sense; historically latex2html has done the best it can and written non-conforming HTML which browsers will do their best to render, and which may coincidentally look perfectly OK, so that the weirdness of the source has never been noticed (some of the SST macros provide fine examples of this, which case is dealt with explicitly). Other parts may make sense but not in fact look very good in existing star2html'd form. In either case, if the latex2html source which the upconverter is presented with isn't very good, then the output will suffer too. You have been warned.


Next Up Previous Contents
Next: parse
Up: Upconverter program descriptions
Previous: code2sgml
[ID index][Keyword index]