Next Up Previous Contents
Next: 7.2 Upconversion from source code
Up: 7 Converting existing text to SGML
Previous: 7 Converting existing text to SGML
[ID index][Keyword index]

7.1 Upconversion from LaTeX

The LaTeX upconverter takes an existing Starlink document (SUN, SSN, ...) and outputs an SGML version of the same thing, which conforms to the appropriate Starlink DTD. The upconversion process is somewhat complicated and error-prone for a number of reasons, and it is unlikely that simply running the upconverter will give you a perfect SGML document without further effort - it will probably be necessary to make a few changes to the original LaTeX, run the upconverter, examine the log messages given by the converter and the resulting SGML, and then either tidy the SGML by hand or make further changes to the LaTeX, and repeat the cycle. Some documents require relatively little manual adjustment, and others more. Since the upconverter makes use of the latex2html program, if the document cannnot be successfully processed by star2html/latex2html then it's unlikely that the upconverter will make a good job of it.

At its simplest, a document can be converted using a command like:

latex2sgml sun321.tex
This will produce a fair bit of output to the screen and to a file sun321.translog; this is explained in detail in the description of latex2sgml.

If you're lucky, the program will end by writing a message like the following:

--------------------------------------------------------------------
--- latex2sgml: Parsing final output file sun321.sgml
---             Document sun321.sgml conforms to DTD
---             Removing intermediate files.
--------------------------------------------------------------------
In this case, the document is syntactically correct and you only need to worry about whether its semantics are as desired, by looking at the .translog file and the converted SGML itself.

However, if the output SGML does not conform to the DTD then a message like this will be written:

--------------------------------------------------------------------
--- latex2sgml: Parsing final output file sun321.sgml
---             Document sun321.sgml does not conform to DTD:
---                 Parse errors: 3
---             Use 'parse -s sun321.sgml |& grep -v :W:' to see messages.
---             Leaving intermediate files (sun321.l2s*)
--------------------------------------------------------------------
Following the script's advice and running parse will give output something like:
nsgmls:sun321.sgml:698:86:E: document type does not allow element "figure" here
nsgmls:sun321.sgml:3049:80:E: document type does not allow element "figure" here
nsgmls:sun321.sgml:3074:80:E: document type does not allow element "figure" here
- the third and fourth fields on each line give line number and column of the SGML file where the problem occurred. Such errors sometimes result from deficiencies in the upconverter and sometimes from errors in the original LaTeX file; you will have to either fix up the SGML file by hand or modify the LaTeX source accordingly and re-run the upconverter.

As the output notes, some temporary files are left when the conversion is not perfect, since they may be of use for debugging and so on. You may wish to delete them by hand if you're not going to use them.

Unfortunately, there are all sorts of things which can go wrong with the conversion process. Preparing the LaTeX document before starting the conversion can defend you against some of them; here is a checklist of useful things to do before attempting a conversion:

Low level formatting
The upconverter will try to ignore most LaTeX and HTML code designed for low level formatting such as tweaks to intercharacter spacing and so on - these things are properly dealt with by the downconverters and not addressed in the source of the document. LaTeX formatting inside math elements is an exception to this and will be passed through unchanged. However in some circumstances low level formatting can confuse the upconverter, so it is generally a good idea to remove this from the LaTeX before conversion. In particular HTML tags output from rawhtml environments may be problematic.

If there are user-defined macros in the form of \newcommand or \newenvironment commands you can help the conversion by ensuring that these are simplified to express the logical form of the document rather than its physical layout.

Figures
Figures are usually handled differently in latexonly and htmlonly parts of the original document. For correct processing, any \includegraphics-type command for importing GIF or JPEG files into the HTML source, or postscript files into the LaTeX, should appear within a figure environment in the LaTeX original. Often for HTML output captions are inserted as text under the included image by custom latex2html code in the document - if this duplicates text in a genuine \caption command it should be removed. Again, this may be best fixed up by modifying user macros which implement figures.
Subroutine/Command descriptions
Many SUNs contain an appendix with detailed descriptions of library subroutines or user commands for reference, usually coded using the SST macros. latex2sgml will attempt to convert these to SGML as part of the main output document, but it will make a pretty messy job of it. The correct way to go about this is to code the routine/command descriptions using the separate programcode DTD and reference that document using a codecollection element within the main document, as described in Section 5. There is no converter provided for generating a programcode document from LaTeX source code, but the code2sgml converter will mark up a suitable set of prologues from Fortran or C source files into programcode SGML as described in the next section. If the resulting document is specified to latex2sgml using the -R flag, it will make appropriate references to the converted prologues and routine references in the final main SGML document.
Mathematics macros
Maths is embedded in LaTeX notation in the final document within m, mequation or meqnarray elements, and under normal circumstances any user-defined macros are expanded to produce the LaTeX content of these elements. However, if the purpose of the user macros was to make the maths clearer or easier to modify, this can be a bad thing. In order to keep macros from being expanded, you can remove the relevant macro definition commands (\newcommands and \newenvironments) from the LaTeX document before running latex2sgml on it, and later replace them in the resulting SGML document within one or more mdefs elements.

Even after the best efforts of latex2sgml some manual intervention will be required to produce a good SGML document from your LaTeX source. This is due to several kinds of problem, for instance inadequacies in the upconverter, the fact that the LaTeX original may not contain markup corresponding to what needs to get put into the SGML output, and inherent differences between what can be expressed in LaTeX and in the Starlink DTD. Lists of known deficiencies and bugs in the converter are given in the detailed description of latex2sgml. On the whole, particular instances of these things are flagged up in the .translog file.

To aid in hand tweaking of the LaTeX source code, the upconverter provides an \sgml{} command and a \begin{rawsgml} ... \end{rawsgml} environment. These do what you would expect: their contents, after normal expansion of any LaTeX macros, appear in the output SGML document. You may find these useful for redefining user macros in the LaTeX original.

As explained in the latex2sgml documentation, the -R flag can be used to specify a document containing associated routine/command descriptions marked up according to the Programcode DTD. In this case an appropriate codecollection element will be inserted as the first appendix and references to IDs defined in the Programcode document will be inserted appropriately as coderef elements.


Next Up Previous Contents
Next: 7.2 Upconversion from source code
Up: 7 Converting existing text to SGML
Previous: 7 Converting existing text to SGML
[ID index][Keyword index]
The Starlink SGML Set
Starlink System Note 70
Norman Gray, Mark Taylor
21 April 1999. Release DR-0.7-13. Last updated 24 August 2001