Next Up Previous Contents
Next: sgml2hlp
Up: Supporting programs
Previous: sgml2docs
[ID index][Keyword index]

sgml2xml - Convert an SGML file to XML, using sx for the body, but additionally converting entity declarations in the original file's declaration subset.

Description

Essentially a wrapper for sx, but since that doesn't rewrite the DTD subset, this script does that, trying to find XML-compatible ways of rewriting what it finds, and spitting out warnings whenever it sees anything it thinks is dodgy.

Output is to stdout.

Notes:

  • No parsing of any declaration -- uses RCS only.
  • Limited error recovery (the input is supposed to be valid already), so most errors are fatal.
  • Only processes entity declarations in subset -- element declarations, etc, are discarded with warnings.
  • No automatic rewriting of public identifiers -- if the input document is a valid instance of the given DTD, the output document will be, also.
  • Generates system identifiers for external entities which have only public ones. At present, this generates a reference to the catalogue server -- should it try to do a catalogue lookup and replace it by what it finds there? That starts to get complicated, and requires command-line queries of the catalogue, which either requires OpenSP, or that I resurrect my catalogue-parser. XML's requirement for system identifiers is a pig.
  • Entities which can't be easily translated are discarded with warnings.
  • It's not clear what one should do with SUBDOC entities. This translates them to NDATA XML entities, and warns that it's done this, but the referenced document might well be SGML rather than XML.

Argument list
arg1 = Filename (Given)

SGML file to be processed (required)

Authors
Options

--pubid public-identifier
Specify the public identifier which is to be used to identify the DTD in the output XML file. If not specified, the public identifier in the input file is reused, which will almost certainly be wrong, but should nonetheless give broadly sensible error messages when it's parsed, and do no harm if it's not.
--sysid system-identifier
Specify the system identifier which is to be used to identify the DTD in the output XML file. If not specified, either the input file's system identifier is reused, if one exists, or else the public identifier in the input file is converted to a system identifier, and that is used.
--declaration decl
The declaration which is appropriate to the input file. Note that this is specified for the benefit of sx -- this script pays no attention to it.
--version
Print version information and exit.

Environment

If the environment variable SX is defined, then this is used as the command to invoke sx, otherwise a default value is used.

Notes

sx is available as part of James Clark's SP package.

References to `productions' are to the productions of the SGML standard. See <http://www.tiac.net/users/bingham/sgmlsyn/sgmlsyn.htm> for a well-linked version, or <http://www.oreilly.com/people/staff/crism/sgmldefs.html>, or <ftp://ftp.ifi.uio.no/pub/SGML/productions>.


Next Up Previous Contents
Next: sgml2hlp
Up: Supporting programs
Previous: sgml2docs
[ID index][Keyword index]