uk.me.nxg.enormity.esis (enormity 0.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package uk.me.nxg.enormity.esis

Writes out a SAX stream in a format based on the sgmls ESIS output.

See:
Description

Interface Summary
EsisWriter	Provides the writing functions needed by an `EsisHandler`

Class Summary
EsisHandler	Writes out a SAX stream in a format based on the sgmls ESIS output.
EsisParser	A parser which can interpret the pseudo-ESIS syntax of `EsisHandler`.
StreamEsisWriter	Writes ESIS output to a stream, taking care of encodings and line separators

Package uk.me.nxg.enormity.esis Description

Writes out a SAX stream in a format based on the sgmls ESIS output. This original format is defined by sgmls. The original point of the format was that it should be easy for downstream tools to parse. The point here is that it turns an XML file into an unambiguous byte-stream and, further, that it permits a normalisation operation which is both well-defined and simple.

There isn't a complete overlap between the ESIS and the SAX model, so there are some differences. All the differences here are extensions rather than changes.

The output consists of a sequence of lines, separated by CR LF (ie bytes 0xd 0xa). Each line consists of a start character indicating which type of output record it represents, followed by one or more arguments. There are always the same number of arguments, separated by a single space.

`Mprefix uri`	start prefix mapping	extn
`mprefix`	end prefix mapping	extn
`Aattname CDATA value`	declare attribute	ESIS
`Bnamespace localname CDATA value`	declare namespaced attribute	extn
`(name`	start element	ESIS
`[namespace localname`	start namespaced element	extn
`)name`	end element	ESIS
`]namespace localname`	end namespaced element	extn
`-text`	character content	ESIS
`=text`	ignorable whitespace	extn
`?pi data`	processing instruction	ESIS
`Xname`	skipped entity	extn

An important function of this class is to normalise the ESIS output. We do this in the following ways:

Attribute records (‘A’ and ‘B’) are alphabetised on output.
Succeeding 'character content' events are merged, and leading and trailing whitespace is trimmed from the resulting merged event. If the resulting event is empty, it is discarded. Ignorable whitespace is... ignored.
Start and end prefix mappings (‘M’ and ‘m’) are discarded.
Any processing instruction which has a ‘target’ of signature is removed.
All of the output is encoded to bytes as UTF-8.

Each start element event is preceded by the set of attributes on that event.

The result of this is to turn the XML:

<doc><ns:p class='foo' xmlns:ns="urn:namespace" ns:att='bar'>Hello</ns:p>
  <p> there,
chum
</p>
</doc>

into the (unnormalised) ESIS form:

(doc
Mns urn:namespace
Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
-Hello
]urn:namespace p
mns
-\n  
(p
- there,\nchum\n
)p
-\n
)doc

This can also be given the normalised form:

(doc
Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
-Hello
]urn:namespace p
(p
-there,\nchum
)p
)doc

In the normalised form, the prefix mappings have been removed (the prefixes are not semantically important), leading and trailing whitespace has been removed from the ‘-’ lines, and all-whitespace ‘-’ records have been removed.