Normalising and signing XML

The Enormity package is for normalising XML, and signing and verifying it using GPG signatures.

The text below describes the need for a normalised form for XML, describes a normalised form which is simple to generate and is reversible, and which can be signed by GPG in a natural way. The mechanism has been implemented in a Java library.

Contents

Normalising and signing XML
A simple-enough normalisation procedure
A Java library
Downloads

Normalising and signing XML

Both normalising and signing XML appear to be hard problems, given the size and complexity of the work of the W3C Signature working group, which has produced recommendations on creating signatures for XML, as well as on the necessary problem of canonicalising XML prior to signature.

The reason why canonicalisation is necessary is that for each XML document, there is a set of other documents with trivially different syntax, but which mean ‘the same thing’ – that is, they might use single-quotes rather than double-quotes for marking attributes, or have the attributes in a different order, or have ‘unimportant’ whitespace differences between elements, or appear in a different encoding, while still being usefully regarded as the same document. If you read an XML document into a system, either an application for processing, or into an XML database for storage, it might be hard to arrange that the XML document you write out or retrieve is represented by exactly the same sequence of bytes, which is necessary for any more-or-less naive signature to validate.

In a sturdily-reasoned essay, Peter Gutmann has discussed this, and suggested that the approach used by the W3C WG is fundamentally mistaken: Why XML Security is Broken. It's a reasonably persuasive argument for the general nproblem, but in the case of a large category of XML documents (and I'm interested in particular in VOEvent packets), the problem is not as hard as this analysis suggests, because we don't have to solve the general problem.

The two key points that Gutmann makes are:

All cryptographic signature mechanisms are designed to sign a bag of bytes. XML documents are not just bags of bytes, so there's a fundamental dislocation here between what's wanted and what's available.
Signature mechanisms are designed to work with streams of bytes, so that it matters from a practical point of view where the signature is located in a byte-stream, and that the system knows from the outset which type of signature it is expected to create or verify.

One solution is not to normalise at all, but instead to regard the on-disk or on-the-wire XML document as the bag of bytes to be signed. This works, but throws away the mutability of XML, which means that if you want to actually do anything with the XML other than simply admire it, and if you want to round-trip the XML into and out of a system which doesn't know about your signature, you're presented with the dilemma of either abandoning the signature, or else worrying about how to reproduce exactly the same bag of bytes when the XML is serialised at some later stage.

It is part of the point of XML that XML documents are not just bags of bytes, and that there is a well-defined distinction between important content and meaninglessly mutable syntax. XML processors and editors freely take advantage of this: it is generally hard to guarantee what flavour of quotes will be written by an XSLT transformer, or that (insignificant) whitespace will be preserved by XML editors. Schemas can make this mutability more pronounced, since amongst other things they can license more extensive syntactic transformations. This syntactic mutability is reflected in the fact that applications typically do not operate on the bytes of a document or stream, but instead on the abstracted content of a document, as exposed via an API such as SAX, DOM, or an XSL node-set. An XML database is free to store an XML document in any way it likes, as long as it produces an equivalent document when required.

These notions have been formalised in the concept of the XML Information Set, which describes all of the information which a complete XML processor must preserve and make available to an application (the ‘canonicalization’ work of the XML Signatures WG is effectively concerned with defining a single serializaton of this set).

The XML InfoSet is quite elaborate, and includes many features of an XML document. The SAX model, however, implicitly defines a much simpler information model for XML, with just 11 API functions (in the org.xml.sax.ContentHandler interface). The SAX model is defined in terms of Java, but there are exactly analogous APIs in other languages, which can consequently support the same model. Viewed through a SAX lens, an XML document is a rather simple thing, which is consequently very simple to normalise, serialise and thus sign.

A simple-enough normalisation procedure

What I'm proposing here is a very simple normalisation mechanism, which straightforwardly turns an XML document into a stream of bytes, in a well-defined, streamable and reversible way. The resulting stream can be signed by GPG in a natural way, and the signature embedded into the XML equally naturally.

The simple normalisation turns the XML:

<doc>
<p class='foo'>Hello</p>
  <p> there
chum
</p>
</doc>

into the normalised form:

(doc
Aclass foo
(p
-Hello
)p
(p
-there\nchum
)p
)doc

This normalised form can then be signed, and the signature reinserted into the original XML, or else included as the parsed XML is passed downstream.

The following document has the same normalised form as the earlier one, but includes a PGP signature block which can be used to verify it:

<doc><p class="foo">Hello</p><p>there
chum</p>
<?signature armor='-----BEGIN PGP SIGNATURE-----
....
-----END PGP SIGNATURE-----'?></doc>

The normalisation procedure

The normalisation is done in two steps. Firstly, we define a textual representation of the parsed XML; second, we define a transformation of this output which normalises it.

The textual representation is based on the sgmls ESIS output, which was originally defined, in the 90s by the sgmls program. The original point of the format was that it should be easy for downstream tools to parse. The point here is that it turns an XML file into an unambiguous byte-stream and, further, that it permits a normalisation operation which is both well-defined and simple.

There isn't a complete overlap between the ESIS and the SAX model, so there are some differences. All the differences here are extensions rather than changes.

The output consists of a sequence of lines, separated by CR LF (ie bytes 0xd 0xa). Each line consists of a start character indicating which type of output record it represents, followed by one or more arguments. There are always the same number of arguments, separated by a single space.

`Mprefix uri`	start prefix mapping	extn
`mprefix`	end prefix mapping	extn
`Aattname CDATA value`	declare attribute	ESIS
`Bnamespace localname CDATA value`	declare namespaced attribute	extn
`(name`	start element	ESIS
`[namespace localname`	start namespaced element	extn
`)name`	end element	ESIS
`]namespace localname`	end namespaced element	extn
`-text`	character content	ESIS
`=text`	ignorable whitespace	extn
`?pi data`	processing instruction	ESIS
`Xname`	skipped entity	extn

Each start element event is preceded by the set of attributes on that event.

An important function of this class is to normalise the ESIS output. We do this in the following ways:

Attribute records (‘A’ and ‘B’) are alphabetised on output.
Succeeding 'character content' events are merged, and leading and trailing whitespace is trimmed from the resulting merged event. If the resulting event is empty, it is discarded. Ignorable whitespace is... ignored.
Start and end prefix mappings (‘M’ and ‘m’) are discarded.
Any processing instruction which has a ‘target’ of signature is removed.
All of the output is encoded to bytes as UTF-8.

The result of this is to turn the XML:

<doc><ns:p class='foo' xmlns:ns="urn:namespace" ns:att='bar'>Hello</ns:p>
  <p> there
chum
</p>
</doc>

into the (unnormalised) ESIS form:

(doc
Mns urn:namespace
Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
-Hello
]urn:namespace p
mns
-\n  
(p
- there\nchum\n
)p
-\n
)doc

This can also be given the normalised form:

(doc
Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
-Hello
]urn:namespace p
(p
-there\nchum
)p
)doc

In the normalised form, the prefix mappings have been removed (the prefixes are not semantically important), leading and trailing whitespace has been removed from the ‘-’ lines, and all-whitespace ‘-’ records have been removed.

A Java library

This Java library provides support for the various steps involved.

The normalisation is based on the data model implied by the well-known SAX streaming API of Java, but very similar APIs exist for other languages, which very closely match the Java one (which is natural, because they all necessarily have a close relationship to the XML information set) and so there is nothing Java-specific about the normalisation procedure here. The data model does not include lexical information, such as information about quotes or the order of attributes, so this confounding information is a fortiori not present in the normalised output.

This is a rather aggressive normalisation, meaning that it defines a large class of XML documents which are equivalent in the sense that they produce identical normalisations. This normalisation greatly simplifies the problem – by solving a simpler problem from the one the XML Canonicalisation Working Group has set itself. This scheme is practical because there is a large set of XML documents which are practically equivalent, so that we do not have to deal with the complicating generalities required if we wish to preserve the entire XML Information Set.

The distributed jar file includes a command-line application which can be used to experiment with the normalisation and signing functionality.

Downloads

Norman Gray
2012 September 9