Unity: Unity

Author:: Norman Gray <http://nxg.me.uk>

The Unity package provides a parser for unit strings.

NOTE: The library should currently be regarded as beta quality: the implementation and interface may change in response to experience and comments.

This is the unity parser (version 1.0), which is a C library to help parse unit specification strings such as W.mm**-2. There is also an associated Java class library which uses the same grammars. For more details, see the library's home page; the source is on bitbucket.

As well as parsing various unit strings, the library can also serialise a parsed expression in various formats, including the four formats that it can parse, a LaTeX version with name latex (which uses the {siunitx} package) and a debug format which lists the parsed unit in an unambiguous, but not otherwise useful, form.

Parsing Units

You can parse units using a couple of different syntaxes since, unfortunately, there is no general consensus on which syntax the world should agree on. The ones supported (and their names within this library) are as follows :

fits: FITS v3.0, section 4.3, W.D. Pence et al., A&A 524, A42, 2010 doi:10.1051/0004-6361/201015362
ogip: OGIP memo OGIP/93-001, 1993
cds: Standards for Astronomical Catalogues, Version 2.0, section 3.2, 2000
vounits: IVOA VOUnits Proposed Recommendation

Demo

If you want to experiment with the library, build the program src/c/unity (in the distribution):

    % ./unity -icds -oogip 'mm2/s'
    mm**2 /s
    % ./unity -icds -ofits -v mm/s
    mm s-1
    check: all units recognised?           yes
    check: all units recommended?          yes
    check: all units satisfy constraints?  yes
    % ./unity -ifits -ocds -v 'merg/s'
    merg/s
    check: all units recognised?           yes
    check: all units recommended?          no
    check: all units satisfy constraints?  no
    % ./unity -icds -ofits -v 'merg/s'
    merg s-1
    check: all units recognised?           no
    check: all units recommended?          no
    check: all units satisfy constraints?  yes

In the latter cases, the -v option validates the input string against various constraints. The expression mm/s is completely valid in all the syntaxes. In the FITS syntax, the erg is a recognised unit, but it is deprecated; although it is recognised, it is not permitted to have SI prefixes. In the CDS syntax, the erg is neither recognised nor (a fortiori) recommended; since there are no constraints on it in this syntax, it satisfies all of them (this latter behaviour is admittedly slightly counterintuitive).

Grammars supported

The four supported grammars have a fair amount in common, but the differences are nonetheless significant enough that they require separate grammars. Important differences are in the number of solidi they allow in the units specifications, and the symbols they use for products and powers.

Current limitations:

Currently ignores some of the odder unit restrictions (such as the OGIP requirement that 'Crab' can have a 'milli' prefix, but no other SI prefixes)

In the grammars below, the common terminals are as follows:

WHITESPACE: one or more whitespace characters
STAR, DOT: a star or a dot, generally used to indicate multiplication
DIVISION: a slash
STARSTAR, CARET: the former is '**'; both are used to indicate exponentiation
OPEN_P, CLOSE_P: open and close parentheses
INTEGER, FLOAT: numbers; the syntax of FLOAT is [+-]?[1-9][0-9]*\\.[0-9]+, so that there are no exponents allowed; the signed integers have a non-optional leading sign, the unsigned don't
STRING: a sequence of upper- and lower-case letters

There are some other terminals used in some grammars. See the VOUnits specification for further details.

The FITS grammar

input: complete_expression 
        | scalefactor complete_expression 
        | scalefactor WHITESPACE complete_expression 
        | division unit_expression 
        ;

complete_expression: product_of_units 
        | product_of_units division unit_expression 
        ;

product_of_units: unit_expression 
        | product_of_units product unit_expression 
        ;

unit_expression: term                                 
        // m(2) is m^2, not function application
        | STRING parenthesized_number 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;

function_application: STRING OPEN_P complete_expression CLOSE_P 
        ;

scalefactor: LIT10 power numeric_power 
        | LIT10 SIGNED_INTEGER 
        ;

division: DIVISION;

term: unit 
        | unit numeric_power 
        | unit power numeric_power 
        ;

unit: STRING 
        ;

power: CARET
        | STARSTAR
        ;

numeric_power: integer 
        | parenthesized_number 
        ;

parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;

integer: SIGNED_INTEGER | UNSIGNED_INTEGER;

product: WHITESPACE | STAR | DOT;

The OGIP grammar

input: complete_expression 
        | scalefactor complete_expression 
        | scalefactor WHITESPACE complete_expression 
        ;

complete_expression: product_of_units 
        ;

product_of_units: unit_expression
        | division unit_expression 
        | product_of_units product unit_expression 
        | product_of_units division unit_expression 
        ;

unit_expression: term                                 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;

function_application: STRING OPEN_P complete_expression CLOSE_P 
        ;

scalefactor: LIT10 power numeric_power 
        | LIT10 
        | FLOAT 
        ;

division: DIVISION | WHITESPACE DIVISION
        | WHITESPACE DIVISION WHITESPACE | DIVISION WHITESPACE;

term: unit 
        | unit power numeric_power 
        ;

unit: STRING 
        ;

power: STARSTAR;

numeric_power: UNSIGNED_INTEGER 
        | FLOAT 
        | parenthesized_number 
        ;

parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;

integer: SIGNED_INTEGER | UNSIGNED_INTEGER;

product: WHITESPACE | STAR | WHITESPACE STAR
       | WHITESPACE STAR WHITESPACE | STAR WHITESPACE;

The CDS grammar

This is quite similar to the OGIP grammar, but with more restrictions.

The CDSFLOAT terminal is a string matching the regular expression [0-9]+\\.[0-9]+x10[-+][0-9]+ (that is, something resembling 1.5x10+11). The termainals OPEN_SQ and CLOSE_SQ are opening and closing square brackets [...].

input: complete_expression 
        | scalefactor complete_expression 
        ;

complete_expression: product_of_units 
        ;

product_of_units: unit_expression
        | division unit_expression 
        | product_of_units product unit_expression 
        | product_of_units division unit_expression 
        ;

unit_expression: term                                 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;

function_application: OPEN_SQ complete_expression CLOSE_SQ 
        ;

scalefactor: LIT10 power numeric_power 
        | LIT10 SIGNED_INTEGER 
        | UNSIGNED_INTEGER 
        | LIT10 
        | CDSFLOAT 
        | FLOAT 
        ;

division: DIVISION;

term: unit 
        | unit numeric_power 
        ;

unit: STRING 
        | PERCENT 
        ;

power: STARSTAR;

numeric_power: integer 
        ;

integer: SIGNED_INTEGER | UNSIGNED_INTEGER;

product: DOT;

The VOUnits grammar

The VOUFLOAT and QUOTED_STRING features are extensions beyond the other grammars. These aside, this syntax is a strict subset of the FITS and CDS grammars, in the sense that any VOUnit unit string, without these extensions, is a valid FITS and CDS string, too), and it is almost a subset of the OGIP grammar, except that it uses the dot for multiplication rather than star.

The VOUFLOAT terminal is a string matching either of the regular expressions 0\\.[0-9]+([eE][+-]?[0-9]+)? or [1-9][0-9]*(\\.[0-9]+)?([eE][+-]?[0-9]+)? (that is, something resembling, for example, 0.123 or 1.5e+11). Also QUOTED_STRING is a STRING enclosed in single quotes '...'.

input: complete_expression 
        | scalefactor complete_expression 
        ;

complete_expression: product_of_units 
        | product_of_units division unit_expression 
        ;

product_of_units: unit_expression 
        | product_of_units product unit_expression 
        ;

unit_expression: term                                 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;

function_application: STRING OPEN_P complete_expression CLOSE_P 
        ;

scalefactor: LIT10 power numeric_power 
        | LIT10 
        | VOUFLOAT 
        ;

division: DIVISION;

term: unit 
        | unit power numeric_power 
        ;

unit: STRING 
        | QUOTED_STRING 
        | STRING QUOTED_STRING 
        ;

power: STARSTAR;

numeric_power: integer 
        | parenthesized_number 
        ;

parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;

integer: SIGNED_INTEGER | UNSIGNED_INTEGER;

product: DOT;