|  | 
As the astronomical information processed within the Virtual Observatory becomes more complex, there is an increasing need for a more formal means of identifying quantities, concepts, and processes not confined to things easily placed in a FITS image, or expressed in a catalogue or a table. We proposed that the IVOA adopt a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialized vocabularies while letting the rest of the astronomical community access, use, and combined them. The use of current, open standards ensures that VO applications will be able to tap into resources of the growing semantic web. Several examples of useful astronomical vocabularies are provided, including work on a common IVOA thesaurus intended to provide a semantic common base for VO applications.
This is (an internal draft of) an IVOA Working Draft. The first release of this document was 2008 February 5.
This document is an IVOA Working Draft for review by IVOA members
and other interested parties.  It is a draft document and may be
updated, replaced, or obsoleted by other documents at any time.  It is
inappropriate to use IVOA Working Drafts as reference materials or to
cite them as other than work in progress
.
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.
We would like to thank the members of the IVOA semantic working group for many interesting ideas and fruitful discussions.
Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things such as position on the sky, brightness in some units, times measured in some frame, redshits, classifications or other similar quantities are easily manipulated and stored in VOTables and can currently be identified using IVOA Unified Content Descriptors (UCDs) [std:ucd]. However, astrophysical concepts and quantities use a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.
There are a number of basic forms of organised semantic knowledge
of potential use to the VO, ranging from informal folksonomies
(where users are free to choose their own labels) at one extreme, to
formally structured vocabularies
 (where the label is drawn from
a predefined set of defintions, and which can include relationships between
labels) and ontologies
 (where the domain is captured in a
formal data model) at the other.
More formal definitions are presented later in this document. 
An astronomical ontology is necessary if we are to have a computer (appear to) `understand' something of the domain. There has been some progress towards creating an ontology of astronomical object types [std:ivoa-astro-onto] to meet this need. However there are distinct use cases for letting human users find resources of interest through search and navigation of the information space. The most appropriate technology to meet these use cases derives from the Information Science community, that of controlled vocabularies, taxonomies and thesauri. In the present document, we do not distinguish between controlled vocabularies, taxonomies and thesauri, and use the term vocabulary to represent all three.
One of the best examples of the need for a simple vocabulary within the VO is VOEvent [std:voevent], the VO standard for supporting rapid notification of astronomical events. This standard requires some formalised indication of what a published event is `about', in a formalism which can be used straightforwardly by the developer of relevant services. See 1.2 Use-cases, and the motivation for formalised vocabularies for further discussion.
A number of astronomical vocabularies have been created, with a variety of goals and intended uses. Some examples are detailed below.
The most immediate high-level motivation for this work is the
requirement of the VOEvent standard [std:voevent] for a controlled vocabulary usable in the
VOEvent's <what/> element, which describes what
sort of object the VOEvent packet is describing, in some broadly
intelligible way.  For example a `burst' might be a Gamma Ray Burst
due to the collapse of a star in a distant galaxy, a solar flare, or
the brightening of a stellar or AGN accretion disk, and having an
explicit list of vocabulary terms can help guide the event publisher
into using a term which will be usefully precise for the event's
consumers.  A free-text label can help here (which brings us into the
area modishly known as `folksonomies'), but the astronomical
community, with its systematising instincts, and aware of the benefits
of standardisation, can do better.
Specific use-cases include the following.
We find ourselves in the situation where there are multiple
vocabularies in use, describing a broad range of resources of interest
to professional and amateur astronomers, and members of the public.
These different vocabularies use different terms and different
relationships to support the different constituencies they cater for.
For example, delta Sct
 and RR Lyr
 are terms one would
find in a vocabulary aimed at professional astronomers, associated
with the notion of variable star
; however one would
not find such technical terms in a vocabulary intended to
support outreach activities.
One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive, both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).
The alternative approach to the problem is to evade it, and this is the approach taken in this document. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, help interest groups formalise as many of them as are appropriate, and standardise the process of formally declaring the relationships between them. This means that:
The purpose of this proposal is to establish a common format for the grass-roots creation, publishing, use, and manipulation of astronomical vocabularies within the Virtual Observatory, based upon the W3C's SKOS standard. We include as appendices to this proposal formalised versions of a number of existing vocabularies, encoded as SKOS vocabularies [std:skoscore].
In this section, we introduce the concepts of SKOS-based vocabularies. This section includes some best-practice guidelines (see 3.2 Suggested good practices); there are normative requirements on an IVOA vocabulary in 3 Publishing vocabularies (normative).
After extensive online and face-to-face discussions, the authors have brokered a consensus within the IVOA community that formalised vocabularies should be published at least in SKOS (Simple Knowledge Organising Systems) format, a W3C draft standard application of RDF to the field of knowledge organisation [std:skoscore]. SKOS draws on long experience within the Library and Information Science community, to address a well-defined set of problems to do with the indexing and retrieval of information and resources; as such, it is a close match to the problem this document is addressing.
ISO 5964 [std:iso5964] defines a number of the relevant terms (ISO 5964:1985=BS 6723:1985; see also [std:bs8723-1] and [std:z39.19]), and some of the (lightweight) theoretical background. The only technical distinction relevant to this document is that between `vocabulary' and `thesaurus': BS-8723-1 defines a thesaurus as a
Controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE: The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term or combination of preferred terms to represent a given subject. (BS-8723-1, sect. 2.39)
with a similar definition in ISO-5964 sect. 3.16.  The paradigmatic
relationships in question are those relating a term to a broader
,
narrower
 or more generically related
 term, with an operational
definition of broader term
 which is such that a resource retrieved
by a given term will also be retrieved by that term's broader term
.
This is not a subsumption relationship, as there is no implication
that the concept referred to by a narrower term is of the same
type as a broader term.
Thus a vocabulary (SKOS or otherwise) is not an ontology. It has lighter and looser semantics than an ontology, and is specialised for the restricted case of resource retrieval. Those interested in ontological analyses can easily transfer the vocabulary relationship information from SKOS to a formal ontological format such as OWL [std:owl].
A published vocabulary in SKOS format consists of a set of
concepts
 – an example concept capturing the
vocabulary information about spiral galaxies is provided in the Figure below, with the RDF shown in both
RDF/XML [std:rdfxml] and Turtle notation [std:turtle] (Turtle is similar to the more
informal N3 notation).  The elements of a concept are detailed
below.
Figure: examples of SKOS vocabularies
| XML Syntax | Turtle Syntax | |
|---|---|---|
| 
<skos:Concept rdf:about="#spiralGalaxy">
  <skos:prefLabel lang="en">
    spiral galaxy
  </prefLabel>
  <skos:prefLabel lang="de">
    Spiralgalaxie
  </prefLabel>
  <skos:altLabel lang="en">
    spiral nebula
  </skos:altLabel>
  <skos:hiddenLabel lang="en">
    spiral glaxy
  </hiddenLabel>
  <skos:definition lang="en">
    A galaxy having a spiral structure.
  </skos:definition>
  <skos:scopeNote lang="en">
    Spiral galaxies fall into one of 
    three catagories: Sa, Sc, and Sd.
  </skos:scopeNote>
  <skos:narrower
    rdf:resource="#barredSpiralGalaxy"/>
  <skos:broader
    rdf:resource="#galaxy"/>
  <skos:related
    rdf:resource="#spiralArm"/>
</skos:Concept>
 | 
<#spiralGalaxy> a skos:Concept;
  skos:prefLabel
    "spiral galaxy"@en, 
    "Spiralgalaxie"@de;
  skos:altLabel "spiral nebula"@en;
  skos:hiddenLabel "spiral glaxy"@en;
  skos:definition """A galaxy having a 
    spiral structure."""@en;
  skos:scopeNote """Spiral galaxies fall
    into one of three categories:
    Sa, Sc, and Sd"""@en;
  skos:narrower <#barredSpiralGalaxy>;
  skos:broader <#galaxy>;
  skos:related <#spiralArm> .
 | 
A SKOS vocabulary includes the following features.
GRBfor "gamma-ray burst", or
Spiral nebulafor spiral galaxies.
glaxyfor
galaxy.
barred spiral galaxy.
In addition to the information about a single concept, a vocabulary can contain information to help users navigate its structure and contents:
top conceptsof the vocabulary, i.e. those that occur at the top of the vocabulary hierarchy defined by the broader/narrower relationships, can be explicitly stated to make it easier to navigate the vocabulary.
collection.
There already exist several vocabularies in the domain of astronomy. Instead of attempting to replace all these existing vocabularies, which have been developed to achieve different aims and user groups, we embrace them. This requires a mechanism to relate the concepts in the different vocabularies. The W3C are in the process of developing a standard for relating the concepts in different SKOS vocabularies [std:skosMapping] and when completed this should be reviewed for use by the IVOA.
Four types of relationship are sufficient to capture the relationships between concepts in vocabularies and are similar to those defined for relationships between concepts within a single vocabulary. The relationships are as follows. [TODO] Add specifics to the examples.
iau93:#SPIRALGALAXY map:exactMatch ivoat:#spiralGalaxy
which states the the spiral galaxy concept in the IAU thesaurus is the
same as the spiral galaxy concept in the IVOAT.
(Note the use of an external namespaces iau93 and
ivoat which must be defined within the document.)
iau93:#XXX
map:broadMatch ivoat:#YYY which states that the IVOAT concept
YYY is more general than the IAU93 concept XXX.
iau93:#XXX
map:narrowMatch ivoat:#YYY which states that the IVOAT concept
YYY is more specific than the IAU93 concept XXX.
iau93:#XXX
map:relatedMatch ivoat:#YYY which states that the IAU93 concept
XXX has an association with the IVOAT concept YYY.
[TODO:] Enter text regarding the resolution of Issue 7.
A vocabulary which conforms to this IVOA standard has the following features. In this section, the keywords must, should and so on, are to be interpreted as described in [std:rfc2119].
The namespace of the
vocabulary must be dereferenceable on the
web.  That is, typing the namespace URL into a web browser will
produce human-readable documentation about the vocabulary.  In
addition, the namespace URL should
return the RDF version of the vocabulary if it is retrieved with an
HTTP Accept header of application/rdf+xml.
Rationale: These prescriptions are intended to be compatible with the patterns described in [berrueta08] and [sauermann07], and vocabulary distributors should follow these patterns where possible.
The files defining a vocabulary, including those of superceded versions, should remain permanently available. There is no requirement that the namespace URL be at any particular location, although the IVOA web pages, or the online sections of the A&A journal would likely be suitable archival locations.
Vocabularies must be made available for distribution as SKOS RDF files, in either RDF/XML [std:rdfxml] or Turtle [std:turtle] format; vocabularies should be made available in both formats. See issue [distformat-2].
A publisher may make available documentation and supporting files in other formats.
Rationale: this does imply that the vocabulary source files can only realistically be parsed using an RDF parser. An alternative is to require that vocabularies be distributed using a subset of RDF/XML which can also be naively handled as traditional XML; however as well as creating an extra standardisation requirement, this would make it effectively infeasible to write out the distribution version of the vocabulary using an RDF or general SKOS tool.
To be decided. There are interactions with 'long-term availablity' and 'dereferenceable namespace', since this implies that the vocabulary version should be manifestly encoded in the namespace URI. See issue [versioning-3].
This standard does not place any restrictions on the format of the files managed by the maintenance process, as long as the distributed files are as specified above. See issue [masterformat-1].
This standard imposes a number of requirements on conformant vocabularies (see section 3 Publishing vocabularies (normative)). In this section we list a number of good practices that IVOA vocabularies should abide by. Some of the prescriptions below are more specific than good-practice guidelines for vocabularies in general.
spiralGalaxy, not "t1234567"); tokens should preferably be created via a direct conversion from the preferred label via removable/translation of non-token characters (see above) and sub-token separation via capitalization of the first sub-token character (e.g. the label "My favorite idea-label #42" is converted into "MyFavoriteIdeaLabel42").
spiral galaxy, not "spiral galaxies". Open issue
skos:definition) that constitutes a short description of
the concept which could be adopted by an application using the
vocabulary.  Each concept should have additional documentation in standard SKOS or
Dublin Core format. Note the
distinction between description and SKOS scope-notebroader,
narrower,
related) between concepts should be present, but are not required; if used, they should be complete (thus all
broaderlinks have corresponding
narrowerlinks in the referenced entries and
relatedentries link each other).
TopConceptentries (see above) should be declared and normally consist of those concepts that do not have any
broaderrelationships (i.e. not at a sub-ordinate position in the hierarchy).
mappingsbetween their vocabularies and other commonly used vocabularies. These should be external to the defining vocabulary document so that the vocabulary can be used independently of the publisher's mappings. Open issue.
These suggestions are by no means trivial – there was considerable discussion within the semantic working group on many of these topics, particularly about token formats (some wanted lower-case only), and singular versus plural forms of the labels (different traditions exist within the international library science community). Obviously, no publisher of an astronomical vocabulary has to adopt these rules, but the adoption of these rules will make it easier to use the vocabularly in external generic VO applications. However, VO applications should be developed to accept any vocabulary that complies with the latest SKOS standard [std:skoscore].
The intent of having the IVOA adopt SKOS as the prefered format for astronomical vocabularies is to encourage the creation and management of diverse vocabularies by competent astronomical groups, so that users of the VO and related resources can benefit directly and dynamically without the intervention of the IAU or IVOA. However, we felt it important to provide several examples of vocabularies in the SKOS format as part of the proposal, to illustrate their simplicity and power, and to provide an immediate vocabular basis for VO applications.
See also issue [vocabset-5]. The identification of sections as normative or informative depends on the outcome of this issue.
We provide a set of SKOS files representing the vocabularies which have been developed, and mappings between them. These can be downloaded at the URL
http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/vocabularies-0.04/vocab-0.04.tar.gz
Not yet: instead go to http://code.google.com/p/volute/downloads/list
[To be expanded:] there are no mappings at the moment. Also, the vocabularies are all in a single language, though translations of the IAU93 thesaurus are available. See also issue 8
This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, e.g. handling constellation information like that commonly encountered in variable star research. For example, SS Cygni
 is a cataclysmic variable located in the constellation Cygnus
. The name of the star uses the genitive form Cygni
, but the alternate label SS Cyg
 uses the standard abbreviation Cyg
. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. `Incorrect' forms should probably be represented in SKOS `hidden labels'
The <skos:ConceptScheme> contains a single <skos:TopConcept>, constellation
| XML Syntax | Turtle Syntax | |
|---|---|---|
| 
<skos:Concept rdf:about="#constellation">
  <skos:inScheme rdf:resource=""/>
  <skos:prefLabel>
    constellation
  </skos:prefLabel>
  <skos:definition>
    IAU-sanctioned constellation names
  </skos:definition>
  <skos:narrower rdf:resource="#Andromeda"/>
  ...
  <skos:narrower rdf:resource="#Vulpecula"/>
</skos:Concept>
 | <#constellation> a :Concept; :inScheme <>; :prefLabel "constellation"; :definition "IAU-sanctioned constellation names"; :narrower <#Andromeda>; ... :narrower <#Vulpecula>. | 
and the entry for Cygnus
 is
| 
<skos:Concept rdf:about="#Cygnus">
  <skos:inScheme rdf:resource=""/>
  <skos:prefLabel>Cygnus</skos:prefLabel>
  <skos:definition>Cygnus</skos:definition>
  <skos:altLabel>Cygni</skos:altLabel>
  <skos:altLabel>Cyg</skos:altLabel>
  <skos:broader rdf:resource="#constellation"/>
  <skos:scopeNote>
    Cygnus is nominative form; the alternative
    labels are the genitive and short forms
  </skos:scopeNote>
</skos:Concept>
 | 
<#Cygnus> a :Concept;
  :inScheme <>;
  :prefLabel "Cygnus";
  :definition "Cygnus";
  :altLabel "Cygni";
  :altLabel "Cyg";
  :broader <#constellation>;
  :scopeNote """Cygnus is nominative form;
    the alternative labels are the genitive and
    short forms""" .
 | 
Note that SKOS alone does not permit the distinct differentiation
of genitive forms and abbreviations, but the use of alternate labels
is more than adequate enough for processing by VO applications where
the difference between SS Cygni
, SS Cyg
, and the incorrect form
SS Cygnus
 is probably irrelevant.
This vocabulary is a set of keywords made available on a web page by the publisher of the journal. The intended usage of the vocabulary is to tag articles with descriptive keywords to aid searching for articles on a particular topic.
The keywords are organised into categories which have been modelled as hierachical relationships. Additionally, some of the keywords are grouped into collections which has been mirrored in the SKOS version. The vocabulary contains no defintions, alternative labels, scope notes, or related links, as these are not provided in the original keyword list.
This vocabulary is published by the IVOA to allow images to be tagged with keywords that are relevant for the public. It consists of a set of keywords organised into an enumerated hierarchical structure. Each term consists of a taxonomic number and a label. There are no alternative labels, definitions, scope notes, or cross references.
When converting the AOIM into SKOS, it was decided to model the taxonomic number as an alternative label. Since there are duplication of terms, the token for a term consists of the full hierarchical location of the term. Thus, it is possible to distinguish between
Planet -> Feature -> Surface -> Canyon
and
Planet -> Satellite -> Feature -> Surface -> Canyon
which have the tokens PlanetFeatureSurfaceCanyon and
PlanetSatelliteFeatureSurfaceCanyon respectively.
The UCD standard is an officially sanctioned and managed vocabulary
of the IVOA. The normative document is a simple text file containing
entries consisting of tokens (e.g. em.IR), a short
description, and usage information (syntax codes
 which permit
UCD tokens to be concatenated). The form of the tokens implies a
natural hierarchy: em.IR.8-15um is obviously a narrower
term than em.IR, which in turn is narrower than
em.
Given the structure of the UCD1+ vocabulary, the natural
translation to SKOS consists of preferred labels equal to the original
tokens (the UCD1 words include dashes and periods), vocabulary tokens
created using guidelines in section 3.2 Suggested good practices (e.g., "emIR815Um" for
em.IR.8-15um), direct use of the definitions, and the syntax codes
placed in usage documentation: <skos:scopeNote>UCD syntax code: P</skos:scopeNote>
NOTE: THIS IS THE FORMAT I USED IN MY VERSION - MAY NOT BE THE SAME AS NORMAN'S [FVH]
Note that the SKOS document containing the UCD1+ vocabulary does NOT consistute the official version: the normative document is still the text list. However, on the long term, the IVOA may decide to make the SKOS version normative, since the SKOS version contains all of the information contained in the original text document but has the advantage of being in a standard format easily read and used by any application on the semantic web.
The IAU Thesaurus consists of concepts with mostly capitalized
labels and a rich set of thesaurus relationships (BF
 for
"broader form", NF
 for narrower form
, and RF
 for
related form
).  The thesaurus also contains U
 (for
use
) and UF
 (use for
) relationships.  In a SKOS
model of a vocabulary these are captured as alternative labels.  A
separate document contains translations of the vocabulary terms in
five languages: English, French, German, Italian, and
Spanish. Enumeratable concepts are plural (e.g. SPIRAL
GALAXIES
) and non-enumerable concepts are singular
(e.g. STABILITY
). Finally, there are some usage hints like
combine with other
In converting the IAU Thesaurus to SKOS, we have been as faithful as possible to the original format of the thesaurus. Thus, preferred labels have been kept in their uppercase format.
The IAU Thesaurus has been unmaintained since its initial production in 1993; it is therefore significantly out of date in places. This vocabulary is published for the sake of completeness, and to make the link between the evolving vocabulary work and any uses of the 1993 vocabulary which come to light. We do not expect to make any future maintenance changes to this vocabulary, and would expect the IVOAT vocabulary, based on this one, to be used instead (see section 4.6 Towards an IVOA Thesaurus).
While it is true that the adoption of SKOS will make it easy to publish and access different astronomical vocabularies, the fact is that there is no vocabulary which makes it easy to jump-start the use of vocabularies in generic astrophysical VO applications: each of the previously developed vocabularies has their own limits and biases. For example, the IAU Thesaurus provides a large number of entries, copious relationships, and translations to four other languages, but there are no definitions, many concepts are now only useful for historical purposes (e.g. many photographic or historical instrument entries), some of the relationships are false or outdated, and many important or newer concepts and their common abbreviations are missing.
Despite its faults, the IAU Thesaurus constitutes a very extensive
vocabulary which could easily serve as the basis vocabulary once
we have removed its most egregious faults and extended it to cover the
most obvious semantic holes. To this end, a heavily revised IAU
thesaurus is in preparation for use within the IVOA and other
astronomical contexts. The goal is to provide a general vocabulary
foundation to which other, more specialized, vocabularies can be added
as needed, and to provide a good lingua franca
 for the creation of
vocabulary mappings.
Revision: 46 Date: 2008-02-05 17:27:16 +0000 (Tue, 05 Feb 2008)