IVOA logo

Vocabularies in the Virtual Observatory, v0.01

IVOA Note/Working Draft, 2007 December 6 [DRAFT $Revision: 12 $]

Working Group
Semantics
This version
http://www.ivoa.net/Documents/ivoa-thesaurus-0.01
Latest version
http://www.ivoa.net/Documents/ivoa-thesaurus-0.01
Editors
TBD
Authors
Alasdair J G Gray, Norman Gray, Frederick V Hessman and Andrea Preite Martinez

Abstract

Use SKOS. might need some expansion...

Status of this document

This is an IVOA Note/Working Draft. The first release of this document was 2007 December 6.

This document is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than work in progress.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgments

None so far.

Table of Contents


1 Introduction

1.1 Vocabularies in astronomy

Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things like position on the sky, brightness in some units, times measured in some frame, redshits, classifications or other similar quantities are easily manipulated and stored in VOTables and can now be identified using IVOA UCDs [std:ucd]. However, astrophysical concepts and quantities consist of a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.

There has been some progress towards creating an ontology of astronomical object types [std:ivoa-astro-onto] (an ontology is a systematic formal description of a set of concepts and their relations with each other), such a formal approach may not be necessary, and may be counterproductive [AG Not sure counterproductive is the right argument here. Ontologies do not meet all of the navigation and retrieval use cases.]. An ontology is necessary if we are to have a computer (appear to) `understand' something of a domain, but in the present case, we are more concerned with the related but distinct problem of letting human users find resources of interest, and so the most appropriate technology derives from the Information Science community, that of controlled vocabularies, taxonomies and thesauri.

One of the best examples of the need for a simple vocabulary within the VO is VOEvent [std:voevent], the VO standard for handling astronomical events: if someone broadcasts, or `publishes', the occurrence of an event, the implication is that someone else is going to want to respond to it, but no institution is interested in all possible events, so some standardised information about what the event `is about' is necessary, in a form which ensures that the parties can communicate effectively. If a `burst' is announced, is it a Gamma Ray Burst due to the collapse of a star in a distant galaxy, a solar flare, or the brightening of a stellar or AGN accretion disk? If a publisher doesn't use the label one might have expected, how is one to guess what other equivalent labels might have been used?

There have been a number of attempts to create astronomical vocabularies (in the present note we will not need to distinguish vocabularies, taxonomies and thesauri, and will use the term `vocabulary' for all three cases).

1.2 Formalising and managing multiple vocabularies

We find ourselves in the situation where there are multiple vocabularies in use, describing a broad range of resources of interest to professional and amateur astronomers, and members of the public. These different vocabularies use different terms and different relationships to support the different constituencies they cater for. For example, `delta Sct' and `RR Lyr' are terms one would hope to find in a vocabulary aimed at professional astronomers, associated with the notion of `variable star'; one would hope not to find such technical terms in a vocabulary intended to support outreach activities.

One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive: both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).

The alternative approach to the problem is to evade it, and this is the approach taken in this Draft. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, formalise all of them, and formally declare the relationships between them. This means that:

To this end we present in this Draft formalised versions of a number of existing vocabularies, encoded as SKOS vocabularies [std:skoscore].

2 Formalising the Vocabularies

After a number of online and face-to-face discussions, the authors brokered a consensus within the IVOA community that the published formats of formalised vocabularies should include at least SKOS (Simple Knowledge Organising Systems), a W3C draft standard application of RDF to the field of knowledge organisation [std:skoscore]. SKOS draws on long experience within the Library and Information Science community, to address a well-defined set of problems to do with the indexing and retrieval of information and resources; as such, it is a close match to the problem this working group is addressing.

ISO 5964 [std:iso5964] defines a number of the relevant terms (ISO 5964:1985=BS 6723:1985; see also [std:bs8723-1] and [std:z39.19]), and some of the (lightweight) theoretical background. The only technical distinction relevant to this document is that between `vocabulary' and `thesaurus': BS-8723-1 defines a thesaurus as a

controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE: The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term or combination of preferred terms to represent a given subject. (BS-8723-1, sect. 2.39)

with a similar definition in ISO-5964 sect. 3.16. The paradigmatic relationships in question are those relating a term to a `broader', `narrower' or more generically `related' term, with an operational definition of `broader term' which is such that a resource retrieved by a given term will also be retrieved by that term's `broader term'. This is not a subsumption relationship, as there is no implication that the concept referred to by a narrower term is of the same type as a broader term.

Thus a vocabulary (SKOS or otherwise) is not an ontology. It has lighter and looser semantics than an ontology, and is specialised for the restricted case of resource retrieval.

What is to be the format of the `master' files? SKOS or mildly-formatted plain text?

2.1 SKOS files (normative)

We provide a set of SKOS files representing the vocabularies which have been developed, and mappings between them. These can be downloaded at the URL

http://www.ivoa.net/Documents/ivoa-thesaurus-0.01/vocab-0.01.tar.gz

To be expanded: there are no mappings at the moment. Also, the vocabularies are all in a single language, though translations of the IAU93 thesaurus are available.

Appendices

Bibliography

[lortet94] M-C Lortet, S Borde, and F Ochsenbein.
Second reference dictionary of the nomenclature of celestial objects. Astron.\ Ap.\ Supp, 107 pp. 193-218, 1994. [Online].
[lortet94a] M-C Lortet, S Borde, and F Ochsenbein.
The second reference dictionary of the nomenclature of celestial objects (solar system excluded). volumes i, ii.. Technical Report 24, Centre des Données astronomique des Strasbourg, 1994. [Online].
[preitemartinez07] Andrea Preite Martinez and Soizick Lesteven.
Astronomical keywords in the era of the virtual observatory. IVOA Note, IVOA, 2007. [Online].
[std:bs8723-1] Structured vocabularies for information retrieval - guide - definitions, symbols and abbreviations (BS 8723-1:2005).
British Standard, 2005.
[std:iso5964] Documentation - guidelines for the establishment and development of multilingual thesauri (ISO 5964:1985=BS6723:1985).
International Standard, 1985.
[std:ivoa-astro-onto] L. Cambrésy, S. Derriere, P. Padovani, A. Preite Martinez, and A. Richard.
Ontology of astronomical object types. IVOA Working Draft, 2007. [Online].
[std:rtml] Remote telescope markup language.
Web page. [Online].
[std:skoscore] Alistair Miles and Dan Brickley.
SKOS core guide. W3C Working Draft, nov 2005. [Online].
[std:ucd] Sébastien Derriere, Norman Gray, Robert Mann, Andrea Preite Martinez, Jonathan McDowell, Thomas McGlynn, François Ochsenbein, Pedro Osuna, Guy Rixon, and Roy Williams.
UCD (Unified Content Descriptor) - moving to UCD1+. [Online, cited July 2005].
[std:voevent] Sky event reporting metadata (voevent).
IVOA Recommendation, 2006. [Online].
[std:z39.19] Guidelines for the construction, format and management of monolingual thesauri (ANSI/NISO Z39.19-1993=ISO 2788:1986).
American National Standard, 1993.

$Revision: 12 $ $Date: 2007-12-06 17:28:48 +0000 (Thu, 06 Dec 2007) $