![]() |
Use SKOS. might need some expansion...
This is an IVOA Note/Working Draft. The first release of this document was 2007 December 6.
This document is an IVOA Working Draft for review by IVOA members
and other interested parties. It is a draft document and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use IVOA Working Drafts as reference materials or to
cite them as other than work in progress
.
A list of current IVOA Recommendations and other technical
documents can be found at
http://www.ivoa.net/Documents/
.
None so far.
Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things like position on the sky, brightness in some units, times measured in some frame, redshits, classifications or other similar quantities are easily manipulated and stored in VOTables and can now be identified using IVOA UCDs [std:ucd]. However, astrophysical concepts and quantities consist of a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.
There has been some progress towards creating an ontology of astronomical object types [std:ivoa-astro-onto] (an ontology is a systematic formal description of a set of concepts and their relations with each other), such a formal approach may not be necessary, and may be counterproductive [AG Not sure counterproductive is the right argument here. Ontologies do not meet all of the navigation and retrieval use cases.]. An ontology is necessary if we are to have a computer (appear to) `understand' something of a domain, but in the present case, we are more concerned with the related but distinct problem of letting human users find resources of interest, and so the most appropriate technology derives from the Information Science community, that of controlled vocabularies, taxonomies and thesauri.
One of the best examples of the need for a simple vocabulary within the VO is VOEvent [std:voevent], the VO standard for handling astronomical events: if someone broadcasts, or `publishes', the occurrence of an event, the implication is that someone else is going to want to respond to it, but no institution is interested in all possible events, so some standardised information about what the event `is about' is necessary, in a form which ensures that the parties can communicate effectively. If a `burst' is announced, is it a Gamma Ray Burst due to the collapse of a star in a distant galaxy, a solar flare, or the brightening of a stellar or AGN accretion disk? If a publisher doesn't use the label one might have expected, how is one to guess what other equivalent labels might have been used?
There have been a number of attempts to create astronomical vocabularies (in the present note we will not need to distinguish vocabularies, taxonomies and thesauri, and will use the term `vocabulary' for all three cases).
We find ourselves in the situation where there are multiple vocabularies in use, describing a broad range of resources of interest to professional and amateur astronomers, and members of the public. These different vocabularies use different terms and different relationships to support the different constituencies they cater for. For example, `delta Sct' and `RR Lyr' are terms one would hope to find in a vocabulary aimed at professional astronomers, associated with the notion of `variable star'; one would hope not to find such technical terms in a vocabulary intended to support outreach activities.
One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive: both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).
The alternative approach to the problem is to evade it, and this is the approach taken in this Draft. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, formalise all of them, and formally declare the relationships between them. This means that:
To this end we present in this Draft formalised versions of a number of existing vocabularies, encoded as SKOS vocabularies [std:skoscore].
After a number of online and face-to-face discussions, the authors brokered a consensus within the IVOA community that the published formats of formalised vocabularies should include at least SKOS (Simple Knowledge Organising Systems), a W3C draft standard application of RDF to the field of knowledge organisation [std:skoscore]. SKOS draws on long experience within the Library and Information Science community, to address a well-defined set of problems to do with the indexing and retrieval of information and resources; as such, it is a close match to the problem this working group is addressing.
ISO 5964 [std:iso5964] defines a number of the relevant terms (ISO 5964:1985=BS 6723:1985; see also [std:bs8723-1] and [std:z39.19]), and some of the (lightweight) theoretical background. The only technical distinction relevant to this document is that between `vocabulary' and `thesaurus': BS-8723-1 defines a thesaurus as a
controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE: The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term or combination of preferred terms to represent a given subject. (BS-8723-1, sect. 2.39)
with a similar definition in ISO-5964 sect. 3.16. The paradigmatic relationships in question are those relating a term to a `broader', `narrower' or more generically `related' term, with an operational definition of `broader term' which is such that a resource retrieved by a given term will also be retrieved by that term's `broader term'. This is not a subsumption relationship, as there is no implication that the concept referred to by a narrower term is of the same type as a broader term.
Thus a vocabulary (SKOS or otherwise) is not an ontology. It has lighter and looser semantics than an ontology, and is specialised for the restricted case of resource retrieval.
What is to be the format of the `master' files? SKOS or mildly-formatted plain text?
We provide a set of SKOS files representing the vocabularies which have been developed, and mappings between them. These can be downloaded at the URL
http://www.ivoa.net/Documents/ivoa-thesaurus-0.01/vocab-0.01.tar.gz
To be expanded: there are no mappings at the moment. Also, the vocabularies are all in a single language, though translations of the IAU93 thesaurus are available.
$Revision: 12 $ $Date: 2007-12-06 17:28:48 +0000 (Thu, 06 Dec 2007) $