As the astronomical information processed within the Virtual Observatory becomes more complex, there is an increasing need for a more formal means of identifying quantities, concepts, and processes not confined to things easily placed in a FITS image, or expressed in a catalogue or a table. We proposed that the IVOA adopt a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organisation System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combine them. The use of current, open standards ensures that VO applications will be able to tap into resources of the growing semantic web. Several examples of useful astronomical vocabularies are provided, including work on a common IVOA thesaurus intended to provide a semantic common base for VO applications.
This is an IVOA Working Draft. The first release of this document was 2008 March 20.
This document is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than “work in progress”.
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.
We would like to thank the members of the IVOA semantic working group for many interesting ideas and fruitful discussions.
Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things such as position on the sky, brightness in some units, times measured in some frame, redshifts, classifications or other similar quantities are easily manipulated and stored in VOTables and can currently be identified using IVOA Unified Content Descriptors (UCDs) [std:ucd]. However, astrophysical concepts and quantities use a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.
There are a number of basic forms of organised semantic knowledge of potential use to the VO, ranging from informal “folksonomies” (where users are free to choose their own labels) at one extreme, to formally structured “vocabularies” (where the label is drawn from a predefined set of definitions, and which can include relationships between labels) and “ontologies” (where the domain is captured in a formal data model) at the other. More formal definitions are presented later in this document.
An astronomical ontology is necessary if we are to have a computer (appear to) “understand” something of the domain. There has been some progress towards creating an ontology of astronomical object types [std:ivoa-astro-onto] to meet this need. However there are distinct use cases for letting human users find resources of interest through search and navigation of the information space. The most appropriate technology to meet these use cases derives from the Information Science community, that of controlled vocabularies, taxonomies and thesauri. In the present document, we do not distinguish between controlled vocabularies, taxonomies and thesauri, and use the term vocabulary to represent all three.
One of the best examples of the need for a simple vocabulary within the VO is VOEvent [std:voevent], the VO standard for supporting rapid notification of astronomical events. This standard requires some formalised indication of what a published event is “about”, in a formalism which can be used straightforwardly by the developer of relevant services. See 1.2. Use-cases, and the motivation for formalised vocabularies for further discussion.
A number of astronomical vocabularies have been created, with a variety of goals and intended uses. Some examples are detailed below.
The most immediate high-level motivation for this work is the
requirement of the VOEvent standard [std:voevent] for a controlled vocabulary usable in the
VOEvent's <Why/>
and <What/>
elements, which describe what
sort of object the VOEvent packet is describing, in some broadly
intelligible way. For example a “burst” might be a gamma-ray burst
due to the collapse of a star in a distant galaxy, a solar flare, or
the brightening of a stellar or AGN accretion disk, and having an
explicit list of vocabulary terms can help guide the event publisher
into using a term which will be usefully precise for the event's
consumers. A free-text label can help here (which brings us into the
domain sometimes referred to as folksonomies), but the astronomical
community, with a culture sympathetic to international agreement, can
do better.
The purpose of this proposal is to establish a set of conventions for the creation, publication, use, and manipulation of astronomical vocabularies within the Virtual Observatory, based upon the W3C's SKOS standard. We include as appendices to this proposal formalised versions of a number of existing vocabularies, encoded as SKOS vocabularies [std:skosref].
Specific use-cases include the following.
The goal of this standard is to show how vocabularies can be easily expressed in an interoperable and computer-manipulable format, and the sole normative section of this Recommendation (namely section 3. Publishing vocabularies (normative)) contains requirements and suggestions intended to promote this. Four example vocabularies that have previously been expressed using non-standardized formats – namely the A&A keyword list, the IAU and AOIM thesauri, and UCD1 – are included below as illustrations of how simple it is to publish them in SKOS, without losing any of the information of the original source vocabularies.
It is not a goal of this standard, as it is not a goal of SKOS, to produce knowledge-engineering artefacts which can support elaborate machine reasoning – such artefacts would be very valuable, but require much more expensive work on ontologies. As the supernova use-case above illustrates, even simple vocabularies can support useful machine reasoning.
It is also not a goal of this standard to produce new vocabularies, or substantially alter existing ones; instead, the vocabularies included below in section 4. Example vocabularies (informative) are directly derived from existing vocabularies (the exceptions are the IVOAT vocabulary, which is ultimately intended to be a significant update to the IAU-93 original, and the constellations vocabulary, which is intended to be purely didactic). It therefore follows that the ambiguities, redundancies and incompleteness of the source vocabularies are faithfully represented in the distributed SKOS vocabularies. We hope that this formalisation process will create greater visibility and broader use for the various vocabularies, and that this will guide the maintenance efforts of the curating groups.
The reason for both of these limitations is that vocabularies are extremely expensive to produce, maintain and deploy, and we must therefore rely on such vocabularies as have been developed, and attached as metadata to resources, by others. Such vocabularies are less rich or less coherent than we might prefer, but they are widely enough deployed to be useful. We hope that the set of example vocabularies we have provided will build on this deployment, by providing material which is useful out of the box.
We find ourselves in the situation where there are multiple vocabularies in use, describing a broad range of resources of interest to professional and amateur astronomers, and members of the public. These different vocabularies use different terms and different relationships to support the different constituencies they cater for. For example, “delta Sct” and “RR Lyr” are terms one would find in a vocabulary aimed at professional astronomers, associated with the notion of “variable star”; however one would not find such technical terms in a vocabulary intended to support outreach activities.
One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive, both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).
The alternative approach to the problem is to evade it, and this is the approach taken in this document. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, help interest groups formalise as many of them as are appropriate, and standardise the process of formally declaring the relationships between them. This means that:
In this section, we introduce the concepts of SKOS-based vocabularies, and the technology of mapping between them. We describe some additional requirements for IVOA vocabularies in the next section, 3. Publishing vocabularies (normative).
After extensive online and face-to-face discussions, the authors have brokered a consensus within the IVOA community that formalised vocabularies should be published at least in SKOS (Simple Knowledge Organisation System) format, a W3C draft standard application of RDF to the field of knowledge organisation [std:skosref]. SKOS draws on long experience within the Library and Information Science community, to address a well-defined set of problems to do with the indexing and retrieval of information and resources; as such, it is a close match to the problem this document is addressing.
ISO 5964 [std:iso5964] defines a number of the relevant terms (ISO 5964:1985=BS 6723:1985; see also [std:bs8723-1] and [std:z39.19]), and some of the (lightweight) theoretical background. The only technical distinction relevant to this document is that between vocabulary and thesaurus: BS-8723-1 defines a thesaurus as a
Controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. (BS-8723-1, sect. 2.39)
with a similar definition in ISO-5964 sect. 3.16. The paradigmatic
relationships in question are those relating a term to a “broader”,
“narrower” or more generically “related” term. These
notions have an operational definition: any resource
retrieved as a result of a search on a given term will also be
retrievable through a search on that term's “broader term”
(“narrower” is a simple inverse, so that for any pair of terms,
if A skos:broader B
, then B skos:narrower A
;
a term may have multiple narrower and broader terms).
This is not a subsumption relationship, as there is no implication
that the concept referred to by a narrower term is of the same
type as a broader term.
Thus a vocabulary (SKOS or otherwise) is not an ontology. It has lighter and looser semantics than an ontology, and is specialised for the restricted case of resource retrieval. Those interested in ontological analyses can easily transfer the vocabulary relationship information from SKOS to a formal ontological format such as OWL [std:owl].
The purpose of a thesaurus is to help users find resources they might be interested in, be they library books, image archives, or VOEvent packets.
A published vocabulary in SKOS format consists of a set of “concepts” – an example concept capturing the vocabulary information about spiral galaxies is provided in the Figure below, with the RDF shown in both RDF/XML [std:rdfxml] and Turtle notation [std:turtle] (Turtle is similar to the more informal Notation3). The elements of a concept are detailed below.
Figure: examples of SKOS vocabularies
XML Syntax | Turtle Syntax | |
---|---|---|
<skos:Concept rdf:about="#spiralGalaxy"> <skos:prefLabel lang="en"> spiral galaxy </prefLabel> <skos:prefLabel lang="de"> Spiralgalaxie </prefLabel> <skos:altLabel lang="en"> spiral nebula </skos:altLabel> <skos:hiddenLabel lang="en"> spiral glaxy </hiddenLabel> <skos:definition lang="en"> A galaxy having a spiral structure. </skos:definition> <skos:scopeNote lang="en"> Spiral galaxies fall into one of three catagories: Sa, Sc, and Sd. </skos:scopeNote> <skos:narrower rdf:resource="#barredSpiralGalaxy"/> <skos:broader rdf:resource="#galaxy"/> <skos:related rdf:resource="#spiralArm"/> </skos:Concept> |
<#spiralGalaxy> a skos:Concept; skos:prefLabel "spiral galaxy"@en, "Spiralgalaxie"@de; skos:altLabel "spiral nebula"@en; skos:hiddenLabel "spiral glaxy"@en; skos:definition """A galaxy having a spiral structure."""@en; skos:scopeNote """Spiral galaxies fall into one of three categories: Sa, Sc, and Sd"""@en; skos:narrower <#barredSpiralGalaxy>; skos:broader <#galaxy>; skos:related <#spiralArm> . |
A SKOS vocabulary includes the following features.
In addition to the information about a single concept, a vocabulary can contain information to help users navigate its structure and contents:
There already exist several vocabularies in the domain of astronomy. Instead of attempting to replace all these existing vocabularies, which have been developed to achieve different aims and user groups, we embrace them. This requires a mechanism to relate the concepts in the different vocabularies.
Part of the SKOS standard [std:skosref] allows a concept in one vocabulary to be related to a concept in another vocabulary. There are four types of relationship provided to capture the relationships between concepts in vocabularies, which are similar to those defined for relationships between concepts within a single vocabulary. The types of mapping relationships are:
AAkeys:#Cosmology skos:exactMatch aoim:#Cosmology
which states that the cosmology concept in the A&A Keywords is the
same as the cosmology concept in the AOIM.
(Note the use of an external namespaces AAkeys
and
aoim
which must be defined within the document.)
AAkeys:#Moon skos:broadMatch aoim:PlanetSatellite
which states that the AOIM concept Planet Satellite is a more general
term than the A&A Keywords concept Moon.
AAkeys:#IsmClouds skos:narrowMatch
aoim:#NebulaAppearanceDarkMolecularCloud
which states that the AOIM concept Nebula Appearance Dark Molecular
Cloud is more specific than the A&A Keywords concept ISM Clouds.
AAkeys:#BlackHolePhysics skos:relatedMatch
aoim:#StarEvolutionaryStageBlackHole
which states that the A&A Keywords concept Black Hole Physics has
an association with the AOIM concept Star Evolutionary Stage Black Hole.
The semantic mapping relationships have certain properties.
The broadMatch relationship has the narrowMatch relationship as its
inverse and the exactMatch and relatedMatch relationships are
symmetrical.
The consequence of these properties is that if you have a mapping from
concept A
in one vocabulary to concept B
in
another vocabulary then you can infer a mapping from concept
B
to concept A
.
At the time of writing, the SKOS document is still a draft, and may or may not end up with support for mappings in the core document rather than in a companion document. This section of this Working Draft, and other references to mappings below, should therefore be considered provisional until it becomes clear how best to implement the eventual SKOS guidelines.
The document [kendall08] discusses good practice for managing RDF vocabularies. At the time of writing (2008 May) this is still an editor's draft, and it itself notes that good practice in this area is not yet fully stable, so our recommendations here are necessarily tentative, and in some places restricted to the relatively small vocabularies (100s to 1000s of terms) we expect to encounter in the VO. We expect to adjust or enhance this advice in future editions of this Recommendation, as best practice evolves, or as we gain more experience with the relevant vocabularies.
We must distinguish between versions of a vocabulary, and versions of the description of a vocabulary. In the former case, we are concerned with the presence or absence of certain concepts, such as “star” or “GRB”, and expect that there will be some reasonably stable relationship between the concept URI and the real-world concept it refers to. In the latter case, we are concerned with the technicalities of associating a concept URI with its labels, its description, and with other related concept URIs. While it is true that there are epistemological commitments involved in the simple act of naming (and the terms “GRB” and “planet” remind us that there is knowledge implicit within a name), it is the latter case that generally represents the knowledge we have of an object, and it is this knowledge which we must version.
In consequence, the concept URIs should not carry version information. The partial exception to this is when a vocabulary undergoes a major restructuring, as a result of the terms in it becoming significantly incoherent – for example, we might imagine the IAU92 thesaurus being updated to form an IAU 200x thesaurus – but in this case we should regard the result as a new vocabulary, rather than simply an adjusted version of an old one.
The SKOS vocabulary has all of its terms appearing in the same unversioned base namespace (the year and month in that namespace is a consequence of the management of the W3C URI namespace, and is not connected to SKOS versions), and once there is not removed [kendall08] (there seems to be no discussion of this in a SKOS document, as opposed to commentary on SKOS). Successive versions of the vocabulary description describe the vocabulary terms as “unstable”, “testing”, “stable” or “deprecated”.
The Dublin Core namespaces are managed in a similar way [dc:namespaces]. The namespace URIs, which act as common prefixes to the DC terms, and which are defined using a “hash URI” strategy, in RDF terms, have no version numbers, so that the namespace for the DC terms vocabulary is http://purl.org/dc/terms/. Terms such as http://purl.org/dc/terms/extent then 302-redirect to a URL which, for administrative convenience, happens to contain a release date, but which resolves to RDF which defines the unversioned term http://purl.org/dc/terms/extent. This file includes the following content (translated into Turtle from the original RDF/XML for legibility).
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix dcam: <http://purl.org/dc/dcam/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <http://purl.org/dc/terms/> dcterms:title """DCMI Namespace for metadata terms in the http://purl.org/dc/terms/ namespace"""@en-us; rdfs:comment """To comment on this schema, please contact dcmifb@dublincore.org."""; dcterms:publisher "The Dublin Core Metadata Initiative"@en-us; dcterms:modified "2008-01-14" . dcterms:extent rdfs:label "Extent"@en-us; rdfs:comment "The size or duration of the resource."@en-us; rdfs:isDefinedBy <http://purl.org/dc/terms/>; dcterms:issued "2000-07-11"; dcterms:modified "2008-01-14"; a rdf:Property; dcterms:hasVersion <http://dublincore.org/usage/terms/history/#extent-003>; rdfs:range dcterms:SizeOrDuration; rdfs:subPropertyOf <http://purl.org/dc/elements/1.1/format>, dcterms:format . ...
This includes the definition of the (unversioned) http://purl.org/dc/terms/extent concept, along with semantic
knowledge about the concept (rdfs:subPropertyOf
) as of
2008-01-14, plus other editorial (dcterms:modified
) and
definitional (rdfs:isDefinedBy
) metadata.
A vocabulary which conforms to this IVOA standard has the following features. In this section, the keywords must, should and so on, are to be interpreted as described in [std:rfc2119].
The namespace of the vocabulary must
be dereferenceable on the web. That is, typing the namespace URL into
a web browser will produce human-readable documentation about the
vocabulary. In addition, the namespace URL should return an RDF version of the vocabulary if it is
retrieved with one of the RDF MIME types in the HTTP Accept header.
At the time of writing, the only fully standardised RDF MIME type is
application/rdf+xml
for RDF/XML, but
text/rdf+n3
and text/turtle
are the proposed
types for Notation3 [notation3] and Turtle
[std:turtle], respectively.
Rationale: These prescriptions are intended to be compatible with the patterns described in [berrueta08] and [sauermann08], and vocabulary distributors should follow these patterns where possible.
The files defining a vocabulary, including those of superseded versions, should remain permanently available. There is no requirement that the namespace URL be at any particular location, although the IVOA web pages, or a journal publisher's web pages, would likely be suitable archival locations.
Vocabularies must be made available
for distribution as SKOS RDF files, in either RDF/XML [std:rdfxml] or Turtle [std:turtle] format; vocabularies should be made available in both formats. As
an alternative to Turtle, vocabularies may be made available in that
subset of Notation3 [notation3] which is
compatible with Turtle; if Turtle or Notation3 is being served, it is
prudent to support both text/rdf+n3
and
text/turtle
as MIME types in the Accept
header of the HTTP request. See issue [distformat-2].
A publisher may make available RDF in other formats, or other supporting files. A publisher must make available at least some human-readable documentation – see section 3.3. Good practices when serving vocabularies on the web for a discussion of the mechanics here.
Rationale: this does imply that the vocabulary source files can only realistically be parsed using an RDF parser. An alternative is to require that vocabularies be distributed using a subset of RDF/XML which can also be naively handled as traditional XML; however as well as creating an extra standardisation requirement, this would make it effectively infeasible to write out the distribution version of the vocabulary using an RDF or general SKOS tool.
The vocabulary namespace should not be versioned, but it should be easy to retrieve earlier versions of the RDF describing the vocabulary. See the discussion in section 2.4. Vocabulary versions for the rationale for this, and see section 3.3. Good practices when serving vocabularies on the web for a discussion of its implications for the way that vocabularies are served on the web.
This Recommendation does not place any restrictions on the format of the files managed by the maintenance process, as long as the distributed files are as specified above. See issue [masterformat-1].
This standard imposes a number of requirements on conformant vocabularies (see 3.1. Requirements). In this section we list a number of good practices that IVOA vocabularies should abide by. Some of the prescriptions below are more specific than good-practice guidelines for vocabularies in general.
The adoption of the following guidelines will make it easier to use vocabularies in generic VO applications. However, VO applications should be able to accept any vocabulary that complies with the latest SKOS standard [std:skosref] (this is a syntactical requirement, and does not imply that an application will necessarily understand the terms in an alien vocabulary, although the presence of mappings to a known vocabulary should allow it to derive some benefit).
spiralGalaxy
, not t1234567
); tokens should preferably be created via a direct
conversion from the preferred label via removable/translation of
non-token characters (see above) and sub-token separation via
capitalisation of the first sub-token character (for example the label My
favourite idea-label #42
is converted into
MyFavouriteIdeaLabel42
)."galaxies"@en
, but "astronomy"@en
;
thesaurus practice in other european languages uses the singular for
all cases.skos:definition
) that constitutes a short description of
the concept which could be adopted by an application using the
vocabulary. Each concept should have
additional documentation using SKOS Notes or
Dublin Core terms as appropriate
(see [std:skosref]). In practice, this
requirement is rather difficult to satisfy, since pre-existing
structured vocabularies, being convered to SKOS, frequently provide
only labels, and not fuller descriptions or scope notes.<skos:changeNote>
and
the like, and these are elaborated in the (currently draft) note [kendall08]. Publishers should respect such good maintenance practices
are are available.The W3C Interest Group Note Cool URIs for the Semantic Web [sauermann08] presents guidelines for the effective use of URIs when serving web documents and concepts on the Semantic Web. When providing vocabularies to the VO, we recommend that publishers conform to these guidelines in general. We make some further observations below.
The “Cool URIs” guidelines describe a number of desirable features of URIs in this context, namely simplicity, stability and manageability. Section 4.5 of the document describes these features as follows (quoted directly).
http://id.example.com/alice
, eases later migration
of the URI-handling subsystem.We endorse this advice in this Recommendation: VO vocabularies should use URIs which have these properties. The advice in the third point is a general point about maintaining the general URI namespace on a particular server, and is not about versioning vocabulary namespaces.
The “Cool URIs” document also describes two broad strategies for making these URIs available on the web, which they name 303 URIs and hash URIs (see the document, section 4, for descriptions). They note that the hash URI strategy “should be preferred for rather small and stable sets of resources that evolve together. The ideal case[s] are RDF Schema vocabularies and OWL ontologies, where the terms are often used together, and the number of terms is unlikely to grow out of control in the future.” Since this is the case for the (relatively small) SKOS vocabularies this Recommendation discusses, and since an application will generally want to use the complete vocabulary rather than only single concepts, we suggest that vocabularies conformant to this Recommendation should be distributed as hash URI ones.
Common to the two strategies above is the insistence that the vocabulary URIs are HTTP URIs which are retrievable on the web – they differ only in the practicalities of achieving this. The strategies also share the expectation that the vocabulary URIs are retrievable both as RDF (machine-readable) and as HTML (providing documentation for humans). We elevate this to a requirement of this Recommendation: vocabulary terms must be HTTP URIs which must be dereferenceable as both RDF and HTML using the mechanism appropriate to the URI naming strategy.
While [sauermann08] discusses the design of the URIs naming concepts, it says little about the mechanics of making these available on the web. We refer vocabulary publishers to the recipe advice contained in [berrueta08], which we illustrate here in the case of the hash URI strategy.
The A&A vocabulary has the namespace http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/AAkeys. In accordance with the above
guidelines, this namespace URI is dereferenceable, and if you enter
the URI into a web browser, you will end up at a page describing the
vocabulary. The way this works can be illustrated by using
curl
to dereference the URI (URIs are cropped for legibility):
% curl --head http://[...]/rdf/AAkeys HTTP/1.1 303 See Other Date: Thu, 08 May 2008 14:07:12 GMT Server: Apache Location: http://[...]/rdf/vocabularies-2008-05-08/AAkeys/AAkeys.html Connection: close Content-Type: text/html; charset=iso-8859-1
The server has responded to the HTTP GET for the URI with a 303
response, and a Location
header, pointing to the HTML
representation of this thing.
If we instead request an RDF representation, by stating a desired
MIME type in the HTTP Accept
header, we get a slightly
different response:
% curl --head -H accept:text/turtle http://[...]/rdf/AAkeys HTTP/1.1 303 See Other Date: Thu, 08 May 2008 14:11:28 GMT Server: Apache Location: http://[...]/rdf/vocabularies-2008-05-08/AAkeys/AAkeys.ttl Connection: close Content-Type: text/html; charset=iso-8859-1
This is also a 303 response, but the Location
header
this time points to an RDF file in Turtle syntax, which we can now retrieve normally.
% curl --include http://[...]/rdf/vocabularies-2008-05-08/AAkeys/AAkeys.ttl HTTP/1.1 200 OK Date: Thu, 08 May 2008 14:13:35 GMT Server: Apache Content-Type: text/turtle; charset=utf-8 @base <http://[...]/rdf/AAkeys> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix : <http://www.w3.org/2004/02/skos/core#> . <> dc:created "2008-05-08" ; dc:title "Vocabulary for Astronomy & Astrophysics Journal keywords (Version wd-1.0)"@en ; a :ConceptScheme ; # and so on...
Note that the base URI in the returned RDF still refers to the unversioned concept names.
This behaviour is controlled by (in this case) an Apache
.htaccess
file which looks like this:
AddType application/rdf+xml .rdf # The MIME type for .n3 should be text/rdf+n3, not application/n3: # see MIME notes at <http://www.w3.org/2000/10/swap/doc/changes.html> # # The MIME type for Turtle is text/turtle, though this has not # completed its registration: see # <http://www.w3.org/TeamSubmission/turtle/#sec-mediaReg> AddType text/rdf+n3 .n3 AddType text/turtle .ttl # For Charset types, see <http://www.iana.org/assignments/character-sets> AddCharset UTF-8 .n3 AddCharset UTF-8 .ttl RewriteEngine On # This will match the directory where this file is located. RewriteBase /users/norman/ivoa/vocabularies/rdf RewriteCond %{HTTP_ACCEPT} application/rdf\+xml RewriteRule ^(AAkeys|AOIM|UCD|IVOAT|IAUT93)$ vocabularies-2008-05-08/$1/$1.rdf [R=303] RewriteCond %{HTTP_ACCEPT} text/rdf\+n3 [OR] RewriteCond %{HTTP_ACCEPT} application/n3 [OR] RewriteCond %{HTTP_ACCEPT} text/turtle RewriteRule ^(AAkeys|AOIM|UCD|IVOAT|IAUT93)$ vocabularies-2008-05-08/$1/$1.ttl [R=303] # No accept conditions: make the .html version the default RewriteRule ^(AAkeys|AOIM|UCD|IVOAT|IAUT93)$ vocabularies-2008-05-08/$1/$1.html [R=303]
These various RewriteRule
statements examine the
content of the HTTP Accept
header, and return
303-redirections to the appropriate actual resource.
Note that the namespace remains unversioned throughout the maintainance history of this vocabulary, even though the actual RDF files being returned might change as labels or relationships are adjusted. Previous versions of the vocabulary RDF will remain available, though they will no longer be served by dereferencing the namespace URL.
The intent of having the IVOA adopt SKOS as the preferred format for astronomical vocabularies is to encourage the creation and management of diverse vocabularies by competent astronomical groups, so that users of the VO and related resources can benefit directly and dynamically without the intervention of the IAU or IVOA. However, we felt it important to provide several examples of vocabularies in the SKOS format as part of the proposal, to illustrate their simplicity and power, and to provide an immediate vocabulary basis for VO applications.
The vocabularies described below are included, as SKOS files, in the distributed version of this standard. These vocabularies have stable URLs, and may be cited and used indefinitely. These vocabularies will not, however, be developed as part of the maintenance of this standard. Interested groups, within and outwith the IVOA, are encouraged to take these as a starting point and absorb them within existing processes.
The exceptions to this rule are the constellation vocabulary, provided here mainly for didactic purposes, and the proposed IVOA Thesaurus, which is being developed as a separate project and whose aim is to provide a corrected, more user-friendly, more complete, and updated version of the 1993 IAU thesaurus. Although work on the IVOA Thesaurus is on-going, the fact that it is largely based on the IAU thesaurus means that it is already a very useful resource, so a usable snapshot of this vocabulary will be published with the other examples.
We provide a set of SKOS files representing the vocabularies which have been developed, and mappings between them. These vocabularies have base URIs starting http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf, and can be downloaded at the URL
http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/vocabularies-2008-05-08.tar.gz
This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, such as handling constellation information like that commonly encountered in variable star research. For example, “SS Cygni” is a cataclysmic variable located in the constellation “Cygnus”. The name of the star uses the genitive form “Cygni”, but the alternate label “SS Cyg” uses the standard abbreviation “Cyg”. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. Various incorrect forms should probably be represented in SKOS hidden labels.
The <skos:ConceptScheme>
contains a single
<skos:TopConcept>
, “constellation”
XML Syntax | Turtle Syntax | |
---|---|---|
<skos:Concept rdf:about="#constellation"> <skos:inScheme rdf:resource=""/> <skos:prefLabel> constellation </skos:prefLabel> <skos:definition> IAU-sanctioned constellation names </skos:definition> <skos:narrower rdf:resource="#Andromeda"/> ... <skos:narrower rdf:resource="#Vulpecula"/> </skos:Concept> |
<#constellation> a :Concept; :inScheme <>; :prefLabel "constellation"; :definition "IAU-sanctioned constellation names"; :narrower <#Andromeda>; ... :narrower <#Vulpecula>. |
and the entry for “Cygnus” is
<skos:Concept rdf:about="#Cygnus"> <skos:inScheme rdf:resource=""/> <skos:prefLabel>Cygnus</skos:prefLabel> <skos:definition>Cygnus</skos:definition> <skos:altLabel>Cygni</skos:altLabel> <skos:altLabel>Cyg</skos:altLabel> <skos:broader rdf:resource="#constellation"/> <skos:scopeNote> Cygnus is nominative form; the alternative labels are the genitive and short forms </skos:scopeNote> </skos:Concept> |
<#Cygnus> a :Concept; :inScheme <>; :prefLabel "Cygnus"; :definition "Cygnus"; :altLabel "Cygni"; :altLabel "Cyg"; :broader <#constellation>; :scopeNote """Cygnus is nominative form; the alternative labels are the genitive and short forms""" . |
Note that SKOS alone does not permit the distinct differentiation of genitive forms and abbreviations, but the use of alternate labels is more than adequate enough for processing by VO applications where the difference between “SS Cygni”, “SS Cyg”, and the incorrect form “SS Cygnus” is probably irrelevant.
Namespace: http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/AAkeys.
This vocabulary is a set of keywords maintained jointly by the publishers of the journals Astronomy and Astrophysics (A&A), Monthly Notices of the Royal Astronomical Society (MNRAS) and the Astrophysical Journal (ApJ). As noted in the introduction, an analysis of these keywords [preitemartinez07] indicates that the different journals are slightly inconsistent with each other; we have rather arbitrarily used the list from the A&A web site. The intended usage of the vocabulary is to tag articles with descriptive keywords to aid searching for articles on a particular topic.
The keywords are organised into categories which have been modelled as hierarchical relationships. Additionally, some of the keywords are grouped into collections which has been mirrored in the SKOS version. The vocabulary contains no definitions or related links as these are not provided in the original keyword list, and only a handful of alternative labels and scope notes that are present in the original keyword list.
Namespace: http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/AOIM.
This vocabulary is published by the IVOA to allow images to be tagged with keywords that are relevant for the public. It consists of a set of keywords organised into an enumerated hierarchical structure. Each term consists of a taxonomic number and a label. There are no definitions, scope notes, or cross references.
When converting the AOIM into SKOS, it was decided to model the taxonomic number as an alternative label. Since there are duplication of terms, the token for a term consists of the full hierarchical location of the term. Thus, it is possible to distinguish between
Planet -> Feature -> Surface -> Canyon
and
Planet -> Satellite -> Feature -> Surface -> Canyon
which have the tokens PlanetFeatureSurfaceCanyon
and
PlanetSatelliteFeatureSurfaceCanyon
respectively.
Namespace: http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/UCD.
The UCD standard is an officially sanctioned and managed vocabulary
of the IVOA. The normative document is a simple text file containing
entries consisting of tokens (for example em.IR
), a short
description, and usage information (“syntax codes” which permit
UCD tokens to be concatenated). The form of the tokens implies a
natural hierarchy: em.IR.8-15um
is obviously a narrower
term than em.IR
, which in turn is narrower than
em
.
Given the structure of the UCD1+ vocabulary, the natural
translation to SKOS consists of preferred labels equal to the original
tokens (the UCD1 words include dashes and periods), vocabulary tokens
created using guidelines in 3.2. Good practices of vocabulary design (for example, "emIR815Um" for
em.IR.8-15um
), direct use of the definitions, and the syntax codes
placed in usage documentation: <skos:scopeNote>UCD syntax code: P</skos:scopeNote>
Note that the SKOS document containing the UCD1+ vocabulary does NOT consistute the official version: the normative document is still the text list. However, on the long term, the IVOA may decide to make the SKOS version normative, since the SKOS version contains all of the information contained in the original text document but has the advantage of being in a standard format easily read and used by any application on the semantic web whilst still being usable in the current ways.
Namespace: http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/rdf/IAUT93.
The IAU Thesaurus consists of concepts with mostly capitalised labels and a rich set of thesaurus relationships (“BT” for "broader term", “NT” for “narrower term”, and “RT” for “related term”). The thesaurus also contains “U” (for “use”) and “UF” (“use for”) relationships. In a SKOS model of a vocabulary these are captured as alternative labels. A separate document contains translations of the vocabulary terms in five languages: English, French, German, Italian, and Spanish. Enumerable concepts are plural (for example “SPIRAL GALAXIES”) and non-enumerable concepts are singular (for example “STABILITY”). Finally, there are some usage hints like “combine with other”, which have been modelled as scope notes.
In converting the IAU Thesaurus to SKOS, we have been as faithful as possible to the original format of the thesaurus. Thus, preferred labels have been kept in their uppercase format.
The IAU Thesaurus has been unmaintained since its initial production in 1993; it is therefore significantly out of date in places. This vocabulary is published for the sake of completeness, and to make the link between the evolving vocabulary work and any uses of the 1993 vocabulary which come to light. We do not expect to make any future maintenance changes to this vocabulary, and would expect the IVOAT vocabulary, based on this one, to be used instead (see 4.6. Towards an IVOA Thesaurus).
While it is true that the adoption of SKOS will make it easy to publish and access different astronomical vocabularies, the fact is that there is no vocabulary which makes it easy to jump-start the use of vocabularies in generic astrophysical VO applications: each of the previously developed vocabularies has their own limits and biases. For example, the IAU Thesaurus provides a large number of entries, copious relationships, and translations to four other languages, but there are no definitions, many concepts are now only useful for historical purposes (for example many photographic or historical instrument entries), some of the relationships are false or outdated, and many important or newer concepts and their common abbreviations are missing.
Despite its faults, the IAU Thesaurus constitutes a very extensive vocabulary which could easily serve as the basis vocabulary once we have removed its most egregious faults and extended it to cover the most obvious semantic holes. To this end, a heavily revised IAU thesaurus is in preparation for use within the IVOA and other astronomical contexts. The goal is to provide a general vocabulary foundation to which other, more specialised, vocabularies can be added as needed, and to provide a good “lingua franca” for the creation of vocabulary mappings.
Part of the motivation for formalising vocabularies within the VO is to support mapping between vocabularies, so that an application which understands, or can natively process, one vocabulary, can use a mapping to provide at least partial support for data described using another vocabulary. The SKOS document ...is still a draft, and may or may not end up with support for mappings in the core document rather than in a companion document.
Example mappings to appear: see issue [mappings-6].
Revision: 420 Date: 2008-05-08 16:25:14 +0100 (Thu, 08 May 2008)