NAME

Thesaurus::RDF -- The RDF Thesaurus descriptor standard


SYNOPSIS

read it!


DESCRIPTION

This document describes an RDF implementation of a representation of terms of a thesaurus. The definition of a thesaurus follows that of the NISO specification z39.19. This specification is intended as a method for thesaurus servers to transfer all or part of a thesaurus to an application.

RDF/XML was chosen as the method of transfer because thesaurus terms and their relationships can be naturally specified under the RDF framework, and because of the development of RDf and XML tools that promise a rich application environment for RDF/XML resources. Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML

It is also hope that using the RDF framework, other vocabularies can be used to further the development of thesaurus servers.

Although RDF is the intended format, since RDF parsers are not yet readily available, a development standard is required. This should allow developers to test applications using an XML parser alone. The rules for this development standard are:

  1. All resources must be specified with the typedNode production specification.

  2. RDF schema propertyTypes must be specified as XML attributes Z39.19 schema propertyTypes most be specified as XML elements

  3. This vocabulary must have the default namespace designation. The RDF vocabulary must use the RDF: prefix

  4. All propertyTypes related to Term linking must use an RDF:resource specification.

Resources in this transfer specification are all typed as instances of 'Descriptor','Category', or 'EntryTerm'. These are all sub-types of 'Term'. These definitions are those contained in the z39.19 standard. This implmentation uses typedNode production forms for all these resources.

The following propertyTypes can be specified for the Terms.

  SN            Scope notes for the Descriptor
  CN            Cattaloger notes for the Descriptor
  HN            Historical Notes for the Term
  Source        Indication of the source of the Term
  Status        Indication of the status of the Term    

These propertyTypes are all resources of two PropertyType, a Label, and a Term (Descriptor | EntryTerm | Category). They should not be 'id'ed. If the label is not used, the client may use the id of the Term resource as a label.

  IC            Descriptor within a Category
  CAT           Category of the Descriptor
  UF            EntryTerm for which the Descriptor is preferred 
  TT            Topmost Term(s) for the Descriptor
  BT            Broader Term(s) for the Descriptor
  RT            Related Term for a Descriptor
  NT            Narrower Term for the Descriptor
  USE           Descriptor that is preferred to the EntryTerm

Typically, the linking propertyTypes include an RDF:resource attribute to identify the resource being identified, rather than specifing it within the propertyType.


Example Syntax

This is the syntax for representing thesauri terms. A major simplification is that the identifier for each term is also how the term should be displayed by the application.

 <?xml version="1.0"?>
 <?xml:namespace ns='http://www.w3.org/TR/WD-rdf-syntax/' prefix='RDF' ?>
 <?xml:namespace ns='http://www.w3.org/TR/WD-rdf-schema/' prefix='RDFS' ?>
 <?xml:namespace ns='http://ceres.ca.gov/thesaurus/' prefix='Z19' ?>

 <RDF:RDF>

 <Category RDF:id="01">
 <Label>Natural Environment</Label>
 <IC>
        <Label>Biosphere</Label>
        <Descriptor RDF:resource="0101"/>
 </IC>
 <IC>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
 </IC>
 </Category>

 <Descriptor RDF:id="0101">
  <Label>Biosphere</Label>
  <CAT>
        <Label>Natural Environment</Label>
        <Category RDF:resource="01"/>
  </CAT>
  <TT>
        <Label>Biosphere</Label>
        <Descriptor RDF:resource="0101"/>
  </TT>
  <NT>
        <Label>Ecosystems</Label>
        <Descriptor RDF:resource="010101"/>
  </NT>
 </Descriptor>

 <Descriptor RDF:id="010101">
  <Label>Ecosystems</Label>
  <TT>
        <Label>Biosphere</Label>
        <Descriptor RDF:resource="0101"/>
  </TT>
  <BT>
        <Label>Biosphere</Label>
        <Descriptor RDF:resource="0101"/>
  </BT>
 </Descriptor>

 <Descriptor RDF:id="0102">
  <Label>Lithosphere</Label>
  <CAT>
        <Label>Natural environment</Label>
        <Category RDF:resource="01"/>
  </CAT>
  <TT>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
  </TT>
  <NT>
        <Label>Earth crust</Label>
        <Descriptor RDF:resource="0102111"/>
  </NT>
  <NT>
        <Label>Soils</Label>
        <Descriptor RDF:resource="0102112"/>
  </NT>
 </Descriptor>

 <Descriptor RDF:id="0102111">
  <Label>Earth crust</Label>
  <TT>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
  </TT>
  <BT>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
  </BT>
 </Descriptor>

 <Descriptor RDF:id="0102112">
  <Label>Soils</Label>
  <UF>
        <Label>Vegetable mold</Label>
        <EntryTerm RDF:resource="x02"/>
  </UF>
  <TT>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
  </TT>
  <BT>
        <Label>Lithosphere</Label>
        <Descriptor RDF:resource="0102"/>
  </BT>
 </Descriptor>

 <EntryTerm RDF:id="x02">
  <Label>Vegetable mold</Label>
 <USE>
        <Label>Soils</Label>
        <Descriptor RDF:resource="0102112"/>
 </USE>
 </EntryTerm>

 </RDF:RDF>


Simple Schema Syntax

This schema does not include the proper description of linking Property Types in the thesaurus.

  <rdf:RDF
     xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#";
     xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#";>

  <rdfs:Class ID="Term">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/TR/WD-rdf-syntax#Resource";/>
  </rdfs:Class>

  <rdf:PropertyType ID="HN"> 
    <rdf:domain rdf:resource="#Term"/>
    <rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String";/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="Source"> 
    <rdf:domain rdf:resource="#Term"/>
    <rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String";/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="Status"> 
    <rdf:domain rdf:resource="#Term"/>
    <rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String";/>
  </rdf:PropertyType>

  <rdfs:Class ID="Category">
    <rdfs:subClassOf rdf:resource="Term"/>
  </rdfs:Class>

  <rdf:PropertyType ID="Descriptor"> 
    <rdf:domain rdf:resource="#Category"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdfs:Class ID="Descriptor">
    <rdfs:subClassOf rdf:resource="Term"/>
  </rdfs:Class>

  <rdf:PropertyType ID="SN"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String";/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="CN"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String";/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="CAT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Category"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="TT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="BT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="RT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="NT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="LT"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  <rdf:PropertyType ID="UF"> 
    <rdf:domain rdf:resource="#Descriptor"/>
    <rdf:range rdf:Resource="#EntryTerm"/>
  </rdf:PropertyType>

  <rdfs:Class ID="EntryTerm">
    <rdfs:subClassOf rdf:resource="Term"/>
  </rdfs:Class>

  <rdf:PropertyType ID="USE"> 
    <rdf:domain rdf:resource="#EntryTerm"/>
    <rdf:range rdf:Resource="#Descriptor"/>
  </rdf:PropertyType>

  </rdf:RDF>


Unresolved Issues

How to specify links within the document vs. links outside the document is still not quite specified. Also, if the server base is expecting to parse and RDF query, what's a good disconnect for a term as a single resource.

Along the lines, we need to specify a mechanism for a server to only send back part of the request, and let the client know how to continue from where they left off. I think this could be a sort of generic query response vocabulary.

We a small vocabulary for errors as well. This could be part of the query response.

This vocabulary should be made in a more inheritance framework; where the most basic is something like 'wordlist' -> 'glossary' -> 'thesaurus' -> 'word net'. And then a thesaurus is a glossary isa wordlist. If we do that, then a single query specification should work for everyone.


To Be incorporated RDF Issues

 Failure Behavior and Error behavior should be defined.
 Continuation must be addressed.
 Include a method for getting only the counts of the number of hit(s)
 Inlcude multiple heirarchies in the query set.
 Talk about the ability to add new properties to an RDF record.