Thesaurus::RDF -- The RDF Thesaurus descriptor standard
read it!
This document describes an RDF implementation of a representation of terms of a thesaurus. The definition of a thesaurus follows that of the NISO specification z39.19. This specification is intended as a method for thesaurus servers to transfer all or part of a thesaurus to an application.
RDF/XML was chosen as the method of transfer because thesaurus terms and their relationships can be naturally specified under the RDF framework, and because of the development of RDf and XML tools that promise a rich application environment for RDF/XML resources. Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML
It is also hope that using the RDF framework, other vocabularies can be used to further the development of thesaurus servers.
Although RDF is the intended format, since RDF parsers are not yet readily available, a development standard is required. This should allow developers to test applications using an XML parser alone. The rules for this development standard are:
All resources must be specified with the typedNode production specification.
RDF schema propertyTypes must be specified as XML attributes Z39.19 schema propertyTypes most be specified as XML elements
This vocabulary must have the default namespace designation. The RDF vocabulary must use the RDF: prefix
All propertyTypes related to Term linking must use an RDF:resource specification.
Resources in this transfer specification are all typed as instances of 'Descriptor','Category', or 'EntryTerm'. These are all sub-types of 'Term'. These definitions are those contained in the z39.19 standard. This implmentation uses typedNode production forms for all these resources.
The following propertyTypes can be specified for the Terms.
SN Scope notes for the Descriptor CN Cattaloger notes for the Descriptor HN Historical Notes for the Term Source Indication of the source of the Term Status Indication of the status of the Term
These propertyTypes are all resources of two PropertyType, a Label, and a Term (Descriptor | EntryTerm | Category). They should not be 'id'ed. If the label is not used, the client may use the id of the Term resource as a label.
IC Descriptor within a Category CAT Category of the Descriptor UF EntryTerm for which the Descriptor is preferred TT Topmost Term(s) for the Descriptor BT Broader Term(s) for the Descriptor RT Related Term for a Descriptor NT Narrower Term for the Descriptor USE Descriptor that is preferred to the EntryTerm
Typically, the linking propertyTypes include an RDF:resource attribute to identify the resource being identified, rather than specifing it within the propertyType.
This is the syntax for representing thesauri terms. A major simplification is that the identifier for each term is also how the term should be displayed by the application.
<?xml version="1.0"?> <?xml:namespace ns='http://www.w3.org/TR/WD-rdf-syntax/' prefix='RDF' ?> <?xml:namespace ns='http://www.w3.org/TR/WD-rdf-schema/' prefix='RDFS' ?> <?xml:namespace ns='http://ceres.ca.gov/thesaurus/' prefix='Z19' ?>
<RDF:RDF>
<Category RDF:id="01">
<Label>Natural Environment</Label>
<IC>
<Label>Biosphere</Label>
<Descriptor RDF:resource="0101"/>
</IC>
<IC>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</IC>
</Category>
<Descriptor RDF:id="0101">
<Label>Biosphere</Label>
<CAT>
<Label>Natural Environment</Label>
<Category RDF:resource="01"/>
</CAT>
<TT>
<Label>Biosphere</Label>
<Descriptor RDF:resource="0101"/>
</TT>
<NT>
<Label>Ecosystems</Label>
<Descriptor RDF:resource="010101"/>
</NT>
</Descriptor>
<Descriptor RDF:id="010101">
<Label>Ecosystems</Label>
<TT>
<Label>Biosphere</Label>
<Descriptor RDF:resource="0101"/>
</TT>
<BT>
<Label>Biosphere</Label>
<Descriptor RDF:resource="0101"/>
</BT>
</Descriptor>
<Descriptor RDF:id="0102">
<Label>Lithosphere</Label>
<CAT>
<Label>Natural environment</Label>
<Category RDF:resource="01"/>
</CAT>
<TT>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</TT>
<NT>
<Label>Earth crust</Label>
<Descriptor RDF:resource="0102111"/>
</NT>
<NT>
<Label>Soils</Label>
<Descriptor RDF:resource="0102112"/>
</NT>
</Descriptor>
<Descriptor RDF:id="0102111">
<Label>Earth crust</Label>
<TT>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</TT>
<BT>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</BT>
</Descriptor>
<Descriptor RDF:id="0102112">
<Label>Soils</Label>
<UF>
<Label>Vegetable mold</Label>
<EntryTerm RDF:resource="x02"/>
</UF>
<TT>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</TT>
<BT>
<Label>Lithosphere</Label>
<Descriptor RDF:resource="0102"/>
</BT>
</Descriptor>
<EntryTerm RDF:id="x02">
<Label>Vegetable mold</Label>
<USE>
<Label>Soils</Label>
<Descriptor RDF:resource="0102112"/>
</USE>
</EntryTerm>
</RDF:RDF>
This schema does not include the proper description of linking Property Types in the thesaurus.
<rdf:RDF
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#">
<rdfs:Class ID="Term">
<rdfs:subClassOf rdf:resource="http://www.w3.org/TR/WD-rdf-syntax#Resource"/>
</rdfs:Class>
<rdf:PropertyType ID="HN">
<rdf:domain rdf:resource="#Term"/>
<rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String"/>
</rdf:PropertyType>
<rdf:PropertyType ID="Source">
<rdf:domain rdf:resource="#Term"/>
<rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String"/>
</rdf:PropertyType>
<rdf:PropertyType ID="Status">
<rdf:domain rdf:resource="#Term"/>
<rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String"/>
</rdf:PropertyType>
<rdfs:Class ID="Category">
<rdfs:subClassOf rdf:resource="Term"/>
</rdfs:Class>
<rdf:PropertyType ID="Descriptor">
<rdf:domain rdf:resource="#Category"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdfs:Class ID="Descriptor">
<rdfs:subClassOf rdf:resource="Term"/>
</rdfs:Class>
<rdf:PropertyType ID="SN">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String"/>
</rdf:PropertyType>
<rdf:PropertyType ID="CN">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="http://www.w3.org/TR/WD-rdf-syntax#String"/>
</rdf:PropertyType>
<rdf:PropertyType ID="CAT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Category"/>
</rdf:PropertyType>
<rdf:PropertyType ID="TT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdf:PropertyType ID="BT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdf:PropertyType ID="RT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdf:PropertyType ID="NT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdf:PropertyType ID="LT">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
<rdf:PropertyType ID="UF">
<rdf:domain rdf:resource="#Descriptor"/>
<rdf:range rdf:Resource="#EntryTerm"/>
</rdf:PropertyType>
<rdfs:Class ID="EntryTerm">
<rdfs:subClassOf rdf:resource="Term"/>
</rdfs:Class>
<rdf:PropertyType ID="USE">
<rdf:domain rdf:resource="#EntryTerm"/>
<rdf:range rdf:Resource="#Descriptor"/>
</rdf:PropertyType>
</rdf:RDF>
How to specify links within the document vs. links outside the document is still not quite specified. Also, if the server base is expecting to parse and RDF query, what's a good disconnect for a term as a single resource.
Along the lines, we need to specify a mechanism for a server to only send back part of the request, and let the client know how to continue from where they left off. I think this could be a sort of generic query response vocabulary.
We a small vocabulary for errors as well. This could be part of the query response.
This vocabulary should be made in a more inheritance framework; where the most basic is something like 'wordlist' -> 'glossary' -> 'thesaurus' -> 'word net'. And then a thesaurus is a glossary isa wordlist. If we do that, then a single query specification should work for everyone.
Failure Behavior and Error behavior should be defined. Continuation must be addressed. Include a method for getting only the counts of the number of hit(s) Inlcude multiple heirarchies in the query set. Talk about the ability to add new properties to an RDF record.