Purpose: To construct 1) an integrated controlled environmental vocabulary, and 2) the tools to use it for creation of metadata and construction of queries, both on-line and in stand-alone systems.
Background:
Both the California Environmental Resources Evaluation System (CERES), a program of the California Resources Agency, and the United States Geological Survey Biological Resources Division (USGS/BRD) are working to develop methods for the description, discovery, and exchange of environmental information. CERES has made significant advances in the development of a browsable and searchable index of information on the World Wide Web (http://ceres.ca.gov) which are potentially useful to the National Biological Information Infrastructure (NBII) project.
Currently, the CERES program is involved in development of a key component, a controlled vocabulary database and user environment that will: a) provide pick list(s) for selection of keywords, synonyms, and related concepts for use in metadata and queries, b) provide a hierarchical organization of information to serve as a browsing structure in information discovery, and c) allow simultaneous browsing and comparison of terms as presented in multiple standard thesauri in use nationally and internationally to index environmental information.
The USGS/BRD has a similar project underway to develop a vocabulary that can be used to describe and access biological information via the NBII.
CERES and USGS/BRD will collaborate to develop a core vocabulary that is appropriate and applicable at the national (and ultimately at the international) level and onto which region-specific terms may be appended for the development of a California environmental thesaurus. The core vocabulary set will serve as a starting point for development of other state-level vocabularies, which may be more detailed and more appropriately specific to regional needs and descriptions. The California vocabulary set will be a template for other state-level vocabularies, and it will serve as a guide for other states' implementations of controlled vocabularies and thesauri.
THESAURUS PROJECT PART I
Title: Integrated Environmental Thesaurus
Description:
CERES and USGS/BRD will collaborate on the development of a common, integrated controlled vocabulary (Thesaurus) that will use as guidance national and international standards for the construction of thesauri. Minimally, the Thesaurus will represent the national and state perspectives (and as necessary the international perspective). An extension to this Thesaurus will provide a California-specific perspective. The intention is that this new Thesaurus be useful as a model for other states. This requires agreement on the terminology used in the core (most generally-applicable) sections in the Thesaurus. This Thesaurus will be a collaborative effort with terms and definitions being supplied by both the CERES and USGS/BRD staffs but maintained in the CERES database. Major components of this activity will include the following four tasks:
Task A- Compilation of the Thesaurus Contents
1. Identify core terms from selected environmental thesauri to represent the national perspective as well as the state perspective (and as necessary the international perspective) and an extension keyword set that provides for a California-specific application.
2. Construct a hierarchical structure for organization of the Thesaurus. This involves selecting, researching, soliciting, and compiling an appropriate arrangement of preferred terms, broader, narrower, related terms, and entry terms and their definitions. The structure will be developed by utilizing existing structures found in existing thesauri and by consulting experts from state and federal natural resources agencies and university programs.
3. Provide linkages from terms in the Thesaurus to controlled vocabularies, which contain synonymous or related terms.
4. Add needed terms for the core set and the California extension which have not been found in existing thesauri.
5. Provide linkages to other thesauri or thesaurus sections or other vocabularies from appropriate terms in the Thesaurus where there is identified need for further specificity such as in taxonomic (primarily to the USGS/BRD supported Integrated Taxonomic Information System [ITIS]), medical, and chemical terminologies.
Task B- Review, Testing and Acceptance
1. Coordinate input, review, and collaboration with other organizations and projects involved in vocabulary research as applies to environmental information technology. CERES will coordinate the state-level review and testing of the core and extended thesaurus, and BRD will represent the national level for review and testing of the core keyword set.
2. Establish and implement a method for approval and acceptance by target users and other organizations involved in vocabulary research at both the state and national levels.
Task C- Maintenance
1. Develop and test a methodology for maintaining and updating the Thesaurus and modifying the Thesaurus vocabulary as indicated by the review and testing. New versions of the core keyword set and structure will require coordinated review by CERES and USGS/BRD and representative users of their respective constituencies.
Task D- Publication
1. Publish the results of this project in a refereed journal and present at at least one appropriate conference.
THESAURUS PROJECT PART 2
Title: Thesaurus Application Tool Set
Description:
The USGS/BRD and the California Resources Agency will collaborate on the development of software tools that will allow navigation and searching of the thesaurus database and ITIS and the use of the vocabulary terms as keywords in metadata records and queries and other NBII applications. CERES will contribute its existing beta version Thesaurus Browser software. Major components and deliverables of this activity will:
1. Enhance and refine the CERES Thesaurus Browser for browsing the multi-thesaurus database and any linked thesaurus databases. Responsibility: CERES staff.
2. Develop a general-purpose thesaurus applications programming interface (API) that will broker communications between NBII applications and both the Thesaurus and ITIS. The API will allow queries to be passed from the various NBII applications to both the Thesaurus and to ITIS and vocabulary terms (also known as keywords) to be passed from the Thesaurus and ITIS to the applications. The API should be generic enough to be applicable to other target applications. Responsibility: CERES and USGS/BRD staff.
3. Develop database drivers for the Thesaurus API. The drivers will enable the API to communicate with the Thesaurus, to ITIS, and to the Library of Congress Subject Headings (LCSH) database. API drivers for other vocabulary databases could be added in the future. Responsibilities: Thesaurus and LCSH drivers: CERES staff; ITIS driver: USGS/BRD staff.
4. Modify NBII applications, such as Metamaker, NBII Website search tools, SIS, the Publications Database, and the Products Database, to allow communication with the Thesaurus API. These modifications will enable the NBII applications to use Thesaurus data in their functions and operations. Responsibility: USGS/BRD staff.
5. Develop an export format and documentation necessary for import into other software that allow thesaurus management, development of metadata, and searching. Minimally, this format will integrate with the CERES and BRD/NBII/Metamaker environments. Responsibility: CERES staff.
6. Publish the results of this project in a refereed journal and present at at least one appropriate conference. Responsibility: CERES and USGS/BRD staff.