TDWG/GBIF GUID-1 Workshop Report ================================ The Taxonomic Databases Working Group (TDWG) and the Global Biodiversity Information Facility (GBIF) completed their first Workshop on Globally Unique Identifiers for Biodiversity Informatics (GUID-1) at the National Center for Evolutionary Synthesis (NESCent), Durham, NC, USA on Feb 1-3, 2006.
Motivation ========== A GUID framework is foundational in facilitating systems interoperability in biodiversity informatics. It meets the need for a universally adopted system for assigning and recognizing identifiers in the domain. A GUID system will help to manage and cross-link the many different types of entities that are manipulated analytically in biodiversity informatics and will improve interoperability with other related life sciences domains, such as bioinformatics and ecology.
The Group ========= The workshop delegates consisted of a representative cross-section of domain experts from around the world (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID1Participants).
Goals ===== The goals of the workshop were to:
* Discuss the requirements for globally unique identifiers for biodiversity informatics * Select an optimal GUID technology (LSID, DOI, Handles or other) * Begin to identify key parameters for implementing an effective system * Investigate the use of a RDF-based metadata architecture for GUIDs * Form working groups to address key identified issues before the GUID-2 workshop
Outcomes ======== * Life Science Identifiers (LSID) seem the most appropriate GUID strategy in biodiversity informatics. * The use of LSIDs does not preclude the use of other technologies where appropriate. * LSID authorities must use the Domain Name Service (DNS) to support identifier resolution. (The LSID specification allows for other resolution mechanisms, but DNS is currently the only mechanism in use.) * Although it is not possible to prevent multiple data providers from issuing alternate identifiers resolving to the same data record, the community should develop processes and tools to coordinate issuing of single identifiers for some classes of data (e.g. taxon names). * Metadata should be provided as RDF serialized as XML and should exploit existing vocabularies such as Dublin Core wherever these are in wide use. * The LSID getData method should be used only where it is possible and appropriate to return an unchanging series of bytes. In other cases only the LSID getMetadata method should be used. (This reflects the use of the terms “data” and “metadata” in connection with LSIDs.)
Justifications ============== The main criteria leading to the selection of LSID technology were:
* The cost-model of DOI. That technology is predicated on the idea that a revenue stream can be constructed for the identified objects, typically sufficient to defray the cost. That this is not the case for most, if not all, of the objects that are likely to be identified in our systems. * The more dynamic nature of LSIDs, which does not require prior registration of every individual identifier before use. * The open nature of the LSID protocol and software stack, and the ease of implementing LSIDs on different platforms.
See http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID1Report for a more detailed comparison of GUID technologies.
Work Plan ========= There are still many issues to address before our community can fully implement an identifier system based on LSIDs. The workshop addressed a number of specific issues and developed working groups to address the following issues:
* Developing white papers to address best practices and key infrastructure questions. * Prototyping activities.
The Infrastructure Working Group ================================ This group was formed to address the key issues regarding the deployment of LSID as the GUID technology for biodiversity informatics. The mandate of this working group is to identify required or desirable policies and infrastructure components to ensure robust, long-term operation of shared GUIDs.
The following activities were identified:
1. Specify minimal standards (including tools and services) for GUID issuance. 2. Investigate long-term archival of LSIDs and associated data and metadata. 3. Investigate establishing (optional) central registration authority. 4. Investigate establishing repository for data and (orphan) datasets with GUIDs 5. Investigate the feasibility, existing actors and requirements for a ”Publication Bank” (a resource to act as a central registry of taxonomic literature and its digital representations, including assigning GUIDs to each publication. 6. Clarify the distinction between GUIDs assigned to data objects and to conceptual entities. 7. Investigate 3rd-party annotation and link-out mechanisms. 8. Develop materials to communicate with wider community. 9. Develop best practices for assigning resolver namespaces for LSIDs. 10. Perform review of LSID specification to identify possible enhancements. 11. Perform gap analysis of LSID software..
The outcomes of this group will be a series of white papers addressing the key infrastructure issues. Those will be reviewed during the second GUID meeting later this year.
The Prototyping Working Group ============================= Our community must experiment with LSID technology and Ontology Engineering if we are to implement a production quality LSID system, The working group will develop prototypes of test cases to test aspects of a GUID infrastructure.
The group will develop test LSID resolvers using data objects provided by each domain, such as names, specimens, and concepts. This activity will also help involve (and train) the community in developing appropriate RDF ontologies, leading to concrete recommendations and implementations.
The potential prototypes to be developed and respective Conveners are:
1. LSID resolver for taxon names – developed by nomenclators using IPNI database and an RDF version of TCS-Names – Group responsible: Roger Hyam, Sally Hinchcliffe, Paul Kirk 2. LSID resolver for specimens using DarwinCore (also ABCD?) - Steve Perry. 3. LSID resolver for taxon concepts by SEEK using TCS. 4. LSID resolver for observations by SEEK using EML. 5. LSID resolver for character data by Damian Barnier, Kevin Thiele 6. LSID resolver for images: Greg Riccardi (MorphBank), Bob Morris
The taxon names resolver has the highest priority.
Prototypes will address one or more of the following (but may not be full implementations of an LSID resolution service):
* Hardware and software (including LSID stack) * RDFS/OWL vocabulary for domain * Data mapping between local data store scheme and shared ontology
Other important tasks identified by this group are:
* Development of ontologies to represent metadata for the various domains. Coordinated by TDWG TAG with help of experienced ontology engineers. * To set up a real live LSID server to perform scalability testing. * A project to demonstrate the potential from LSID-based integration of data for a particular group (Ants) – LSIDs, taxonomic lit, specimen, images, names, sequences from Genbank – Rod Page * To use SEEK Taxon resolution server (alpha) for testing.
This group will have 3 months to work on the specified tasks before preparing for the second GUID workshop.
Next Workshop: GUID-2 ===================== The TDWG Infrastructure Project is planning a second GUID workshop in late May or early June, 2006. At the time of writing a venue has not been decided.
The second workshop should cover the following:
* Review of the material produced between the workshops by both working groups (prototypes and white papers). * Summary of the lessons learned in the process. * Identify open issues and devise specific work plans to address them. * Draft concrete recommendations on GUIDs for production systems – in general and for each specific domain (names, specimens, concepts, images, etc).
Information about GUID-2 will be distributed as soon as possible.
Other Resources =============== * See http://wiki.gbif.org/guidwiki for more on the TDWG GUID effort. * See http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID1Minutes for the workshop minutes and presentations. * See http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID1Report for a full report, or download it in pdf format from the following link: http://wiki.gbif.org/guidwiki/images/GUID-1Report.pdf
-------------------- Ricardo Pereira Software Engineer Taxonomic Databases Working Group (TDWG) TDWG Infrastructure Project (TIP)