Rich,
Thanks for this. I fully agree that we need to separate the core plan for infrastructure from the different subdomain matters.
Reading what you have said, I would like to suggest a way for us to keep our focus in these discussions. I want us to achieve something workable at the end of this, and we need to avoid getting tied up in different issues that are certainly important to TDWG and to GBIF, but which will over-extend us right now.
I think that it IS really important for us to discuss what the key objects in nomenclature and taxonomy are that would need GUIDs (names, epithets, concepts as published uses, concepts as delineations of sets of organisms, etc.). These discussions will help us to understand how GUIDs are likely to be used. We need to make enough progress on these points to feel sure that we will be able to support the various ways that GUIDs could possibly be used in our domain.
However we do not need to solve all of the debates as part of this GUID project. We need to agree on one or more GUID models and associated infrastructure that would be sufficient to support the range of possible uses we identify. The specific application of this GUID model to the different sub-domains is another issue. I think that TDWG must work through all of these questions, but it should be done as part of the different working groups (Biological Collections Data, Taxonomic Names, Structured Descriptive Data, Images, etc.).
I think therefore that we should focus on getting the following outputs from the GUID discussions:
1. A list of use cases for identifiers that we should be able to support if we are to meet expectations. We should be sure that there is no major use case that cannot be supported by our proposed implementation. This includes our understanding anything that would be essential for a system of name management based entirely on epithets, as well as one based on binomials (and all of the possible layers that Rich listed from a plain text string up to a fully documented concept). We may finally decide that from a GUID management standpoint all of these classes of data need exactly the same behaviour, but we need a good range of examples if we are to be confident on this point.
2. A specification of the format and expected behaviour for one of our GUIDs independently of the exact type of data being identified, along with a plan to implement any associated infrastructure.
3. A set of guidelines or rules for the TDWG working groups or others to apply to make use of these GUIDs to support their own objects. I believe that the actual decisions here need to be made by those who are developing the different exchange standards. For example, Yde's suggestions about fundamentally different expectations among zoologists and botanists need to be addressed in the TCS group. As another example (taken from one of Sally's posts), the TCS group could recommend a set of organisations to be responsible for GUIDs for names within different taxonomic groups (IPNI, ZooBank, Index Fungorum, etc.). The GUID specification should clearly explain the implications of adopting GUIDs for any particular class of data, as well as any infrastructure work that is required (e.g. registration of the data class in some ontology).
One way of looking at this is that I support Rod's point that we are developing an infrastructure on top of which appropriate mappings can be performed. It's up to different subcommunities to work out the most appropriate way to use that infrastructure to make their data linkages as robust as possible.
This is all a bit wordy, but I don't have time to make it shorter right now.
Thanks,
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: Taxonomic Databases Working Group GUID Project [mailto:TDWG-GUID@LISTSERV.NHM.KU.EDU] On Behalf Of Richard Pyle Sent: 04 November 2005 03:53 To: TDWG-GUID@LISTSERV.NHM.KU.EDU Subject: Re: Topic 3: GUIDs for Taxon Names and Taxon Concepts
Hi Kevin,
I am trying to understand how you imagine a name guid system will work. When you say "what gets a name guid?", what data/database do you think will get the guid - a central name repository? Are you thinking that all records in various databases around the world that are referring to the "same" name will have the same "Name Guid"? For resolvable GUIDs this will mean the name will always resolve to the central repository data.
I think that would be WONDERFUL -- and might even be feasible for zoological names once ZooBank is up and running. But that's tangential to my main point -- which is that TDWG Standards should establish a clear definition for what a "Name" is. At the very basic level, we need to be able to understand the difference between, say, a Specimen GUID vs. a Publication GUID vs. a Taxon Name GUID. Granted, the resolution service should take care of this distinction (a GUID is, after all, a *G*UID), in which case we would't really need to distinguish between these domains. We really need only one GUID system for all object domains (specimens, publications, names, concepts, etc.).
But Donald's question of 29 October and the TDWG GUID Wiki, imply that there are different issues to consider for different GUID domains (e.g., "GUIDsForCollectionsAndSpecimens" vs. "GUIDsForTaxonNamesAndTaxonConcepts").
So, *IF* we are partitioning GUID domains (rather than letting the resolution service determine the difference between a specimen and, say, a publication), and *IF* we are thinking about "GUIDsForTaxonNamesAndTaxonConcepts", then I believe it is critical to come up with clear definitions for what the scope of a "Name" is. If TDWG does not impose some standards of this sort, then cross-walking different datasets will be every bit as much a nightmare as it currently is. I would see that as a most unfortunate outcome for the broader goal of global biological data exchange.
Maybe I misunderstood your question?
Aloha, Rich