Topic 3: GUIDs for Taxon Names and Taxon Concepts

Mon Nov 7 09:45:08 CET 2005

Rich,

Thanks for this.  I fully agree that we need to separate the core plan for
infrastructure from the different subdomain matters.

Reading what you have said, I would like to suggest a way for us to keep our
focus in these discussions.  I want us to achieve something workable at the
end of this, and we need to avoid getting tied up in different issues that
are certainly important to TDWG and to GBIF, but which will over-extend us
right now.

I think that it IS really important for us to discuss what the key objects
in nomenclature and taxonomy are that would need GUIDs (names, epithets,
concepts as published uses, concepts as delineations of sets of organisms,
etc.).  These discussions will help us to understand how GUIDs are likely to
be used.  We need to make enough progress on these points to feel sure that
we will be able to support the various ways that GUIDs could possibly be
used in our domain.

However we do not need to solve all of the debates as part of this GUID
project.  We need to agree on one or more GUID models and associated
infrastructure that would be sufficient to support the range of possible
uses we identify.  The specific application of this GUID model to the
different sub-domains is another issue.  I think that TDWG must work through
all of these questions, but it should be done as part of the different
working groups (Biological Collections Data, Taxonomic Names, Structured
Descriptive Data, Images, etc.).

I think therefore that we should focus on getting the following outputs from
the GUID discussions:

1. A list of use cases for identifiers that we should be able to support if
we are to meet expectations.  We should be sure that there is no major use
case that cannot be supported by our proposed implementation.  This includes
our understanding anything that would be essential for a system of name
management based entirely on epithets, as well as one based on binomials
(and all of the possible layers that Rich listed from a plain text string up
to a fully documented concept).  We may finally decide that from a GUID
management standpoint all of these classes of data need exactly the same
behaviour, but we need a good range of examples if we are to be confident on
this point.

2. A specification of the format and expected behaviour for one of our GUIDs
independently of the exact type of data being identified, along with a plan
to implement any associated infrastructure.

3. A set of guidelines or rules for the TDWG working groups or others to
apply to make use of these GUIDs to support their own objects.  I believe
that the actual decisions here need to be made by those who are developing
the different exchange standards.  For example, Yde's suggestions about
fundamentally different expectations among zoologists and botanists need to
be addressed in the TCS group.  As another example (taken from one of
Sally's posts), the TCS group could recommend a set of organisations to be
responsible for GUIDs for names within different taxonomic groups (IPNI,
ZooBank, Index Fungorum, etc.).  The GUID specification should clearly
explain the implications of adopting GUIDs for any particular class of data,
as well as any infrastructure work that is required (e.g. registration of
the data class in some ontology).

One way of looking at this is that I support Rod's point that we are
developing an infrastructure on top of which appropriate mappings can be
performed.  It's up to different subcommunities to work out the most
appropriate way to use that infrastructure to make their data linkages as
robust as possible.

This is all a bit wordy, but I don't have time to make it shorter right now.

Thanks,

Donald

---------------------------------------------------------------
Donald Hobern (dhobern at gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
---------------------------------------------------------------

-----Original Message-----
From: Taxonomic Databases Working Group GUID Project
[mailto:TDWG-GUID at LISTSERV.NHM.KU.EDU] On Behalf Of Richard Pyle
Sent: 04 November 2005 03:53
To: TDWG-GUID at LISTSERV.NHM.KU.EDU
Subject: Re: Topic 3: GUIDs for Taxon Names and Taxon Concepts

Hi Kevin,

> I am trying to understand how you imagine a name guid
> system will work.  When you say "what gets a name guid?",
> what data/database do you think will get the guid - a
> central name repository?  Are you thinking that all
> records in various databases around the world that are
> referring to the "same" name will have the same "Name Guid"?
> For resolvable GUIDs this will mean the name will always
> resolve to the central repository data.

I think that would be WONDERFUL -- and might even be feasible for zoological
names once ZooBank is up and running.  But that's tangential to my main
point -- which is that TDWG Standards should establish a clear definition
for what a "Name" is.  At the very basic level, we need to be able to
understand the difference between, say, a Specimen GUID vs. a Publication
GUID vs. a Taxon Name GUID.  Granted, the resolution service should take
care of this distinction (a GUID is, after all, a *G*UID), in which case we
would't really need to distinguish between these domains.  We really need
only one GUID system for all object domains (specimens, publications, names,
concepts, etc.).

But Donald's question of 29 October and the TDWG GUID Wiki, imply that there
are different issues to consider for different GUID domains (e.g.,
"GUIDsForCollectionsAndSpecimens" vs. "GUIDsForTaxonNamesAndTaxonConcepts").

So, *IF* we are partitioning GUID domains (rather than letting the
resolution service determine the difference between a specimen and, say, a
publication), and *IF* we are thinking about
"GUIDsForTaxonNamesAndTaxonConcepts", then I believe it is critical to come
up with clear definitions for what the scope of a "Name" is.  If TDWG does
not impose some standards of this sort, then cross-walking different
datasets will be every bit as much a nightmare as it currently is.  I would
see that as a most unfortunate outcome for the broader goal of global
biological data exchange.

Maybe I misunderstood your question?

Aloha,
Rich