Topic 3: GUIDs for Taxon Names and Taxon Concepts

Thu Nov 3 23:15:55 CET 2005

There seems to be a great deal of confusion and misunderstanding in this
discussion, and I apologize for whatever extent I am to blame. Several big
issues are crossing paths.  I don't know why the "centralized" issue cropped
up, but it is, in my mind, tangential to (what should be) the main thrust of
this discussion.

> The idea that only the resolution system needs to be able to
> distinguish between specimens, taxon names, etc., seems unfortunate.

I agree.

> Assigning GUIDs solely to basionyms strikes me as crazy -- I'd suggest
> most taxa aren't known by their basionyms. I'd advocate GUIDs for every
> name string (with the possible exception of orthographic variants,
> spelling mistakes, etc.).

By "Namestring", do you include author, our just taxonomic nomenclatural
elements?  In either case, why even bother establishing a GUID for taxonomic
names at all?  Why not just use the namestring itself?

Moreover, I don't think anyone has suggest that GUIDs be assigned "solely"
to basionyms.  The point is, there shouldn't be six different "units" to
which GUIDs are assigned to represent "taxon names" [basionyms,
combinations, namestrings, namestrings with authors, concept definitions,
name usage instances, etc.]

> Lastly, "imposing standards" is the wrong way to think about this.
> Standards win support if they work, and are adopted. I'd suggest this
> stuff will happen if people make compelling applications that others
> make use of, not because TDWG decides what should be done.

O.K., my choice of the word "imposing" was bad.  I should have said
something like "implementing".  The one lesson that the web has taught us
more vividly than decentralization is the power of standards for information
exchange. We could have all invented our own version of DarwinCore for
exposing specimen records to the internet, and allowed natural selection to
take its course such that only the most useful few versions survived -- but
I don't think that would have been the most effective path to where we are
now (and where we could be soon, once DarwinCore V2 is ratified and mapped
against ABCD elements).

> So, my final question is, what is wrong with having each taxonomic
> database serve their own GUIDs for their own data (using an agreed
> format, such as RDF), and where possible GUIDs from different sources
> are mapped (e.g., a name string in IPNI to one in uBio). Users employ
> whatever GUID they find useful -- at least we then know what digital
> object they are referring to.

To me, GUIDs are only valuable and serve the information exchange needs of
our community if they are, to some extent at least, reusable.  I do not
understand the value of assigning GUIDs to the records in my taxonomic
database, if I am the only one who will ever use them. I understand the
practical value of establishing "local" GUIDs to my taxon name records
independently from other databases, if the long-term goal is to eventually
map my GUIDs to the corresponding "equivalent" GUIDs of other databases.
But that brings me back to the original point I've been trying to make:
that mapping can only be done effectively if we have equivalent objects to
map.  Right now, there are a half-dozen competing ideas about what a "name"
object is (or should be).  Having spent the past year beating my head
against a wall trying to map my local taxonomic database records to ITIS
TSN's, I can speak with first-hand experience how utterly INefficient it can
be to create such cross-walks when a "Name" record in ITIS means something
different from a "Name" record in my database.

Donald started out this discussion by suggesting that "Taxon Name" objects
might be thought of as fundamentally different from "Taxon Concept" objects
(in an analagous way to how publication objects are fundamentally different
from specimen objects).  If you agree with this, then it would be nice if we
could find a standard line to draw between these two objects, so that when
we later try to cross-walk our respective datasets, we're mapping apples to
apples, and oranges to oranges (rather than apples to bananas and then back
to oranges -- which aptly represents what I'm having to resort to in mapping
my taxonomic data records to ITIS records).

> If we think of the scientific literature, many paper have at least two
> GUIDs (DOI and PubMed), both of which are useful, and which serve
> different purposes.

What is the value of expanding that to potentially hundreds, or thousands of
GUIDs for each paper -- one GUID from each database that has a table of
literature records?  Wouldn't our information exchange be much more
efficient if we all adopted (or at least mapped to) either the DOI or the
PubMed GUID (or both)? If so, why doesn't this same logic apply to taxon
names?

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html