Topic 3: GUIDs for Taxon Names and Taxon Concepts

Sun Oct 30 00:07:22 CEST 2005

Thank you, Donald, for starting the discussion thread for which I have been
waiting (no so) patiently a very long time, and in a context that might, for
the first time, possibly lead to some meaningful resolution. Of course, I
wholeheartedly endorse your approach to distinguishing "names" from
"concepts" as different informational objects, and also support the basic
notion that a "Concept Object" can be conveniently and reliably represented
as a combination of a "Name" and some sort of documented usage of the Name
(usually in the form of a publication).

Before I provide my own answers to your specific questions, though, I want
to underscore what I feel is a fundamentally important issue that needs to
be addressed early on in any serious discussion of GUIDs for taxonomic
names.  There is no broad agreement on what a unit "Name" really is, or
should be.  Consider the following list:

 1. Pomacanthidae
 2. Pomacanthinae
 3. Centropyge
 4. Xiphypops
 5. Centropyge (Xiphypops)
 6. Centropyge flavicaudus
 7. Centropyge flavicauda
 8. Xiphypops flavicaudus
 9. Centropyge (Xiphypops) flavicauda
10. Centropyge fisheri
11. Centropyge fisheri flavicauda
12. Centropyge (Xiphypops) fisheri flavicauda

How many Name-GUIDs would be needed for the above list?  From one
perspective there would be twelve GUIDs -- one for each "namestring".  In
ITIS, there would be ten TSNs (#9 would not receive a separate TSN from #7,
nor would #12 receive a separate TSN from #11).  From the botanical
perspective (imagining these as botanical names), there would be at least
seven (#6 & #7 would be spelling variants of the same "name", and I don't
believe that #9 and #12 would be treated as different "names" from #7 and
#11, respectively), and perhaps eight (not sure if #1 & #2 would be the same
or different "names", the former being at rank Family, and the latter
Subfamily).  From the zoological perspective, there may be only five: [1+2],
[3], [4+5], [6+7+8+9+11+12], [10] (the various flavors of each "Name" unit
would be considered attributes of the usage -- i.e., tied to the Concept
object).

Before a GUID system can be implemented for taxon names, there needs to be a
clear definition of what "unit" of name should receive a unique GUID, vs.
what textual elements represent attributes of a usage (~concept) instance.
No definition is perfectly unambiguous in all cases, but I think it's
important that the broader community adopt a SINGLE definition of what a
Name unit is. Having separate systems for Botany vs. Zoology vs. whatever
would, I think, go a very long way toward defeating the purpose of
establishing taxon name GUIDs in the first place.

Now on to the specific questions:

> Is your data organised using taxon names or to taxon concepts?

I use Taxon concepts as the core unit, with only one series of ID #s (32-bit
integers). Name IDs are derived from a defined subset of Concept IDs (the
original description usage instance for each name). For a full explanation,
see: www.phyloinformatics.org/pdf/1.pdf

Note: I would NOT recommend this approach (names IDs derived from subset of
concept IDs) for GUIDs.  It works WONDERFULLY and elegantly for my Taxonomer
application, where ID numbers are always passed in context.  But for
universally accessed GUIDs, there may be ambiguity whether ID#12345
references the concept asserted within the original description of a name,
or just the concept-less name object.

> Do you assign any reusable identifiers to taxon names or concepts
> (i.e. identifiers used in more than one database)?

I guess it depends on what you mean by "one database".  I think the best
answer to your question for the "databases" I manage is "yes".

> If so, what is the process in assigning new identifiers for additional
> taxa and for accommodating taxonomic change?

New names & concepts are created from multiple sources, and identifiers are
assigned automatically within a single, common taxon data table accessed by
all sources via the network.  Because records represent Name-usage
instances, they never need to change (except for correcting data
entry/transcription errors). Changing taxonomies are documented
automatically simply by virtue of the fact that each usage is treated as a
separate record, so the data table creates a history of alternate usages
over time.  A single internal "current use" taxonomy is established by
selecting a single usage record for each "Name" (sensu zoological
perspective), representing the specific usage that we feel got it "right".

> Where are these identifiers used (other organizations,
> databases, data exchange, recording forms, etc.)?

At this moment, they are used only internally within our institution. Soon,
they will be shared among partners of the Pacific Basin Information Node
(PBIN) -- part of the U.S. National Biological Information Infrastructure
(NBII).

> Do you use identifiers from any external classification
> within your database?

Not sure what this means, exactly, but we do cross-map our IDs to other IDs
(e.g., ITIS TSNs, Catalog of Fishes ID numbers, etc.). And the nature of our
data structure (tracking usage instances) automatically keeps track of
multiple classifications.

> Would there be any social or technical roadblocks to
> replacing these identifiers with a single identifier
> that was guaranteed to be unique?

Not really -- depending on how a Name "unit" is scoped (as per my discussion
above).

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html