Re: Topic 3: GUIDs for Taxon Names and Taxon Concepts
Thank you, Donald, for starting the discussion thread for which I have been waiting (no so) patiently a very long time, and in a context that might, for the first time, possibly lead to some meaningful resolution. Of course, I wholeheartedly endorse your approach to distinguishing "names" from "concepts" as different informational objects, and also support the basic notion that a "Concept Object" can be conveniently and reliably represented as a combination of a "Name" and some sort of documented usage of the Name (usually in the form of a publication).
Before I provide my own answers to your specific questions, though, I want to underscore what I feel is a fundamentally important issue that needs to be addressed early on in any serious discussion of GUIDs for taxonomic names. There is no broad agreement on what a unit "Name" really is, or should be. Consider the following list:
1. Pomacanthidae 2. Pomacanthinae 3. Centropyge 4. Xiphypops 5. Centropyge (Xiphypops) 6. Centropyge flavicaudus 7. Centropyge flavicauda 8. Xiphypops flavicaudus 9. Centropyge (Xiphypops) flavicauda 10. Centropyge fisheri 11. Centropyge fisheri flavicauda 12. Centropyge (Xiphypops) fisheri flavicauda
How many Name-GUIDs would be needed for the above list? From one perspective there would be twelve GUIDs -- one for each "namestring". In ITIS, there would be ten TSNs (#9 would not receive a separate TSN from #7, nor would #12 receive a separate TSN from #11). From the botanical perspective (imagining these as botanical names), there would be at least seven (#6 & #7 would be spelling variants of the same "name", and I don't believe that #9 and #12 would be treated as different "names" from #7 and #11, respectively), and perhaps eight (not sure if #1 & #2 would be the same or different "names", the former being at rank Family, and the latter Subfamily). From the zoological perspective, there may be only five: [1+2], [3], [4+5], [6+7+8+9+11+12], [10] (the various flavors of each "Name" unit would be considered attributes of the usage -- i.e., tied to the Concept object).
Before a GUID system can be implemented for taxon names, there needs to be a clear definition of what "unit" of name should receive a unique GUID, vs. what textual elements represent attributes of a usage (~concept) instance. No definition is perfectly unambiguous in all cases, but I think it's important that the broader community adopt a SINGLE definition of what a Name unit is. Having separate systems for Botany vs. Zoology vs. whatever would, I think, go a very long way toward defeating the purpose of establishing taxon name GUIDs in the first place.
Now on to the specific questions:
Is your data organised using taxon names or to taxon concepts?
I use Taxon concepts as the core unit, with only one series of ID #s (32-bit integers). Name IDs are derived from a defined subset of Concept IDs (the original description usage instance for each name). For a full explanation, see: www.phyloinformatics.org/pdf/1.pdf
Note: I would NOT recommend this approach (names IDs derived from subset of concept IDs) for GUIDs. It works WONDERFULLY and elegantly for my Taxonomer application, where ID numbers are always passed in context. But for universally accessed GUIDs, there may be ambiguity whether ID#12345 references the concept asserted within the original description of a name, or just the concept-less name object.
Do you assign any reusable identifiers to taxon names or concepts (i.e. identifiers used in more than one database)?
I guess it depends on what you mean by "one database". I think the best answer to your question for the "databases" I manage is "yes".
If so, what is the process in assigning new identifiers for additional taxa and for accommodating taxonomic change?
New names & concepts are created from multiple sources, and identifiers are assigned automatically within a single, common taxon data table accessed by all sources via the network. Because records represent Name-usage instances, they never need to change (except for correcting data entry/transcription errors). Changing taxonomies are documented automatically simply by virtue of the fact that each usage is treated as a separate record, so the data table creates a history of alternate usages over time. A single internal "current use" taxonomy is established by selecting a single usage record for each "Name" (sensu zoological perspective), representing the specific usage that we feel got it "right".
Where are these identifiers used (other organizations, databases, data exchange, recording forms, etc.)?
At this moment, they are used only internally within our institution. Soon, they will be shared among partners of the Pacific Basin Information Node (PBIN) -- part of the U.S. National Biological Information Infrastructure (NBII).
Do you use identifiers from any external classification within your database?
Not sure what this means, exactly, but we do cross-map our IDs to other IDs (e.g., ITIS TSNs, Catalog of Fishes ID numbers, etc.). And the nature of our data structure (tracking usage instances) automatically keeps track of multiple classifications.
Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
Not really -- depending on how a Name "unit" is scoped (as per my discussion above).
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
participants (1)
-
Richard Pyle