[tdwg-content] Producing a global taxon register (was:ITIS TSNID to uBio NamebankIDs mapping)

Sun Jun 5 21:35:37 CEST 2011

One of the complicating factors in producing a GTR is the human or real-world element.  Even if all the academically/taxonomist-generated correct name-taxon relationships could be modeled to the satisfaction of all the use cases that are out there, and that seems nigh impossible based on list traffic of the last few months that seem to suggest a few use cases are orthogonal, there then comes the truth that the only facts that can be employed in a digital/online repository have to be digital. Mental and print-only concepts are out of reach for computers. The vast majority of taxonomic facts are still in paper form, even after all the work of BHL. And the digitized facts are practically never comprehensive, frequently leaving out the really useful details for a taxon concept, and those suboptimal name-taxon facts then have the inevitable human errors randomly mixed in.  I think we need techniques and processes that work with the real name and taxon data that are available with all their imperfections/gaps to arrive at a "best guess" about taxon relationships rather than expecting the availability of ideal data to populate an ideal GTR. We need better algorithms, models, and services that work with fuzzy concepts, probabilities, quality indices, and possibly sensitivity analyses to deal with missing, inaccurate and conflicting name and taxon data sources.

Chuck

Chuck Miller
VP-IT & CIO
Missouri Botanical Garden
4344 Shaw Boulevard
Saint Louis, MO 63110 USA

On Jun 5, 2011, at 12:14 PM, "Richard Pyle" <deepreef at bishopmuseum.org> wrote:

>> * a GTR - global taxon register - is something else entirely, at least
>> if the term is taken literally. It would be indispensable if the purpose
>> "to index all usages of all names in all sources" is to be realized.
> 
> Yes, that would be nice.  But as Tony indicated, that would be impractical
> for the foreseeable future.  Especially when you consider that "all sources"
> encompasses not only "all publications" (including popular books and
> magazine articles, newspaper articles, etc., etc.), but also all unpublished
> sources (museum specimen labels, field notebooks, personal correspondence,
> etc., etc.).  The GNUB model is designed to accommodate any & all of these,
> but a proactive attempt to populate it to that extent would represent an
> unrealistic amount of effort.
> 
> However, an enormous benefit would be achieved it a select subset of "all
> usages of all names in all sources" was targeted.  For example, the first
> priority for populating GNUB will be:
> 
>> a complete nomenclatural index
>> (inventorying all nomenclatural acts), 
> 
> And the next step would be:
> 
>> moving towards
>> lists of currently accepted names 
> 
> That is, capturing the specific usage instances for each that reflect a
> modern taxonomic landscape.  Of course, there is more than one
> interpretation of the "modern taxonomic landscape" (i.e., different opinions
> about how to structure the HCAL). Therefore, you need a spectrum of modern
> usage instances to capture all of the popular HCAL perspectives.
> 
>> Names and taxa are quite different things and they are interconnected
>> in a complex way.
> 
> I don't think that the interconnection is all that complex.  In the same way
> that nomenclature and biology intersect at the type specimen, names and taxa
> intersect at the Taxon Name Usage instance.  The analogy is reasonably good.
> A scientific name is "anchored" to the biological world through the type
> specimen. Likewise, a taxon concept is anchored to a name through a taxon
> name usage instance.  Not all taxon name usage instances rise to the level
> of an explicit or implicit taxon concept definition.  However, all taxon
> concept definitions exist in the form of a Taxon Name Usage instance.
> 
> The problem, as Tony alluded to, is that TNU instances are so abundant that
> it can be overwhelming to contemplate the TNU universe in its entirety.
> Dave Remsen referred to TNUs as the "individual molecules" of taxonomy. When
> we look at a physical object, we don't think of it in terms of an assemblage
> of individual molecules; we abstract it to the entire object.  This is why
> we have so many databases that focus on the HCAL -- it's much more direct to
> capture the entire object (in this case, taxon concept), than to enumerate
> all of the molecules that comprise it.  
> 
> But unlike physical objects and their constituent molecules, there are
> "special" TNUs that stand out from all the rest.  Capturing a few of these
> "special" TNUs will allow us to get most of the benefit in representing the
> parts of the taxon concept we're interested.  As already noted, these
> "special" TNUs include all the relevant nomenclatural acts for all of the
> names that have been associated with that taxon concept, as well as the main
> concept definitions (e.g., published taxonomic treatments that may or may
> not carry nomenclatural acts with them).  In other words, unlike trying to
> describe a physical object by enumerating its individual molecules, we can
> capture the majority of our interest in taxon names and concepts by
> enumerating only a small fraction of the TNUs (i.e., the aforementioned
> "special" ones).
> 
> Aloha,
> Rich
> 
> 
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
> Associate Zoologist in Ichthyology
> Dive Safety Officer
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
> 
> 
> 
> 
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content