[tdwg-content] DwC taxonomic terms
pete.devries at gmail.com
Fri Sep 4 04:31:34 CEST 2009
Your proposed plan does not actually give researchers what they need for
large scale analysis.
They need to know what is "meant" by a particular identifier.
In the mosquito community we have a split between those who have adopted *
Ochlerotatus* as a genus.
For some this changed *Aedes triseriatus* to *Ochlerotatus triseriatus*,
others refuse to adopt the new name.
Are these the same or different things?
Under your scheme they are different things because the idea that an entity
is a species is merged
with the particular taxonomic placement of that entity.
How does your proposal solve this?
What is needed is a linked data identifier that resolves to data that help
determine those instances of
*Aedes triseriatus* and *Ochlerotatus triseriatus* that are the same, and
those instances that are different.
In reference to the earlier discussion on separating identifiers from
resolution, how will a user determine
if occurrences tagged with the *Aedes triseriatus* UUID or LSID and those
tagged with the *Ochlerotatus*
*triseriatus* LSID are referring to the same species?
The proposed solution leaves users with just a name and no clear way of
determining what the person identifying
the specimen actually meant. The original species description is amazingly
Most non-taxonomist's don't care that much about what particular genus
something is in. They care that
the specimens they collected with malaria parasites are linked to other
specimens of the same species.
At those times they do care, they want quick way to lookup the current name
*i.e. phylogenetic hypothesis*
**that can remain linked to their data.
If you leave in the TaxonConceptID, then users have a choice of filling it
in or ignoring it. For those that would
like to use something like this, it will dramatically improve data
integration and move disagreements about
name changes in the background. A change, that I think, would improve the
relationship between taxonomists
and other biological scientists.
There were a number of other issues in previous emails that suggested that
the taxonomic community
has chosen to rehash informatics issues that have already been thoroughly
discussed by the scientific
informatics community. What is somewhat alarming is that they seem to have
come to completely
Also the thread on "trust" seemed particularly misinformed. If the writer
intended to imply that by going to
the current GBIF site they can "trust" the data, they are wrong. I see no
mechanism on the GBIF home
page that allows me to determine that this is the "real" GBIF site.
This is not meant to disparage GBIF, but to clarify the discussion. In fact
the person who seems to be
the most concerned with "trust" does not have any way to authenticate that
his highly touted resolution
service is the "real" one.
I suspect that the "trust" issue was either particularly uninformed or a
smoke screen for a different issue
which may be about data and services from cronies vs. data and services from
If you don't trust a particular provider, you can just remove those URI's
from your data store by filtering by
"context" or reification.
Pete DeVries <http://spiders.entomology.wisc.edu/pjd/index.html>
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries at wisc.edu
GeoSpecies Knowledge Base <http://species.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
On Wed, Sep 2, 2009 at 2:53 PM, Richard Pyle <deepreef at bishopmuseum.org>wrote:
> Greetings (again)...
> With a slightly more rested brain, I'll provide some more specific feedback
> on the DwC Taxonomy terms. I'll use John's Aug 25 proposed list of terms &
> definitions as a starting point.
> (Tim -- go get a cup of coffee before continuing....)
> > taxonID: An identifier for a specific taxon-related name usage (a
> > Taxon record). May be a global unique identifier or an identifier
> > specific to the data set.
> As I said in my previous post, I worry that "taxon" is too familiar, and
> too many meanings such that, without reviewing the definition, people may
> jump to the wrong conclusion about what sort of data object should be
> resolved through this ID. As klunky as it is, I feel it better to be
> unambiguous and use something like "taxonNameUsageID" This is the term
> has adopted; and while GNUB is still in early draft form, it took literally
> decades of deliberation to finally arrive at that term. If GNA & GNUB gain
> the traction that many of us are hoping it will, I believe that the term
> "TaxonNameUsage" will become much more familiar to managers of taxonomic
> data in the future. Thus, I would propose:
> taxonNameUsageID: An identifier for a specific taxon-related name usage
> instance (a particular name as it is used within the context of a
> publication or other documentation source). May be a global unique
> identifier or an identifier specific to the data set.
> > acceptedTaxonID: A unique identifier for the acceptedTaxon.
> I'm not exactly sure what this is supposed to represent, but I gather that
> it is used in cases where the taxon name for this record is not regarded as
> the accepted taxon name. Stan wrote:
> > In the context of an identification, yes, a taxon is asserted
> > to be valid/accepted by the identifier (at the time), but not
> > all identifications are accepted by the data manager, so that
> > last statement isn't always true. Also not all taxa are
> > accepted/valid within a classification (if it includes
> > synonymous taxa).
> If this is the purpose for the "acceptedTaxonID" (and I agree it's
> to represent this), then I think we need to be more explicit about what is
> meant by accepted. For example, consider these three different meanings
> (I'll use the terms provided by John, rather than my recommended terms):
> 1. Accepted in the sense of name orthograpgy
> A specimen was identified as "Centropyge loricula", so the TaxonID resolves
> to this name. The data manager knows that the correct orthography is
> "Centropyge loriculus", so acceptedTaxonID resolves to that name.
> 2. Accepted in the sense of subjective synonymy
> A specimen was identified as "Centropyge flammeus", so the TaxonID resolves
> to this name. The data manager follows modern literature in treating this
> name as a junior synonym of C. loriculus, so acceptedTaxonID resolves to
> "Centropyge loriculus".
> 3. Accepted in the sense of Concept Circumscription
> A specimen was identified as "Centropyge loriculus" and the TaxonID
> to the usage instance of "Centropyge loriculus Günther 1874 sec Woods &
> Schultz 1953", but the data manager feels this is not the most appropriate
> circumscription for the taxon represented by the specimens, so
> acceptedTaxonID resolves to the usage instance of "Centropyge loriculus
> Günther 1874 sec Allen 1975".
> In my mind, all three of these would be appropriate use cases for
> acceptedTaxonID; but I suspect some people would not regard #3 as
> appropriate. As long as taxonID and acceptedTaxonID both point to Usage
> instances, it doesn't really matter, because a resolved Usage Instance
> record will provide the full set of metadata to do whatever comparison
> (orthography/synonymy/circumscription) the consumer of the record wishes to
> do. However, I do think the definition of the term should address these
> different possible resolutions of meaning.
> The draft GNUB structure (which I can send to anyone who is interested) has
> a field called "ValidUsageID", which is a recursive foreign key to the same
> or a different Usage Instance, and is used explicitly for synonym
> (#2 in the above list). Best to explain by example:
> Each row below represents a Taxon Name Usage Instance, and "VUID" refers to
> TNUID Reference VUID FullName
> 1 Günther 1874 1 Centropyge loriculus
> 2 Woods&Schultz 1953 2 Centropyge flammeus
> 3 Allen 1975 3 Centropyge loriculus
> 4 Allen 1975 3 Centropyge flammeus
> For the first three records, TNUID=VUID. This means that each of those
> publications treated each of those names as a valid species. By contrast,
> TNUID 4 has VUID 3 (i.e., TNUID<>VUID), which means that Allen 1975 treated
> the name "Centropyge flammeus" as a junior synonym of "Centropyge
> loriculus". Note that in the GNUB data model, the TNUID link must point to
> TNUID within the Reference. For example, in row #4, TNUID=3; not 1. In
> simplest terms, row #4 translates to "Allen 1975 regarded Centropyge
> flammeus as a junior synonym of Centropyge loriculus." In other words,
> relationship applies specifically to use-case #2 in the list above.
> As for the term itself, my recommendation would depend on which of the
> use-case examples listed above the term "acceptedTaxonID" is intended to
> represent. If it is really only meant for Use-case #2 (synonymy), then I
> would recommend following GNUB with "validUsageID". However, I think it's
> probably best to leave the scope of meaning of the term open to any of
> use-cases, in which case I would recommend the term "acceptedUsageID". But
> in either case, I think the definition needs to be more explicit.
> > higherTaxonID: A unique identifier for the taxon that is the parent of
> > the scientificName.
> Again, why not be explicit? Following the "taxon" root-stem approach, this
> should probably be "parentTaxonID". In the GNUB data model, the field used
> for this exact same purpose is "ParentUsageID". So, accordingly, my
> recommendation for the DwC term wothld be "parentUsageID".
> > originalTaxonID: A unique identifier for the basionym (botany),
> > basonym (bacteriology), or replacement of the scientificName.
> I wrestled with this term a lot when developing the Taxonomer data model,
> and launched several threads on Taxacom about it, and discussed it
> extensively with many database nerds and taxononmy nerds of all Code
> flavors. "Protologue" was the closes existing term to what this term is
> intended for, but the problem with "Protologue" (a term familiar to
> botanical taxonomists) is that it may be spread across more than one
> publication. As I understand it, it's the set of Usage Instances that
> collectively fulfill the criteria for a name being validly published. I
> finally decided on the term "Protonym". Although I later discovered that
> this word had been defined in a different way in the context of fungi
> taxonomy, I was assured by Paul Kirk (curator of Index Fungorum) that my
> of the term should take precedence. Consequently, the term we use in GNUB
> (Paul is one of the original architects of GNUB) is "ProtonymID".
> I'm not necessarily pushing for DwC to adopt this term; however, I am
> reasonably confident that GNUB will retin it, and depending on the future
> success of GNUB, it may end up becoming solidified in our community. As
> such, I think "protonymID" is the best term to use for DwC. However, if
> this is not acceptable, then I would suggest "originalUsageID" as a more
> explicit alternative.
> > scientificName: The taxon name (with date and authorship information
> > if applicable). When forming part of an Identification, this should be
> > the name in the lowest level taxonomic rank that can be determined.
> > This term should not contain Identification qualifications, which
> > should instead be supplied in the IdentificationQualifier term.
> This is probably fine, but it sort of depends on where DwC settles on the
> definition of "acceptedTaxon(ID)/acceptedUsage(ID)". If the scope includes
> orthographic variants, then the definition of scientificName should be
> expanded to explicitly refer to "exact orthography" (which may or may not
> match the orthography represented by acceptedXXX). In GNUB, each usage has
> a field called "VerbatimNameString", which is intended to capture the exact
> string of characters (as best as can be represented via UTF-8) that
> in the publcation/reference. However, I don't think this is necessary for
> DwC. But I do think the definition of scientificName should make comment
> > acceptedTaxon: The currently valid (zoological) or accepted
> > (botanical) name for the scientificName.
> This definition suggests that this term applies only to my use-case #2
> (synonymies). As described earlier, in GNUB (which was initially developed
> by two botanists and one zoologist), the term "valid" was used instead of
> "accepted". Either one will do, but I think it makes sense to follow GNUB.
> In any case, I would propose the following:
> If the intent is only for taxonomic synonymies (use-case 2), then go with
> "validUsage" to be consistent with GNUB, and recommend that a full
> usage-instance string ("Centropyge loriculus Günther 1874 sec Allen 1975")
> be provided, if available.
> If the intent is less specific, and is open to
> orthographic/synonym/circumscription relationships, then go with
> "acceptedUsage" (with the same full usage-instance string)
> > higherTaxon: The taxon that is the parent of the scientificName.
> Again, I would go with "parentUsage", and recommend the full usage-instance
> > originalTaxon: The basionym (botany), basonym (bacteriology), or
> > replacement of the scientificName..
> As per above, I would go with "protonym" (which need only be a name-string,
> such as "Centropyge loriculus Günther 1874"); but if not protonym, then
> > higherClassification: A list (concatenated and separated) of the names
> > for the taxonomic ranks less specific than that given in the
> > scientificName.
> I'm fine with this.
> > kingdom, phylum, class, order, family, genus, subgenus,
> > specificEpithet, infraspecificEpithet - all unchanged.
> Fine by me.
> > taxonRank: The taxonomic rank of the scientificName. Recommended best
> > practice is to use a controlled vocabulary.
> Fine by me.
> > verbatimTaxonRank: The verbatim original taxonomic rank of the
> I think this is OK, but I'm not entirely sure how strictly the term
> "verbatim" is applied. For example, should this be verbatim as it appears
> on the specimen label or original database record (e.g., "f." if it says
> "f."; "forma" if it says "forma", etc.) Or, does it just mean the
> "interpreted" rank (i.e., convert "f." to "forma"). My inclination is the
> former; but for most names (i.e., those without explicit rank qualifiers
> embedded within the name-string), this would be blank. For example, all
> species and higher ranks would be blank, because nobody explicitly writes
> "species" when listing a species name. To a zoologist, a subspecies name
> looks like "Centropyge loriculus flammeus", but to a botanist it looks like
> "Centropyge loriculus subsp. flammeus". Sensu stricto, the use of the word
> "verbatim" would imply that the zoologist would leave this item empty, but
> the botanist would enter "subsp." Do I interpret this correctly? Or (as I
> suspect), do I misunderstand the purpose of this item.
> > scientificNameAuthorship, nomenclaturalCode - unchanged
> Fine by me.
> > taxonPublicationID: A unique identifier for the publication of the Taxon.
> Presumably this would be the publication to which the specific usage
> instance for taxonID/taxonNameUsageID is anchored. If so, then I think the
> definition needs to be expanded. As written, some people might interpret
> the publication as always being the original publication (i.e., the
> 1874" of "Centropyge loriculus Günther 1874 sec Allen 1975"). Others might
> (more correctly, in my view) interpret it as the concept definition
> publication (i.e., the "Allen 1975" of "Centropyge loriculus Günther 1874
> sec Allen 1975").
> > taxonPublication: A reference for the publication of the Taxon.
> Same comment as above.
> > taxonomicStatus, nomenclaturalStatus, taxonAccordingTo, taxonRemarks,
> > vernacularName - unchanged.
> I'm fine with all of these except possibly taxonAccordingTo, which I need
> think about some more.
> Sorry for the long post -- I'm just making up for having not been part of
> this discussion earlier. I am more than happy to help draft revised
> definitions for all of these terms, but only after we resolve their
> scope & meaning.
> By the way, where do I find the current draft definitions for all these
> terms? When I go to http://code.google.com/p/darwincore/wiki/Taxon, I only
> see definitions for three of the terms.
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
> and Associate Zoologist in Ichthyology
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tdwg-content