Greetings,
I only just now subscribed to this list, and I apologize for not coming into the discussion earlier. I've just finished reading this complete thread in the archives, and I want to make a few comments while it's frseh in mind (as fresh as one's mind can be at 3:30am). I will expand (expound?) some more tomorrow.
I understand the desire to stick with familiar terms; but in the case of taxonomy, "familiar" can be a bad thing. Several critical words (e.g., "name", "concept", "taxon", etc.) mean slightly (sometimes not so slightly) different things to different people. As such, relying on the term itself to inform users of what the term represents (without referring them to the definition) can lead to disparate applications of the term in provided datasets. In my mind, it's better to use a less-familiar term that requires users to consult the definitoion, to reduce the chance of misapplication; rather than save the end user the trouble of consulting the definition by trying to use a familiar term that is definied differently by different people. Too much opportunity for a person to jump to the wrong conclusion about what content is expected in association with a particular term.
One of the biggest complaints I had about TCS 1.0 was the distinction between "name" and "concept". It was always my intent to try to suppress or eliminate this distinction in TCS 2.0; so I'm less eager than Stan is to cling to the terms as they exist in TCS 1.0.
Don't get me wrong -- I have been wrestling with data modelling of taxon names and taxon concepts since about 1990, so I am *VERY* familiar with what people mean when they distinguish names from concepts. But at an informatics level, I think Markus was absolutely right when he defined the "usage" as the most granular (and convenient) data object you can use to refer to either taxon concepts or taxon names. Our community has struggled with what to call this "thing" for a long time. Walter called it "Potential Taxon". I first started calling it "TaxonRef" (short for "Taxon Reference", based on pretty-much the same logic Dave Remsen alluded to). Then I started calling it "Assertion" (sensu: http://systbio.org/files/phyloinformatics/1.pdf). James Ytow had something similar called "Appearance"; but after years of conversations with him, we finally established that his "Appearance" was something slightly different (actually more granular). Others have called it a "Treatment" or "Taxon Treatment".
In developing GNUB, we finally settled on "TaxonNameUsage", because that was both explicit, and generic (and also wouldn't likely be confused with anything else in our field). Yes, it's cumbersome, but I think it represents the right balance of self-describing but without potentially disparate preconceived notions.
The definition is the usage or application of a taxon name within a particular documented context. "Documented context" is mostly published literature, but can also include any other forms of documentation, such as correspondence, unpublished manuscripts, single-copy documents such as field notes, specimen identification tags, etc. This is the core unit of information that GNUB will index and assign shared GUIDs to. I am absolutely convinced that it will become the standard currency for referencing taxon names and concepts.
The point that Markus was trying to make is that a TaxonNameUsage instance carries both an implied (or explicit) taxon concept circumscription, and also the nomenclatural metadata associated with how that circumcription was labelled (i.e., the "name"). This doesn't mean it's ambiguous, because it is what it is: a discrete Usage Instance. The difference is in what set of metadata is harvested from the identified Usage Instance. For example, consider:
1. Aus bus Smith 1950 sec Smith 1950 2. Aus bus Smith 1950 sec Jones 1960 3. Xus bus (Smith 1950) sec Brown 1970
We have one species epithet, and three TaxonNameUsage instances (TNUs). #1 is the original taxonomic description of the species "bus", which was originally combined with the genus "Aus". This has both a name part (Aus bus Smith 1950), and an implied taxon concept (sec Smith 1950). If I resolve any of these TNUs for nomenclatural information, I get a genus name, species epithet, an author, and a year. If I resolve them for taxon concept information, I get all the name bits plus the according to stuff ("sec" stuff); plus any other taxon concept information that is resolvable through that particular usage instance. Thus, whether you're interested in the nomenclature or the concept circumscription, you get both (explicitly) from referencing a TNU instance.
I'm too tired to write any more now, but I plan to expand on this tomorrow, with specific reference to the DwC terms and in the context of GNUB.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Greetings (again)...
With a slightly more rested brain, I'll provide some more specific feedback on the DwC Taxonomy terms. I'll use John's Aug 25 proposed list of terms & definitions as a starting point.
(Tim -- go get a cup of coffee before continuing....)
taxonID: An identifier for a specific taxon-related name usage (a Taxon record). May be a global unique identifier or an identifier specific to the data set.
As I said in my previous post, I worry that "taxon" is too familiar, and has too many meanings such that, without reviewing the definition, people may jump to the wrong conclusion about what sort of data object should be resolved through this ID. As klunky as it is, I feel it better to be unambiguous and use something like "taxonNameUsageID" This is the term GNUB has adopted; and while GNUB is still in early draft form, it took literally decades of deliberation to finally arrive at that term. If GNA & GNUB gain the traction that many of us are hoping it will, I believe that the term "TaxonNameUsage" will become much more familiar to managers of taxonomic data in the future. Thus, I would propose:
taxonNameUsageID: An identifier for a specific taxon-related name usage instance (a particular name as it is used within the context of a particular publication or other documentation source). May be a global unique identifier or an identifier specific to the data set.
acceptedTaxonID: A unique identifier for the acceptedTaxon.
I'm not exactly sure what this is supposed to represent, but I gather that it is used in cases where the taxon name for this record is not regarded as the accepted taxon name. Stan wrote:
In the context of an identification, yes, a taxon is asserted to be valid/accepted by the identifier (at the time), but not all identifications are accepted by the data manager, so that last statement isn't always true. Also not all taxa are accepted/valid within a classification (if it includes synonymous taxa).
If this is the purpose for the "acceptedTaxonID" (and I agree it's important to represent this), then I think we need to be more explicit about what is meant by accepted. For example, consider these three different meanings (I'll use the terms provided by John, rather than my recommended terms):
1. Accepted in the sense of name orthograpgy A specimen was identified as "Centropyge loricula", so the TaxonID resolves to this name. The data manager knows that the correct orthography is "Centropyge loriculus", so acceptedTaxonID resolves to that name.
2. Accepted in the sense of subjective synonymy A specimen was identified as "Centropyge flammeus", so the TaxonID resolves to this name. The data manager follows modern literature in treating this name as a junior synonym of C. loriculus, so acceptedTaxonID resolves to "Centropyge loriculus".
3. Accepted in the sense of Concept Circumscription A specimen was identified as "Centropyge loriculus" and the TaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Woods & Schultz 1953", but the data manager feels this is not the most appropriate circumscription for the taxon represented by the specimens, so acceptedTaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Allen 1975".
In my mind, all three of these would be appropriate use cases for acceptedTaxonID; but I suspect some people would not regard #3 as appropriate. As long as taxonID and acceptedTaxonID both point to Usage instances, it doesn't really matter, because a resolved Usage Instance record will provide the full set of metadata to do whatever comparison (orthography/synonymy/circumscription) the consumer of the record wishes to do. However, I do think the definition of the term should address these different possible resolutions of meaning.
The draft GNUB structure (which I can send to anyone who is interested) has a field called "ValidUsageID", which is a recursive foreign key to the same or a different Usage Instance, and is used explicitly for synonym treatments (#2 in the above list). Best to explain by example:
Each row below represents a Taxon Name Usage Instance, and "VUID" refers to ValidUsageID.
TNUID Reference VUID FullName ==================================================== 1 Günther 1874 1 Centropyge loriculus 2 Woods&Schultz 1953 2 Centropyge flammeus 3 Allen 1975 3 Centropyge loriculus 4 Allen 1975 3 Centropyge flammeus ====================================================
For the first three records, TNUID=VUID. This means that each of those publications treated each of those names as a valid species. By contrast, TNUID 4 has VUID 3 (i.e., TNUID<>VUID), which means that Allen 1975 treated the name "Centropyge flammeus" as a junior synonym of "Centropyge loriculus". Note that in the GNUB data model, the TNUID link must point to TNUID within the Reference. For example, in row #4, TNUID=3; not 1. In simplest terms, row #4 translates to "Allen 1975 regarded Centropyge flammeus as a junior synonym of Centropyge loriculus." In other words, this relationship applies specifically to use-case #2 in the list above.
As for the term itself, my recommendation would depend on which of the three use-case examples listed above the term "acceptedTaxonID" is intended to represent. If it is really only meant for Use-case #2 (synonymy), then I would recommend following GNUB with "validUsageID". However, I think it's probably best to leave the scope of meaning of the term open to any of these use-cases, in which case I would recommend the term "acceptedUsageID". But in either case, I think the definition needs to be more explicit.
higherTaxonID: A unique identifier for the taxon that is the parent of the scientificName.
Again, why not be explicit? Following the "taxon" root-stem approach, this should probably be "parentTaxonID". In the GNUB data model, the field used for this exact same purpose is "ParentUsageID". So, accordingly, my recommendation for the DwC term wothld be "parentUsageID".
originalTaxonID: A unique identifier for the basionym (botany), basonym (bacteriology), or replacement of the scientificName.
I wrestled with this term a lot when developing the Taxonomer data model, and launched several threads on Taxacom about it, and discussed it extensively with many database nerds and taxononmy nerds of all Code flavors. "Protologue" was the closes existing term to what this term is intended for, but the problem with "Protologue" (a term familiar to botanical taxonomists) is that it may be spread across more than one publication. As I understand it, it's the set of Usage Instances that collectively fulfill the criteria for a name being validly published. I finally decided on the term "Protonym". Although I later discovered that this word had been defined in a different way in the context of fungi taxonomy, I was assured by Paul Kirk (curator of Index Fungorum) that my use of the term should take precedence. Consequently, the term we use in GNUB (Paul is one of the original architects of GNUB) is "ProtonymID".
I'm not necessarily pushing for DwC to adopt this term; however, I am reasonably confident that GNUB will retin it, and depending on the future success of GNUB, it may end up becoming solidified in our community. As such, I think "protonymID" is the best term to use for DwC. However, if this is not acceptable, then I would suggest "originalUsageID" as a more explicit alternative.
scientificName: The taxon name (with date and authorship information if applicable). When forming part of an Identification, this should be the name in the lowest level taxonomic rank that can be determined. This term should not contain Identification qualifications, which should instead be supplied in the IdentificationQualifier term.
This is probably fine, but it sort of depends on where DwC settles on the definition of "acceptedTaxon(ID)/acceptedUsage(ID)". If the scope includes orthographic variants, then the definition of scientificName should be expanded to explicitly refer to "exact orthography" (which may or may not match the orthography represented by acceptedXXX). In GNUB, each usage has a field called "VerbatimNameString", which is intended to capture the exact string of characters (as best as can be represented via UTF-8) that appeared in the publcation/reference. However, I don't think this is necessary for DwC. But I do think the definition of scientificName should make comment on orthography.
acceptedTaxon: The currently valid (zoological) or accepted (botanical) name for the scientificName.
This definition suggests that this term applies only to my use-case #2 (synonymies). As described earlier, in GNUB (which was initially developed by two botanists and one zoologist), the term "valid" was used instead of "accepted". Either one will do, but I think it makes sense to follow GNUB. In any case, I would propose the following:
If the intent is only for taxonomic synonymies (use-case 2), then go with "validUsage" to be consistent with GNUB, and recommend that a full usage-instance string ("Centropyge loriculus Günther 1874 sec Allen 1975") be provided, if available.
If the intent is less specific, and is open to orthographic/synonym/circumscription relationships, then go with "acceptedUsage" (with the same full usage-instance string)
higherTaxon: The taxon that is the parent of the scientificName.
Again, I would go with "parentUsage", and recommend the full usage-instance string.
originalTaxon: The basionym (botany), basonym (bacteriology), or replacement of the scientificName..
As per above, I would go with "protonym" (which need only be a name-string, such as "Centropyge loriculus Günther 1874"); but if not protonym, then "originalUsage".
higherClassification: A list (concatenated and separated) of the names for the taxonomic ranks less specific than that given in the scientificName.
I'm fine with this.
kingdom, phylum, class, order, family, genus, subgenus, specificEpithet, infraspecificEpithet - all unchanged.
Fine by me.
taxonRank: The taxonomic rank of the scientificName. Recommended best practice is to use a controlled vocabulary.
Fine by me.
verbatimTaxonRank: The verbatim original taxonomic rank of the
scientificName.
I think this is OK, but I'm not entirely sure how strictly the term "verbatim" is applied. For example, should this be verbatim as it appears on the specimen label or original database record (e.g., "f." if it says "f."; "forma" if it says "forma", etc.) Or, does it just mean the "interpreted" rank (i.e., convert "f." to "forma"). My inclination is the former; but for most names (i.e., those without explicit rank qualifiers embedded within the name-string), this would be blank. For example, all species and higher ranks would be blank, because nobody explicitly writes "species" when listing a species name. To a zoologist, a subspecies name looks like "Centropyge loriculus flammeus", but to a botanist it looks like "Centropyge loriculus subsp. flammeus". Sensu stricto, the use of the word "verbatim" would imply that the zoologist would leave this item empty, but the botanist would enter "subsp." Do I interpret this correctly? Or (as I suspect), do I misunderstand the purpose of this item.
scientificNameAuthorship, nomenclaturalCode - unchanged
Fine by me.
taxonPublicationID: A unique identifier for the publication of the Taxon.
Presumably this would be the publication to which the specific usage instance for taxonID/taxonNameUsageID is anchored. If so, then I think the definition needs to be expanded. As written, some people might interpret the publication as always being the original publication (i.e., the "Günther 1874" of "Centropyge loriculus Günther 1874 sec Allen 1975"). Others might (more correctly, in my view) interpret it as the concept definition publication (i.e., the "Allen 1975" of "Centropyge loriculus Günther 1874 sec Allen 1975").
taxonPublication: A reference for the publication of the Taxon.
Same comment as above.
taxonomicStatus, nomenclaturalStatus, taxonAccordingTo, taxonRemarks, vernacularName - unchanged.
I'm fine with all of these except possibly taxonAccordingTo, which I need to think about some more.
Sorry for the long post -- I'm just making up for having not been part of this discussion earlier. I am more than happy to help draft revised definitions for all of these terms, but only after we resolve their intended scope & meaning.
By the way, where do I find the current draft definitions for all these terms? When I go to http://code.google.com/p/darwincore/wiki/Taxon, I only see definitions for three of the terms.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Richard, Your proposed plan does not actually give researchers what they need for large scale analysis.
They need to know what is "meant" by a particular identifier.
In the mosquito community we have a split between those who have adopted * Ochlerotatus* as a genus. For some this changed *Aedes triseriatus* to *Ochlerotatus triseriatus*, others refuse to adopt the new name.
Are these the same or different things?
Under your scheme they are different things because the idea that an entity is a species is merged with the particular taxonomic placement of that entity.
How does your proposal solve this?
What is needed is a linked data identifier that resolves to data that help determine those instances of
*Aedes triseriatus* and *Ochlerotatus triseriatus* that are the same, and those instances that are different.
In reference to the earlier discussion on separating identifiers from resolution, how will a user determine if occurrences tagged with the *Aedes triseriatus* UUID or LSID and those tagged with the *Ochlerotatus* *triseriatus* LSID are referring to the same species?
The proposed solution leaves users with just a name and no clear way of determining what the person identifying the specimen actually meant. The original species description is amazingly non-informative.
Most non-taxonomist's don't care that much about what particular genus something is in. They care that the specimens they collected with malaria parasites are linked to other specimens of the same species. At those times they do care, they want quick way to lookup the current name *i.e. phylogenetic hypothesis* **that can remain linked to their data.
If you leave in the TaxonConceptID, then users have a choice of filling it in or ignoring it. For those that would like to use something like this, it will dramatically improve data integration and move disagreements about name changes in the background. A change, that I think, would improve the relationship between taxonomists and other biological scientists.
There were a number of other issues in previous emails that suggested that the taxonomic community has chosen to rehash informatics issues that have already been thoroughly discussed by the scientific informatics community. What is somewhat alarming is that they seem to have come to completely opposite conclusions.
Also the thread on "trust" seemed particularly misinformed. If the writer intended to imply that by going to the current GBIF site they can "trust" the data, they are wrong. I see no mechanism on the GBIF home page that allows me to determine that this is the "real" GBIF site.
This is not meant to disparage GBIF, but to clarify the discussion. In fact the person who seems to be the most concerned with "trust" does not have any way to authenticate that his highly touted resolution service is the "real" one.
I suspect that the "trust" issue was either particularly uninformed or a smoke screen for a different issue which may be about data and services from cronies vs. data and services from non-cronies.
If you don't trust a particular provider, you can just remove those URI's from your data store by filtering by "context" or reification.
Respectfully,
- Pete
--------------------------------------------------------------- Pete DeVries http://spiders.entomology.wisc.edu/pjd/index.html Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu GeoSpecies Knowledge Base http://species.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/ ------------------------------------------------------------
On Wed, Sep 2, 2009 at 2:53 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Greetings (again)...
With a slightly more rested brain, I'll provide some more specific feedback on the DwC Taxonomy terms. I'll use John's Aug 25 proposed list of terms & definitions as a starting point.
(Tim -- go get a cup of coffee before continuing....)
taxonID: An identifier for a specific taxon-related name usage (a Taxon record). May be a global unique identifier or an identifier specific to the data set.
As I said in my previous post, I worry that "taxon" is too familiar, and has too many meanings such that, without reviewing the definition, people may jump to the wrong conclusion about what sort of data object should be resolved through this ID. As klunky as it is, I feel it better to be unambiguous and use something like "taxonNameUsageID" This is the term GNUB has adopted; and while GNUB is still in early draft form, it took literally decades of deliberation to finally arrive at that term. If GNA & GNUB gain the traction that many of us are hoping it will, I believe that the term "TaxonNameUsage" will become much more familiar to managers of taxonomic data in the future. Thus, I would propose:
taxonNameUsageID: An identifier for a specific taxon-related name usage instance (a particular name as it is used within the context of a particular publication or other documentation source). May be a global unique identifier or an identifier specific to the data set.
acceptedTaxonID: A unique identifier for the acceptedTaxon.
I'm not exactly sure what this is supposed to represent, but I gather that it is used in cases where the taxon name for this record is not regarded as the accepted taxon name. Stan wrote:
In the context of an identification, yes, a taxon is asserted to be valid/accepted by the identifier (at the time), but not all identifications are accepted by the data manager, so that last statement isn't always true. Also not all taxa are accepted/valid within a classification (if it includes synonymous taxa).
If this is the purpose for the "acceptedTaxonID" (and I agree it's important to represent this), then I think we need to be more explicit about what is meant by accepted. For example, consider these three different meanings (I'll use the terms provided by John, rather than my recommended terms):
- Accepted in the sense of name orthograpgy
A specimen was identified as "Centropyge loricula", so the TaxonID resolves to this name. The data manager knows that the correct orthography is "Centropyge loriculus", so acceptedTaxonID resolves to that name.
- Accepted in the sense of subjective synonymy
A specimen was identified as "Centropyge flammeus", so the TaxonID resolves to this name. The data manager follows modern literature in treating this name as a junior synonym of C. loriculus, so acceptedTaxonID resolves to "Centropyge loriculus".
- Accepted in the sense of Concept Circumscription
A specimen was identified as "Centropyge loriculus" and the TaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Woods & Schultz 1953", but the data manager feels this is not the most appropriate circumscription for the taxon represented by the specimens, so acceptedTaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Allen 1975".
In my mind, all three of these would be appropriate use cases for acceptedTaxonID; but I suspect some people would not regard #3 as appropriate. As long as taxonID and acceptedTaxonID both point to Usage instances, it doesn't really matter, because a resolved Usage Instance record will provide the full set of metadata to do whatever comparison (orthography/synonymy/circumscription) the consumer of the record wishes to do. However, I do think the definition of the term should address these different possible resolutions of meaning.
The draft GNUB structure (which I can send to anyone who is interested) has a field called "ValidUsageID", which is a recursive foreign key to the same or a different Usage Instance, and is used explicitly for synonym treatments (#2 in the above list). Best to explain by example:
Each row below represents a Taxon Name Usage Instance, and "VUID" refers to ValidUsageID.
TNUID Reference VUID FullName
1 Günther 1874 1 Centropyge loriculus 2 Woods&Schultz 1953 2 Centropyge flammeus 3 Allen 1975 3 Centropyge loriculus 4 Allen 1975 3 Centropyge flammeus ====================================================
For the first three records, TNUID=VUID. This means that each of those publications treated each of those names as a valid species. By contrast, TNUID 4 has VUID 3 (i.e., TNUID<>VUID), which means that Allen 1975 treated the name "Centropyge flammeus" as a junior synonym of "Centropyge loriculus". Note that in the GNUB data model, the TNUID link must point to TNUID within the Reference. For example, in row #4, TNUID=3; not 1. In simplest terms, row #4 translates to "Allen 1975 regarded Centropyge flammeus as a junior synonym of Centropyge loriculus." In other words, this relationship applies specifically to use-case #2 in the list above.
As for the term itself, my recommendation would depend on which of the three use-case examples listed above the term "acceptedTaxonID" is intended to represent. If it is really only meant for Use-case #2 (synonymy), then I would recommend following GNUB with "validUsageID". However, I think it's probably best to leave the scope of meaning of the term open to any of these use-cases, in which case I would recommend the term "acceptedUsageID". But in either case, I think the definition needs to be more explicit.
higherTaxonID: A unique identifier for the taxon that is the parent of the scientificName.
Again, why not be explicit? Following the "taxon" root-stem approach, this should probably be "parentTaxonID". In the GNUB data model, the field used for this exact same purpose is "ParentUsageID". So, accordingly, my recommendation for the DwC term wothld be "parentUsageID".
originalTaxonID: A unique identifier for the basionym (botany), basonym (bacteriology), or replacement of the scientificName.
I wrestled with this term a lot when developing the Taxonomer data model, and launched several threads on Taxacom about it, and discussed it extensively with many database nerds and taxononmy nerds of all Code flavors. "Protologue" was the closes existing term to what this term is intended for, but the problem with "Protologue" (a term familiar to botanical taxonomists) is that it may be spread across more than one publication. As I understand it, it's the set of Usage Instances that collectively fulfill the criteria for a name being validly published. I finally decided on the term "Protonym". Although I later discovered that this word had been defined in a different way in the context of fungi taxonomy, I was assured by Paul Kirk (curator of Index Fungorum) that my use of the term should take precedence. Consequently, the term we use in GNUB (Paul is one of the original architects of GNUB) is "ProtonymID".
I'm not necessarily pushing for DwC to adopt this term; however, I am reasonably confident that GNUB will retin it, and depending on the future success of GNUB, it may end up becoming solidified in our community. As such, I think "protonymID" is the best term to use for DwC. However, if this is not acceptable, then I would suggest "originalUsageID" as a more explicit alternative.
scientificName: The taxon name (with date and authorship information if applicable). When forming part of an Identification, this should be the name in the lowest level taxonomic rank that can be determined. This term should not contain Identification qualifications, which should instead be supplied in the IdentificationQualifier term.
This is probably fine, but it sort of depends on where DwC settles on the definition of "acceptedTaxon(ID)/acceptedUsage(ID)". If the scope includes orthographic variants, then the definition of scientificName should be expanded to explicitly refer to "exact orthography" (which may or may not match the orthography represented by acceptedXXX). In GNUB, each usage has a field called "VerbatimNameString", which is intended to capture the exact string of characters (as best as can be represented via UTF-8) that appeared in the publcation/reference. However, I don't think this is necessary for DwC. But I do think the definition of scientificName should make comment on orthography.
acceptedTaxon: The currently valid (zoological) or accepted (botanical) name for the scientificName.
This definition suggests that this term applies only to my use-case #2 (synonymies). As described earlier, in GNUB (which was initially developed by two botanists and one zoologist), the term "valid" was used instead of "accepted". Either one will do, but I think it makes sense to follow GNUB. In any case, I would propose the following:
If the intent is only for taxonomic synonymies (use-case 2), then go with "validUsage" to be consistent with GNUB, and recommend that a full usage-instance string ("Centropyge loriculus Günther 1874 sec Allen 1975") be provided, if available.
If the intent is less specific, and is open to orthographic/synonym/circumscription relationships, then go with "acceptedUsage" (with the same full usage-instance string)
higherTaxon: The taxon that is the parent of the scientificName.
Again, I would go with "parentUsage", and recommend the full usage-instance string.
originalTaxon: The basionym (botany), basonym (bacteriology), or replacement of the scientificName..
As per above, I would go with "protonym" (which need only be a name-string, such as "Centropyge loriculus Günther 1874"); but if not protonym, then "originalUsage".
higherClassification: A list (concatenated and separated) of the names for the taxonomic ranks less specific than that given in the scientificName.
I'm fine with this.
kingdom, phylum, class, order, family, genus, subgenus, specificEpithet, infraspecificEpithet - all unchanged.
Fine by me.
taxonRank: The taxonomic rank of the scientificName. Recommended best practice is to use a controlled vocabulary.
Fine by me.
verbatimTaxonRank: The verbatim original taxonomic rank of the
scientificName.
I think this is OK, but I'm not entirely sure how strictly the term "verbatim" is applied. For example, should this be verbatim as it appears on the specimen label or original database record (e.g., "f." if it says "f."; "forma" if it says "forma", etc.) Or, does it just mean the "interpreted" rank (i.e., convert "f." to "forma"). My inclination is the former; but for most names (i.e., those without explicit rank qualifiers embedded within the name-string), this would be blank. For example, all species and higher ranks would be blank, because nobody explicitly writes "species" when listing a species name. To a zoologist, a subspecies name looks like "Centropyge loriculus flammeus", but to a botanist it looks like "Centropyge loriculus subsp. flammeus". Sensu stricto, the use of the word "verbatim" would imply that the zoologist would leave this item empty, but the botanist would enter "subsp." Do I interpret this correctly? Or (as I suspect), do I misunderstand the purpose of this item.
scientificNameAuthorship, nomenclaturalCode - unchanged
Fine by me.
taxonPublicationID: A unique identifier for the publication of the Taxon.
Presumably this would be the publication to which the specific usage instance for taxonID/taxonNameUsageID is anchored. If so, then I think the definition needs to be expanded. As written, some people might interpret the publication as always being the original publication (i.e., the "Günther 1874" of "Centropyge loriculus Günther 1874 sec Allen 1975"). Others might (more correctly, in my view) interpret it as the concept definition publication (i.e., the "Allen 1975" of "Centropyge loriculus Günther 1874 sec Allen 1975").
taxonPublication: A reference for the publication of the Taxon.
Same comment as above.
taxonomicStatus, nomenclaturalStatus, taxonAccordingTo, taxonRemarks, vernacularName - unchanged.
I'm fine with all of these except possibly taxonAccordingTo, which I need to think about some more.
Sorry for the long post -- I'm just making up for having not been part of this discussion earlier. I am more than happy to help draft revised definitions for all of these terms, but only after we resolve their intended scope & meaning.
By the way, where do I find the current draft definitions for all these terms? When I go to http://code.google.com/p/darwincore/wiki/Taxon, I only see definitions for three of the terms.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Peter,
Thank you for your email, but it seems evident that either I failed to adequately explain how my proposed solution would work, or I fail to understand what you would expect to extract from resolving a TaxonConceptID. Perhaps you could describe for me what a TaxonConceptID would resolve to (or point me to an example), and how it would be more effective at addressing the needs of researchers? Also, can you be more explicit about what you mean by "things" in the question, "Are these the same or different things?" By "things", do you mean that both refer to the same original description of the species epithet "triseriatus", or do you do you mean that they represent the same taxon concept circumscription? I suspect you mean the latter, in which case it would also apply for two different references to "Aedes triseriatus".
Addressing the former, perhaps I didn't explain that a fundamental component of an object resolved through a TaxonNameUsageID is a link to the Protonym (~Basionym), and as such a link to a TaxonNameUsageID for Aedes triseriatus would itself cross-link to all TaxonNameUsage instances of Ochlerotatus triseriatus, thereby revealing the congruency of the "triseriatus" epithet. As for the latter, services built on top of TaxonNameUsage instances would go the next step and resolve whether or not two different references to the name "Aedes triseriatus" (or one to A. triseriatus and one to O. triseriatus) apply to the same taxon concept circumscription.
To be sure, these services do not exist yet, but they are unambiguously in development right now, and such an infrastructure for taxon name resolution were identified as a high priority at the eBiosphere conference, so there is now at least a reasonably clear roadmap to developing and implementing this infrastructure. By contrast, I'm not sure I've ever found two people who have exactly the same notion of what kind of object a TaxonConceptID would resolve to, and how its associated metadata would answer the question you pose below about Aedes triseriatus vs. Ochlerotatus triseriatus, (Which is why it would be helpful for me to see a specific example of a TaxonConceptID, and what it resolves to).
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
________________________________
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Thursday, September 03, 2009 4:32 PM To: Richard Pyle Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] DwC taxonomic terms Richard,
Your proposed plan does not actually give researchers what they need for large scale analysis.
They need to know what is "meant" by a particular identifier.
In the mosquito community we have a split between those who have adopted Ochlerotatus as a genus. For some this changed Aedes triseriatus to Ochlerotatus triseriatus, others refuse to adopt the new name.
Are these the same or different things?
Under your scheme they are different things because the idea that an entity is a species is merged with the particular taxonomic placement of that entity.
How does your proposal solve this?
What is needed is a linked data identifier that resolves to data that help determine those instances of
Aedes triseriatus and Ochlerotatus triseriatus that are the same, and those instances that are different.
In reference to the earlier discussion on separating identifiers from resolution, how will a user determine if occurrences tagged with the Aedes triseriatus UUID or LSID and those tagged with the Ochlerotatus triseriatus LSID are referring to the same species?
The proposed solution leaves users with just a name and no clear way of determining what the person identifying the specimen actually meant. The original species description is amazingly non-informative.
Most non-taxonomist's don't care that much about what particular genus something is in. They care that the specimens they collected with malaria parasites are linked to other specimens of the same species. At those times they do care, they want quick way to lookup the current name i.e. phylogenetic hypothesis that can remain linked to their data.
If you leave in the TaxonConceptID, then users have a choice of filling it in or ignoring it. For those that would like to use something like this, it will dramatically improve data integration and move disagreements about name changes in the background. A change, that I think, would improve the relationship between taxonomists and other biological scientists.
There were a number of other issues in previous emails that suggested that the taxonomic community has chosen to rehash informatics issues that have already been thoroughly discussed by the scientific informatics community. What is somewhat alarming is that they seem to have come to completely opposite conclusions.
Also the thread on "trust" seemed particularly misinformed. If the writer intended to imply that by going to the current GBIF site they can "trust" the data, they are wrong. I see no mechanism on the GBIF home page that allows me to determine that this is the "real" GBIF site.
This is not meant to disparage GBIF, but to clarify the discussion. In fact the person who seems to be the most concerned with "trust" does not have any way to authenticate that his highly touted resolution service is the "real" one.
I suspect that the "trust" issue was either particularly uninformed or a smoke screen for a different issue which may be about data and services from cronies vs. data and services from non-cronies.
If you don't trust a particular provider, you can just remove those URI's from your data store by filtering by "context" or reification.
Respectfully,
- Pete
--------------------------------------------------------------- Pete DeVries http://spiders.entomology.wisc.edu/pjd/index.html Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu GeoSpecies Knowledge Base http://species.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/ ------------------------------------------------------------
On Wed, Sep 2, 2009 at 2:53 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Greetings (again)... With a slightly more rested brain, I'll provide some more specific feedback on the DwC Taxonomy terms. I'll use John's Aug 25 proposed list of terms & definitions as a starting point. (Tim -- go get a cup of coffee before continuing....) > taxonID: An identifier for a specific taxon-related name usage (a > Taxon record). May be a global unique identifier or an identifier > specific to the data set. As I said in my previous post, I worry that "taxon" is too familiar, and has too many meanings such that, without reviewing the definition, people may jump to the wrong conclusion about what sort of data object should be resolved through this ID. As klunky as it is, I feel it better to be unambiguous and use something like "taxonNameUsageID" This is the term GNUB has adopted; and while GNUB is still in early draft form, it took literally decades of deliberation to finally arrive at that term. If GNA & GNUB gain the traction that many of us are hoping it will, I believe that the term "TaxonNameUsage" will become much more familiar to managers of taxonomic data in the future. Thus, I would propose: taxonNameUsageID: An identifier for a specific taxon-related name usage instance (a particular name as it is used within the context of a particular publication or other documentation source). May be a global unique identifier or an identifier specific to the data set. > acceptedTaxonID: A unique identifier for the acceptedTaxon. I'm not exactly sure what this is supposed to represent, but I gather that it is used in cases where the taxon name for this record is not regarded as the accepted taxon name. Stan wrote: > In the context of an identification, yes, a taxon is asserted > to be valid/accepted by the identifier (at the time), but not > all identifications are accepted by the data manager, so that > last statement isn't always true. Also not all taxa are > accepted/valid within a classification (if it includes > synonymous taxa). If this is the purpose for the "acceptedTaxonID" (and I agree it's important to represent this), then I think we need to be more explicit about what is meant by accepted. For example, consider these three different meanings (I'll use the terms provided by John, rather than my recommended terms): 1. Accepted in the sense of name orthograpgy A specimen was identified as "Centropyge loricula", so the TaxonID resolves to this name. The data manager knows that the correct orthography is "Centropyge loriculus", so acceptedTaxonID resolves to that name. 2. Accepted in the sense of subjective synonymy A specimen was identified as "Centropyge flammeus", so the TaxonID resolves to this name. The data manager follows modern literature in treating this name as a junior synonym of C. loriculus, so acceptedTaxonID resolves to "Centropyge loriculus". 3. Accepted in the sense of Concept Circumscription A specimen was identified as "Centropyge loriculus" and the TaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Woods & Schultz 1953", but the data manager feels this is not the most appropriate circumscription for the taxon represented by the specimens, so acceptedTaxonID resolves to the usage instance of "Centropyge loriculus Günther 1874 sec Allen 1975". In my mind, all three of these would be appropriate use cases for acceptedTaxonID; but I suspect some people would not regard #3 as appropriate. As long as taxonID and acceptedTaxonID both point to Usage instances, it doesn't really matter, because a resolved Usage Instance record will provide the full set of metadata to do whatever comparison (orthography/synonymy/circumscription) the consumer of the record wishes to do. However, I do think the definition of the term should address these different possible resolutions of meaning. The draft GNUB structure (which I can send to anyone who is interested) has a field called "ValidUsageID", which is a recursive foreign key to the same or a different Usage Instance, and is used explicitly for synonym treatments (#2 in the above list). Best to explain by example: Each row below represents a Taxon Name Usage Instance, and "VUID" refers to ValidUsageID. TNUID Reference VUID FullName ==================================================== 1 Günther 1874 1 Centropyge loriculus 2 Woods&Schultz 1953 2 Centropyge flammeus 3 Allen 1975 3 Centropyge loriculus 4 Allen 1975 3 Centropyge flammeus ==================================================== For the first three records, TNUID=VUID. This means that each of those publications treated each of those names as a valid species. By contrast, TNUID 4 has VUID 3 (i.e., TNUID<>VUID), which means that Allen 1975 treated the name "Centropyge flammeus" as a junior synonym of "Centropyge loriculus". Note that in the GNUB data model, the TNUID link must point to TNUID within the Reference. For example, in row #4, TNUID=3; not 1. In simplest terms, row #4 translates to "Allen 1975 regarded Centropyge flammeus as a junior synonym of Centropyge loriculus." In other words, this relationship applies specifically to use-case #2 in the list above. As for the term itself, my recommendation would depend on which of the three use-case examples listed above the term "acceptedTaxonID" is intended to represent. If it is really only meant for Use-case #2 (synonymy), then I would recommend following GNUB with "validUsageID". However, I think it's probably best to leave the scope of meaning of the term open to any of these use-cases, in which case I would recommend the term "acceptedUsageID". But in either case, I think the definition needs to be more explicit. > higherTaxonID: A unique identifier for the taxon that is the parent of > the scientificName. Again, why not be explicit? Following the "taxon" root-stem approach, this should probably be "parentTaxonID". In the GNUB data model, the field used for this exact same purpose is "ParentUsageID". So, accordingly, my recommendation for the DwC term wothld be "parentUsageID". > originalTaxonID: A unique identifier for the basionym (botany), > basonym (bacteriology), or replacement of the scientificName. I wrestled with this term a lot when developing the Taxonomer data model, and launched several threads on Taxacom about it, and discussed it extensively with many database nerds and taxononmy nerds of all Code flavors. "Protologue" was the closes existing term to what this term is intended for, but the problem with "Protologue" (a term familiar to botanical taxonomists) is that it may be spread across more than one publication. As I understand it, it's the set of Usage Instances that collectively fulfill the criteria for a name being validly published. I finally decided on the term "Protonym". Although I later discovered that this word had been defined in a different way in the context of fungi taxonomy, I was assured by Paul Kirk (curator of Index Fungorum) that my use of the term should take precedence. Consequently, the term we use in GNUB (Paul is one of the original architects of GNUB) is "ProtonymID". I'm not necessarily pushing for DwC to adopt this term; however, I am reasonably confident that GNUB will retin it, and depending on the future success of GNUB, it may end up becoming solidified in our community. As such, I think "protonymID" is the best term to use for DwC. However, if this is not acceptable, then I would suggest "originalUsageID" as a more explicit alternative. > scientificName: The taxon name (with date and authorship information > if applicable). When forming part of an Identification, this should be > the name in the lowest level taxonomic rank that can be determined. > This term should not contain Identification qualifications, which > should instead be supplied in the IdentificationQualifier term. This is probably fine, but it sort of depends on where DwC settles on the definition of "acceptedTaxon(ID)/acceptedUsage(ID)". If the scope includes orthographic variants, then the definition of scientificName should be expanded to explicitly refer to "exact orthography" (which may or may not match the orthography represented by acceptedXXX). In GNUB, each usage has a field called "VerbatimNameString", which is intended to capture the exact string of characters (as best as can be represented via UTF-8) that appeared in the publcation/reference. However, I don't think this is necessary for DwC. But I do think the definition of scientificName should make comment on orthography. > acceptedTaxon: The currently valid (zoological) or accepted > (botanical) name for the scientificName. This definition suggests that this term applies only to my use-case #2 (synonymies). As described earlier, in GNUB (which was initially developed by two botanists and one zoologist), the term "valid" was used instead of "accepted". Either one will do, but I think it makes sense to follow GNUB. In any case, I would propose the following: If the intent is only for taxonomic synonymies (use-case 2), then go with "validUsage" to be consistent with GNUB, and recommend that a full usage-instance string ("Centropyge loriculus Günther 1874 sec Allen 1975") be provided, if available. If the intent is less specific, and is open to orthographic/synonym/circumscription relationships, then go with "acceptedUsage" (with the same full usage-instance string) > higherTaxon: The taxon that is the parent of the scientificName. Again, I would go with "parentUsage", and recommend the full usage-instance string. > originalTaxon: The basionym (botany), basonym (bacteriology), or > replacement of the scientificName.. As per above, I would go with "protonym" (which need only be a name-string, such as "Centropyge loriculus Günther 1874"); but if not protonym, then "originalUsage". > higherClassification: A list (concatenated and separated) of the names > for the taxonomic ranks less specific than that given in the > scientificName. I'm fine with this. > kingdom, phylum, class, order, family, genus, subgenus, > specificEpithet, infraspecificEpithet - all unchanged. Fine by me. > taxonRank: The taxonomic rank of the scientificName. Recommended best > practice is to use a controlled vocabulary. Fine by me. > verbatimTaxonRank: The verbatim original taxonomic rank of the scientificName. I think this is OK, but I'm not entirely sure how strictly the term "verbatim" is applied. For example, should this be verbatim as it appears on the specimen label or original database record (e.g., "f." if it says "f."; "forma" if it says "forma", etc.) Or, does it just mean the "interpreted" rank (i.e., convert "f." to "forma"). My inclination is the former; but for most names (i.e., those without explicit rank qualifiers embedded within the name-string), this would be blank. For example, all species and higher ranks would be blank, because nobody explicitly writes "species" when listing a species name. To a zoologist, a subspecies name looks like "Centropyge loriculus flammeus", but to a botanist it looks like "Centropyge loriculus subsp. flammeus". Sensu stricto, the use of the word "verbatim" would imply that the zoologist would leave this item empty, but the botanist would enter "subsp." Do I interpret this correctly? Or (as I suspect), do I misunderstand the purpose of this item. > scientificNameAuthorship, nomenclaturalCode - unchanged Fine by me. > taxonPublicationID: A unique identifier for the publication of the Taxon. Presumably this would be the publication to which the specific usage instance for taxonID/taxonNameUsageID is anchored. If so, then I think the definition needs to be expanded. As written, some people might interpret the publication as always being the original publication (i.e., the "Günther 1874" of "Centropyge loriculus Günther 1874 sec Allen 1975"). Others might (more correctly, in my view) interpret it as the concept definition publication (i.e., the "Allen 1975" of "Centropyge loriculus Günther 1874 sec Allen 1975"). > taxonPublication: A reference for the publication of the Taxon. Same comment as above. > taxonomicStatus, nomenclaturalStatus, taxonAccordingTo, taxonRemarks, > vernacularName - unchanged. I'm fine with all of these except possibly taxonAccordingTo, which I need to think about some more. Sorry for the long post -- I'm just making up for having not been part of this discussion earlier. I am more than happy to help draft revised definitions for all of these terms, but only after we resolve their intended scope & meaning. By the way, where do I find the current draft definitions for all these terms? When I go to http://code.google.com/p/darwincore/wiki/Taxon, I only see definitions for three of the terms.
Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- --------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 ------------------------------------------------------------
Dear All,
After a series of off-list conversations with Peter DeVries, Dave Remsen, and others; and thanks to John W. for pointing me to the active list of terms, I would like to offer some additional thoughts on the "Core Taxon" terms; but before I do, I want to make sure I understand how the existing terms are intended to be used.
From the perspective of an Occurrence (specimen/observation/etc.) record
represented through DwC, it seems to me that there are three sets of name/taxon terms:
1. "As Identified" [Information about how the record is currently identified.]
- scientificName - scientificNameID - scientificNameAuthorship - taxonAccordingTo - taxonAccordingToID
2. "As originally established" [Information about the original name as established under the Code]
- originalTaxonName - originalTaxonNameID - namePublishedIn - namePublishedInID
3. "Opinion of Data Provider" [Information about how the data provider interprets the correct name.]
- acceptedTaxon - acceptedTaxonID
I'm not entirely certain which "set" of names the following terms would apply to:
- rank - verbatimRank - higherTaxonName - higherTaxonNameID - higherClassification - kingdom - phylum - class - order - family - genus - subgenus - specificEpithet - infraspecificEpithet
According to the current draft spreadsheet (http://spreadsheets.google.com/pub?key=tZ3c04UGzRgalNxZMmcijcQ&output=ht...) , it seems that the first two apply specifically to the "scientificName", and therefore belong in the first set (i.e., rank according to how it was identified; not necessarily how the Data Provider now treats it, or what the original rank was). I assume the rest all apply to "Opinion of Data Provider"; but this is not explicitly stated.
For example, consider the specimen BPBM 13492. It was most recently identified as "Centropyge flavicauda Fraser-Brunner 1933". Our current treatment of this species is as a junior synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'm assuming that I would present this record via DwC using the above terms as follows:
1. As Identified:
scientificName: Centropyge flavicauda scientificNameID: http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548 scientificNameAuthorship: Fraser-Brunner 1933 taxonAccordingTo: Allen, G.R. 1980. Butterfly and angelfishes of the world. Volume II. Mergus Publishers. Pp. 149-352. taxonAccordingToID: http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=22 764
2. As originally established:
- originalTaxonName: Centropyge flavicauda Fraser-Brunner 1933 - originalTaxonNameID: http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548 - namePublishedIn: Fraser-Brunner, A. 1933. A revision of the chaetodont fishes of the subfamily Pomacanthinae. Proceedings of the General Meetings for Scientific Business of the Zoological Society of London 1933 (pt 3, no. 30): 543-599, Pl. 1. - namePublishedInID: http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=67 1
3. Opinion of Data Provider:
acceptedTaxon: Centropyge fisheri acceptedTaxonID: http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548
If my assumptions are correct, then "specificEpithet" would be "fisheri", not "flavicauda" -- correct?
Once I get a sense from this list whether I am interpreting the terms correctly (or not), I'll offer some specific comments on the taxon terms.
Aloha, Rich
Rich, as usual no time to write a long mail, but I wanted to quickly respond to your 3 intended uses below. The idea is that everyone of them has a dwc:scientificName term and potentially also the other terms you listed at the end like rank.
originalTaxonNameID and acceptedTaxonID are still properties of the described dwc:scientificName and act like foreign keys linking one name/taxon to another. So if you have some sort of synonym (indicated by dwc:taxonomicStatus) the dwc:acceptedTaxonID will point to what is considered the accepted taxon. While originalTaxonNameID will point to the original name record. The verbatim non ID versions of these two terms do essentially the same, but are based on name string matching. They are not meant to replace the use of dwc:scientificName in a record.
Maybe its best to look at the examples Dave put together: (the tax/nom status columns are subject to change)
http://code.google.com/p/gbif-ecat/wiki/GNAsynonymsExample
Markus
On Sep 10, 2009, at 5:47 AM, Richard Pyle wrote:
Dear All,
After a series of off-list conversations with Peter DeVries, Dave Remsen, and others; and thanks to John W. for pointing me to the active list of terms, I would like to offer some additional thoughts on the "Core Taxon" terms; but before I do, I want to make sure I understand how the existing terms are intended to be used.
From the perspective of an Occurrence (specimen/observation/etc.) record
represented through DwC, it seems to me that there are three sets of name/taxon terms:
- "As Identified"
[Information about how the record is currently identified.]
- scientificName
- scientificNameID
- scientificNameAuthorship
- taxonAccordingTo
- taxonAccordingToID
- "As originally established"
[Information about the original name as established under the Code]
- originalTaxonName
- originalTaxonNameID
- namePublishedIn
- namePublishedInID
- "Opinion of Data Provider"
[Information about how the data provider interprets the correct name.]
- acceptedTaxon
- acceptedTaxonID
I'm not entirely certain which "set" of names the following terms would apply to:
- rank
- verbatimRank
- higherTaxonName
- higherTaxonNameID
- higherClassification
- kingdom
- phylum
- class
- order
- family
- genus
- subgenus
- specificEpithet
- infraspecificEpithet
According to the current draft spreadsheet (http://spreadsheets.google.com/pub?key=tZ3c04UGzRgalNxZMmcijcQ&output=ht... ) , it seems that the first two apply specifically to the "scientificName", and therefore belong in the first set (i.e., rank according to how it was identified; not necessarily how the Data Provider now treats it, or what the original rank was). I assume the rest all apply to "Opinion of Data Provider"; but this is not explicitly stated.
For example, consider the specimen BPBM 13492. It was most recently identified as "Centropyge flavicauda Fraser-Brunner 1933". Our current treatment of this species is as a junior synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'm assuming that I would present this record via DwC using the above terms as follows:
- As Identified:
scientificName: Centropyge flavicauda scientificNameID: http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548 scientificNameAuthorship: Fraser-Brunner 1933 taxonAccordingTo: Allen, G.R. 1980. Butterfly and angelfishes of the world. Volume II. Mergus Publishers. Pp. 149-352. taxonAccordingToID: http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=22 764
- As originally established:
- originalTaxonName: Centropyge flavicauda Fraser-Brunner 1933
- originalTaxonNameID:
http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548
- namePublishedIn: Fraser-Brunner, A. 1933. A revision of the
chaetodont fishes of the subfamily Pomacanthinae. Proceedings of the General Meetings for Scientific Business of the Zoological Society of London 1933 (pt 3, no. 30): 543-599, Pl. 1.
- namePublishedInID:
http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=67 1
- Opinion of Data Provider:
acceptedTaxon: Centropyge fisheri acceptedTaxonID: http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?s pid=53548
If my assumptions are correct, then "specificEpithet" would be "fisheri", not "flavicauda" -- correct?
Once I get a sense from this list whether I am interpreting the terms correctly (or not), I'll offer some specific comments on the taxon terms.
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks Markus -- this is very helpful. I'll need to wrap my head around what is meant. However, it would be useful if you or someone could show me how I would populate a DwC record for the sample I gave:
Specimen BPBM 13492. Last identified as "Centropyge flavicauda Fraser-Brunner 1933". We (provider) treat this species is as a synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'll take a look at the example, and see if I can understand from that.
Thanks, Rich
-----Original Message----- From: "Markus Döring (GBIF)" [mailto:mdoering@gbif.org] Sent: Thursday, September 10, 2009 11:03 AM To: Richard Pyle Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] DwC taxonomic terms
Rich, as usual no time to write a long mail, but I wanted to quickly respond to your 3 intended uses below. The idea is that everyone of them has a dwc:scientificName term and potentially also the other terms you listed at the end like rank.
originalTaxonNameID and acceptedTaxonID are still properties of the described dwc:scientificName and act like foreign keys linking one name/taxon to another. So if you have some sort of synonym (indicated by dwc:taxonomicStatus) the dwc:acceptedTaxonID will point to what is considered the accepted taxon. While originalTaxonNameID will point to the original name record. The verbatim non ID versions of these two terms do essentially the same, but are based on name string matching. They are not meant to replace the use of dwc:scientificName in a record.
Maybe its best to look at the examples Dave put together: (the tax/nom status columns are subject to change)
http://code.google.com/p/gbif-ecat/wiki/GNAsynonymsExample
Markus
On Sep 10, 2009, at 5:47 AM, Richard Pyle wrote:
Dear All,
After a series of off-list conversations with Peter DeVries, Dave Remsen, and others; and thanks to John W. for pointing me to the active list of terms, I would like to offer some additional
thoughts
on the "Core Taxon" terms; but before I do, I want to make sure I understand how the existing terms are intended to be used.
From the perspective of an Occurrence (specimen/observation/etc.) record
represented through DwC, it seems to me that there are
three sets of
name/taxon terms:
- "As Identified"
[Information about how the record is currently identified.]
- scientificName
- scientificNameID
- scientificNameAuthorship
- taxonAccordingTo
- taxonAccordingToID
- "As originally established"
[Information about the original name as established under the Code]
- originalTaxonName
- originalTaxonNameID
- namePublishedIn
- namePublishedInID
- "Opinion of Data Provider"
[Information about how the data provider interprets the
correct name.]
- acceptedTaxon
- acceptedTaxonID
I'm not entirely certain which "set" of names the following terms would apply to:
- rank
- verbatimRank
- higherTaxonName
- higherTaxonNameID
- higherClassification
- kingdom
- phylum
- class
- order
- family
- genus
- subgenus
- specificEpithet
- infraspecificEpithet
According to the current draft spreadsheet
(http://spreadsheets.google.com/pub?key=tZ3c04UGzRgalNxZMmcijcQ&output
=html ) , it seems that the first two apply specifically to the "scientificName", and therefore belong in the first set (i.e., rank according to how it was identified; not necessarily how the Data Provider now treats it, or what the original rank was). I
assume the
rest all apply to "Opinion of Data Provider"; but this is not explicitly stated.
For example, consider the specimen BPBM 13492. It was most recently identified as "Centropyge flavicauda Fraser-Brunner 1933". Our current treatment of this species is as a junior synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'm assuming that I would present this record via DwC using
the above
terms as follows:
- As Identified:
scientificName: Centropyge flavicauda scientificNameID:
http://research.calacademy.org/research/ichthyology/catalog/fishcatget
.asp?s pid=53548 scientificNameAuthorship: Fraser-Brunner 1933 taxonAccordingTo: Allen, G.R. 1980. Butterfly and
angelfishes of the
world. Volume II. Mergus Publishers. Pp. 149-352. taxonAccordingToID:
http://research.calacademy.org/research/ichthyology/catalog/getref.asp
?id=22 764
- As originally established:
- originalTaxonName: Centropyge flavicauda Fraser-Brunner 1933
- originalTaxonNameID:
http://research.calacademy.org/research/ichthyology/catalog/fishcatget
.asp?s pid=53548
- namePublishedIn: Fraser-Brunner, A. 1933. A revision of the
chaetodont fishes of the subfamily Pomacanthinae.
Proceedings of the
General Meetings for Scientific Business of the Zoological
Society of
London 1933 (pt 3, no. 30): 543-599, Pl. 1.
- namePublishedInID:
http://research.calacademy.org/research/ichthyology/catalog/getref.asp
?id=67 1
- Opinion of Data Provider:
acceptedTaxon: Centropyge fisheri acceptedTaxonID:
http://research.calacademy.org/research/ichthyology/catalog/fishcatget
.asp?s pid=53548
If my assumptions are correct, then "specificEpithet" would be "fisheri", not "flavicauda" -- correct?
Once I get a sense from this list whether I am interpreting
the terms
correctly (or not), I'll offer some specific comments on the taxon terms.
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
in case of a specimen I would just create an occurrence record like this using verbatim terms:
dwc:occurrenceID=BPBM-13492 dwc:collectionCode=BPBM dwc:catalogNumber=13492 dwc:scientificName=Centropyge flavicauda Fraser-Brunner 1933 dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904)
if I understand you correct the orignal name is the one for the accepted name. So I cannot state this in the above record, as it would mean the original name of C. flavicauda
I would have to create another taxon record:
dwc:scientificName=Centropyge fisheri (Snyder 1904) dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904) dwc:originalName= Holacanthus fisheri Snyder 1904
The problem here is that I dont think it is a good idea to mix occurrence and taxon records in one dataset. But they could easily be separate datasets for specimen and taxa.
Also you could use ID terms instead of the verbatim one, which is less error prone and cleaner to grasp:
dwc:taxonID=431 dwc:scientificName=Centropyge flavicauda Fraser-Brunner 1933 dwc:taxonomicStatus=synonym dwc:acceptedTaxonID=432
dwc:taxonID=432 dwc:scientificName=Centropyge fisheri (Snyder 1904) dwc:taxonomicStatus=accepted dwc:originalNameID=433
dwc:taxonID=433 dwc:scientificName=Holacanthus fisheri Snyder 1904
On Sep 10, 2009, at 11:11 PM, Richard Pyle wrote:
Thanks Markus -- this is very helpful. I'll need to wrap my head around what is meant. However, it would be useful if you or someone could show me how I would populate a DwC record for the sample I gave:
Specimen BPBM 13492. Last identified as "Centropyge flavicauda Fraser-Brunner 1933". We (provider) treat this species is as a synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'll take a look at the example, and see if I can understand from that.
Thanks, Rich
-----Original Message----- From: "Markus Döring (GBIF)" [mailto:mdoering@gbif.org] Sent: Thursday, September 10, 2009 11:03 AM To: Richard Pyle Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] DwC taxonomic terms
Rich, as usual no time to write a long mail, but I wanted to quickly respond to your 3 intended uses below. The idea is that everyone of them has a dwc:scientificName term and potentially also the other terms you listed at the end like rank.
originalTaxonNameID and acceptedTaxonID are still properties of the described dwc:scientificName and act like foreign keys linking one name/taxon to another. So if you have some sort of synonym (indicated by dwc:taxonomicStatus) the dwc:acceptedTaxonID will point to what is considered the accepted taxon. While originalTaxonNameID will point to the original name record. The verbatim non ID versions of these two terms do essentially the same, but are based on name string matching. They are not meant to replace the use of dwc:scientificName in a record.
Maybe its best to look at the examples Dave put together: (the tax/nom status columns are subject to change)
http://code.google.com/p/gbif-ecat/wiki/GNAsynonymsExample
Markus
On Sep 10, 2009, at 5:47 AM, Richard Pyle wrote:
Dear All,
After a series of off-list conversations with Peter DeVries, Dave Remsen, and others; and thanks to John W. for pointing me to the active list of terms, I would like to offer some additional
thoughts
on the "Core Taxon" terms; but before I do, I want to make sure I understand how the existing terms are intended to be used.
From the perspective of an Occurrence (specimen/observation/etc.) record
represented through DwC, it seems to me that there are
three sets of
name/taxon terms:
- "As Identified"
[Information about how the record is currently identified.]
- scientificName
- scientificNameID
- scientificNameAuthorship
- taxonAccordingTo
- taxonAccordingToID
- "As originally established"
[Information about the original name as established under the Code]
- originalTaxonName
- originalTaxonNameID
- namePublishedIn
- namePublishedInID
- "Opinion of Data Provider"
[Information about how the data provider interprets the
correct name.]
- acceptedTaxon
- acceptedTaxonID
I'm not entirely certain which "set" of names the following terms would apply to:
- rank
- verbatimRank
- higherTaxonName
- higherTaxonNameID
- higherClassification
- kingdom
- phylum
- class
- order
- family
- genus
- subgenus
- specificEpithet
- infraspecificEpithet
According to the current draft spreadsheet
(http://spreadsheets.google.com/pub? key=tZ3c04UGzRgalNxZMmcijcQ&output
=html ) , it seems that the first two apply specifically to the "scientificName", and therefore belong in the first set (i.e., rank according to how it was identified; not necessarily how the Data Provider now treats it, or what the original rank was). I
assume the
rest all apply to "Opinion of Data Provider"; but this is not explicitly stated.
For example, consider the specimen BPBM 13492. It was most recently identified as "Centropyge flavicauda Fraser-Brunner 1933". Our current treatment of this species is as a junior synonym of "Centropyge fisheri (Snyder 1904)". The original description "fisheri" by Snyder (1904) placed it in the genus "Holacanthus".
I'm assuming that I would present this record via DwC using
the above
terms as follows:
- As Identified:
scientificName: Centropyge flavicauda scientificNameID:
http://research.calacademy.org/research/ichthyology/catalog/ fishcatget
.asp?s pid=53548 scientificNameAuthorship: Fraser-Brunner 1933 taxonAccordingTo: Allen, G.R. 1980. Butterfly and
angelfishes of the
world. Volume II. Mergus Publishers. Pp. 149-352. taxonAccordingToID:
http://research.calacademy.org/research/ichthyology/catalog/ getref.asp
?id=22 764
- As originally established:
- originalTaxonName: Centropyge flavicauda Fraser-Brunner 1933
- originalTaxonNameID:
http://research.calacademy.org/research/ichthyology/catalog/ fishcatget
.asp?s pid=53548
- namePublishedIn: Fraser-Brunner, A. 1933. A revision of the
chaetodont fishes of the subfamily Pomacanthinae.
Proceedings of the
General Meetings for Scientific Business of the Zoological
Society of
London 1933 (pt 3, no. 30): 543-599, Pl. 1.
- namePublishedInID:
http://research.calacademy.org/research/ichthyology/catalog/ getref.asp
?id=67 1
- Opinion of Data Provider:
acceptedTaxon: Centropyge fisheri acceptedTaxonID:
http://research.calacademy.org/research/ichthyology/catalog/ fishcatget
.asp?s pid=53548
If my assumptions are correct, then "specificEpithet" would be "fisheri", not "flavicauda" -- correct?
Once I get a sense from this list whether I am interpreting
the terms
correctly (or not), I'll offer some specific comments on the taxon terms.
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks!
dwc:occurrenceID=BPBM-13492 dwc:collectionCode=BPBM dwc:catalogNumber=13492 dwc:scientificName=Centropyge flavicauda Fraser-Brunner 1933 dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904)
Actually, according to the current definitions, you would need to split up scientificName into:
dwc:scientificName=Centropyge flavicauda dwc:scientificNameAuthorship=Fraser-Brunner 1933
..which is a bit out of phase with:
dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904) [allowed by the current definitions]
One of my suggestions would be to treat these in a consistent fashion; something like:
scientificName scientificNameAuthorship acceptedName acceptedNameAuthorship
if I understand you correct the orignal name is the one for the accepted name. So I cannot state this in the above record, as it would mean the original name of C. flavicauda
Right -- that's another one of the things I'm getting at. Does originalName apply to what it is identified as, or does it apply to the acceptedTaxon? If they are different, then which one is implied by the originalName? I gather from your statement above that they apply explicitly to the scientificName (not acceptedTaxon), so that should probably be explicitly indicated in the definitions.
I would have to create another taxon record:
dwc:scientificName=Centropyge fisheri (Snyder 1904) dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904) dwc:originalName= Holacanthus fisheri Snyder 1904
Right -- so presumably these would be returned by resolving scientificNameID that is included on the specimen record, and would not themselves be included within the resultset for the specimen record. In other words, the specimen record would give me scientificNameID, and resolving that ID would give me the three pieces of information you list above -- correct?
The problem here is that I dont think it is a good idea to mix occurrence and taxon records in one dataset.
Agreed!! That's actually the real point I was heading towards. We need the terms, and I think they all belong in dwc, but we need to be clear to people in what context those terms should be used. It's not clear to me how the data providers will know which terms to populate for occurrence records, and which are intended only for taxon name records. Is there some sort of specification within DwC that makes this distinction? My apologies for cluttering the list if it exists, and I simply missed it.
But they could easily be separate datasets for specimen and taxa.
Also you could use ID terms instead of the verbatim one, which is less error prone and cleaner to grasp:
Yes, exactly.
Aloha, Rich
P.S. I am perfectly happy to do the work on writing the definitions, but I don't want to do that if I misunderstand the intended purpose of these terms.
Comments inline.
On Thu, Sep 10, 2009 at 3:19 PM, Richard Pyledeepreef@bishopmuseum.org wrote:
Thanks!
dwc:occurrenceID=BPBM-13492 dwc:collectionCode=BPBM dwc:catalogNumber=13492 dwc:scientificName=Centropyge flavicauda Fraser-Brunner 1933 dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904)
Actually, according to the current definitions, you would need to split up scientificName into:
dwc:scientificName=Centropyge flavicauda dwc:scientificNameAuthorship=Fraser-Brunner 1933
..which is a bit out of phase with:
dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904) [allowed by the current definitions]
One of my suggestions would be to treat these in a consistent fashion; something like:
scientificName scientificNameAuthorship acceptedName acceptedNameAuthorship
Authorship is permissible in scientificName. The current working definition for scientificName is:
"The taxon name (with date and authorship information if applicable). When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term."
while the current working definition of acceptedTaxon is:
"The full scientific name of the currently valid (zoological) or accepted (botanical) taxon."
So, they are already consistent.
if I understand you correct the orignal name is the one for the accepted name. So I cannot state this in the above record, as it would mean the original name of C. flavicauda
Right -- that's another one of the things I'm getting at. Does originalName apply to what it is identified as, or does it apply to the acceptedTaxon? If they are different, then which one is implied by the originalName? I gather from your statement above that they apply explicitly to the scientificName (not acceptedTaxon), so that should probably be explicitly indicated in the definitions.
The current working definition for originalTaxonName already contains the reference to scientificName, but clearly this description could be improved:
"The name originally given to a taxon when it was first correctly/legitimately published under the rules of the appropriate code. The basionym (botany) or basonym (bacteriology) of the scientificName or the senior/earlier homonym for replaced names."
I would have to create another taxon record:
dwc:scientificName=Centropyge fisheri (Snyder 1904) dwc:acceptedTaxon=Centropyge fisheri (Snyder 1904) dwc:originalName= Holacanthus fisheri Snyder 1904
Right -- so presumably these would be returned by resolving scientificNameID that is included on the specimen record, and would not themselves be included within the resultset for the specimen record. In other words, the specimen record would give me scientificNameID, and resolving that ID would give me the three pieces of information you list above -- correct?
The specimen record could contain all three of those fields populated with the values shown, as well as the scientificNameID, the acceptedTaxonNameID, and the originalTaxonNameID, however, the specimen record would not be required to have any of them.
The problem here is that I dont think it is a good idea to mix occurrence and taxon records in one dataset.
Agreed!! That's actually the real point I was heading towards. We need the terms, and I think they all belong in dwc, but we need to be clear to people in what context those terms should be used. It's not clear to me how the data providers will know which terms to populate for occurrence records, and which are intended only for taxon name records. Is there some sort of specification within DwC that makes this distinction? My apologies for cluttering the list if it exists, and I simply missed it.
No, by design and happily, DwC defers implementation to implementors. I see perfectly good use cases for passing or storing occurrence records with the full taxon information already resolved (think GBIF Index).
But they could easily be separate datasets for specimen and taxa.
Also you could use ID terms instead of the verbatim one, which is less error prone and cleaner to grasp:
Yes, exactly.
You could, and your colleague could do so without the IDs. We can't force anyone to publish what they don't have.
Aloha, Rich
P.S. I am perfectly happy to do the work on writing the definitions, but I don't want to do that if I misunderstand the intended purpose of these terms.
I think we need better access to the spreadsheet at http://spreadsheets.google.com/pub?key=tZ3c04UGzRgalNxZMmcijcQ&output=ht... or we need to move the work to http://code.google.com/p/darwincore/wiki/Taxon until it gets fully resolved and included in the post-public review version I am eager to release.
Authorship is permissible in scientificName. The current working definition for scientificName is:
Sorry! My bad.
"The full scientific name of the currently valid (zoological) or accepted (botanical) taxon."
So, they are already consistent.
Agreed they are conceptually consistent -- but the wording of the definitions ought to be consistent as well.
The specimen record could contain all three of those fields populated with the values shown, as well as the scientificNameID, the acceptedTaxonNameID, and the originalTaxonNameID, however, the specimen record would not be required to have any of them.
OK, thanks.
No, by design and happily, DwC defers implementation to implementors. I see perfectly good use cases for passing or storing occurrence records with the full taxon information already resolved (think GBIF Index).
OK, fair enough -- but I think the definitions need to be tightended up a bit, and the terms should follow consistent patterns, to make it easier to ensure that two different providers put the same sort of information under the same terms.
I think we need better access to the spreadsheet at http://spreadsheets.google.com/pub?key=tZ3c04UGzRgalNxZMmcijcQ
&output=html
or we need to move the work to http://code.google.com/p/darwincore/wiki/Taxon until it gets fully resolved and included in the post-public review version I am eager to release.
Which do you prefer? I'm happy to spend the time and do the work, as needed.
Aloha, Rich
participants (4)
-
"Markus Döring (GBIF)"
-
John R. WIECZOREK
-
Peter DeVries
-
Richard Pyle