Hello all,
I have an issue that I would like some comment on...
We have some data that covers Taxa, Names and Concept relationships. Eg
- A Taxon table that contains the nomenclatural details + accepted name + parent name
- Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability".
Any thoughts?
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi Kevin,
It seems fine to me Kevin, as long as the intention remains the same. I would imagine slow changing content behind the object (revisions in higher names perhaps) and not complete rewrites of the content.
Is there anything inside the Taxon that you can always rely on (something immutable)? E.g. a constant ID from the nomenclatural organisation level, or perhaps some lexcal group ID? Perhaps you could somehow select one reference ID from the nomenclatural or concept level as the Taxon ID, and then you could promote the Taxon as a dynamically assembled object with the best representation (at request time) for ConceptX. This is then more analogous to getTaxonFor(conceptX) than getTaxonById(X).
The alternative to what you are suggesting would presumably be creating new ID for any change, which seems like it would be difficult to keep anything in sync, unless I can subscribe to and understand your changes constantly.
Cheers, Tim
On Jul 5, 2010, at 5:43 AM, Kevin Richards wrote:
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept relationships. Eg
A Taxon table that contains the nomenclatural details +
accepted name + parent name
Concept + relationship tables that contain details about
the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/ tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I think that the basic problem is, as with a taxon "name", there are as many notions for what is meant by a taxon "concept" as there are people who have uttered the words.
I think that Peter D. has a good example of what a taxonID might resolve to. But the question is, which bits of metadata can change under the same taxonID, and which bits would prompt the generation of a new taxonID? This is the fundamental problem I've always had with creating permanent GUIDs for "taxon concepts" (by anyone's definition, let alone *everyone's* definition). I've had a similar question concerning ITIS TSNs. It's still not entirely clear to me when something generates a new TSN, vs. when something represents a correction or amendment (e.g., via a comment) to an existing TSN.
So, for example. Suppose authority "X" has a well-defined and metadata-rich taxonID for their concept of "Aus bus". Later on, they decide that the species "bus" should be moved to the genus "Xus". None of the other metadata (occurence records, diagnostic characters, associated DNA sequences, etc., etc.) have changed. *Only* the genus placement has changed. Should Authority "X" brand a new taxonID for this? In my mind, no. Because in my mind, the species concept is defined only by the stuff related to "bus". Changing the genus doesn't affect the contents or concept of "bus", it only changes the concept circumscription for the genus "Xus" (and for "Aus", if it is retained as a valid genus that no longer includes the species "bus").
I doubt, however, that there would be universal agreement about this.
Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: Sunday, July 04, 2010 7:09 PM To: Kevin Richards Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Taxon Concept dilemma
Hi Kevin,
It seems fine to me Kevin, as long as the intention remains the same. I would imagine slow changing content behind the object (revisions in higher names perhaps) and not complete rewrites of the content.
Is there anything inside the Taxon that you can always rely on (something immutable)? E.g. a constant ID from the nomenclatural organisation level, or perhaps some lexcal group ID? Perhaps you could somehow select one reference ID from the nomenclatural or concept level as the Taxon ID, and then you could promote the Taxon as a dynamically assembled object with the best representation (at request time) for ConceptX. This is then more analogous to getTaxonFor(conceptX) than getTaxonById(X).
The alternative to what you are suggesting would presumably be creating new ID for any change, which seems like it would be difficult to keep anything in sync, unless I can subscribe to and understand your changes constantly.
Cheers, Tim
On Jul 5, 2010, at 5:43 AM, Kevin Richards wrote:
Hello all,
I have an issue that I would like some comment on.
We have some data that covers Taxa, Names and Concept relationships. Eg - A Taxon table that contains the nomenclatural details + accepted name + parent name - Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc) The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common. I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic? The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability". Any thoughts? Kevin
_____
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re-used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most).
Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on.
We have some data that covers Taxa, Names and Concept relationships.
Eg
- A Taxon table that contains the nomenclatural details + accepted name + parent name
- Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself).
Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability".
Any thoughts?
Kevin
_____
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Rich, Kevin as one of the main supporter of the taxonID the final decision what stable piece of information it reflects to me sits with the publisher/author of the data. The problem Kevin describes is very common and comes down to the difficulty to describe what information really is stable in taxonomy. The name may change, the classification, the textual description, the distribution and its probably impossible to automatically tell whether anything significant has changed or if only small "corrections" have been done. Even for humans its a challenge to compare several textual descriptions and decide whether its the same thing or not. I guess this is where some people see the conceptID to come into play to give some long term stability - but I still dont see much gain in another ID if you cannot tell what pieces of the information behind that ID is stable over time. I still believe with scientificNameID and taxonID we are well equipped to deal with our data. I tend to think someone who publishes the data should have a good thought about how they assign and change taxonIDs and try to announce what they consider stable, what is versioned or what can change anytime without changing the meaning, i.e. the ID. But I would assume this might be different for different databases focussing on different aspects of taxonomy in particular.
For purely automatically aggregated data in Checklist Bank I decided to assign stable taxonIDs to entire lexical group of names that at least have one qualified name, i.e. with proper authorship. The classification, the spelling and authorship of the preferred, representive name for that group might change - but at least you have some definition of stability which is really hard to define. For lexical name groups that only contain bare canonical names I am assigning volatile IDs right now, as these might dramatically change with more knowledge about them. I dont know if this could also apply to Kevins problem - do you treat the same, qualified name in multiple concepts? If thats the case then surely you need something else to define a stable taxonID.
Markus
On Jul 5, 2010, at 9:26, Richard Pyle wrote:
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re-used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept relationships. Eg
A Taxon table that contains the nomenclatural details + accepted name + parent name
Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Is it worth listing properties that do or do not impact the concept
Scientific Name presents as least three cases
1. The name doesn't change at all but the circumscription does, warranting a concept change (a new taxon ID)
If you recall the case of Vireo solitarius I used in the 13 June mail, the same name refers to different circumscriptions. One method for distinguishing these is based on the (to borrow from RIch's terminology) "protonym count" of the material included in the circumscription. There are some problems with this (unintended or temporally-based omissions) but it provides one pretty good basis for making gross comparisons.
2. The name changes but this, in itself, does not warrant a concept change (a new taxonID) because the circumscription does not change.
When a species is transferred from one genus to another, there is no direct impact on circumscription. Thus, if two names share a common protonymID, there may not be a need to change the taxon ID. In this case, we are back to 1 above.
3. The name changes and this, in itself, requires a new taxonID.
If a taxon is renamed and the new name does not share the same protonymID as the previous name, it should be given a new concept ID. There may be exceptions to this (for replacement names perhaps?) but surely the rule of priority would only result in this case if the circumscription (concept) changed.
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept identifier change?
- David
On Jul 5, 2010, at 3:53 AM, Markus Döring wrote:
Rich, Kevin as one of the main supporter of the taxonID the final decision what stable piece of information it reflects to me sits with the publisher/author of the data. The problem Kevin describes is very common and comes down to the difficulty to describe what information really is stable in taxonomy. The name may change, the classification, the textual description, the distribution and its probably impossible to automatically tell whether anything significant has changed or if only small "corrections" have been done. Even for humans its a challenge to compare several textual descriptions and decide whether its the same thing or not. I guess this is where some people see the conceptID to come into play to give some long term stability - but I still dont see much gain in another ID if you cannot tell what pieces of the information behind that ID is stable over time. I still believe with scientificNameID and taxonID we are well equipped to deal with our data. I tend to think someone who publishes the data should have a good thought about how they assign and change taxonIDs and try to announce what they consider stable, what is versioned or what can change anytime without changing the meaning, i.e. the ID. But I would assume this might be different for different databases focussing on different aspects of taxonomy in particular.
For purely automatically aggregated data in Checklist Bank I decided to assign stable taxonIDs to entire lexical group of names that at least have one qualified name, i.e. with proper authorship. The classification, the spelling and authorship of the preferred, representive name for that group might change - but at least you have some definition of stability which is really hard to define. For lexical name groups that only contain bare canonical names I am assigning volatile IDs right now, as these might dramatically change with more knowledge about them. I dont know if this could also apply to Kevins problem - do you treat the same, qualified name in multiple concepts? If thats the case then surely you need something else to define a stable taxonID.
Markus
On Jul 5, 2010, at 9:26, Richard Pyle wrote:
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re- used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept relationships. Eg
A Taxon table that contains the nomenclatural details +
accepted name + parent name
Concept + relationship tables that contain details about
the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept identifier change?
Yes and no, and I think this is the trouble with determining whether the circumscription was changed or not (your point 2 or 3).
In many cases, transferring a species into a new genus does implicitly change the description, because a set of previously unrecorded or misinterpreted characters has been studied and the genus description is now implicitly correct for these transferred species.
It may be that the diagnostic characters are still the same, it may be that previous option to misapply a name (previously undetected) now no longer apply.
For example, conidiogenesis in imperfect fungi is now routinely observed, and transferring a species into a genus with defined conidiogenesis in the last 50 years generally meant that the range of potential identification (correct or misapplied) was drastically changed. This occurs on all higher levels, not only but including genus.
I have therefore doubts whether the distinction between 2 and 3 should be made a priori; a system where a name change requires a new ID and where the reasoning is based on these IDs could be more flexible to adjust.
Gregor
Hi Gregor,
While I certainly agree there are many cases where both things happen in the same publication (i.e., the circumscription for the species "bus" changes, at the same time "bus" is moved from the genus "Aus" to the genus "Xus"); I see these as two unrelated things.
A classic (but rare) example of this would be when two different species, which are, respectively, the type species of two different genera, are synonymized as the same species. In this case, a new species-level concept is established at the same time that a genus-level name change is forced (one of the two genera becomes a junior synonym of the other genus). But this represents two different things, in my mind.
To change the circumscription of "bus" implies that the set of organisms included/implied within circumscription has either expanded or contracted. Doing so may inspire a taxonomist to synonymize genera, or to move a species circumscription from within one genus circumscription to another genus circumscription. But the act of doing either of these things (which directly affects the circumscriptions of the genera involved), does not have any impact on the circumscription of the included species.
The point is, the species may change genera as a result of re-defining the species circumscription; but the mere act of moving a species from one genus to another does not, by itself, affect the species circumscription (but it certainly does affect the circumscriptions of the affected genera). Thus, a change of the metadata of "GenusName" for a taxonID of a species does not require a new taxonID. But there are occassions where the change in a species circumscription (and hence, the need for a new taxonID) will also (coincidentally) involve a change to the "GenusName" metadata for one or more of the affected taxonID.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Gregor Hagedorn Sent: Monday, July 05, 2010 3:55 AM To: David Remsen (GBIF) Cc: tdwg-content@lists.tdwg.org Mailing List; Kevin Richards Subject: Re: [tdwg-content] Taxon Concept dilemma
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept
identifier
change?
Yes and no, and I think this is the trouble with determining whether the circumscription was changed or not (your point 2 or 3).
In many cases, transferring a species into a new genus does implicitly change the description, because a set of previously unrecorded or misinterpreted characters has been studied and the genus description is now implicitly correct for these transferred species.
It may be that the diagnostic characters are still the same, it may be that previous option to misapply a name (previously undetected) now no longer apply.
For example, conidiogenesis in imperfect fungi is now routinely observed, and transferring a species into a genus with defined conidiogenesis in the last 50 years generally meant that the range of potential identification (correct or misapplied) was drastically changed. This occurs on all higher levels, not only but including genus.
I have therefore doubts whether the distinction between 2 and 3 should be made a priori; a system where a name change requires a new ID and where the reasoning is based on these IDs could be more flexible to adjust.
->
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
While I certainly agree there are many cases where both things happen in the same publication (i.e., the circumscription for the species "bus" changes, at the same time "bus" is moved from the genus "Aus" to the genus "Xus"); I see these as two unrelated things.
I agree in theory. However, in practice, this is hard to distinguish. A clear case ist that a replacement epithet does NOT change circumscription. But we already disagree about merging two species: I don't think is necessarily changes the circumscription, it may only correct a previous oversight (i.e. that one circumscription is a subset of the other). But who in practice establishes whether the larger circumscription is the older or the younger name.
My main concern is however changes of the higher taxon. If a species was originally placed in a genus circumscribed as leaf spots producing elliptical spores on a specific host plant, and is moved into a genus with defined conidiogenesis, we have changed the circumscription, and previous "identifications" may now be misapplied names.
The central point is that in practice, significant experience and knowledge about past practices is required, to determine whether a new taxon ID should be given or not. Computer typists will get it wrong and experts may disagree about this.
My conclusion is that if every change in genus name is given a new ID, the system is manageable after assignment of ID. Computers can synonymize IDs (sameAs), but they cannot retrospectively split them. :-)
Gregor
Hi Gregor,
I agree in theory. However, in practice, this is hard to distinguish. A clear case ist that a replacement epithet does NOT change circumscription.
I don't think there is any need to distinguish anything; I think it's purely a function of logic. *If* we define the "essense" of what a taxonID represents as the circumscribed set of individuals (living, dead, and yet-to-be-born)implied to be contained within the taxon circumscription, then it seems to me that a new taxonID is needed only when that circumscribed set changes (exapnds or shrinks). Because taxa are nested sets, the placement of a circumscribed set from one parent set into another parent set doesn't affect the contents of the child set, and hence (in this definition of what is represented by a taxonID), doesn't trigger a need for a new taxonID.
I think the crux of the issue hinges on the premise of the above; that is: *are* we defining the "thing" represented by a taxonID as the circumscribed set of individuals? If so, then I think we can be confident that a new taxonID is only needed when the contents of the cirumscribed set changes. But I'm not sure we all agree that's what a taxonID is intended to represent.
But we already disagree about merging two species: I don't think is necessarily changes the circumscription, it may only correct a previous oversight (i.e. that one circumscription is a subset of the other). But who in practice establishes whether the larger circumscription is the older or the younger name.
I think we are both in agreement on this (I mis-spoke/mis-wrote in my reply to Dave). I think it depends on whether the merge changes the implied members of the circumscribed set. For example:
Suppose Smith describes Aus bus to represent all the individuals of a population in the Marshall Islands.
Then Jones comes along and describes Aus cus to represent all the individuals of a population in the Hawaiian Islands, and he lists all the morphometric differences that distinguishes his new Aus cus from Smith's Aus bus.
Then Brown comes along and decides that Aus cus is a junior synonym of Aus bus, and that Aus bus occurs in both the Hawaiian Islands and the Marchall Islands.
Clearly, in this case, we have three taxon conepts (= three different circumscribed sets of individuals, and need three taxonIDs): One for Aus bus sensu stricto (of Smith), one for Aus cus sensu stricto (of Jones), and one for Aus bus sensu lato (of Brown).
Now, suppose instead that Jones was unaware of Smith's description of Aus bus from the Marshall Islands, and he describes Aus cus based on material from the same Marshall Islands population that Smith based his description.
In this case, I agree with you -- we only have one circumscribed set of organisms, and hence need only one taxonID to represent it, despite the fact that two protonymIDs (=two type specimens) exist within that set.
Unfortunately, a common situation is more along the lines of the following:
Smith describes Aus bus to represent all the individuals of a population in the Marshall Islands.
Then Jones comes along and describes Aus cus to represent all the individuals of a population in the Hawaiian Islands, and he is *completely unaware* of the existence of Smith's Smith's Aus bus.
Then Brown comes along and decides that the populations in Hawaii and Marshall Islands are *clearly* and *unabiguously* the same species, so synonymizes the two names.
In this case, how many circumscritpions are involved? It all hinges on my use of the word "implied" in the notion of a the "implied members of the circumscribed set".
This is why the only way we're going to be able to establish RelationshipAssertions (sensu TCS) is via third-party assertions. In other words, someone is going to have to assert an opinion over whether the implied members of Smith's Aus bus would have included the population in Hawaii, and whether the implied set of Jones' Aus cus would have included the population in the Marshall Islands.
In any case, the point is I think you are right -- the synonomization of two names does not necessarily imply that a new taxonID is needed.
As to your comment:
My main concern is however changes of the higher taxon. If a species was originally placed in a genus circumscribed as leaf spots producing elliptical spores on a specific host plant, and is moved into a genus with defined conidiogenesis, we have changed the circumscription, and previous "identifications" may now be misapplied names.
Yes -- but the new taxonID is not needed because the genus name changed; the new taxonID is needed because the implied set of members of the child set changed (due to the new character-based definition). In other words, the act of changing the genus didn't change the circumscription scope; the change in the character definition of the child taxon changed in. Even if it seems these are the same thing (i.e., that the change in character-based definition was triggered by the genus change), they are still different logical actions.
The central point is that in practice, significant experience and knowledge about past practices is required, to determine whether a new taxon ID should be given or not. Computer typists will get it wrong and experts may disagree about this.
I agree that we require significant experience and knowledge about past practices/etc. to make assertions about the logical relationships between two circumscribed sets represented by two taxonID values. But not for the reason you suggest -- rather, for the reson I described above.
My conclusion is that if every change in genus name is given a new ID, the system is manageable after assignment of ID. Computers can synonymize IDs (sameAs), but they cannot retrospectively split them. :-)
Right -- and this comes back to my fundamental problem with the notion of a taxonID. In order for it to be useful, we need a very clear and shared understanding of the entity that is represented by the taxonID. What we need to do is atomize the ID's -- and this is best done through the TaxonNameUsage object (which is relatively simple to define). So in answer toy your last quote above, in my mind *every* TNU is a "potential taxon" (sensu Berendsohn), so *these* are the Ids we should be using and re-using when we define taxon concepts. My vague understanding of a taxonID is that it is established to represent a set of tnuID's. If it were that simple, then taxonID really just becomes a subclass of tnuID. However, I don't think it's that simple, because the definition of a taxonID is not limited to simply sets of tnuIDs -- it can also be established through member-specimens, morphological characters, genetic characters, etc. Although I think it's logically possible to inherit all these things *through* tnuIDs (after all, anytime anyone ever asserts that a taxon is defined by a set of specimens or characters, a new tnuID is born); we're a long way from having all that sort of information parsed and digitized (as per Kevin's earlier comment).
So, for purely *practical* reasons, I think the main justification for even defining "taxonID" (as more than just a subclass of tnuID), is that our content base for tnuID's is still small, and the inheritance of things like specimens and characters through TNU instances is not yet well-developed. For this reason -- alluded to by Kevin -- I see a practical value to having a taxonID.
But we *still* need to define what it is!
Aloha, Rich
Kevin et al,
I too believe it is important that we keep our taxonomic events distinct. Taxonomy goes to the trouble to work this way so we should at least try to capture events at this minimum level of granularity. Our ability to handle future taxonomic change depends upon it. The power in the name usage model derives from the simplicity of persistent management of taxonomic events at the level of our most basic reusable components. Aggregation of concepts can be just that: with new facts, arrangements or placements documented by subsequent events linked to pre-existing taxonomic entities by assertion - same as, congruent to ... just like the real world. The key here is taxon[event]Id.
Our systems of nomenclature are designed to provide stability in naming in the face of ongoing taxonomic change. In this codified scientific system revisionary treatments circumscribe taxa without prejudice and names are reapplied according to the distribution and priority of types. Nomenclatural re-use should not be confused therefore with taxonomic equivalence and the fortuitous fact that this is often the case does not provide the necessary licence to presume existence of nomenclatural (objective) species when it comes to documenting these events.
If scientific support of taxonomy is a requirement of our systems then persistent identifiers for persistent taxonomic statements are mandatory. And for online treatments, at least, subsequent revision at any level needs be similarly identified and linked from what came before - ideally, reusing by citation, existing profile and classification elements. The simple advantage of doing this is always being up-to-date; of being scientifically useful and semantically meaningful; of providing stability and forward reference for value-added effort ... credibility even, in the face of taxonomic change.
We must assume, as Gregor has noted, that in the absence of explicit statements to the contrary all taxonomic events are unique ... and if they aren't, we simply have two that are the same.
Greg
On Tuesday, July 6, 2010, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Gregor,
I agree in theory. However, in practice, this is hard to distinguish. A clear case ist that a replacement epithet does NOT change circumscription.
I don't think there is any need to distinguish anything; I think it's purely a function of logic. *If* we define the "essense" of what a taxonID represents as the circumscribed set of individuals (living, dead, and yet-to-be-born)implied to be contained within the taxon circumscription, then it seems to me that a new taxonID is needed only when that circumscribed set changes (exapnds or shrinks). Because taxa are nested sets, the placement of a circumscribed set from one parent set into another parent set doesn't affect the contents of the child set, and hence (in this definition of what is represented by a taxonID), doesn't trigger a need for a new taxonID.
I think the crux of the issue hinges on the premise of the above; that is: *are* we defining the "thing" represented by a taxonID as the circumscribed set of individuals? If so, then I think we can be confident that a new taxonID is only needed when the contents of the cirumscribed set changes. But I'm not sure we all agree that's what a taxonID is intended to represent.
But we already disagree about merging two species: I don't think is necessarily changes the circumscription, it may only correct a previous oversight (i.e. that one circumscription is a subset of the other). But who in practice establishes whether the larger circumscription is the older or the younger name.
I think we are both in agreement on this (I mis-spoke/mis-wrote in my reply to Dave). I think it depends on whether the merge changes the implied members of the circumscribed set. For example:
Suppose Smith describes Aus bus to represent all the individuals of a population in the Marshall Islands.
Then Jones comes along and describes Aus cus to represent all the individuals of a population in the Hawaiian Islands, and he lists all the morphometric differences that distinguishes his new Aus cus from Smith's Aus bus.
Then Brown comes along and decides that Aus cus is a junior synonym of Aus bus, and that Aus bus occurs in both the Hawaiian Islands and the Marchall Islands.
Clearly, in this case, we have three taxon conepts (= three different circumscribed sets of individuals, and need three taxonIDs): One for Aus bus sensu stricto (of Smith), one for Aus cus sensu stricto (of Jones), and one for Aus bus sensu lato (of Brown).
Now, suppose instead that Jones was unaware of Smith's description of Aus bus from the Marshall Islands, and he describes Aus cus based on material from the same Marshall Islands population that Smith based his description.
In this case, I agree with you -- we only have one circumscribed set of organisms, and hence need only one taxonID to represent it, despite the fact that two protonymIDs (=two type specimens) exist within that set.
Unfortunately, a common situation is more along the lines of the following:
Smith describes Aus bus to represent all the individuals of a population in the Marshall Islands.
Then Jones comes along and describes Aus cus to represent all the individuals of a population in the Hawaiian Islands, and he is *completely unaware* of the existence of Smith's Smith's Aus bus.
Then Brown comes along and decides that the populations in Hawaii and Marshall Islands are *clearly* and *unabiguously* the same species, so synonymizes the two names.
In this case, how many circumscritpions are involved? It all hinges on my use of the word "implied" in the notion of a the "implied members of the circumscribed set".
This is why the only way we're going to be able to establish RelationshipAssertions (sensu TCS) is via third-party assertions. In other words, someone is going to have to assert an opinion over whether the implied members of Smith's Aus bus would have included the population in Hawaii, and whether the implied set of Jones' Aus cus would have included the population in the Marshall Islands.
In any case, the point is I think you are right -- the synonomization of two names does not necessarily imply that a new taxonID is needed.
As to your comment:
My main concern is however changes of the higher taxon. If a species was originally placed in a genus circumscribed as leaf spots producing elliptical spores on a specific host plant, and is moved into a genus with defined conidiogenesis, we have changed the circumscription, and previous "identifications" may now be misapplied names.
Yes -- but the new taxonID is not needed because the genus name changed; the new taxonID is needed because the implied set of members of the child set changed (due to the new character-based definition). In other words, the act of changing the genus didn't change the circumscription scope; the change in the character definition of the child taxon changed in. Even if it seems these are the same thing (i.e., that the change in character-based definition was triggered by the genus change), they are still different logical actions.
The central point is that in practice, significant experience and knowledge about past practices is required, to determine whether a new taxon ID should be given or not. Computer typists will get it wrong and experts may disagree about this.
I agree that we require significant experience and knowledge about past practices/etc. to make assertions about the logical relationships between two circumscribed sets represented by two taxonID values. But not for the reason you suggest -- rather, for the reson I described above.
My conclusion is that if every change in genus name is given a new ID, the system is manageable after assignment of ID. Computers can synonymize IDs (sameAs), but they cannot retrospectively split them. :-)
Right -- and this comes back to my fundamental problem with the notion of a taxonID. In order for it to be useful, we need a very clear and shared understanding of the entity that is represented by the taxonID. What we need to do is atomize the ID's -- and this is best done through the TaxonNameUsage object (which is relatively simple to define). So in answer toy your last quote above, in my mind *every* TNU is a "potential taxon" (sensu Berendsohn), so *these* are the Ids we should be using and re-using when we define taxon concepts. My vague understanding of a taxonID is that it is established to represent a set of tnuID's. If it were that simple, then taxonID really just becomes a subclass of tnuID. However, I don't think it's that simple, because the definition of a taxonID is not limited to simply sets of tnuIDs -- it can also be established through member-specimens, morphological characters, genetic characters, etc. Although I think it's logically possible to inherit all these things *through* tnuIDs (after all, anytime anyone ever asserts that a taxon is defined by a set of specimens or characters, a new tnuID is born); we're a long way from having all that sort of information parsed and digitized (as per Kevin's earlier comment).
So, for purely *practical* reasons, I think the main justification for even defining "taxonID" (as more than just a subclass of tnuID), is that our content base for tnuID's is still small, and the inheritance of things like specimens and characters through TNU instances is not yet well-developed. For this reason -- alluded to by Kevin -- I see a practical value to having a taxonID.
But we *still* need to define what it is!
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Excellent summary, Greg!
We must assume, as Gregor has noted, that in the absence of explicit statements to the contrary all taxonomic events are unique ... and if they aren't, we simply have two that are the same.
Yes. This turns my vague efforts of gaining an operational perspective into an actual operationally useful instruction to those who have to assign IDs. :-) When in doubt, a (potentially) duplicate ID must be assigned!
Gregor
It sounds like we're in agreement on what needs to be done! It's just a semantic problem of labelling the identifer we use to represent this granular "thing". The several elements in DwC that include the word "Usage" are examples of these identifiers; but what is (still) not clear to me is what taxonID is intended for.
Rich
-----Original Message----- From: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Sent: Tuesday, July 06, 2010 4:13 AM To: greg whitbread Cc: Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards Subject: Re: [tdwg-content] Taxon Concept dilemma [SEC=UNCLASSIFIED]
Excellent summary, Greg!
We must assume, as Gregor has noted, that in the absence of
explicit
statements to the contrary all taxonomic events are unique
... and if
they aren't, we simply have two that are the same.
Yes. This turns my vague efforts of gaining an operational perspective into an actual operationally useful instruction to those who have to assign IDs. :-) When in doubt, a (potentially) duplicate ID must be assigned!
Gregor
well, before scientificNameID and in particular taxonConceptID was added to darwin core is was intended to be a flexible identifier for a taxonomic or nomenclatural unit of your choice. Just like occurrenceID doesnt preclude if we are talking about a specimen, observation or something else, the taxonID was just a primary key to a scientific name related "object". Im trying deliberately here not to use taxon, name, concept or name usage as any of these were intended to be identified with the taxonID. As "taxon"ID for taxonomists seem to refer only to taxa and not names, we had a problem finding an adequate name and as you might remember changed the terminology quite some times to usageID, nameID, speciesID, nameUsageID before finally ending up with taxonID again. But now that there is also a taxonConceptID and scientificNameID things are harder to define I suppose.
Markus
On Jul 7, 2010, at 16:23, Richard Pyle wrote:
It sounds like we're in agreement on what needs to be done! It's just a semantic problem of labelling the identifer we use to represent this granular "thing". The several elements in DwC that include the word "Usage" are examples of these identifiers; but what is (still) not clear to me is what taxonID is intended for.
Rich
-----Original Message----- From: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Sent: Tuesday, July 06, 2010 4:13 AM To: greg whitbread Cc: Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards Subject: Re: [tdwg-content] Taxon Concept dilemma [SEC=UNCLASSIFIED]
Excellent summary, Greg!
We must assume, as Gregor has noted, that in the absence of
explicit
statements to the contrary all taxonomic events are unique
... and if
they aren't, we simply have two that are the same.
Yes. This turns my vague efforts of gaining an operational perspective into an actual operationally useful instruction to those who have to assign IDs. :-) When in doubt, a (potentially) duplicate ID must be assigned!
Gregor
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
One other way to say it is that every Darwin Core Class (Occurrence, Location, Event, Taxon, Identification...) has an identifier. taxonID is no more than an identifier for the Taxon Class.
On Wed, Jul 7, 2010 at 7:32 AM, Markus Döring m.doering@mac.com wrote:
well, before scientificNameID and in particular taxonConceptID was added to darwin core is was intended to be a flexible identifier for a taxonomic or nomenclatural unit of your choice. Just like occurrenceID doesnt preclude if we are talking about a specimen, observation or something else, the taxonID was just a primary key to a scientific name related "object". Im trying deliberately here not to use taxon, name, concept or name usage as any of these were intended to be identified with the taxonID. As "taxon"ID for taxonomists seem to refer only to taxa and not names, we had a problem finding an adequate name and as you might remember changed the terminology quite some times to usageID, nameID, speciesID, nameUsageID before finally ending up with taxonID again. But now that there is also a taxonConceptID and scientificNameID things are harder to define I suppose.
Markus
On Jul 7, 2010, at 16:23, Richard Pyle wrote:
It sounds like we're in agreement on what needs to be done! It's just a semantic problem of labelling the identifer we use to represent this granular "thing". The several elements in DwC that include the word "Usage" are examples of these identifiers; but what is (still) not clear to me is what taxonID is intended for.
Rich
-----Original Message----- From: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Sent: Tuesday, July 06, 2010 4:13 AM To: greg whitbread Cc: Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards Subject: Re: [tdwg-content] Taxon Concept dilemma [SEC=UNCLASSIFIED]
Excellent summary, Greg!
We must assume, as Gregor has noted, that in the absence of
explicit
statements to the contrary all taxonomic events are unique
... and if
they aren't, we simply have two that are the same.
Yes. This turns my vague efforts of gaining an operational perspective into an actual operationally useful instruction to those who have to assign IDs. :-) When in doubt, a (potentially) duplicate ID must be assigned!
Gregor
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Actually this sounds a lot similar to ontology evolution best practices. For example, for OBO ontologies*, changes of a class label do not trigger an ID change. Changing the class definition (except for typographical corrections), however, does, even if the label stays the same. I.e., the ID is for the definition, not the label (term name). If classes are moved to a different parent, it will trigger an ID change too because it will inevitably mean a change of the definition.
Just a thought that occurred to me - not sure this is useful here :-)
-hilmar
(*) I don't know actually whether such ontology engineering best practices exist for the TDWG ontologies too.
On Jul 5, 2010, at 9:31 AM, David Remsen (GBIF) wrote:
Is it worth listing properties that do or do not impact the concept
Scientific Name presents as least three cases
- The name doesn't change at all but the circumscription does,
warranting a concept change (a new taxon ID)
If you recall the case of Vireo solitarius I used in the 13 June mail, the same name refers to different circumscriptions. One method for distinguishing these is based on the (to borrow from RIch's terminology) "protonym count" of the material included in the circumscription. There are some problems with this (unintended or temporally-based omissions) but it provides one pretty good basis for making gross comparisons.
- The name changes but this, in itself, does not warrant a concept
change (a new taxonID) because the circumscription does not change.
When a species is transferred from one genus to another, there is no direct impact on circumscription. Thus, if two names share a common protonymID, there may not be a need to change the taxon ID. In this case, we are back to 1 above.
- The name changes and this, in itself, requires a new taxonID.
If a taxon is renamed and the new name does not share the same protonymID as the previous name, it should be given a new concept ID. There may be exceptions to this (for replacement names perhaps?) but surely the rule of priority would only result in this case if the circumscription (concept) changed.
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept identifier change?
- David
On Jul 5, 2010, at 3:53 AM, Markus Döring wrote:
Rich, Kevin as one of the main supporter of the taxonID the final decision what stable piece of information it reflects to me sits with the publisher/author of the data. The problem Kevin describes is very common and comes down to the difficulty to describe what information really is stable in taxonomy. The name may change, the classification, the textual description, the distribution and its probably impossible to automatically tell whether anything significant has changed or if only small "corrections" have been done. Even for humans its a challenge to compare several textual descriptions and decide whether its the same thing or not. I guess this is where some people see the conceptID to come into play to give some long term stability - but I still dont see much gain in another ID if you cannot tell what pieces of the information behind that ID is stable over time. I still believe with scientificNameID and taxonID we are well equipped to deal with our data. I tend to think someone who publishes the data should have a good thought about how they assign and change taxonIDs and try to announce what they consider stable, what is versioned or what can change anytime without changing the meaning, i.e. the ID. But I would assume this might be different for different databases focussing on different aspects of taxonomy in particular.
For purely automatically aggregated data in Checklist Bank I decided to assign stable taxonIDs to entire lexical group of names that at least have one qualified name, i.e. with proper authorship. The classification, the spelling and authorship of the preferred, representive name for that group might change - but at least you have some definition of stability which is really hard to define. For lexical name groups that only contain bare canonical names I am assigning volatile IDs right now, as these might dramatically change with more knowledge about them. I dont know if this could also apply to Kevins problem - do you treat the same, qualified name in multiple concepts? If thats the case then surely you need something else to define a stable taxonID.
Markus
On Jul 5, 2010, at 9:26, Richard Pyle wrote:
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re- used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept relationships. Eg
A Taxon table that contains the nomenclatural details +
accepted name + parent name
Concept + relationship tables that contain details about
the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Jul 5, 2010, at 4:59 PM, Richard Pyle wrote:
Hi Himar,
If classes are moved to a different parent, it will trigger an ID change too because it will inevitably mean a change of the definition.
Do you mean it triggers an ID change to the parent, or to the child, or both?
The child. In an ontology, a class' definition is (or at least ought to be) independent of its children.
However, a class being moved to another parent might be triggered by the definition of that new parent having changed. In that case, obviously both IDs would change.
-hilmar
Hi Dave,
I agree with #1 and #2, but I can't think of an example for #3 that doesn't fall into your #2 (assuming we're in agreement that taxonID = circumscription).
You already mentioned the circumstance of a replacement name, wherby the protonymID changes but the circumscription remains the same. Another example is that a nomenclaturist might discover an older name than the one most people use for the concept. In that case, the circumscription remains the same (because if the older name replaces the current name, by definition the protonymID for the older name falls within the circumscription represented by the current name), but the older ("senior") protonymID (and hence the terminal epithet) changes. Thus, in this case, the name chages but the circumscription remains the same (your #2).
A more commom example of what I think you mean by #3 is that the circumscription represented by protonymID 1 is subjectively synonymized with the circumscription represented by protonymID 2, which results in a new circumscription (the union of the original two) that didn't previously exist, and hence requires a new taxonID. I agree that a new taxonID is needed, but this isn't so much a case where a new taxonID is needed *because* a name changed. Rather, a new concept was defined (causing the need for a new taxonID), and as a result the name that was originally applied to one of the concepts (i.e., that represented by protonymID 1) changes. Stated another way; the set of organisms included in the original circumscription represented by protonymID 1 are now represented by protonymID 2. Of course, the original set of organisms contained within the original circumscription represented by protonymID 2 are still represented by protonymID 2.
But the point, is, I can't see any example of where a name change *causes* the need for a new taxonID. I only see a (partial) name change as the *result* of a newly defined (combined) taxon concept. Maybe this is just semantics; but I would rephrase your #3 as "Two different concepts represented by two different names are synonymized, causing the need for a new taxonID, which is represented by only one of the original two names."
I definitely agree with your point about the higher taxonomy (including genus placement) not affecting the circumscription; and hence if taxonID=circumscription, then there would be no need for a new taxonID when the higher clasification or genus combination changes. But I'm not sure everyone agrees with this.
Rich
-----Original Message----- From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Monday, July 05, 2010 3:32 AM To: Markus Döring Cc: David Remsen (GBIF); Richard Pyle; Kevin Richards; tdwg-content@lists.tdwg.org Mailing List Subject: Re: [tdwg-content] Taxon Concept dilemma
Is it worth listing properties that do or do not impact the concept
Scientific Name presents as least three cases
- The name doesn't change at all but the circumscription does,
warranting a concept change (a new taxon ID)
If you recall the case of Vireo solitarius I used in the 13 June mail, the same name refers to different circumscriptions. One method for distinguishing these is based on the (to borrow from RIch's terminology) "protonym count" of the material included in the circumscription. There are some problems with this (unintended or temporally-based omissions) but it provides one pretty good basis for making gross comparisons.
- The name changes but this, in itself, does not warrant a
concept change (a new taxonID) because the circumscription does not change.
When a species is transferred from one genus to another, there is no direct impact on circumscription. Thus, if two names share a common protonymID, there may not be a need to change the taxon ID. In this case, we are back to 1 above.
- The name changes and this, in itself, requires a new taxonID.
If a taxon is renamed and the new name does not share the same protonymID as the previous name, it should be given a new concept ID. There may be exceptions to this (for replacement names perhaps?) but surely the rule of priority would only result in this case if the circumscription (concept) changed.
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept identifier change?
- David
On Jul 5, 2010, at 3:53 AM, Markus Döring wrote:
Rich, Kevin as one of the main supporter of the taxonID the final decision what stable piece of information it reflects to me sits with the publisher/author of the data. The problem Kevin describes is very common and comes down to the difficulty to describe what information really is stable in
taxonomy.
The name may change, the classification, the textual
description, the
distribution and its probably impossible to automatically
tell whether
anything significant has changed or if only small
"corrections" have
been done. Even for humans its a challenge to compare
several textual
descriptions and decide whether its the same thing or not. I guess this is where some people see the conceptID to come into
play to give
some long term stability - but I still dont see much gain
in another
ID if you cannot tell what pieces of the information behind
that ID is
stable over time. I still believe with scientificNameID and
taxonID we
are well equipped to deal with our data. I tend to think
someone who
publishes the data should have a good thought about how they assign and change taxonIDs and try to announce what they consider stable, what is versioned or what can change anytime without changing the meaning, i.e. the ID. But I would assume this might be
different for
different databases focussing on different aspects of taxonomy in particular.
For purely automatically aggregated data in Checklist Bank
I decided
to assign stable taxonIDs to entire lexical group of names that at least have one qualified name, i.e. with proper authorship. The classification, the spelling and authorship of the preferred, representive name for that group might change - but at
least you have
some definition of stability which is really hard to define. For lexical name groups that only contain bare canonical names I am assigning volatile IDs right now, as these might
dramatically change
with more knowledge about them. I dont know if this could
also apply
to Kevins problem - do you treat the same, qualified name
in multiple
concepts? If thats the case then surely you need something else to define a stable taxonID.
Markus
On Jul 5, 2010, at 9:26, Richard Pyle wrote:
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the
"atoms" (as Dave
R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely
shared/re- used IDs
on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine
taxonID in DwC
would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But
I tend to
think these can instead by dynamic services, that assemble
the sets
either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a
common sense for
what is intended to be represented by taxonID. I suspect
my own view
is not shared by all (or even most).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on
We have some data that covers Taxa, Names and Concept
relationships.
Eg
A Taxon table that contains the nomenclatural
details +
accepted name + parent name
Concept + relationship tables that contain
details about
the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name
(nomenclatural) and
the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as we understand them). (Probably
equivalent
to the taxonID in Dwc)
The problem is this tends to be much more dynamic data
ie, in this
particular case we have aggregated data from a variety of
providers
and are in continual revision of this data - as we revise the data the details such as the accepted name may change this
troubles me a
bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I
suspect this
is a common case that people find themselves in ie
revision/tidying
of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections,
so are not
changing the object itself). Should it be OK to have an object type like this, that is
likely to
change, but keep the ID permanent for it ie accept that
some object
types are quite dynamic?
The only other option is to maintain a hideous version
audit trail,
that probably hinders the use of the data more than it
benefits the
end user by providing stability.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not
read, use,
disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rich
I think I didn't select my words carefully enough and in fact your description is inline with what I was saying and is more clear.
DR
On Jul 5, 2010, at 4:44 PM, Richard Pyle wrote:
Hi Dave,
I agree with #1 and #2, but I can't think of an example for #3 that doesn't fall into your #2 (assuming we're in agreement that taxonID = circumscription).
You already mentioned the circumstance of a replacement name, wherby the protonymID changes but the circumscription remains the same. Another example is that a nomenclaturist might discover an older name than the one most people use for the concept. In that case, the circumscription remains the same (because if the older name replaces the current name, by definition the protonymID for the older name falls within the circumscription represented by the current name), but the older ("senior") protonymID (and hence the terminal epithet) changes. Thus, in this case, the name chages but the circumscription remains the same (your #2).
A more commom example of what I think you mean by #3 is that the circumscription represented by protonymID 1 is subjectively synonymized with the circumscription represented by protonymID 2, which results in a new circumscription (the union of the original two) that didn't previously exist, and hence requires a new taxonID. I agree that a new taxonID is needed, but this isn't so much a case where a new taxonID is needed *because* a name changed. Rather, a new concept was defined (causing the need for a new taxonID), and as a result the name that was originally applied to one of the concepts (i.e., that represented by protonymID
changes. Stated another way; the set of organisms included in the original circumscription represented by protonymID 1 are now represented by protonymID 2. Of course, the original set of organisms contained within the original circumscription represented by protonymID 2 are still represented by protonymID 2.
But the point, is, I can't see any example of where a name change *causes* the need for a new taxonID. I only see a (partial) name change as the *result* of a newly defined (combined) taxon concept. Maybe this is just semantics; but I would rephrase your #3 as "Two different concepts represented by two different names are synonymized, causing the need for a new taxonID, which is represented by only one of the original two names."
I definitely agree with your point about the higher taxonomy (including genus placement) not affecting the circumscription; and hence if taxonID=circumscription, then there would be no need for a new taxonID when the higher clasification or genus combination changes. But I'm not sure everyone agrees with this.
Rich
-----Original Message----- From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Monday, July 05, 2010 3:32 AM To: Markus Döring Cc: David Remsen (GBIF); Richard Pyle; Kevin Richards; tdwg-content@lists.tdwg.org Mailing List Subject: Re: [tdwg-content] Taxon Concept dilemma
Is it worth listing properties that do or do not impact the concept
Scientific Name presents as least three cases
- The name doesn't change at all but the circumscription does,
warranting a concept change (a new taxon ID)
If you recall the case of Vireo solitarius I used in the 13 June mail, the same name refers to different circumscriptions. One method for distinguishing these is based on the (to borrow from RIch's terminology) "protonym count" of the material included in the circumscription. There are some problems with this (unintended or temporally-based omissions) but it provides one pretty good basis for making gross comparisons.
- The name changes but this, in itself, does not warrant a
concept change (a new taxonID) because the circumscription does not change.
When a species is transferred from one genus to another, there is no direct impact on circumscription. Thus, if two names share a common protonymID, there may not be a need to change the taxon ID. In this case, we are back to 1 above.
- The name changes and this, in itself, requires a new taxonID.
If a taxon is renamed and the new name does not share the same protonymID as the previous name, it should be given a new concept ID. There may be exceptions to this (for replacement names perhaps?) but surely the rule of priority would only result in this case if the circumscription (concept) changed.
Higher taxonomy
Is it generally agreed that higher taxonomy does not, in itself, impact the concept (circumscription) and therefore different classifications of a taxon are not criteria for a concept identifier change?
- David
On Jul 5, 2010, at 3:53 AM, Markus Döring wrote:
Rich, Kevin as one of the main supporter of the taxonID the final decision what stable piece of information it reflects to me sits with the publisher/author of the data. The problem Kevin describes is very common and comes down to the difficulty to describe what information really is stable in
taxonomy.
The name may change, the classification, the textual
description, the
distribution and its probably impossible to automatically
tell whether
anything significant has changed or if only small
"corrections" have
been done. Even for humans its a challenge to compare
several textual
descriptions and decide whether its the same thing or not. I guess this is where some people see the conceptID to come into
play to give
some long term stability - but I still dont see much gain
in another
ID if you cannot tell what pieces of the information behind
that ID is
stable over time. I still believe with scientificNameID and
taxonID we
are well equipped to deal with our data. I tend to think
someone who
publishes the data should have a good thought about how they assign and change taxonIDs and try to announce what they consider stable, what is versioned or what can change anytime without changing the meaning, i.e. the ID. But I would assume this might be
different for
different databases focussing on different aspects of taxonomy in particular.
For purely automatically aggregated data in Checklist Bank
I decided
to assign stable taxonIDs to entire lexical group of names that at least have one qualified name, i.e. with proper authorship. The classification, the spelling and authorship of the preferred, representive name for that group might change - but at
least you have
some definition of stability which is really hard to define. For lexical name groups that only contain bare canonical names I am assigning volatile IDs right now, as these might
dramatically change
with more knowledge about them. I dont know if this could
also apply
to Kevins problem - do you treat the same, qualified name
in multiple
concepts? If thats the case then surely you need something else to define a stable taxonID.
Markus
On Jul 5, 2010, at 9:26, Richard Pyle wrote:
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the
"atoms" (as Dave
R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely
shared/re- used IDs
on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine
taxonID in DwC
would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But
I tend to
think these can instead by dynamic services, that assemble
the sets
either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a
common sense for
what is intended to be represented by taxonID. I suspect
my own view
is not shared by all (or even most).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept
relationships.
Eg
A Taxon table that contains the nomenclatural
details +
accepted name + parent name
Concept + relationship tables that contain
details about
the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name
(nomenclatural) and
the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably
equivalent
to the taxonID in Dwc)
The problem is this tends to be much more dynamic data –
ie, in this
particular case we have aggregated data from a variety of
providers
and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this
troubles me a
bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I
suspect this
is a common case that people find themselves in – ie
revision/tidying
of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections,
so are not
changing the object itself). Should it be OK to have an object type like this, that is
likely to
change, but keep the ID permanent for it – ie accept that
some object
types are quite dynamic?
The only other option is to maintain a hideous version
audit trail,
that probably hinders the use of the data more than it
benefits the
end user by providing “stability”.
Any thoughts?
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not
read, use,
disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I completely agree with you Rich about the 'atoms', and nailing those down first. This is the ideal situation of course, and could be, I suspect, applied to all name datasets that exist if we persisted enough. There are some cases where they dont have any details about the 'atoms' - in which case I suppose there would be required a huge effort on behalf of specialists to untangle the mess - but isdoable.
I think it would be too restrictive to not apply IDs to the "sets of TNU IDs" though. Especially as this "set" is what is used in a lot of cases to relate to things like distribution, biostatus, vernacular names, etc.
Perhaps we need a general rule though to not allow a Taxon without a base TNU :-)
I am also fairly uncomfortable with applying IDs to the reasonably dynamic Taxon, but so long as we tie them to TNUs, we will always have the building blocks to work with.
Kevin
________________________________ From: Richard Pyle [deepreef@bishopmuseum.org] Sent: Monday, 5 July 2010 7:26 p.m. To: Kevin Richards; tdwg-content@lists.tdwg.org Subject: RE: [tdwg-content] Taxon Concept dilemma
This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re-used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts.
So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most).
Rich
________________________________ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma
Hello all,
I have an issue that I would like some comment on…
We have some data that covers Taxa, Names and Concept relationships. Eg
- A Taxon table that contains the nomenclatural details + accepted name + parent name
- Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference
We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)
The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/tidying of aggregated datasets must be quite common.
I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?
The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.
Any thoughts?
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
participants (9)
-
David Remsen (GBIF)
-
greg whitbread
-
Gregor Hagedorn
-
Hilmar Lapp
-
John Wieczorek
-
Kevin Richards
-
Markus Döring
-
Richard Pyle
-
Tim Robertson (GBIF)