Taxon Concept dilemma

Kevin Richards

4 Jul 2010 4 Jul '10

20:43

Hello all, I have an issue that I would like some comment on... We have some data that covers Taxa, Names and Concept relationships. Eg - A Taxon table that contains the nomenclatural details + accepted name + parent name - Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc) The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common. I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic? The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability". Any thoughts? Kevin ________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz

Attachments:

attachment.html (text/html — 7.5 KB)

Show replies by date

Tim Robertson (GBIF)

4 Jul 4 Jul

22:09

Hi Kevin, It seems fine to me Kevin, as long as the intention remains the same. I would imagine slow changing content behind the object (revisions in higher names perhaps) and not complete rewrites of the content. Is there anything inside the Taxon that you can always rely on (something immutable)? E.g. a constant ID from the nomenclatural organisation level, or perhaps some lexcal group ID? Perhaps you could somehow select one reference ID from the nomenclatural or concept level as the Taxon ID, and then you could promote the Taxon as a dynamically assembled object with the best representation (at request time) for ConceptX. This is then more analogous to getTaxonFor(conceptX) than getTaxonById(X). The alternative to what you are suggesting would presumably be creating new ID for any change, which seems like it would be difficult to keep anything in sync, unless I can subscribe to and understand your changes constantly. Cheers, Tim On Jul 5, 2010, at 5:43 AM, Kevin Richards wrote:

...

Hello all,

I have an issue that I would like some comment on…

We have some data that covers Taxa, Names and Concept relationships. Eg - A Taxon table that contains the nomenclatural details + accepted name + parent name - Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) – this is specifically a link between the Name and a Reference

We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as “we” understand them). (Probably equivalent to the taxonID in Dwc)

The problem is this tends to be much more dynamic data – ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change – this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in – ie revision/ tidying of aggregated datasets must be quite common.

I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it – ie accept that some object types are quite dynamic?

The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing “stability”.

Any thoughts?

Kevin

Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Richard Pyle

5 Jul 5 Jul

00:36

I think that the basic problem is, as with a taxon "name", there are as many notions for what is meant by a taxon "concept" as there are people who have uttered the words. I think that Peter D. has a good example of what a taxonID might resolve to. But the question is, which bits of metadata can change under the same taxonID, and which bits would prompt the generation of a new taxonID? This is the fundamental problem I've always had with creating permanent GUIDs for "taxon concepts" (by anyone's definition, let alone *everyone's* definition). I've had a similar question concerning ITIS TSNs. It's still not entirely clear to me when something generates a new TSN, vs. when something represents a correction or amendment (e.g., via a comment) to an existing TSN. So, for example. Suppose authority "X" has a well-defined and metadata-rich taxonID for their concept of "Aus bus". Later on, they decide that the species "bus" should be moved to the genus "Xus". None of the other metadata (occurence records, diagnostic characters, associated DNA sequences, etc., etc.) have changed. *Only* the genus placement has changed. Should Authority "X" brand a new taxonID for this? In my mind, no. Because in my mind, the species concept is defined only by the stuff related to "bus". Changing the genus doesn't affect the contents or concept of "bus", it only changes the concept circumscription for the genus "Xus" (and for "Aus", if it is retained as a valid genus that no longer includes the species "bus"). I doubt, however, that there would be universal agreement about this. Rich _____ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: Sunday, July 04, 2010 7:09 PM To: Kevin Richards Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Taxon Concept dilemma Hi Kevin, It seems fine to me Kevin, as long as the intention remains the same. I would imagine slow changing content behind the object (revisions in higher names perhaps) and not complete rewrites of the content. Is there anything inside the Taxon that you can always rely on (something immutable)? E.g. a constant ID from the nomenclatural organisation level, or perhaps some lexcal group ID? Perhaps you could somehow select one reference ID from the nomenclatural or concept level as the Taxon ID, and then you could promote the Taxon as a dynamically assembled object with the best representation (at request time) for ConceptX. This is then more analogous to getTaxonFor(conceptX) than getTaxonById(X). The alternative to what you are suggesting would presumably be creating new ID for any change, which seems like it would be difficult to keep anything in sync, unless I can subscribe to and understand your changes constantly. Cheers, Tim On Jul 5, 2010, at 5:43 AM, Kevin Richards wrote: Hello all, I have an issue that I would like some comment on. We have some data that covers Taxa, Names and Concept relationships. Eg - A Taxon table that contains the nomenclatural details + accepted name + parent name - Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc) The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common. I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic? The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability". Any thoughts? Kevin _____ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Richard Pyle

00:26

This is why I'm very uncormfortable with the entire notion of "taxonID". The main reason I'm pushing so hard for taxonNameUsageID's (ala GNUB) is that these are the "atoms" (as Dave R. calls them) of both nomenclature *and* most existing concept definitions. If we can get permanent and widely shared/re-used IDs on these "atoms", then we can assmble the complex molecules from them. Someone's notion of a taxon concept then becomes a set of TNUID's. I have mixed feelings about branding these sets with permanent GUIDs; but if we did, this is what I imagine taxonID in DwC would (ultimately) represent. If we want to archive the sets for posterity, then we can certainly brand them with IDs. But I tend to think these can instead by dynamic services, that assemble the sets either algorithmically, or through the fingertips of experts. So...I guess before we do anything, we need to get a common sense for what is intended to be represented by taxonID. I suspect my own view is not shared by all (or even most). Rich _____ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Sunday, July 04, 2010 5:44 PM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Taxon Concept dilemma Hello all, I have an issue that I would like some comment on. We have some data that covers Taxa, Names and Concept relationships. Eg - A Taxon table that contains the nomenclatural details + accepted name + parent name - Concept + relationship tables that contain details about the name + references where the name has been used in a taxonomic sense (ie not nomenclatural information) - this is specifically a link between the Name and a Reference We have fairly permanent Ids for the Taxon Name (nomenclatural) and the Concepts, but I now what to consider the ID to cover the whole Taxon (ie the Nomenclatural data + taxon rank + parent name + accepted name, etc, as "we" understand them). (Probably equivalent to the taxonID in Dwc) The problem is this tends to be much more dynamic data - ie, in this particular case we have aggregated data from a variety of providers and are in continual revision of this data - as we revise the data the details such as the accepted name may change - this troubles me a bit, because this could be seen as fundamentally changing the definition of the object behind the taxonID. However, I suspect this is a common case that people find themselves in - ie revision/tidying of aggregated datasets must be quite common. I would prefer to NOT change the taxonID every time we revise that data (taking the angle that these changes are corrections, so are not changing the object itself). Should it be OK to have an object type like this, that is likely to change, but keep the ID permanent for it - ie accept that some object types are quite dynamic? The only other option is to maintain a hideous version audit trail, that probably hinders the use of the data more than it benefits the end user by providing "stability". Any thoughts? Kevin _____ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz