Re: [tdwg-tag] [tdwg-content] canonicalScientificName
Apart from the fact that I can barely bring myself to care about plants ;) I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Regards
Rod
On 14 Mar 2012, at 19:10, Dmitry Mozzherin wrote:
To add to Gregor's post
scientificNameAuthorship common usage assumes that a name is a linear construct which is true for many cases. However in general scientific name is a tree, and it is most obvious with names of hybrids. So when people try to apply solutions like splitting name to canonical form and authorship, it won't work for everybody and everything
Dima
On Wed, Mar 14, 2012 at 2:58 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
- When providers had a text blob for the name, separate from the text blob
for the authorship, they could concatenate the two for presentation in scientificName, and also provide the authorship bit in scientificNameAuthorship, and the consumers could easily strip the authorship from scientificName to produce the functional equivalent of a canonical name.
this does not work for autonyms in botany, i.e. the infraspecific taxa where the infraspecific epithet and the specific epithet are the same. Here the author is before the infraspecific epithet. Inconvenient, but prevents the simplest solution.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
On 14 March 2012 20:17, Roderic Page r.page@bio.gla.ac.uk wrote:
Apart from the fact that I can barely bring myself to care about plants ;) I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
The first argument is valid, the second not. A high fraction of plant names have infraspecific taxa, and by necessity any such name has one autonym plus one or several separate infraspecific names.
Yes, usage of infraspecific names is considerably lower than usage of specific names, but it is not an edge case. Even hybrids are not an edge case - I really whish I could forget about them, but they always crop up and annoy me...
Gregor
Sure, for prototypes and proof of concept things, edge cases can be left apart. But some people and institutions also use Darwin Core for exchanging data between big databases. It is a simple and powerful tool for that, especially coupled with DwC/A.
When records doesn't fit in standard terms and you cannot afford losing the information, the only solution is to add your own custom field. I am clearly happier when I find the term I need in the standard.
As canonicalScientificName is both useful enough and clearly defined I vote for adding it to the standard.
Regards,
Simon.
Le 14/03/2012 20:17, Roderic Page a écrit :
Apart from the fact that I can barely bring myself to care about plants ;) I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Regards
Rod
Hi all,
Now that a "formal" proposal for a new term has been added [1], should this discussion continue here or on the issue tracker? To my knowledge that is the only forum that will be used for a new term when considering if it will get accepted.
When records doesn't fit in standard terms and you cannot afford losing the information, the only solution is to add your own custom field. I am clearly happier when I find the term I need in the standard.
I'm, not sure this not applies in this case Simon. Is there really a scenario where you would put something in canonicalScientificName that you *could not* put in dwc:scientificName and would require a new term?
As canonicalScientificName is both useful enough and clearly defined I vote for adding it to the standard.
I've actually challenged this and believe it is not clearly defined [1], as it does not deal detail what to do when you are filling both, nor how a consumer would deal with apparently conflicting information. While the same is true for dwc:genus etc, this one is *so close* to dwc:scientificName and in many cases could legitimately have the same content, guidelines are going to be needed for consistent use. I fear a repeat of the consumption mess dealing with conflicting abcd:catalogNumber and abcd:catalogNumberNumeric unwittingly being used inconsistently across resources.
I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Rod, for the vast majority of names a couple of regular expressions suffice anyway - right? We don't need a new term to deal with the easy stuff, and for the hard stuff this term doesn't help anyway.
Cheers, Tim
[1 ] http://code.google.com/p/darwincore/issues/detail?id=150&colspec=ID%20Ty...
On Mar 15, 2012, at 10:22 AM, Simon Chagnoux wrote:
Sure, for prototypes and proof of concept things, edge cases can be left apart. But some people and institutions also use Darwin Core for exchanging data between big databases. It is a simple and powerful tool for that, especially coupled with DwC/A.
When records doesn't fit in standard terms and you cannot afford losing the information, the only solution is to add your own custom field. I am clearly happier when I find the term I need in the standard.
As canonicalScientificName is both useful enough and clearly defined I vote for adding it to the standard.
Regards,
Simon.
Le 14/03/2012 20:17, Roderic Page a écrit :
Apart from the fact that I can barely bring myself to care about plants ;) I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Regards
Rod
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi all,
Since most of the discussion happened on this tdwg-content and tdwg-tag mailing lists already, can't we continue here? I created a link to both from the issue page: http://code.google.com/p/darwincore/issues/detail?id=150#c2
I have been stuck in a meeting all day, while Tim Roberston wrote some convincing arguments against creating a canonicalScientificName term ( http://code.google.com/p/darwincore/issues/detail?id=150#c1), as did Rich Pyle (email March 14 17:20 GMT-10:00). I will need some time to think about these. :-) I will try to write a coherent response tomorrow.
Peter
On Thu, Mar 15, 2012 at 06:38, Tim Robertson [GBIF] trobertson@gbif.orgwrote:
Hi all,
Now that a "formal" proposal for a new term has been added [1], should this discussion continue here or on the issue tracker? To my knowledge that is the only forum that will be used for a new term when considering if it will get accepted.
When records doesn't fit in standard terms and you cannot afford losing the information, the only solution is to add your own custom field. I am clearly happier when I find the term I need in the standard.
I'm, not sure this not applies in this case Simon. Is there really a scenario where you would put something in canonicalScientificName that you *could not* put in dwc:scientificName and would require a new term?
As canonicalScientificName is both useful enough and clearly defined I
vote for adding it to the standard.
I've actually challenged this and believe it is not clearly defined [1], as it does not deal detail what to do when you are filling both, nor how a consumer would deal with apparently conflicting information. While the same is true for dwc:genus etc, this one is *so close* to dwc:scientificName and in many cases could legitimately have the same content, guidelines are going to be needed for consistent use. I fear a repeat of the consumption mess dealing with conflicting abcd:catalogNumber and abcd:catalogNumberNumeric unwittingly being used inconsistently across resources.
I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Rod, for the vast majority of names a couple of regular expressions suffice anyway - right? We don't need a new term to deal with the easy stuff, and for the hard stuff this term doesn't help anyway.
Cheers, Tim
[1 ] http://code.google.com/p/darwincore/issues/detail?id=150&colspec=ID%20Ty...
On Mar 15, 2012, at 10:22 AM, Simon Chagnoux wrote:
Sure, for prototypes and proof of concept things, edge cases can be left apart. But some people and institutions also use Darwin Core for exchanging data between big databases. It is a simple and powerful tool for that, especially coupled with DwC/A.
When records doesn't fit in standard terms and you cannot afford losing the information, the only solution is to add your own custom field. I am clearly happier when I find the term I need in the standard.
As canonicalScientificName is both useful enough and clearly defined I vote for adding it to the standard.
Regards,
Simon.
Le 14/03/2012 20:17, Roderic Page a écrit :
Apart from the fact that I can barely bring myself to care about plants ;) I suspect that the vast majority of names do not present these problems. Why do we let edge cases determine what we do?
Regards
Rod
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi all,
The reason why I proposed a canonicalScientificName was to make it easier for data users.
As Tim Robertson points out [1], adding this term will probably make it more complicated, without much benefit. Gregor Hagedorn [2] explains that canonicalScientificName is not a solution for some edge cases (like hybrids) and infraspecific names without a rank marker don't mean much in botany. Rich Pyle [3] points out that more requests will probably be proposed (canonicalScientificNameWithoutRanks, canonicalScientificNameWithoutInfrageneric), demonstrated by Gregor's remark [2].
I now agree with all of these. The only thing I'd like to refine is the definition for genus [4], for which I issued a request: http://code.google.com/p/darwincore/issues/detail?id=151
See my argumentation in the link above. Basically: As I explained before (and Markus Döring before me), under the current definition the genus gets populated with the genus name of the accepted taxon for a synonym, while the specificEpithet and infraspecificEpithet are not. I think this is counter intuitive and confusing. I would populate it with the "genus name of the scientificName", which I think is how much people interpret it anyway. Advantages:
1. Agreeing with the refined definition for genus won't affect most of our applications and data (only for those who cared about putting the accepted genus for synonyms). 2. No new term canonicalScientificName. 3. Not need to update the useful definition for scientificName (verbose all the way if you can). 4. Publishers and aggregators *have the option* to provide an easier to use name via genus, specificEpithet and infraspecificEpithet. None of these have an authorship, so creating a canonicalScientificName under the proposed definition is as easy as TRIM(genus+" "+specificEpithet+" "+infraspecificEpithet). 5. The above statement only applies to genera, species and infraspecific taxa, but this is the bulk of our data. This method cannot be applied to infrageneric taxa and higher taxa, but as Rich pointed out [3], there are alternative methods for this. 6. Aggregators can ignore genus, specificEpithet and infraspecificEpithet for heterogenous networks and use parsers to deal with scientificNames. Stripping out the scientificNameAuthorship or using a simple regular expression won't sometimes be enough of course, e.g. "*Calamagrostis* * stricta* (Timm) Koeler subsp. *stricta* (Timm) Koeler var. *borealis*(Laestadius) Hartman" and those pesky hybrids. The good this is that once they have done the work, they can actually express that data in Darwin Core (see point 4). 7. Timon lepidus won't complain [5]. :-)
Regards,
Peter
[1] http://code.google.com/p/darwincore/issues/detail?id=150#c1 [2] http://code.google.com/p/darwincore/issues/detail?id=150#c3 [3] http://lists.tdwg.org/pipermail/tdwg-tag/2012-March/002487.html [4] http://rs.tdwg.org/dwc/terms/index.htm#genus [5] https://plus.google.com/114672072317054763788/posts/Nph2ksggNZW
On Thu, Mar 15, 2012 at 17:28, Peter Desmet peter.desmet@umontreal.cawrote:
Hi all,
Since most of the discussion happened on this tdwg-content and tdwg-tag mailing lists already, can't we continue here? I created a link to both from the issue page: http://code.google.com/p/darwincore/issues/detail?id=150#c2
I have been stuck in a meeting all day, while Tim Roberston wrote some convincing arguments against creating a canonicalScientificName term ( http://code.google.com/p/darwincore/issues/detail?id=150#c1), as did Rich Pyle (email March 14 17:20 GMT-10:00). I will need some time to think about these. :-) I will try to write a coherent response tomorrow.
Peter
Peter, Are you now withdrawing your formal request to add the term canonicalScientificName to DwC? I had forwarded your request to John Wieczorek to notify him of the initiation of the change process on Google Code.
Frankly, I was hoping for a broader comment period involving more of the actual “users” that have been mentioned in the various threads, particularly more of the plant name users, to put some more balance into the discussion. Then, let the process go through to an up or down decision on the request based on that (hopefully) broader discussion. Discounting hybrid names as “an edge case” seemed a bit edgy and worthy of some more reaction from actual users/consumers of plant name data before being taken as consensus.
But, are you withdrawing the request?
Thanks, Chuck
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Peter Desmet Sent: Friday, March 16, 2012 4:52 PM To: Tim Robertson [GBIF] Cc: TDWG content mailing list; tdwg-tag@lists.tdwg.org Subject: Re: [tdwg-tag] [tdwg-content] canonicalScientificName
Hi all,
The reason why I proposed a canonicalScientificName was to make it easier for data users.
As Tim Robertson points out [1], adding this term will probably make it more complicated, without much benefit. Gregor Hagedorn [2] explains that canonicalScientificName is not a solution for some edge cases (like hybrids) and infraspecific names without a rank marker don't mean much in botany. Rich Pyle [3] points out that more requests will probably be proposed (canonicalScientificNameWithoutRanks, canonicalScientificNameWithoutInfrageneric), demonstrated by Gregor's remark [2].
I now agree with all of these. The only thing I'd like to refine is the definition for genus [4], for which I issued a request: http://code.google.com/p/darwincore/issues/detail?id=151
See my argumentation in the link above. Basically: As I explained before (and Markus Döring before me), under the current definition the genus gets populated with the genus name of the accepted taxon for a synonym, while the specificEpithet and infraspecificEpithet are not. I think this is counter intuitive and confusing. I would populate it with the "genus name of the scientificName", which I think is how much people interpret it anyway. Advantages:
1. Agreeing with the refined definition for genus won't affect most of our applications and data (only for those who cared about putting the accepted genus for synonyms). 2. No new term canonicalScientificName. 3. Not need to update the useful definition for scientificName (verbose all the way if you can). 4. Publishers and aggregators *have the option* to provide an easier to use name via genus, specificEpithet and infraspecificEpithet. None of these have an authorship, so creating a canonicalScientificName under the proposed definition is as easy as TRIM(genus+" "+specificEpithet+" "+infraspecificEpithet). 5. The above statement only applies to genera, species and infraspecific taxa, but this is the bulk of our data. This method cannot be applied to infrageneric taxa and higher taxa, but as Rich pointed out [3], there are alternative methods for this. 6. Aggregators can ignore genus, specificEpithet and infraspecificEpithet for heterogenous networks and use parsers to deal with scientificNames. Stripping out the scientificNameAuthorship or using a simple regular expression won't sometimes be enough of course, e.g. "Calamagrostis stricta (Timm) Koeler subsp. stricta (Timm) Koeler var. borealis (Laestadius) Hartman" and those pesky hybrids. The good this is that once they have done the work, they can actually express that data in Darwin Core (see point 4). 7. Timon lepidus won't complain [5]. :-)
Regards,
Peter
[1] http://code.google.com/p/darwincore/issues/detail?id=150#c1 [2] http://code.google.com/p/darwincore/issues/detail?id=150#c3 [3] http://lists.tdwg.org/pipermail/tdwg-tag/2012-March/002487.html [4] http://rs.tdwg.org/dwc/terms/index.htm#genus [5] https://plus.google.com/114672072317054763788/posts/Nph2ksggNZW
On Thu, Mar 15, 2012 at 17:28, Peter Desmet <peter.desmet@umontreal.camailto:peter.desmet@umontreal.ca> wrote: Hi all,
Since most of the discussion happened on this tdwg-content and tdwg-tag mailing lists already, can't we continue here? I created a link to both from the issue page: http://code.google.com/p/darwincore/issues/detail?id=150#c2
I have been stuck in a meeting all day, while Tim Roberston wrote some convincing arguments against creating a canonicalScientificName term (http://code.google.com/p/darwincore/issues/detail?id=150#c1), as did Rich Pyle (email March 14 17:20 GMT-10:00). I will need some time to think about these. :-) I will try to write a coherent response tomorrow.
Peter
-- Peter Desmet Biodiversity Informatics Manager Canadensys - www.canadensys.nethttp://www.canadensys.net
Université de Montréal Biodiversity Centre 4101 rue Sherbrooke est Montreal, QC, H1X2B2 Canada
Phone: 514-343-6111 #82354tel:514-343-6111%20%2382354 Fax: 514-343-2288tel:514-343-2288 Email: peter.desmet@umontreal.camailto:peter.desmet@umontreal.ca / peter.desmet.cubc@gmail.commailto:peter.desmet.cubc@gmail.com Skype: anderhalv Public profile: http://www.linkedin.com/in/peterdesmet
Hi Chuck and others,
I submitted a formal request [1] to add the term canonicalScientificName to hopefully reach a consensus from the TDWG community: either we add it or we don't. This term keeps popping up and I think it would be good if we reached a formal decision [2]. So I'm all for keeping the discussion open and involving more actual users. Given the reactions (pro and con) it wouldn't help much if I retract my request now I think.
Me personally am very happy with the response I got so far. I now think that the need for the term (with the definition I proposed) is no longer justified, on the condition that we refine the definition for genus, which I also formally requested [3]. Darwin Core is a community standard and I just want to improve it where I can: Is there a way to share canonical names? Is it even necessary? The eventual solution (don't do anything, add canonicalScientificName as is or altered, or change the definition for genus) is a community decision.
Cheers,
Peter
[1] http://code.google.com/p/darwincore/issues/detail?id=150 [2] http://rs.tdwg.org/dwc/terms/history/decisions/index.htm [3] http://code.google.com/p/darwincore/issues/detail?id=151
On Fri, Mar 16, 2012 at 18:18, Chuck Miller Chuck.Miller@mobot.org wrote:
Peter,****
Are you now withdrawing your formal request to add the term canonicalScientificName to DwC? I had forwarded your request to John Wieczorek to notify him of the initiation of the change process on Google Code.****
Frankly, I was hoping for a broader comment period involving more of the actual “users” that have been mentioned in the various threads, particularly more of the plant name users, to put some more balance into the discussion. Then, let the process go through to an up or down decision on the request based on that (hopefully) broader discussion. Discounting hybrid names as “an edge case” seemed a bit edgy and worthy of some more reaction from actual users/consumers of plant name data before being taken as consensus.****
But, are you withdrawing the request? ****
Thanks,****
Chuck****
*From:* tdwg-tag-bounces@lists.tdwg.org [mailto: tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Peter Desmet *Sent:* Friday, March 16, 2012 4:52 PM *To:* Tim Robertson [GBIF] *Cc:* TDWG content mailing list; tdwg-tag@lists.tdwg.org *Subject:* Re: [tdwg-tag] [tdwg-content] canonicalScientificName****
Hi all,****
The reason why I proposed a canonicalScientificName was to make it easier for data users.****
As Tim Robertson points out [1], adding this term will probably make it more complicated, without much benefit. Gregor Hagedorn [2] explains that canonicalScientificName is not a solution for some edge cases (like hybrids) and infraspecific names without a rank marker don't mean much in botany. Rich Pyle [3] points out that more requests will probably be proposed (canonicalScientificNameWithoutRanks, canonicalScientificNameWithoutInfrageneric), demonstrated by Gregor's remark [2].****
I now agree with all of these. The only thing I'd like to refine is the definition for genus [4], for which I issued a request: http://code.google.com/p/darwincore/issues/detail?id=151****
See my argumentation in the link above. Basically: As I explained before (and Markus Döring before me), under the current definition the genus gets populated with the genus name of the accepted taxon for a synonym, while the specificEpithet and infraspecificEpithet are not. I think this is counter intuitive and confusing. I would populate it with the "genus name of the scientificName", which I think is how much people interpret it anyway. Advantages:****
- Agreeing with the refined definition for genus won't affect most of our
applications and data (only for those who cared about putting the accepted genus for synonyms).****
No new term canonicalScientificName.****
Not need to update the useful definition for scientificName (verbose
all the way if you can).****
- Publishers and aggregators *have the option* to provide an easier to
use name via genus, specificEpithet and infraspecificEpithet. None of these have an authorship, so creating a canonicalScientificName under the proposed definition is as easy as TRIM(genus+" "+specificEpithet+" "+infraspecificEpithet).****
- The above statement only applies to genera, species and infraspecific
taxa, but this is the bulk of our data. This method cannot be applied to infrageneric taxa and higher taxa, but as Rich pointed out [3], there are alternative methods for this.****
- Aggregators can ignore genus, specificEpithet and infraspecificEpithet
for heterogenous networks and use parsers to deal with scientificNames. Stripping out the scientificNameAuthorship or using a simple regular expression won't sometimes be enough of course, e.g. "*Calamagrostis* * stricta* (Timm) Koeler subsp. *stricta* (Timm) Koeler var. *borealis*(Laestadius) Hartman" and those pesky hybrids. The good this is that once they have done the work, they can actually express that data in Darwin Core (see point 4).****
- Timon lepidus won't complain [5]. :-)****
Regards,****
Peter****
[1] http://code.google.com/p/darwincore/issues/detail?id=150#c1****
[2] http://code.google.com/p/darwincore/issues/detail?id=150#c3****
[3] http://lists.tdwg.org/pipermail/tdwg-tag/2012-March/002487.html****
[4] http://rs.tdwg.org/dwc/terms/index.htm#genus****
[5] https://plus.google.com/114672072317054763788/posts/Nph2ksggNZW****
On Thu, Mar 15, 2012 at 17:28, Peter Desmet peter.desmet@umontreal.ca wrote:****
Hi all,****
Since most of the discussion happened on this tdwg-content and tdwg-tag mailing lists already, can't we continue here? I created a link to both from the issue page: http://code.google.com/p/darwincore/issues/detail?id=150#c2****
I have been stuck in a meeting all day, while Tim Roberston wrote some convincing arguments against creating a canonicalScientificName term ( http://code.google.com/p/darwincore/issues/detail?id=150#c1), as did Rich Pyle (email March 14 17:20 GMT-10:00). I will need some time to think about these. :-) I will try to write a coherent response tomorrow.****
Peter****
-- Peter Desmet Biodiversity Informatics Manager Canadensys - www.canadensys.net
Université de Montréal Biodiversity Centre 4101 rue Sherbrooke est Montreal, QC, H1X2B2 Canada
Phone: 514-343-6111 #82354 Fax: 514-343-2288 Email: peter.desmet@umontreal.ca / peter.desmet.cubc@gmail.com Skype: anderhalv Public profile: http://www.linkedin.com/in/peterdesmet****
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
participants (6)
-
Chuck Miller
-
Gregor Hagedorn
-
Peter Desmet
-
Roderic Page
-
Simon Chagnoux
-
Tim Robertson [GBIF]