Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]
(I'll send this to tdwg-content rather than tdwg-tag)
We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field
"The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)."
I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard?
On 18/11/2010, at 2:31 PM, Tony.Rees@csiro.au wrote:
Dear TDWG-persons,
I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?
I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.
Any advice from the persons designing or using this field for data exchange would be appreciated.
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
------
Tony, at GBIF we recommend to include the authorship/year for a name to have the most complete full name. This is useful for several reasons: - homonym disambiguation - getting a better idea of your taxonomic concept - not have to deal with autonyms when reassembling an atomised name And even if you omit the authorship from a name you will need some name parsing/cleaning if you are after consistency across various datasets. The infraspecific ranks are treated differently, some people supply more than 2 epithets, some include the subgenus, etc.
I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.
Markus
On Nov 18, 2010, at 4:44, Paul Murray wrote:
(I'll send this to tdwg-content rather than tdwg-tag)
We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field
"The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)."
I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard?
On 18/11/2010, at 2:31 PM, Tony.Rees@csiro.au wrote:
Dear TDWG-persons,
I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?
I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.
Any advice from the persons designing or using this field for data exchange would be appreciated.
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
some quick additions to my previous mail in haste:
I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.
When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier
And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.
Markus
On Nov 18, 2010, at 10:05, Markus Döring wrote:
Tony, at GBIF we recommend to include the authorship/year for a name to have the most complete full name. This is useful for several reasons:
- homonym disambiguation
- getting a better idea of your taxonomic concept
- not have to deal with autonyms when reassembling an atomised name
And even if you omit the authorship from a name you will need some name parsing/cleaning if you are after consistency across various datasets. The infraspecific ranks are treated differently, some people supply more than 2 epithets, some include the subgenus, etc.
I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.
Markus
On Nov 18, 2010, at 4:44, Paul Murray wrote:
(I'll send this to tdwg-content rather than tdwg-tag)
We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field
"The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)."
I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard?
On 18/11/2010, at 2:31 PM, Tony.Rees@csiro.au wrote:
Dear TDWG-persons,
I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?
I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.
Any advice from the persons designing or using this field for data exchange would be appreciated.
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
some quick additions to my previous mail in haste:
I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.
When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier
And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.
This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.
For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us
There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us
To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
And so you need to know a) is the name an autonym? and b) is it botanical?
Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.
And so on and so forth. Lord only knows how virologists name their taxa.
The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.
There are a couple of options so far:
* an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not
* a format string into which the components of the name are substituted. For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.
* an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text. We would use <I> tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus: <span class="boaContent name scientificName trinomialName"> <I class="G scientificNameComponent">Vombatus</I> <I class="S scientificNameComponent">ursinus</I> <I class="SSP epithet scientificNameComponent">ursinus</I> </span>
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
It has just been gently explained to me the Pinus pinus is not an autonym, although Pinus pinus pinus is. I suppose this underscores the point that IT people building systems and webpages out of this data will tend not to get it right if just given the data fields.
On 19/11/2010, at 1:08 PM, Paul Murray wrote:
some quick additions to my previous mail in haste:
I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.
When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier
And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.
This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.
For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us
There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us
To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
And so you need to know a) is the name an autonym? and b) is it botanical?
Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.
And so on and so forth. Lord only knows how virologists name their taxa.
The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.
There are a couple of options so far:
an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not
a format string into which the components of the name are substituted.
For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.
- an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text.
We would use <I> tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus:
<span class="boaContent name scientificName trinomialName"> <I class="G scientificNameComponent">Vombatus</I> <I class="S scientificNameComponent">ursinus</I> <I class="SSP epithet scientificNameComponent">ursinus</I> </span>
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Also gently, botanists generally don't do Pinus pinus or Pinus pinus pinus. We do Pinus patula var. patula (or Pinus patula subsp. patula). These are autonyms that are not published as such but come into existence 'automagically' when another variety or subspecies is described. They do not actually serve any useful purpose other than to alert you that there are other varieties or subspecies in this species to be aware of and that you are not dealing with them in this case.
In the hypothetical instance above, you could assume that 'Pinus patula' referred to Pinus patula var. patula and you might be right. But it might also refer to the the range of variation covered by the other varieties as well. To resolve this you really need some other contextual information such as whether you are dealing with broader concept or the narrower one before or after the other components were excised from or added to the mix.
If you were goign to invent a taxonomic and nomenclatural system from scratch, with the benefit of hindsight and the absence of legacy practice, there is no way on earth you would ever do it like this... :)
jim
On Fri, Nov 19, 2010 at 3:06 PM, Paul Murray pmurray@anbg.gov.au wrote:
It has just been gently explained to me the Pinus pinus is not an autonym, although Pinus pinus pinus is. I suppose this underscores the point that IT people building systems and webpages out of this data will tend not to get it right if just given the data fields.
On 19/11/2010, at 1:08 PM, Paul Murray wrote:
some quick additions to my previous mail in haste:
I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.
When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier
And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.
This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.
For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us
There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us
To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
And so you need to know a) is the name an autonym? and b) is it botanical?
Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.
And so on and so forth. Lord only knows how virologists name their taxa.
The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.
There are a couple of options so far:
an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not
a format string into which the components of the name are substituted.
For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.
- an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text.
We would use <I> tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus:
<span class="boaContent name scientificName trinomialName"> <I class="G scientificNameComponent">Vombatus</I> <I class="S scientificNameComponent">ursinus</I> <I class="SSP epithet scientificNameComponent">ursinus</I> </span>
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following.
For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..."
Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet.
jim
p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit...
On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray pmurray@anbg.gov.au wrote:
zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
Here's something a zoologist has written, though (Cavalier-Smith, 1993):
[Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend."
I'm sure a parser will have fun deducing what of this forms the authority...
Cheers - Tony
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Jim Croft Sent: Friday, 19 November 2010 4:59 PM To: Paul Murray Cc: tdwg-content@lists.tdwg.org List; "Markus Döring (GBIF)" Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]
Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following.
For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..."
Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet.
jim
p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit...
On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray pmurray@anbg.gov.au wrote:
zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963)
Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Fri, Nov 19, 2010 at 5:07 PM, Tony.Rees@csiro.au wrote:
Here's something a zoologist has written, though (Cavalier-Smith, 1993):
Who are these people? And what are they doing on my interweb?
[Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend."
I'm sure a parser will have fun deducing what of this forms the authority...
Well, this parser thinks it has worked out the "name" is Ebridea.. as for the rest? It's just "stuff"... :)
jim _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963)
Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html
having the full name AND the basic parts is quite useful sometimes...
On Nov 19, 2010, at 7:07, Tony.Rees@csiro.au wrote:
Here's something a zoologist has written, though (Cavalier-Smith, 1993):
[Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend."
I'm sure a parser will have fun deducing what of this forms the authority...
Cheers - Tony
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Jim Croft Sent: Friday, 19 November 2010 4:59 PM To: Paul Murray Cc: tdwg-content@lists.tdwg.org List; "Markus Döring (GBIF)" Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]
Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following.
For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..."
Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet.
jim
p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit...
On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray pmurray@anbg.gov.au wrote:
zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.'
- Robert Frost, poet (1874-1963)
Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
All,
Wow, I disappear to Las Vegas for a couple days, and now I don't know what is more surreal: Vegas, or this thread (my money is on Vegas, though I am not a betting man).
Couple points:
From Tony: Well, that sounds fine to me, however you may note that the ICZN Code at least expressly states that authorship is *not* part of the scientific name:
Don't drag the Code in this. The Code doesn't govern DwC terms.
From Jim: For some inexplicable reason zoologists throw away the parentheses and the stuff following.
For some inexplicable reason, botanists feel the need to track ALL new combinations as though they were nomenclatural (rather than taxonomic) events; whereas zoologists more surgically track only the ones that generate homonyms. -)
More from Jim:
I think it is a really bad move to attempt to redefine "name" so as to include the name metadata to achieve some degree of name resolution
I think it's a REALLY bad move to even try to come up with a unified definition of "name" at all in our context. Doing so is an unholy abomination, a lexical atrocity, an affront to logic and an insult the natural order of the cosmos and any deity conceived by humankind. (to coin a phrase)
The First Commandment of biodiversity informatics communication is:
"Thou shalt not useth the unqualified term 'name' with the expectation that he or she upon whose ears (or eyes) it falls will not completely misunderstandth thy point."
Rod wrote:
I'm with Jm. For the love of God let's keep things clean and simple.
If clean and simple is the goal, then DwC:scientificName should be defined as the complete set of textual elements useful for recognizing a unique scientific name (or formula of names, in the case of hybrids). If the name is already parsed in the source database, then populate the record with the parsed elements in their respective DwC terms accordingly, and form scientificName as a reasonably standard(ish) concatenation of the full set of elements, to form a string as just defined. If the name is not already parsed in the source database, then provide the complete text string (as just defined) verbatim.
*THAT* is about as simple as it is going to get, I'm afraid.
But Rod was talking about "fields", so maybe he's talking about a database model, rather than DwC terms (which, I think, this thread is mostly about).
The Model we're taking for GNUB establishes a record for every "NameElement". For example, if there was a usage representing "Centaurea affinis Friv. ssp. affinis var. Affinis", there would be four TaxonNameUsage records (one for each NameElement: "Centaurea" as used at the rank of genus; "affinis" as used at the rank of species; "affinis" as used at the rank of subspecies; and "Affinis" at the rank of variety). We inherit the authorship for each NameElement through its associated Protonym link. In this case, we know it to be an autonym, because each of the last three (infrageneric) NameElements (epithets) happens to share the same Protonym.
Besides the parsed NameElement, GNUB also includes fields (for each NameElement) for:
VerbatimNameString (actual string of characters used to represent the most complete form of the name, inclusive of authorships, prefixes, suffixes, etc., )
TaxonRank (controlled vocabulary)
VerbatimRank (the actual rank they declared it to be within the usage instance, if some obscure-ish rank from the 18th century that is not among, but easily mapped directly to, one of the Controlled Vocabulary items for TaxonRank)
CorrectedNameElement (in case the usage was not in compliance with the Code; such as feminine adjectival epithet combined with a masculine genus, or other such Code-correctable anomalies)
Because the names are fully atomized there can very easily be generated a "standard" name-form, with or without authorship and/or standardized prefixes & suffixes & such.
Simple? HELL no. Powerful? You betcha.
Aloha, Rich
This seems to be one of those threads where we seem hell bent on making things as complicated as possible.
I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine. To a first approximation nobody using any of the databases we construct will care about authorship. If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym) .
For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.
By all means add additional information in other fields, but doesn't
dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758
pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
To complete the circle ...
http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html =
Plant Taxonomic Database Standards No. 3
greg
On 21 November 2010 19:58, Roderic Page r.page@bio.gla.ac.uk wrote:
This seems to be one of those threads where we seem hell bent on making things as complicated as possible.
I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine. To a first approximation nobody using any of the databases we construct will care about authorship. If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym) .
For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.
By all means add additional information in other fields, but doesn't
dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758
pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
-- Greg Whitbread Australian National Botanic Gardens Australian National Herbarium +61 2 62509482 ghw@anbg.gov.au
Van: tdwg-content-bounces@lists.tdwg.org namens greg whitbread Verzonden: zo 21-11-2010 11:22
To complete the circle ...
Plant Taxonomic Database Standards No. 3
*** I have not looked at this in detail, but a truly outrageous error immediately jumps out!, where it says "The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. "
"The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" plus sign) preceding the species epithet"
There is no conceivable ambiguity in "Art. H.1.1. Hybridity is indicated by the use of the multiplication sign × or by the addition of the prefix notho-¹ to the term denoting the rank of the taxon."
There never has been a "(lower case alphabetic x)" allowed, except where there is force majeure. "Rec. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter x (not italicized)."
(BTW, there is no such thing as a "species epithet" in botany; it is a "specific epithet").
Paul van Rijckevorsel
Van: dipteryx@freeler.nl [mailto:dipteryx@freeler.nl] Verzonden: ma 22-11-2010 13:18
Van: tdwg-content-bounces@lists.tdwg.org namens greg whitbread Verzonden: zo 21-11-2010 11:22
To complete the circle ...
Plant Taxonomic Database Standards No. 3
I have not looked at this in detail, but a truly outrageous error immediately jumps out!, where it says "The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. "
"The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" plus sign) preceding the species epithet"
There is no conceivable ambiguity in "Art. H.1.1. Hybridity is indicated by the use of the multiplication sign × or by the addition of the prefix notho-¹ to the term denoting the rank of the taxon."
There never has been a "(lower case alphabetic x)" allowed, except where there is force majeure. "Rec. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter x (not italicized)."
(BTW, there is no such thing as a "species epithet" in botany; it is a "specific epithet").
Paul van Rijckevorsel
*** After looking at this paper a little more closely I see this is not the brightest thing I could have said.
There are three main issues with this paper (besides a lack of rigour in the use of terms): 1) it is fifteen to twenty years out of date (it is dated 1994), 2) it represents a meeting of three worlds a) name strings found in databases b) names governed by the ICBN and ICNCP c) the standards applied by the TDWG and it is not always clear of what item or what usage belongs to what world, 3) it is a little confused in its focus (what it does deal with and what it does not deal with).
Paul van Rijckevorsel
Van: tdwg-content-bounces@lists.tdwg.org namens Roderic Page Verzonden: zo 21-11-2010 9:58
[...]
I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine.
*** And so they should, as that is how a system of nomenclature is designed to work, no matter what Code applies. * * *
For all the potential ambiguity, people will rely on naked scientific names,
*** The only ambiguity here is that the circumscription / definition of the taxon is not mentioned (this is fine where it is automatically implied, but often this is not the case). The nomenclatural author is just a (fleeting) detail, to be adjusted as needed. * * *
[...] so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.
By all means add additional information in other fields, but doesn't
dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758
pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data.
*** It looks that way to me, also.
Paul van Rijckevorsel
My comments inline:
On 11/21/2010 4:58 AM, Roderic Page wrote:
This seems to be one of those threads where we seem hell bent on making things as complicated as possible.
It's probably more accurate to say that, for better or worse, there are multiple discussions going on. One set of issues relates to the DarwinCore representation of names, and at least one other has to do with use cases that will - arguably - require more semantic resolution than that offered by names. It seemed to me that Bob's comments touched upon both sets of issues.
I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine.
I think that's a contentious, and possibly not agenda-free, view. It reflects the "reluctance to go deeper" that I mentioned.
To a first approximation nobody using any of the databases we construct will care about authorship.
Also not necessarily a given, at least not in any of the major use cases that we struggled with in the SEEK project (e.g. predict future mammal species distributions in the Americas based on MaNIS records and climate modeling).
If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym).
As Peterson & Navarro-Sigüenza (1999) show in at least one restricted case, we are in trouble making inferences about conservation priorities based just on binomials.
For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.
As an account of common practice the premise is accurate, and the conclusions based just on that premise are well taken too. But where's the other part that covers cases in which peoples' reliance on naked scientific names is problematic?
By all means add additional information in other fields, but doesn't
dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758
pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre.
In one of the most thorough analyses of this issue to date, Geoffroy and Berendsohn (2003) found that the name and concept of about 1500 German moss taxa remained stable in only 13.3% of the examined cases; spanning throughout a dozen treatments from 1927-2000. That's one of the most dramatic results from an essentially non-replicated study, but I find it hard to dismiss when we talk about named-based data labeling and integration.
Geoffroy, M., Berendsohn, W.G. 2003. The concept problem in taxonomy: importance, components, approaches. Schrift. Veget. 39, 5-14.
Regards,
Nico
The single most important value shouldn't be one people have to construct from the data.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Dear all,
Nico Franz just wrote:
It's probably more accurate to say that, for better or worse, there are multiple discussions going on.
Correct - and returning to my original question, there appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?
Regards - Tony
Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.
On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:
Correct - and returning to my original question, there appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.
Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:
dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst
If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.
That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.
Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570
dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923
Botanic example: http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800e92ea-...
dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter
dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?
Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName
Regards - Tony
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Van: tdwg-content-bounces@lists.tdwg.org namens Markus Döring Verzonden: ma 22-11-2010 12:03
[...]
That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and they dont have to use a rank marker.
[...]
***
From a nomenclatural point of view this can be argued either
way. The scientific name is indeed "Lactuca macrophylla uralensis" and if there are two such names, based on different types, these are homonyms (irrespective of rank). However, the name cannot be rendered this way, as "Lactuca macrophylla subsp. uralensis" and "Lactuca macrophylla var. uralensis" are different things.
Also keep in mind that the same issue can also be found for subdivisions of genera, e.g. "Euphorbia subg. Euphorbia", etc.
Paul van Rijckevorsel
Thanks, Markus.
Just one comment on a paragraph where I may not have expressed myself too clearly:
You wrote:
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
What I was saying was that e.g.
dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies
may work alright as an alternative to the suggested canonicalName,
however the following has no workaround:
dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist
(same for most other intermediate ranks)
i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.
Regards - Tony
________________________________________ From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.
On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:
Correct - and returning to my original question, there appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.
Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:
dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst
If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.
That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.
Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570
dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923
Botanic example: http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800e92ea-...
dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter
dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?
Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName
Regards - Tony
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
What I was saying was that e.g.
dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies
may work alright as an alternative to the suggested canonicalName,
however the following has no workaround:
dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist
(same for most other intermediate ranks)
Ah, perfectly right of course!
Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?
i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.
Regards - Tony
From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.
On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:
Correct - and returning to my original question, there appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.
Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:
dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst
If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.
That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.
Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570
dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923
Botanic example: http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800e92ea-...
dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter
dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?
Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName
Regards - Tony
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
TCS includes the element "Uninomial" (under the "CanonicalName" node), to address all names consisting of a single "part" (=single "NameElement" in GNUB-speak); including names at the rank of genus. I don't rememeber exactly whether names at the rank of genus are supposed to be represented in both "Uninomial" and "Genus", but I guess it doesn't really matter.
The addition of "Uninomial" to DwC would effectively solve the problem of representing names not among the "main" ranks.
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: Monday, November 22, 2010 4:30 AM To: Tony.Rees@csiro.au Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
Im not sure if I correctly understand. dwc:scientificName
is used for
ANY rank, not only infrageneric ones. You dont have to use
the higher
taxon terms at all if you already use the adjacency format
via DwC:parentNameUsageID.
What I was saying was that e.g.
dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies
may work alright as an alternative to the suggested canonicalName,
however the following has no workaround:
dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this
model, but it
does not exist
(same for most other intermediate ranks)
Ah, perfectly right of course!
Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?
i.e. there is no dwc pre-formatted element for intermediate
ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.
Regards - Tony
From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of
authorship in DwC scientificName: good or bad?
Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName
term, see below.
On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:
Correct - and returning to my original question, there
appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical
name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name"
info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple
binomials I think I have stressed before.
Another confusion that should need clarification is
actually the role of the higher taxon terms in dwc - you touch on it below too.
In case of synonyms does dwc:genus actually hold the genus
of the synonym name or is it the accepted genus the synonym is classified to?
If you look at the term definition it says: "The full
scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:
dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst
If we accept this view, then there really is no way to
express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.
That leads me to another question. Does the canonical name
string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.
Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570
dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii
marisalbi
dwc:scientifcNameAuthorship=Berg, 1923
Botanic example:
http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800
e92ea-496b-4368-abf9-9ae12f7f40d1/synonymy
dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter
dwc:scientifcName=Mulgedium macrophyllum var. hispidum
(Ledeb.) Korsh.
dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym
dwc:acceptedNameUsage=Lactuca
macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter
In my initial email my thought was that (1) would be an
acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for
export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Tr eatment,_normalised) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the
DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
Im not sure if I correctly understand. dwc:scientificName
is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
So, I am now beginning to think that the case for a new
element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this
route, and adding the relevant additional element to DwC?
Tony, I do agree and also think this solves all problems
discussed here so far!
As a recommendation both scientificName and canonicalName
Regards - Tony
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rich, thanks for the suggestion.
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
Regards - Tony
________________________________________ From: Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 5:35 AM To: 'Markus Döring'; Rees, Tony (CMAR, Hobart) Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
TCS includes the element "Uninomial" (under the "CanonicalName" node), to address all names consisting of a single "part" (=single "NameElement" in GNUB-speak); including names at the rank of genus. I don't rememeber exactly whether names at the rank of genus are supposed to be represented in both "Uninomial" and "Genus", but I guess it doesn't really matter.
The addition of "Uninomial" to DwC would effectively solve the problem of representing names not among the "main" ranks.
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: Monday, November 22, 2010 4:30 AM To: Tony.Rees@csiro.au Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
Im not sure if I correctly understand. dwc:scientificName
is used for
ANY rank, not only infrageneric ones. You dont have to use
the higher
taxon terms at all if you already use the adjacency format
via DwC:parentNameUsageID.
What I was saying was that e.g.
dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies
may work alright as an alternative to the suggested canonicalName,
however the following has no workaround:
dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this
model, but it
does not exist
(same for most other intermediate ranks)
Ah, perfectly right of course!
Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?
i.e. there is no dwc pre-formatted element for intermediate
ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.
Regards - Tony
From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of
authorship in DwC scientificName: good or bad?
Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName
term, see below.
On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:
Correct - and returning to my original question, there
appear to be 2 contrasting views:
(1) Include authority and other strictly "non canonical
name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.
(2) Omit authority and other strictly "non canonical name"
info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).
Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple
binomials I think I have stressed before.
Another confusion that should need clarification is
actually the role of the higher taxon terms in dwc - you touch on it below too.
In case of synonyms does dwc:genus actually hold the genus
of the synonym name or is it the accepted genus the synonym is classified to?
If you look at the term definition it says: "The full
scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:
dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst
If we accept this view, then there really is no way to
express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.
That leads me to another question. Does the canonical name
string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.
Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570
dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii
marisalbi
dwc:scientifcNameAuthorship=Berg, 1923
Botanic example:
http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800
e92ea-496b-4368-abf9-9ae12f7f40d1/synonymy
dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter
dwc:scientifcName=Mulgedium macrophyllum var. hispidum
(Ledeb.) Korsh.
dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym
dwc:acceptedNameUsage=Lactuca
macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter
In my initial email my thought was that (1) would be an
acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:
Currently I am preparing around 1.9 million records for
export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Tr eatment,_normalised) reads as follows (paraphrased from the relevant row in my csv file):
DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN
This follows model (2) above.
Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,
DwC:genus=Philander DwC:specificEpithet=opossum
and concatenate (add in) the authority into the
DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.
Im not sure if I correctly understand. dwc:scientificName
is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.
So, I am now beginning to think that the case for a new
element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.
If others agree, is there then a case for going this
route, and adding the relevant additional element to DwC?
Tony, I do agree and also think this solves all problems
discussed here so far!
As a recommendation both scientificName and canonicalName
Regards - Tony
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
more informative might be
fullScientificNameStringWithAuthorshipIfYouHaveItAndTooBadForYouIfYouDont
more normative might be
fullScientificNameStringWithAuthorshipIfYouHaveItAndShameOnYouIfYouDont
On Mon, Nov 22, 2010 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Correction on my previous post...
I said:
fullScientificNameStringWithAuthorship
What I meant was:
fullScientificNameStringWithAuthorshipIfYouHaveIt
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
more informative might be
fullScientificNameStringWithAuthorshipIfYouHaveItAndTooBadForY ouIfYouDont
Yes, but the key question is: "TooBad" for whom?
For the Provider? (i.e., if you don't have the Authorship information, then don't even bother giving us the record)
Or, for the User? (i.e., here's all I got; so too bad if you also wanted Authorship details).
more normative might be
fullScientificNameStringWithAuthorshipIfYouHaveItAndShameOnYou IfYouDont
Yes, but as I imagine many of these name-strings will be emerging from BHL OCR text, in most cases we'll only be casting shame on people who are no longer living.
Rich
Hi Rich, all,
You wrote:
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub-optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...
Regards - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.
1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.
2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship
3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.
It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like
dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
I'd know what to do then
DR
On Nov 22, 2010, at 11:18 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Hi Rich, all,
You wrote: .
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...
Regards - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
David Remsen wrote:
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered...
Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess?
Regards - Tony
________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.
1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.
2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship
3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.
It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like
dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
I'd know what to do then
DR
On Nov 22, 2010, at 11:18 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Hi Rich, all,
You wrote: .
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...
Regards - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Tony
I did indeed mean that scientificName and authorship could be used in the following way
1. "Agalinis purpurea" -> scientificName ("Agalinis purpurea") - where a canonical form of the name with no authorship in the source data
2. "Agalinis purpurea (L.) Pennell" -> scientificName ("Agalinis purpurea (L.) Pennell" ) - where a unparsed name+author is in the source data
3. "Agalinis purpurea" AND "(L.) Pennell" -> scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) Pennell") - where a semi-parsed name + author is in the source data
4. "Agalinis" AND purpurea" AND "(L.) Pennell" > scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) - where a fully atomised name is in the source data and the 'name' parts concatenated to make a proper canonical name.
Cases 3 and 4 require modification of the definition at http://rs.tdwg.org/dwc/terms/index.htm#scientificName to be something like
"The full scientific name, which may include authorship and date information if known..." with the implicit intention that it is not REQUIRED to parse or semi-parse an unparsed name in order to properly share it.
David
On Nov 23, 2010, at 12:35 PM, Tony.Rees@csiro.au wrote:
David Remsen wrote:
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered...
Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess?
Regards - Tony
From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.
- Completely unparsed where the entire verbose name text is in a
single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.
- Semi-parsed where the canonical name is separated from the
authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship
- Fully parsed into atoms (genus, specific epithet, infraspecific
rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.
It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like
dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
I'd know what to do then
DR
On Nov 22, 2010, at 11:18 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Hi Rich, all,
You wrote: .
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...
Regards - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi David,
It seems to me that your suggestion is still not quite ideal, in that sometimes just the dwc:scientificName element will be picked up and passed around and the content will not be consistent between those suppliers who concatenate the available authority info and those who do not. That suggests to me that an extra field for known canonicalName if this can be supplied is still desirable - but I am not sure if I am alone in thinking this...
Regards - Tony
________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 11:15 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
Tony
I did indeed mean that scientificName and authorship could be used in the following way
1. "Agalinis purpurea" -> scientificName ("Agalinis purpurea") - where a canonical form of the name with no authorship in the source data
2. "Agalinis purpurea (L.) Pennell" -> scientificName ("Agalinis purpurea (L.) Pennell" ) - where a unparsed name+author is in the source data
3. "Agalinis purpurea" AND "(L.) Pennell" -> scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) Pennell") - where a semi-parsed name + author is in the source data
4. "Agalinis" AND purpurea" AND "(L.) Pennell" > scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) - where a fully atomised name is in the source data and the 'name' parts concatenated to make a proper canonical name.
Cases 3 and 4 require modification of the definition at http://rs.tdwg.org/dwc/terms/index.htm#scientificName to be something like
"The full scientific name, which may include authorship and date information if known..." with the implicit intention that it is not REQUIRED to parse or semi-parse an unparsed name in order to properly share it.
David
On Nov 23, 2010, at 12:35 PM, Tony.Rees@csiro.au wrote:
David Remsen wrote:
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered...
Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess?
Regards - Tony
From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.
- Completely unparsed where the entire verbose name text is in a
single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.
- Semi-parsed where the canonical name is separated from the
authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship
- Fully parsed into atoms (genus, specific epithet, infraspecific
rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.
It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like
dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship
I'd know what to do then
DR
On Nov 22, 2010, at 11:18 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Hi Rich, all,
You wrote: .
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...
Regards - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...
Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.
What we seem to be arguing about now is how many different forms of a "formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be split into two different terms:
fullScientificNameStringWithAuthorship And verbatimNameString
The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.
The second would be the literal text string as it appeared in the original source.
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.
(I think we need to add the rank string. Same epithet for subsp. and var. is not unusual in Botany/Mycology. Gregor)
Candidly, I would love to get rid of the infraspecific rank thing in botany and work with tri- or perhaps even poly- nominals, but history, practice and the code are against it.
I don't like to say nice things about zoologists, but I think they got this one right. :)
Jim
On Friday, November 19, 2010, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.
(I think we need to add the rank string. Same epithet for subsp. and var. is not unusual in Botany/Mycology. Gregor) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
participants (13)
-
"Markus Döring (GBIF)"
-
Bob Morris
-
David Remsen (GBIF)
-
dipteryx@freeler.nl
-
greg whitbread
-
Gregor Hagedorn
-
Jim Croft
-
Markus Döring
-
Nico Franz
-
Paul Murray
-
Richard Pyle
-
Roderic Page
-
Tony.Rees@csiro.au