Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Paul Murray

17 Nov 2010 17 Nov '10

19:44

(I'll send this to tdwg-content rather than tdwg-tag) We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field "The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)." I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard? On 18/11/2010, at 2:31 PM, <Tony.Rees@csiro.au> wrote:

...

Dear TDWG-persons,

I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?

I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.

Any advice from the persons designing or using this field for data exchange would be appreciated.

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. ------

Attachments:

attachment.html (text/html — 10.2 KB)

Show replies by date

Markus Döring

18 Nov 18 Nov

01:05

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Tony, at GBIF we recommend to include the authorship/year for a name to have the most complete full name. This is useful for several reasons: - homonym disambiguation - getting a better idea of your taxonomic concept - not have to deal with autonyms when reassembling an atomised name And even if you omit the authorship from a name you will need some name parsing/cleaning if you are after consistency across various datasets. The infraspecific ranks are treated differently, some people supply more than 2 epithets, some include the subgenus, etc. I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more. Markus On Nov 18, 2010, at 4:44, Paul Murray wrote:

...

(I'll send this to tdwg-content rather than tdwg-tag)

We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field

"The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)."

I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard?

On 18/11/2010, at 2:31 PM, <Tony.Rees@csiro.au> wrote:

...
Dear TDWG-persons,

I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?

I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.

Any advice from the persons designing or using this field for data exchange would be appreciated.

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email. ------ _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

"Markus Döring (GBIF)"

02:07

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

some quick additions to my previous mail in haste: I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core. When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms. Markus On Nov 18, 2010, at 10:05, Markus Döring wrote:

...

Tony, at GBIF we recommend to include the authorship/year for a name to have the most complete full name. This is useful for several reasons: - homonym disambiguation - getting a better idea of your taxonomic concept - not have to deal with autonyms when reassembling an atomised name And even if you omit the authorship from a name you will need some name parsing/cleaning if you are after consistency across various datasets. The infraspecific ranks are treated differently, some people supply more than 2 epithets, some include the subgenus, etc.

I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.

Markus

On Nov 18, 2010, at 4:44, Paul Murray wrote:

...
(I'll send this to tdwg-content rather than tdwg-tag)

We are sorting through this issue now. The TDWG vocabulary defines a property "nameComplete" which is the name without authority. It states that the name with authority belongs in a dcterms:title field

"The complete uninomial, binomial or trinomial name without any authority or year components. Every TaxonName should have a DublinCore:title property that contains the complete name string including authors and year (where appropriate)."

I'm a tad uneasy about this: a vocabulary expressing an opinion on fields that are outside of its own uri namespace. What if my name objects are more than simply names, and I want the dcterms:title to reflect data that the tdwg vocabulary doesn't know about? I might want my title, for instance, to announce that a record is a draft or test record. In particular, our titles will be including the nomenclatural status ("nom. cons." etc) - does that mean that we are not conforming to the standard?

On 18/11/2010, at 2:31 PM, <Tony.Rees@csiro.au> wrote:

...
Dear TDWG-persons,

I note that DwC "scientificName" as defined at http://rs.tdwg.org/dwc/terms/#scientificName is supposed to include authorship, however this can also be supplied separately in the field "scientificNameAuthorship". Under that scenario, should authorship be also included in the scientificName field, or omitted?

I can see arguments either way for this - if the authorship is included in the same field as the rest of the scientific name, then that single value is more meaningful and better for e.g. homonym disambiguation. On the other hand, it then requires parsing to get the scientific name without the authority, which if done incorrectly could introduce errors.

Any advice from the persons designing or using this field for data exchange would be appreciated.

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email. ------ _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Paul Murray

18:08

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

...

some quick additions to my previous mail in haste:

I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.

When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier

And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.

This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial. For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus And so you need to know a) is the name an autonym? and b) is it botanical? Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc. And so on and so forth. Lord only knows how virologists name their taxa. The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data. There are a couple of options so far: * an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not * a format string into which the components of the name are substituted. For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats. * an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text. We would use tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus: Vombatus ursinus ursinus If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

Paul Murray

20:06

New subject: Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

It has just been gently explained to me the Pinus pinus is not an autonym, although Pinus pinus pinus is. I suppose this underscores the point that IT people building systems and webpages out of this data will tend not to get it right if just given the data fields. On 19/11/2010, at 1:08 PM, Paul Murray wrote:

...

...
some quick additions to my previous mail in haste:

I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.

When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier

And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.

This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.

For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us

There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us

To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus

And so you need to know a) is the name an autonym? and b) is it botanical?

Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.

And so on and so forth. Lord only knows how virologists name their taxa.

The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.

There are a couple of options so far:

* an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not

* a format string into which the components of the name are substituted. For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.

* an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text. We would use tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus: Vombatus ursinus ursinus

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

Jim Croft

21:17

New subject: Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Also gently, botanists generally don't do Pinus pinus or Pinus pinus pinus. We do Pinus patula var. patula (or Pinus patula subsp. patula). These are autonyms that are not published as such but come into existence 'automagically' when another variety or subspecies is described. They do not actually serve any useful purpose other than to alert you that there are other varieties or subspecies in this species to be aware of and that you are not dealing with them in this case. In the hypothetical instance above, you could assume that 'Pinus patula' referred to Pinus patula var. patula and you might be right. But it might also refer to the the range of variation covered by the other varieties as well. To resolve this you really need some other contextual information such as whether you are dealing with broader concept or the narrower one before or after the other components were excised from or added to the mix. If you were goign to invent a taxonomic and nomenclatural system from scratch, with the benefit of hindsight and the absence of legacy practice, there is no way on earth you would ever do it like this... :) jim On Fri, Nov 19, 2010 at 3:06 PM, Paul Murray <pmurray@anbg.gov.au> wrote:

...

It has just been gently explained to me the Pinus pinus is not an autonym, although Pinus pinus pinus is. I suppose this underscores the point that IT people building systems and webpages out of this data will tend not to get it right if just given the data fields.

On 19/11/2010, at 1:08 PM, Paul Murray wrote:

...
...
some quick additions to my previous mail in haste:

I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.

When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier

And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.

This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.

For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us

There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us

To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus

And so you need to know a) is the name an autonym? and b) is it botanical?

Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.

And so on and so forth. Lord only knows how virologists name their taxa.

The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.

There are a couple of options so far:

* an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not

* a format string into which the components of the name are substituted. For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.

* an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text. We would use tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus: Vombatus ursinus ursinus 

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963) Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html

Jim Croft

21:59

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following. For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..." Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet. jim p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit... On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray <pmurray@anbg.gov.au> wrote:

...

zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus

Tony.Rees＠csiro.au

22:07

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Here's something a zoologist has written, though (Cavalier-Smith, 1993): [Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend." I'm sure a parser will have fun deducing what of this forms the authority... Cheers - Tony

...

-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Jim Croft Sent: Friday, 19 November 2010 4:59 PM To: Paul Murray Cc: tdwg-content@lists.tdwg.org List; "Markus Döring (GBIF)" Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following.

For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..."

Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet.

jim

p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit...

On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray <pmurray@anbg.gov.au> wrote:

...
zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus

-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963)

Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Jim Croft

22:14

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

On Fri, Nov 19, 2010 at 5:07 PM, <Tony.Rees@csiro.au> wrote:

...

Here's something a zoologist has written, though (Cavalier-Smith, 1993):

Who are these people? And what are they doing on my interweb?

...

[Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend."

I'm sure a parser will have fun deducing what of this forms the authority...

Well, this parser thinks it has worked out the "name" is Ebridea.. as for the rest? It's just "stuff"... :) jim _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963) Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html

"Markus Döring (GBIF)"

19 Nov 19 Nov

00:00

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

having the full name AND the basic parts is quite useful sometimes... On Nov 19, 2010, at 7:07, Tony.Rees@csiro.au wrote:

...

Here's something a zoologist has written, though (Cavalier-Smith, 1993):

[Class] "Ebridea Lemmermann, 1901 emend. Deflandre, 1936 stat. nov. Loeblich III, 1970 orthog. emend."

I'm sure a parser will have fun deducing what of this forms the authority...

Cheers - Tony

...
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Jim Croft Sent: Friday, 19 November 2010 4:59 PM To: Paul Murray Cc: tdwg-content@lists.tdwg.org List; "Markus Döring (GBIF)" Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Just quickly Paul, botanist would *never* say Pinus L. pinus, or Pinus pinus pinus L. If we needed the author string (and I admit we grossly overuse it where it is not necessary, probably becasue it makes things look scientific and important) we would go for something like Pinus patula (personWhoCreatedTheEpithet) personWhoMovedTheEpithetIntoThisGenus. For some inexplicable reason zoologists throw away the parentheses and the stuff following.

For an autonum, from the Glossary in the 2006 botanical code, p. 484 " autonym: ... specific epithet repeated without an author citation as the final epithet in the name of ... an infraspecific taxon name that included the type of the adopted, legitimate name of the ... species ..."

Thus, if you wanted to render an autonymic name with author, it would be something like: Pinus patula authorString var. patula The autonymic infraspecies epithet appears never to have an author; the authorship is implied from the authorship of the species epithet.

jim

p.s. also, avoid modelling hybrids and hybrid formulae - therein lies madness and putrifaction of the spirit...

On Fri, Nov 19, 2010 at 1:08 PM, Paul Murray <pmurray@anbg.gov.au> wrote:

...
zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus

-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963)

Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Richard Pyle

20 Nov 20 Nov

18:15

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

All, Wow, I disappear to Las Vegas for a couple days, and now I don't know what is more surreal: Vegas, or this thread (my money is on Vegas, though I am not a betting man). Couple points:

...

From Tony: Well, that sounds fine to me, however you may note that the ICZN Code at least expressly states that authorship is *not* part of the scientific name:

Don't drag the Code in this. The Code doesn't govern DwC terms.

...

From Jim: For some inexplicable reason zoologists throw away the parentheses and the stuff following.

For some inexplicable reason, botanists feel the need to track ALL new combinations as though they were nomenclatural (rather than taxonomic) events; whereas zoologists more surgically track only the ones that generate homonyms. -) More from Jim:

...

I think it is a really bad move to attempt to redefine "name" so as to include the name metadata to achieve some degree of name resolution

I think it's a REALLY bad move to even try to come up with a unified definition of "name" at all in our context. Doing so is an unholy abomination, a lexical atrocity, an affront to logic and an insult the natural order of the cosmos and any deity conceived by humankind. (to coin a phrase) The First Commandment of biodiversity informatics communication is: "Thou shalt not useth the unqualified term 'name' with the expectation that he or she upon whose ears (or eyes) it falls will not completely misunderstandth thy point." Rod wrote:

...

I'm with Jm. For the love of God let's keep things clean and simple.

If clean and simple is the goal, then DwC:scientificName should be defined as the complete set of textual elements useful for recognizing a unique scientific name (or formula of names, in the case of hybrids). If the name is already parsed in the source database, then populate the record with the parsed elements in their respective DwC terms accordingly, and form scientificName as a reasonably standard(ish) concatenation of the full set of elements, to form a string as just defined. If the name is not already parsed in the source database, then provide the complete text string (as just defined) verbatim. *THAT* is about as simple as it is going to get, I'm afraid. But Rod was talking about "fields", so maybe he's talking about a database model, rather than DwC terms (which, I think, this thread is mostly about). The Model we're taking for GNUB establishes a record for every "NameElement". For example, if there was a usage representing "Centaurea affinis Friv. ssp. affinis var. Affinis", there would be four TaxonNameUsage records (one for each NameElement: "Centaurea" as used at the rank of genus; "affinis" as used at the rank of species; "affinis" as used at the rank of subspecies; and "Affinis" at the rank of variety). We inherit the authorship for each NameElement through its associated Protonym link. In this case, we know it to be an autonym, because each of the last three (infrageneric) NameElements (epithets) happens to share the same Protonym. Besides the parsed NameElement, GNUB also includes fields (for each NameElement) for: VerbatimNameString (actual string of characters used to represent the most complete form of the name, inclusive of authorships, prefixes, suffixes, etc., ) TaxonRank (controlled vocabulary) VerbatimRank (the actual rank they declared it to be within the usage instance, if some obscure-ish rank from the 18th century that is not among, but easily mapped directly to, one of the Controlled Vocabulary items for TaxonRank) CorrectedNameElement (in case the usage was not in compliance with the Code; such as feminine adjectival epithet combined with a masculine genus, or other such Code-correctable anomalies) Because the names are fully atomized there can very easily be generated a "standard" name-form, with or without authorship and/or standardized prefixes & suffixes & such. Simple? HELL no. Powerful? You betcha. Aloha, Rich

Roderic Page

21 Nov 21 Nov

00:58

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

This seems to be one of those threads where we seem hell bent on making things as complicated as possible. I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine. To a first approximation nobody using any of the databases we construct will care about authorship. If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym) . For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake. By all means add additional information in other fields, but doesn't dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758 pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data. Regards Rod --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

greg whitbread

02:22

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

To complete the circle ... http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html = Plant Taxonomic Database Standards No. 3 greg On 21 November 2010 19:58, Roderic Page <r.page@bio.gla.ac.uk> wrote:

...

This seems to be one of those threads where we seem hell bent on making things as complicated as possible.

I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine. To a first approximation nobody using any of the databases we construct will care about authorship. If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym) .

For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.

By all means add additional information in other fields, but doesn't

dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758

pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data.

Regards

Rod

--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email.

-- Greg Whitbread Australian National Botanic Gardens Australian National Herbarium +61 2 62509482 ghw@anbg.gov.au

dipteryx＠freeler.nl

22 Nov 22 Nov

04:18

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Van: tdwg-content-bounces@lists.tdwg.org namens greg whitbread Verzonden: zo 21-11-2010 11:22

...

To complete the circle ...

...

http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html =

...

Plant Taxonomic Database Standards No. 3

*** I have not looked at this in detail, but a truly outrageous error immediately jumps out!, where it says "The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. " "The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" plus sign) preceding the species epithet" There is no conceivable ambiguity in "Art. H.1.1. Hybridity is indicated by the use of the multiplication sign × or by the addition of the prefix notho-¹ to the term denoting the rank of the taxon." There never has been a "(lower case alphabetic x)" allowed, except where there is force majeure. "Rec. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter x (not italicized)." (BTW, there is no such thing as a "species epithet" in botany; it is a "specific epithet"). Paul van Rijckevorsel

dipteryx＠freeler.nl

23 Nov 23 Nov

00:41

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Van: dipteryx@freeler.nl [mailto:dipteryx@freeler.nl] Verzonden: ma 22-11-2010 13:18

...

Van: tdwg-content-bounces@lists.tdwg.org namens greg whitbread Verzonden: zo 21-11-2010 11:22

...

...
To complete the circle ...

...

...
http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html =

...

...
Plant Taxonomic Database Standards No. 3

...

*** I have not looked at this in detail, but a truly outrageous error immediately jumps out!, where it says "The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. "

...

"The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" plus sign) preceding the species epithet"

...

There is no conceivable ambiguity in "Art. H.1.1. Hybridity is indicated by the use of the multiplication sign × or by the addition of the prefix notho-¹ to the term denoting the rank of the taxon."

...

There never has been a "(lower case alphabetic x)" allowed, except where there is force majeure. "Rec. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter x (not italicized)."

...

(BTW, there is no such thing as a "species epithet" in botany; it is a "specific epithet").

...

Paul van Rijckevorsel

*** After looking at this paper a little more closely I see this is not the brightest thing I could have said. There are three main issues with this paper (besides a lack of rigour in the use of terms): 1) it is fifteen to twenty years out of date (it is dated 1994), 2) it represents a meeting of three worlds a) name strings found in databases b) names governed by the ICBN and ICNCP c) the standards applied by the TDWG and it is not always clear of what item or what usage belongs to what world, 3) it is a little confused in its focus (what it does deal with and what it does not deal with). Paul van Rijckevorsel

dipteryx＠freeler.nl

21 Nov 21 Nov

03:25

New subject: [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad? [SEC=UNCLASSIFIED]

Van: tdwg-content-bounces@lists.tdwg.org namens Roderic Page Verzonden: zo 21-11-2010 9:58 [...]

...

I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine.

*** And so they should, as that is how a system of nomenclature is designed to work, no matter what Code applies. * * *

...

For all the potential ambiguity, people will rely on naked scientific names,

*** The only ambiguity here is that the circumscription / definition of the taxon is not mentioned (this is fine where it is automatically implied, but often this is not the case). The nomenclatural author is just a (fleeting) detail, to be adjusted as needed. * * *

...

[...] so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake.

...

By all means add additional information in other fields, but doesn't

...

dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758

...

pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. The single most important value shouldn't be one people have to construct from the data.

*** It looks that way to me, also. Paul van Rijckevorsel

Nico Franz

16:37

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

My comments inline: On 11/21/2010 4:58 AM, Roderic Page wrote:

...

This seems to be one of those threads where we seem hell bent on making things as complicated as possible. It's probably more accurate to say that, for better or worse, there are multiple discussions going on. One set of issues relates to the DarwinCore representation of names, and at least one other has to do with use cases that will - arguably - require more semantic resolution than that offered by names. It seemed to me that Bob's comments touched upon both sets of issues. I think Bob Morris was pointing out, in the vast majority of cases biologists use binomials without author names quite happily, and manage to get by just fine. I think that's a contentious, and possibly not agenda-free, view. It reflects the "reluctance to go deeper" that I mentioned. To a first approximation nobody using any of the databases we construct will care about authorship. Also not necessarily a given, at least not in any of the major use cases that we struggled with in the SEEK project (e.g. predict future mammal species distributions in the Americas based on MaNIS records and climate modeling). If they did, we'd be in trouble, because our databases represent this in various ways (comma after author name versus no comma), and some have invented spurious authorships based on chresonyms (that is, the "authorship" is someone who used the name, not the original author, see http://en.wikipedia.org/wiki/Chresonym). As Peterson & Navarro-Sigüenza (1999) show in at least one restricted case, we are in trouble making inferences about conservation priorities based just on binomials. For all the potential ambiguity, people will rely on naked scientific names, so it seems to me to be obvious that anybody exporting data in this area needs to provide a field that contains just the name. Failure to do this makes consuming the data harder than it needs to be, and that would be a mistake. As an account of common practice the premise is accurate, and the conclusions based just on that premise are well taken too. But where's the other part that covers cases in which peoples' reliance on naked scientific names is problematic? By all means add additional information in other fields, but doesn't

dwc:scientificName=Philander opossum dwc:scientificNameAuthorship=Linnaeus, 1758

pretty much cover what most people need? The vast majority of people consuming data will want just the name, so make that front and centre. In one of the most thorough analyses of this issue to date, Geoffroy and Berendsohn (2003) found that the name and concept of about 1500 German moss taxa remained stable in only 13.3% of the examined cases; spanning throughout a dozen treatments from 1927-2000. That's one of the most dramatic results from an essentially non-replicated study, but I find it hard to dismiss when we talk about named-based data labeling and integration.

Geoffroy, M., Berendsohn, W.G. 2003. The concept problem in taxonomy: importance, components, approaches. Schrift. Veget. 39, 5-14. Regards, Nico

...

The single most important value shouldn't be one people have to construct from the data.

Regards

Rod

--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Tony.Rees＠csiro.au

18:50

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Dear all, Nico Franz just wrote:

...

It's probably more accurate to say that, for better or worse, there are multiple discussions going on.

Correct - and returning to my original question, there appear to be 2 contrasting views: (1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc. (2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct). In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case: Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file): DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN This follows model (2) above. Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements, DwC:genus=Philander DwC:specificEpithet=opossum and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see. So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above. If others agree, is there then a case for going this route, and adding the relevant additional element to DwC? Regards - Tony

Markus Döring

22 Nov 22 Nov

03:03

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below. On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:

...

Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before. Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example: dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case. That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker. Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570 dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923 Botanic example: http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800e92ea-... dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter

...

In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):

DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

DwC:genus=Philander DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

...

So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName

...

Regards - Tony

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

dipteryx＠freeler.nl

04:01

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Van: tdwg-content-bounces@lists.tdwg.org namens Markus Döring Verzonden: ma 22-11-2010 12:03 [...]

...

That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and they dont have to use a rank marker. [...]

***

...

From a nomenclatural point of view this can be argued either way. The scientific name is indeed "Lactuca macrophylla uralensis" and if there are two such names, based on different types, these are homonyms (irrespective of rank). However, the name cannot be rendered this way, as "Lactuca macrophylla subsp. uralensis" and "Lactuca macrophylla var. uralensis" are different things.

Also keep in mind that the same issue can also be found for subdivisions of genera, e.g. "Euphorbia subg. Euphorbia", etc. Paul van Rijckevorsel

Tony.Rees＠csiro.au

04:02

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Thanks, Markus. Just one comment on a paragraph where I may not have expressed myself too clearly: You wrote:

...

Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

What I was saying was that e.g. dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies may work alright as an alternative to the suggested canonicalName, however the following has no workaround: dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist (same for most other intermediate ranks) i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks. Regards - Tony ________________________________________ From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below. On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:

...

Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

...

In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):

DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

DwC:genus=Philander DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

...

So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName

...

Regards - Tony

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Markus Döring

06:30

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...

...
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

What I was saying was that e.g.

dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies

may work alright as an alternative to the suggested canonicalName,

however the following has no workaround:

dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist

(same for most other intermediate ranks) Ah, perfectly right of course!

Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?

...

i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.

Regards - Tony

________________________________________ From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.

On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:

...
Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.

Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:

dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst

If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.

That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.

Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570

dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923

Botanic example: http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800e92ea-...

dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter

dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter

...
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...) reads as follows (paraphrased from the relevant row in my csv file):

DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

DwC:genus=Philander DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

...
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName

...
Regards - Tony

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Richard Pyle

10:35

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

TCS includes the element "Uninomial" (under the "CanonicalName" node), to address all names consisting of a single "part" (=single "NameElement" in GNUB-speak); including names at the rank of genus. I don't rememeber exactly whether names at the rank of genus are supposed to be represented in both "Uninomial" and "Genus", but I guess it doesn't really matter. The addition of "Uninomial" to DwC would effectively solve the problem of representing names not among the "main" ranks. Rich

...

-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: Monday, November 22, 2010 4:30 AM To: Tony.Rees@csiro.au Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
...
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

What I was saying was that e.g.

dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies

may work alright as an alternative to the suggested canonicalName,

however the following has no workaround:

dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist

(same for most other intermediate ranks) Ah, perfectly right of course!

Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?

...
i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.

Regards - Tony

________________________________________ From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.

On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:

...
Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.

Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:

dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst

If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.

That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.

Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570

dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923

Botanic example:

http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800

...
e92ea-496b-4368-abf9-9ae12f7f40d1/synonymy

dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter

dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter

...
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Tr eatment,_normalised) reads as follows (paraphrased from the relevant row in my csv file):

DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

DwC:genus=Philander DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

...
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName

...
Regards - Tony

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Tony.Rees＠csiro.au

11:44

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Hi Rich, thanks for the suggestion. "unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think... Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say. Just my personal view, of course... Regards - Tony ________________________________________ From: Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 5:35 AM To: 'Markus Döring'; Rees, Tony (CMAR, Hobart) Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? TCS includes the element "Uninomial" (under the "CanonicalName" node), to address all names consisting of a single "part" (=single "NameElement" in GNUB-speak); including names at the rank of genus. I don't rememeber exactly whether names at the rank of genus are supposed to be represented in both "Uninomial" and "Genus", but I guess it doesn't really matter. The addition of "Uninomial" to DwC would effectively solve the problem of representing names not among the "main" ranks. Rich

...

-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: Monday, November 22, 2010 4:30 AM To: Tony.Rees@csiro.au Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
...
Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

What I was saying was that e.g.

dwc:scientifcName=Pinus abies L. dwc:rank=species dwc:genus=Picea dwc:specificEpithet=abies

may work alright as an alternative to the suggested canonicalName,

however the following has no workaround:

dwc:scientificName=Crustacea Brünnich, 1772 dwc:rank=subphylum would then need: //dwc:subphylum=Crustacea// under this model, but it does not exist

(same for most other intermediate ranks) Ah, perfectly right of course!

Assuming we would add canonicalName and we use genus for the classification - is there any purpose left for specific- and infraspecificEpithet?

...
i.e. there is no dwc pre-formatted element for intermediate ranks between kingdom/phylum/class/order/family/genus, but there would be plenty of canonical names at these ranks.

Regards - Tony

________________________________________ From: Markus Döring [m.doering@mac.com] Sent: Monday, 22 November 2010 10:03 PM To: Rees, Tony (CMAR, Hobart) Cc: nico.franz@upr.edu; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Thanks Tony for bringing this back. I think I tend to support the idea of a new canonicalName term, see below.

On Nov 22, 2010, at 3:50, Tony.Rees@csiro.au wrote:

...
Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

Yes, thats the current choice one has with dwc. Problems with #2 when dealing with not only simple binomials I think I have stressed before.

Another confusion that should need clarification is actually the role of the higher taxon terms in dwc - you touch on it below too. In case of synonyms does dwc:genus actually hold the genus of the synonym name or is it the accepted genus the synonym is classified to? If you look at the term definition it says: "The full scientific name of the genus in which the taxon is classified." This is consistent with all other higher taxon terms in darwin core that represent the taxonomic hierarchy. A quick example:

dwc:scientifcName=Pinus abies L. dwc:genus=Picea dwc:taxonomicStatus=homotypic synonym dwc:acceptedNameUsage=Picea abies (L.) H.Karst

If we accept this view, then there really is no way to express the canonical name and I would indeed vote for having a new dwc:canonicalName term. With no doubt the canonical form of a name is the most important string when first dealing with a name and trying to align it with names from other sources. And we surely shouldnt require a name parser to be used for this very frequent use case.

That leads me to another question. Does the canonical name string for an infraspecific taxon include the rank marker? Ideally I think it shouldnt as the main point for having a canonical name string is to have a string that is highly similar across different sources. Removing the rank marker not only avoids spelling variations, but also zoologists pretty much only deal with subspecies and there dont have to use a rank marker.

Zoological example: http://www.marinespecies.org/aphia.php?p=taxdetails&id=293570

dwc:scientifcName=Clupea pallasii marisalbi Berg, 1923 dwc:taxonRank=subspecies dwc:canonicalName=Clupea pallasii marisalbi dwc:scientifcNameAuthorship=Berg, 1923

Botanic example:

http://wp6-cichorieae.e-taxonomy.eu/portal/?q=cdm_dataportal/taxon/800

...
e92ea-496b-4368-abf9-9ae12f7f40d1/synonymy

dwc:scientifcName=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter dwc:taxonRank=subspecies dwc:canonicalName=Lactuca macrophylla uralensis dwc:scientifcNameAuthorship=(Rouy) N. Kilian & Greuter

dwc:scientifcName=Mulgedium macrophyllum var. hispidum (Ledeb.) Korsh. dwc:taxonRank=variety dwc:canonicalName=Mulgedium macrophyllum hispidum dwc:scientifcNameAuthorship=(Ledeb.) Korsh. dwc:taxonomicStatus=heterotypic synonym dwc:acceptedNameUsage=Lactuca macrophylla subsp. uralensis (Rouy) N. Kilian & Greuter

...
In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Tr eatment,_normalised) reads as follows (paraphrased from the relevant row in my csv file):

DwC:taxonId=mam10000822 DwC:scientificName=Philander opossum DwC:scientificNameAuthorship=Linnaeus, 1758 DwC:taxonRank=species DwC:taxonomicStatus=accepted DwC:nomenclaturalStatus=available DwC:nameAccordingTo=CoL2006/ITS DwC:originalNameUsageID= DwC:namePublishedIn= DwC:acceptedNameUsageID=mam10000822 DwC:parentNameUsage=Philander DwC:parentNameUsageID=mam1001153 DwC:taxonRemarks= dc:modified=21-09-2006 DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

DwC:genus=Philander DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

Im not sure if I correctly understand. dwc:scientificName is used for ANY rank, not only infrageneric ones. You dont have to use the higher taxon terms at all if you already use the adjacency format via DwC:parentNameUsageID.

...
So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Tony, I do agree and also think this solves all problems discussed here so far! As a recommendation both scientificName and canonicalName

...
Regards - Tony

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Richard Pyle

12:06

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...

"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier. But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data. I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data. What we seem to be arguing about now is how many different forms of a "formatted" name do we want? With or without authorship? With or without year? With or without infraspecific prefixes ("var.", "f." etc.)? With or without infrageneric name(s)? With or without italics codes? With or without qualifiers like "cf.", "aff.", etc.? Etc. Etc. Etc. There are potentially dozens of different terms we could define to accommodate every particular niche-need. Personally, I think that the existing "scientificName" should be split into two different terms: fullScientificNameStringWithAuthorship And verbatimNameString The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form. The second would be the literal text string as it appeared in the original source. Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for. Rich

Richard Pyle

12:21

New subject: [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Correction on my previous post... I said:

...

fullScientificNameStringWithAuthorship

What I meant was: fullScientificNameStringWithAuthorshipIfYouHaveIt Rich

Bob Morris

12:37

New subject: [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

more informative might be fullScientificNameStringWithAuthorshipIfYouHaveItAndTooBadForYouIfYouDont more normative might be fullScientificNameStringWithAuthorshipIfYouHaveItAndShameOnYouIfYouDont On Mon, Nov 22, 2010 at 3:21 PM, Richard Pyle <deepreef@bishopmuseum.org> wrote:

...

Correction on my previous post...

I said:

...
fullScientificNameStringWithAuthorship

What I meant was:

fullScientificNameStringWithAuthorshipIfYouHaveIt

Rich

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)

Richard Pyle

12:52

New subject: [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

...

more informative might be

fullScientificNameStringWithAuthorshipIfYouHaveItAndTooBadForY ouIfYouDont

Yes, but the key question is: "TooBad" for whom? For the Provider? (i.e., if you don't have the Authorship information, then don't even bother giving us the record) Or, for the User? (i.e., here's all I got; so too bad if you also wanted Authorship details).

...

more normative might be

fullScientificNameStringWithAuthorshipIfYouHaveItAndShameOnYou IfYouDont

Yes, but as I imagine many of these name-strings will be emerging from BHL OCR text, in most cases we'll only be casting shame on people who are no longer living. Rich

Tony.Rees＠csiro.au

14:18

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Hi Rich, all, You wrote:

...

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub-optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again... Regards - Tony

...

-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.

But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.

I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.

What we seem to be arguing about now is how many different forms of a "formatted" name do we want?

With or without authorship?

With or without year?

With or without infraspecific prefixes ("var.", "f." etc.)?

With or without infrageneric name(s)?

With or without italics codes?

With or without qualifiers like "cf.", "aff.", etc.?

Etc.

Etc.

Etc.

There are potentially dozens of different terms we could define to accommodate every particular niche-need.

Personally, I think that the existing "scientificName" should be split into two different terms:

fullScientificNameStringWithAuthorship And verbatimNameString

The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.

The second would be the literal text string as it appeared in the original source.

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

Rich

David Remsen (GBIF)

23 Nov 23 Nov

00:47

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms. 1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information. 2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship 3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms. It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case. Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship I'd know what to do then DR On Nov 22, 2010, at 11:18 PM, <Tony.Rees@csiro.au> <Tony.Rees@csiro.au> wrote:

...

Hi Rich, all,

You wrote: .

...
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...

Regards - Tony

...
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.

But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.

I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.

What we seem to be arguing about now is how many different forms of a "formatted" name do we want?

With or without authorship?

With or without year?

With or without infraspecific prefixes ("var.", "f." etc.)?

With or without infrageneric name(s)?

With or without italics codes?

With or without qualifiers like "cf.", "aff.", etc.?

Etc.

Etc.

Etc.

There are potentially dozens of different terms we could define to accommodate every particular niche-need.

Personally, I think that the existing "scientificName" should be split into two different terms:

fullScientificNameStringWithAuthorship And verbatimNameString

The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.

The second would be the literal text string as it appeared in the original source.

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

Rich

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Tony.Rees＠csiro.au

03:35

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

David Remsen wrote:

...

Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship

If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered... Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess? Regards - Tony ________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms. 1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information. 2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship 3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms. It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case. Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship I'd know what to do then DR On Nov 22, 2010, at 11:18 PM, <Tony.Rees@csiro.au> <Tony.Rees@csiro.au> wrote:

...

Hi Rich, all,

You wrote: .

...
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...

Regards - Tony

...
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.

But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.

I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.

What we seem to be arguing about now is how many different forms of a "formatted" name do we want?

With or without authorship?

With or without year?

With or without infraspecific prefixes ("var.", "f." etc.)?

With or without infrageneric name(s)?

With or without italics codes?

With or without qualifiers like "cf.", "aff.", etc.?

Etc.

Etc.

Etc.

There are potentially dozens of different terms we could define to accommodate every particular niche-need.

Personally, I think that the existing "scientificName" should be split into two different terms:

fullScientificNameStringWithAuthorship And verbatimNameString

The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.

The second would be the literal text string as it appeared in the original source.

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

Rich

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

David Remsen (GBIF)

04:15

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Tony I did indeed mean that scientificName and authorship could be used in the following way 1. "Agalinis purpurea" -> scientificName ("Agalinis purpurea") - where a canonical form of the name with no authorship in the source data 2. "Agalinis purpurea (L.) Pennell" -> scientificName ("Agalinis purpurea (L.) Pennell" ) - where a unparsed name+author is in the source data 3. "Agalinis purpurea" AND "(L.) Pennell" -> scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) Pennell") - where a semi-parsed name + author is in the source data 4. "Agalinis" AND purpurea" AND "(L.) Pennell" > scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) - where a fully atomised name is in the source data and the 'name' parts concatenated to make a proper canonical name. Cases 3 and 4 require modification of the definition at http://rs.tdwg.org/dwc/terms/index.htm#scientificName to be something like "The full scientific name, which may include authorship and date information if known..." with the implicit intention that it is not REQUIRED to parse or semi-parse an unparsed name in order to properly share it. David On Nov 23, 2010, at 12:35 PM, <Tony.Rees@csiro.au> wrote:

...

David Remsen wrote:

...
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship

If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered...

Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess?

Regards - Tony

________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.

1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.

2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship

3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.

It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.

Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like

dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship

I'd know what to do then

DR

On Nov 22, 2010, at 11:18 PM, <Tony.Rees@csiro.au> <Tony.Rees@csiro.au> wrote:

...
Hi Rich, all,

You wrote: .

...
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...

Regards - Tony

...
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.

But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.

I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.

What we seem to be arguing about now is how many different forms of a "formatted" name do we want?

With or without authorship?

With or without year?

With or without infraspecific prefixes ("var.", "f." etc.)?

With or without infrageneric name(s)?

With or without italics codes?

With or without qualifiers like "cf.", "aff.", etc.?

Etc.

Etc.

Etc.

There are potentially dozens of different terms we could define to accommodate every particular niche-need.

Personally, I think that the existing "scientificName" should be split into two different terms:

fullScientificNameStringWithAuthorship And verbatimNameString

The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.

The second would be the literal text string as it appeared in the original source.

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

Rich

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Tony.Rees＠csiro.au

11:40

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Hi David, It seems to me that your suggestion is still not quite ideal, in that sometimes just the dwc:scientificName element will be picked up and passed around and the content will not be consistent between those suppliers who concatenate the available authority info and those who do not. That suggests to me that an extra field for known canonicalName if this can be supplied is still desirable - but I am not sure if I am alone in thinking this... Regards - Tony ________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 11:15 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? Tony I did indeed mean that scientificName and authorship could be used in the following way 1. "Agalinis purpurea" -> scientificName ("Agalinis purpurea") - where a canonical form of the name with no authorship in the source data 2. "Agalinis purpurea (L.) Pennell" -> scientificName ("Agalinis purpurea (L.) Pennell" ) - where a unparsed name+author is in the source data 3. "Agalinis purpurea" AND "(L.) Pennell" -> scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) Pennell") - where a semi-parsed name + author is in the source data 4. "Agalinis" AND purpurea" AND "(L.) Pennell" > scientificName ("Agalinis purpurea") + scientificNameAuthorship ("(L.) - where a fully atomised name is in the source data and the 'name' parts concatenated to make a proper canonical name. Cases 3 and 4 require modification of the definition at http://rs.tdwg.org/dwc/terms/index.htm#scientificName to be something like "The full scientific name, which may include authorship and date information if known..." with the implicit intention that it is not REQUIRED to parse or semi-parse an unparsed name in order to properly share it. David On Nov 23, 2010, at 12:35 PM, <Tony.Rees@csiro.au> wrote:

...

David Remsen wrote:

...
Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship

If by "dwc:scientificName" you mean with authorship omitted, that is fine, however it would need the dwc definition to be altered...

Then at least folk would/should know which field to populate. However the mandatory yes/no issue would also have to be addressed - at present I think dwc:scientificName is the only taxonomy related element that is mandatory, all others are optional. Under your scenario it would then maybe be one of either of the first 2 fields, or both as available, I guess?

Regards - Tony

________________________________________ From: David Remsen (GBIF) [dremsen@gbif.org] Sent: Tuesday, 23 November 2010 7:47 PM To: Rees, Tony (CMAR, Hobart) Cc: David Remsen (GBIF); deepreef@bishopmuseum.org; m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

While I haven't seen them all, I have seen and had to understand a good number of biodiversity databases including many focused on managing species lists in one form or another. Names are represented in these three forms.

1. Completely unparsed where the entire verbose name text is in a single field corresponding to dwc:scientificName. In some databases this means just a scientific name as many databases don't hold authorship information.

2. Semi-parsed where the canonical name is separated from the authorship information corresponding to the proposed canonicalName and dwc:scientificNameAuthorship

3. Fully parsed into atoms (genus, specific epithet, infraspecific rank, infraspecies, authorship) corresponding to the incomplete set of dwc atomic elements already in existence. This form is the most problematic because 1) it isn't always clear from the parts how the actual complete name is intended to be represented and 2) there are so many structural exceptions and complexities that many more 'atoms' need to be described to effectively enable it to be used. 3) there is the problematic definition of the use of Genus as described by Markus that conflicts with atomising synonyms.

It makes sense to maintain the separation of name and authorship in data sources that already do this but Im not convinced a canonicalName element is required. It seems that it is suggested so that it makes it easier to consume the data but it also means its more confusing for a typical data manager or biologist to produce it. I have a database with binomials alone. How many data managers or biologists will map them to canonicalName before scientificName? I know we want to avoid testing different conditions when we use the data but we will have to in either case.

Maybe we shouldnt add canonical name but rather something more specific to the concatenated form like

dwc:scientificNameWithAuthorshipAndOtherBits dwc:scientificName dwc:scientificNameAuthorship

I'd know what to do then

DR

On Nov 22, 2010, at 11:18 PM, <Tony.Rees@csiro.au> <Tony.Rees@csiro.au> wrote:

...
Hi Rich, all,

You wrote: .

...
Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

No, I think that is muddying the waters (with respect of course...) I simply made the case for "canonicalName" - aka scientific name without authorship - as a valuable adjunct to "scientificName", for users who can supply both, and consumers who would otherwise have to generate the former from the latter algorithmically. Markus, Dima probably represent the main "consumers" here and I if you like can represent a "provider" (although I wear other "consumer" hats on occasion as well). Basically if a "canonicalName" field does not exist, I will just omit to provide this information, which seems sub- optimal since it all exists pre-parsed and manually verified in my system, and someone else will then have to do the job again...

Regards - Tony

...
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 23 November 2010 7:06 AM To: Rees, Tony (CMAR, Hobart); m.doering@mac.com Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

...
"unininomial" would equal "canonicalName" for ranks subgenus and above, but not for species and below, while canonicalName (or scientificNameCanonical if you prefer) covers all cases, which is why I thik it is preferable, especially as the majority of names in circulation are at species level and below I think...

Atomising further i.e. a binomial or poynomial into genus, species, infaspecies is actually a separate activity with its own rationale, I would say.

Just my personal view, of course...

The cleanest way to do it is to simply have Rank, NameElement and parentNameUsageID, and be done with it (maybe with the addition of verbatimNameString for purists). But that assumes that providers have parsed data, which they often do not. Maybe with services like those associated with GNI, the time of databases with unparsed names data are drawing to a close. Or, maybe if GNUB gets a foot-hold, we'll solve all the problems via a simply actionable persistent identifier.

But until that time, dwc needs to find a balance between users who want pre-parsed data, and providers who do not have pre-parsed data.

I think dwc *almost* accomodates both worlds, as long as scientificName is defined as "the complete set of textual elements useful for recognizing a unique scientific name"; which is either concatenated by the provider with parsed data, or simply "provided" by the provider with unparsed data.

What we seem to be arguing about now is how many different forms of a "formatted" name do we want?

With or without authorship?

With or without year?

With or without infraspecific prefixes ("var.", "f." etc.)?

With or without infrageneric name(s)?

With or without italics codes?

With or without qualifiers like "cf.", "aff.", etc.?

Etc.

Etc.

Etc.

There are potentially dozens of different terms we could define to accommodate every particular niche-need.

Personally, I think that the existing "scientificName" should be split into two different terms:

fullScientificNameStringWithAuthorship And verbatimNameString

The first would be a concatenated text string assembled from parsed bits, according to a community standard concatenation form.

The second would be the literal text string as it appeared in the original source.

Otherwise, we could argue forever about which of the dozen possible forms we think DwC needs a term for.

Rich

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Gregor Hagedorn

19 Nov 19 Nov

01:07

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

...

I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.

(I think we need to add the rank string. Same epithet for subsp. and var. is not unusual in Botany/Mycology. Gregor)

Jim Croft

13:39

New subject: [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad? [SEC=UNCLASSIFIED]

Candidly, I would love to get rid of the infraspecific rank thing in botany and work with tri- or perhaps even poly- nominals, but history, practice and the code are against it. I don't like to say nice things about zoologists, but I think they got this one right. :) Jim On Friday, November 19, 2010, Gregor Hagedorn <g.m.hagedorn@gmail.com> wrote:

...

...
I think it would be useful though to have a canonicalName field that only takes "Genus specificEpithet infraspecificEpithet" and no more.

(I think we need to add the rank string. Same epithet for subsp. and var. is not unusual in Botany/Mycology. Gregor) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

5268

Age (days ago)

5273

Last active (days ago)

List overview

Download

34 comments

13 participants

participants (13)

"Markus Döring (GBIF)"
Bob Morris
David Remsen (GBIF)
dipteryx＠freeler.nl
greg whitbread
Gregor Hagedorn
Jim Croft
Markus Döring
Nico Franz
Paul Murray
Richard Pyle
Roderic Page
Tony.Rees＠csiro.au