The basic problem, I think, is that neither the Codes, nor the vast majority of taxonomists, regard the Authorship details as being part of a "scientific name". Hence, the inclusion of name-bits and authorship bits in the same text string does not resonate well with the term "scientificName". I think most of us in the taxonomy world would be willing to overlook this semantic dissonance, except for the fact that "scientificName" is so crucial to any DwC or DwCA record by virtue of it being required (Is this true, BTW? I don't see anywhere in the DwC documentation where this requirement is stated explicitly).
I think it's also true that it's much easier and more reliable to concatenate [scientificName]+' '+[scientificNameAuthorship] at the client side, than it is parse something like "Aus bus deBruen" into [genus] | [specificEpithet] | [scientificNameAuthorship]; especially when "deBruen" might be misinterpreted as an infraspecificEpithet.
So, forgetting the existing DwC terms for a moment, I think what we ultimately want is the ability to pass a complete/verbatim name-string, and also pass the parsed bits. As already stated, it's much easier to generate the former from the latter, than the other way around; thus, to be provider-friendly, if either is required, it should be the complete/verbatim version. So, what I think we need is something like:
verbatimScientificName As I suggested in an earlier post, this would be "the complete set of textual elements useful for recognizing a unique scientific name", exactly as they appear in the original source.
uninomialNameElement Used for all names at the rank of genus and above; would also replace "genus" in DwC.
infragenericNameElement Better term for "subgenus".
specificEpithet As in existing DwC.
infraspecificEpithet As in existing DwC.
scientificNameAuthorship As in existing DwC.
I don't really agree with Tony on the "clutter" argument for introducing a single "canonicalName" term to replace the parsed uninomialNameElement [aka "genus"], infragenericNameElement [aka "subgenus"], specificEpithet, and infraspecificEpithet. (Side question to Tony -- would canonicalName include "var.", "f." etc., hence obviating the need for TaxonRank as well?) After all, "Aus bus xus" requires exactly the same number of bytes as "Aus,bus,xus" in a DwCA file. Of course, if verbatimScientificName [aka scientificName] is required, we'd have redundancy and hence doubling of bytes. However, if defined as verbatimScientificName as above, it would not really be redundant information if the parsed bits were defined as representing the Code-corrected version of the name, and due to the fact that the verbatimScientificName will often be different from a canonical concatenation of the parsed bits according to some standard format/formula.
So, to me, the main questions to answer are:
1) How does the existing DwC/DwCA structure fail to meet the needs of providers and/or users, in terms of loss of information, potential for misrepresentation of information, or inefficient or ineffective transfer of information (i.e. overburdening either the provider or the client).
2) What are the most effective and least disruptive ways to correct the failures identified in #1 above, in terms of re-defining existing terms, vs. introducing new (and potentially redundant) terms, vs. a complete new set of terms that may be semantically less confusing to taxonomists (as above)?
Aloha, Rich
-----Original Message----- From: Markus Döring [mailto:m.doering@mac.com] Sent: Wednesday, November 24, 2010 2:30 AM To: Richard Pyle Cc: Tony.Rees@csiro.au; Chuck.Miller@mobot.org; tdwg- content@lists.tdwg.org; dmozzherin@eol.org Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?
I just had a quick look at the first few thousand data records coming into OBIS for my region (Australia). Just about every supplier who includes authority as dwc:scientificNameAuthor has used dwc:scientificName "incorrectly" i.e., for the canonical name not the canonical name + author. This data then flows into GBIF, ALA, etc. and circulates in this form. So "users" are already ignoring the definition of dwc:scientificName in practice, it would seem, with no apparent ill effects (?) - not sure whether this is good or bad, hence the title of my original question which prompted this thread...
OK, so here's the question:
Is it more disruptive to re-define dwc:scientificName to explicitly exclude authorship?
Thats definitely something Id like to avoid! We really need one place to keep the most explicit form of the name.
From seeing real data coming in I would coin the definition for
scientificName
that it should *contain the most complete, verbatim name string*. If you happen to have only a canonical, use the canonical. If you happen
to
have canonical + authorship parsed, join them if you can (its usually not
a
simple concatenation, beware).
Markus
Or, is it more disruptive to leave the existing (loose) definition of scientificName intact, and create more term(s) with more precise meanings, which we feel can help facilitate sharing of infomration?
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content