[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Richard Pyle deepreef at bishopmuseum.org
Wed Nov 24 21:43:58 CET 2010


The basic problem, I think, is that neither the Codes, nor the vast majority
of taxonomists, regard the Authorship details as being part of a "scientific
name".  Hence, the inclusion of name-bits and authorship bits in the same
text string does not resonate well with the term "scientificName".  I think
most of us in the taxonomy world would be willing to overlook this semantic
dissonance, except for the fact that "scientificName" is so crucial to any
DwC or DwCA record by virtue of it being required (Is this true, BTW? I
don't see anywhere in the DwC documentation where this requirement is stated
explicitly).

I think it's also true that it's much easier and more reliable to
concatenate [scientificName]+' '+[scientificNameAuthorship] at the client
side, than it is parse something like "Aus bus deBruen" into [genus] |
[specificEpithet] | [scientificNameAuthorship]; especially when "deBruen"
might be misinterpreted as an infraspecificEpithet.

So, forgetting the existing DwC terms for a moment, I think what we
ultimately want is the ability to pass a complete/verbatim name-string, and
also pass the parsed bits.  As already stated, it's much easier to generate
the former from the latter, than the other way around; thus, to be
provider-friendly, if either is required, it should be the complete/verbatim
version. So, what I think we need is something like:

verbatimScientificName
As I suggested in an earlier post, this would be "the complete set of
textual elements useful for recognizing a unique scientific name", exactly
as they appear in the original source.

uninomialNameElement
Used for all names at the rank of genus and above; would also replace
"genus" in DwC.

infragenericNameElement 
Better term for "subgenus".

specificEpithet
As in existing DwC.

infraspecificEpithet
As in existing DwC.

scientificNameAuthorship
As in existing DwC.

I don't really agree with Tony on the "clutter" argument for introducing a
single "canonicalName" term to replace the parsed uninomialNameElement [aka
"genus"], infragenericNameElement [aka "subgenus"], specificEpithet, and
infraspecificEpithet. (Side question to Tony -- would canonicalName include
"var.", "f." etc., hence obviating the need for TaxonRank as well?)
After all, "Aus bus xus" requires exactly the same number of bytes as
"Aus,bus,xus" in a DwCA file. Of course, if verbatimScientificName  [aka
scientificName] is required, we'd have redundancy and hence doubling of
bytes. However, if defined as verbatimScientificName as above, it would not
really be redundant information if the parsed bits were defined as
representing the Code-corrected version of the name, and due to the fact
that the verbatimScientificName will often be different from a canonical
concatenation of the parsed bits according to some standard format/formula.

So, to me, the main questions to answer are:

1) How does the existing DwC/DwCA structure fail to meet the needs of
providers and/or users, in terms of loss of information, potential for
misrepresentation of information, or inefficient or ineffective transfer of
information (i.e. overburdening either the provider or the client).

2) What are the most effective and least disruptive ways to correct the
failures identified in #1 above, in terms of re-defining existing terms, vs.
introducing new (and potentially redundant) terms, vs. a complete new set of
terms that may be semantically less confusing to taxonomists (as above)?

Aloha,
Rich

> -----Original Message-----
> From: Markus Döring [mailto:m.doering at mac.com]
> Sent: Wednesday, November 24, 2010 2:30 AM
> To: Richard Pyle
> Cc: Tony.Rees at csiro.au; Chuck.Miller at mobot.org; tdwg-
> content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
> DwCscientificName: good or bad?
> 
> >> I just had a quick look at the first few thousand data records coming
> >> into OBIS for my region (Australia). Just about every supplier who
> >> includes authority as dwc:scientificNameAuthor has used
> >> dwc:scientificName "incorrectly" i.e., for the canonical name not the
> >> canonical name + author. This data then flows into GBIF, ALA, etc.
> >> and circulates in this form. So "users" are already ignoring the
> >> definition of dwc:scientificName in practice, it would seem, with no
> >> apparent ill effects (?) - not sure whether this is good or bad,
> >> hence the title of my original question which prompted this thread...
> >
> > OK, so here's the question:
> >
> > Is it more disruptive to re-define dwc:scientificName to explicitly
> > exclude authorship?
> Thats definitely something Id like to avoid!
> We really need one place to keep the most explicit form of the name.
> 
> From seeing real data coming in I would coin the definition for
scientificName
> that it should *contain the most complete, verbatim name string*.
> If you happen to have only a canonical, use the canonical. If you happen
to
> have canonical + authorship parsed, join them if you can (its usually not
a
> simple concatenation, beware).
> 
> Markus
> 
> 
> 
> 
> >
> > Or, is it more disruptive to leave the existing (loose) definition of
> > scientificName intact, and create more term(s) with more precise
> > meanings, which we feel can help facilitate sharing of infomration?
> >
> > Rich
> >
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content




More information about the tdwg-content mailing list