Several name parsing services exist to provide this functionality

http://tools.gbif.org/nameparser/
http://gni.globalnames.org/parsers/new

My personal philosophy regarding data sharing is that,  whenever practical, the burden should be placed on the enabling infrastructure and not the person wanting to share their data.   Put enough impediments in the way and the pipes will stay empty.  

I prefer a loose definition for using scientificName,  that recommends parsing authority information into the authorship element but does not require it.   The services for parsing could be made more easily available and even incorporated into publishing tools.   I think we will be dealing with this issue whether canonicalName is available or not.

For GBIF at least,  if dealing with authorship issues like this were the biggest data processing issue we faced,  we would take it.   If you want to see what we are dealing with at the moment,  for example,  the values in specimen and observation data that are included in the dwc:Family element,  its a bit of an eye opener and I have a file!

DR

On Nov 24, 2010, at 4:46 AM, Peter DeVries wrote:

My vote would be to clarify the use of scientific name to not include authorship as Rich suggests.

Perhaps a partial solution would be for the GNI or GBIF to provide some web service that end users could use to clean and parse their names into their dwc:scientificname and authorship parts. (They probably have something close to this already)

For ease of use the system could output something like this

Puma concolor <tab>  (Linnaeus 1771) 

In the process they could flag potentially incorrect uses of parenthesis etc.

Puma concolor <tab>  Linnaeus 1771 <tab> Note Potentially incorrect authorship - parenthesis missing

or

Felis concolor <tab>  Linnaeus 1771 <tab> Note Do you mean "Puma concolor  (Linnaeus 1771)"

A beneficial side effect would be that everyone has a more normalized and accurate species list.

Respectfully,

- Pete

On Tue, Nov 23, 2010 at 6:40 PM, <Tony.Rees@csiro.au> wrote:
Rich,

No need to apologise... Actually it affects the aggregators in two respects, one is the larger vs. more compact data representation, the other is the present inconsistency about what is actually expected/supplied in practice by real world data providers in the present "scientificName" element. If it was clearer that this was for sciname + author, and the sciname without author had its own dedicated element, the incoming data would (might) be potentially a lot more consistent.

Basically it is the present "scientificNameAuthor" element which is clouding the issue - people see this and then think they do not need to add the author in to "scientificName" as well, although as previously stated by Markus this is technically incorrect according to the DwC spec (and I can see the argument for keeping it that way, so as to capture as much info as possible in that field).

Cheers - Tony


> -----Original Message-----
> From: Richard Pyle [mailto:deepreef@bishopmuseum.org]
> Sent: Wednesday, 24 November 2010 11:27 AM
> To: Rees, Tony (CMAR, Hobart); Chuck.Miller@mobot.org; dremsen@gbif.org
> Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org
> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
> DwCscientificName: good or bad?
>
>
> OK, understood.
>
> But I guess my next question would be: is this really "bloat"?  Isn't the
> cost of the bloat much less than the value of providing fully parsed
> content?
>
> I now understand what I think is a large part of the basis for our
> (perhaps
> non-existent?) disagreement: I'm thinking of dwc terms in the abstract
> sense, whereas you are thinking in terms of more practical issues such as
> the MB size of your DwCA files.  This also clarifies for me why you keep
> saying that it's really a question for the big aggregators (which I now
> understand and agree with).
>
> Sorry if I was misunderstanding where you are coming from on this!
>
> Aloha,
> Rich
>

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content



--
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------