My vote would be to clarify the use of scientific name to not include authorship as Rich suggests.
Perhaps a partial solution would be for the GNI or GBIF to provide some web service that end users could use to clean and parse their names into their dwc:scientificname and authorship parts. (They probably have something close to this already)
For ease of use the system could output something like this
Puma concolor <tab> (Linnaeus 1771)
In the process they could flag potentially incorrect uses of parenthesis etc.
Puma concolor <tab> Linnaeus 1771 <tab> Note Potentially incorrect authorship - parenthesis missing
or
Felis concolor <tab> Linnaeus 1771 <tab> Note Do you mean "Puma concolor (Linnaeus 1771)"
A beneficial side effect would be that everyone has a more normalized and accurate species list.
Respectfully,
- Pete
On Tue, Nov 23, 2010 at 6:40 PM, Tony.Rees@csiro.au wrote:
Rich,
No need to apologise... Actually it affects the aggregators in two respects, one is the larger vs. more compact data representation, the other is the present inconsistency about what is actually expected/supplied in practice by real world data providers in the present "scientificName" element. If it was clearer that this was for sciname + author, and the sciname without author had its own dedicated element, the incoming data would (might) be potentially a lot more consistent.
Basically it is the present "scientificNameAuthor" element which is clouding the issue - people see this and then think they do not need to add the author in to "scientificName" as well, although as previously stated by Markus this is technically incorrect according to the DwC spec (and I can see the argument for keeping it that way, so as to capture as much info as possible in that field).
Cheers - Tony
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Wednesday, 24 November 2010 11:27 AM To: Rees, Tony (CMAR, Hobart); Chuck.Miller@mobot.org; dremsen@gbif.org Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?
OK, understood.
But I guess my next question would be: is this really "bloat"? Isn't the cost of the bloat much less than the value of providing fully parsed content?
I now understand what I think is a large part of the basis for our (perhaps non-existent?) disagreement: I'm thinking of dwc terms in the abstract sense, whereas you are thinking in terms of more practical issues such as the MB size of your DwCA files. This also clarifies for me why you keep saying that it's really a question for the big aggregators (which I now understand and agree with).
Sorry if I was misunderstanding where you are coming from on this!
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content