This all sounds like it's getting terribly complicated and the combined discussion on atomised parts vs canonical/full-name are confusing me.
For the first part, I still think available parsing tools make 99.8% of all cases tractable but if we want to be explicit and not run these services all the time then.
For datasets that separate the name from the authorship we make sure it's clear this separation is retained. The definition for this term must change. It doesn't make sense to me to concatenate two elements that are already split. The parts go into: 1. scientificName + scientificNameAuthorship
For datasets with only a scientificName. The name goes into: 2. scientificName
For datasets with scientificName and authorship in a single field we have two choices: 3a. scientificName # in which case we must be able to detect and split authorship and we need to detect the canonical form in case 2 3b. scientificNameWithAuthorship # rather than a canonicalName term which is confusing we use a less ambiguous term like this.
It seems to me the intent of 3b is more explicit as to what we intend by adding canonicalName.
DR
On Nov 25, 2010, at 2:03 AM, Tony.Rees@csiro.au wrote:
Quoting Rich Pyle:
At this point, though, I really don't have a good sense for how best to proceed.
Aloha, Rich
Maybe an answer would be to use TCS not DwC for exchange of purely taxonomic data? How about creating a TCSA format for bulk transfer - or is this not a great thought... (not being that familiar with TCS)
One problem is that (e.g.) it is often desired to include some non- taxonomic information along with the names e.g. distribution/habitat codes, etc.
Just an idea, don't know if it solves the residual DwC issues anyway,
Cheers - Tony