[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad? [SEC=UNCLASSIFIED]

Thu Nov 25 09:11:08 CET 2010

the denormalised single Linnean Rank terms are very, very helpful for sharing occurrence data.
They are the primary means to distinguish between homonyms when only a canonical name is given.
And they are found in many denormalised sources like spreadsheets. No doubt these are needed!

And yes, dwc:genus and dwc:subgenus according to the definition is for the *classification*, not the parsed name (even though this is mostly the same).

As far as I can tell the dwc changes we are discussing are still the same. Either:

 A) add a canonicalName term
or
 B) add an atomised term for genus/uninomial + infrageneric/uninomial

I think both options are a way to go.
A single canonical name if given correctly is very straight forward to parse, so personally I think this is easier than having multiple terms.
For the name part terms I think I would agree with Chuck that a single uninomial can be used for genus or infrageneric ranks.
As a canonical binomial would *not* include a subgenus or section, there is not need to have that parsed information as a term.
In case the scientificname actually IS the subgenus, the uninomial can be used.

Markus

On Nov 25, 2010, at 8:36, David Remsen (GBIF) wrote:

> Rich
> 
> Your two statements below don't jibe well in this case.   Putting  
> random concatenations of higher taxa into dwc:higherClassification  
> would make for a real mess.   Having only the basic named Linnaean  
> ranks does ignore all of the intermediate ranks but it supports  
> conformity at least for those in a way that higherClassification  
> cannot as you lose the associated rank term.   It also supports what I  
> think is a fairly substantial bloc of data that exists in a denormal  
> form with only (or nearly only) the basic Linnaean ranks in named rank  
> columns.   Concatenating these into dwc:higherClassification would be  
> lossy in this case.
> 
> My real concern, however, would be in trying to subsequently line up  
> multiple datasets where there were omissions in some higher ranks so  
> that the concatenations were abbreviated.   In other words
> 
> Bivalvia:Mytildae:Mytilus edulis
> Mollusca:Mytiloidea:Mytilus: Mytilus edulis
> Animalia: Mollusca:Mytiloidea:Mytildae:Mytilus: Mytilus edulis
> 
> See http://code.google.com/p/gbif-ecat/wiki/Nom5ExampleMytilusedulis   
> for a real world example and, ignoring the other inherent messes,   
> imagine trying to deal with this with no higher rank columns for  
> context and all those nulls removed (no fair keeping the delimiters  
> for them either).
> 
> DR
> 
> 
>> 
>> On 25/11/2010, at 11:56 AM, Richard Pyle wrote:
>> 
>>> One golden rule of data management that I often
>>> tell people is that it's often better to be consistent, then  
>>> correct.  That
>>> is, something that's consistently incorrect can be corrected easily.
> 
> 
>> Right -- you mean in the sense of Family, Order, Class, etc.   
>> Personally, I
>> think it would be "ideal" to eliminate these individual fields and  
>> just use
>> dwc: higherClassification for this purpose.  People with normalised  
>> data can
>> represent it properly via parentNameUsage[ID] -- with the  
>> understanding that
>> all names with a rank lower than genus would include the genus name as
>> uninomial.
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content