[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad? [SEC=UNCLASSIFIED]

Thu Nov 25 09:21:22 CET 2010

... a much, much bigger use case for having linnean rank terms other than homonym disambiguation is actually fuzzy matching of misspelled canonicals! 

On Nov 25, 2010, at 9:11, Markus Döring wrote:

> the denormalised single Linnean Rank terms are very, very helpful for sharing occurrence data.
> They are the primary means to distinguish between homonyms when only a canonical name is given.
> And they are found in many denormalised sources like spreadsheets. No doubt these are needed!
> 
> And yes, dwc:genus and dwc:subgenus according to the definition is for the *classification*, not the parsed name (even though this is mostly the same).
> 
> As far as I can tell the dwc changes we are discussing are still the same. Either:
> 
> A) add a canonicalName term
> or
> B) add an atomised term for genus/uninomial + infrageneric/uninomial
> 
> I think both options are a way to go.
> A single canonical name if given correctly is very straight forward to parse, so personally I think this is easier than having multiple terms.
> For the name part terms I think I would agree with Chuck that a single uninomial can be used for genus or infrageneric ranks.
> As a canonical binomial would *not* include a subgenus or section, there is not need to have that parsed information as a term.
> In case the scientificname actually IS the subgenus, the uninomial can be used.
> 
> 
> Markus
> 
> On Nov 25, 2010, at 8:36, David Remsen (GBIF) wrote:
> 
>> Rich
>> 
>> Your two statements below don't jibe well in this case.   Putting  
>> random concatenations of higher taxa into dwc:higherClassification  
>> would make for a real mess.   Having only the basic named Linnaean  
>> ranks does ignore all of the intermediate ranks but it supports  
>> conformity at least for those in a way that higherClassification  
>> cannot as you lose the associated rank term.   It also supports what I  
>> think is a fairly substantial bloc of data that exists in a denormal  
>> form with only (or nearly only) the basic Linnaean ranks in named rank  
>> columns.   Concatenating these into dwc:higherClassification would be  
>> lossy in this case.
>> 
>> My real concern, however, would be in trying to subsequently line up  
>> multiple datasets where there were omissions in some higher ranks so  
>> that the concatenations were abbreviated.   In other words
>> 
>> Bivalvia:Mytildae:Mytilus edulis
>> Mollusca:Mytiloidea:Mytilus: Mytilus edulis
>> Animalia: Mollusca:Mytiloidea:Mytildae:Mytilus: Mytilus edulis
>> 
>> See http://code.google.com/p/gbif-ecat/wiki/Nom5ExampleMytilusedulis   
>> for a real world example and, ignoring the other inherent messes,   
>> imagine trying to deal with this with no higher rank columns for  
>> context and all those nulls removed (no fair keeping the delimiters  
>> for them either).
>> 
>> DR
>> 
>> 
>>> 
>>> On 25/11/2010, at 11:56 AM, Richard Pyle wrote:
>>> 
>>>> One golden rule of data management that I often
>>>> tell people is that it's often better to be consistent, then  
>>>> correct.  That
>>>> is, something that's consistently incorrect can be corrected easily.
>> 
>> 
>>> Right -- you mean in the sense of Family, Order, Class, etc.   
>>> Personally, I
>>> think it would be "ideal" to eliminate these individual fields and  
>>> just use
>>> dwc: higherClassification for this purpose.  People with normalised  
>>> data can
>>> represent it properly via parentNameUsage[ID] -- with the  
>>> understanding that
>>> all names with a rank lower than genus would include the genus name as
>>> uninomial.
>> 
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>