[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad? [SEC=UNCLASSIFIED]
"Markus Döring (GBIF)"
mdoering at gbif.org
Thu Nov 25 09:21:22 CET 2010
... a much, much bigger use case for having linnean rank terms other than homonym disambiguation is actually fuzzy matching of misspelled canonicals!
On Nov 25, 2010, at 9:11, Markus Döring wrote:
> the denormalised single Linnean Rank terms are very, very helpful for sharing occurrence data.
> They are the primary means to distinguish between homonyms when only a canonical name is given.
> And they are found in many denormalised sources like spreadsheets. No doubt these are needed!
>
> And yes, dwc:genus and dwc:subgenus according to the definition is for the *classification*, not the parsed name (even though this is mostly the same).
>
> As far as I can tell the dwc changes we are discussing are still the same. Either:
>
> A) add a canonicalName term
> or
> B) add an atomised term for genus/uninomial + infrageneric/uninomial
>
> I think both options are a way to go.
> A single canonical name if given correctly is very straight forward to parse, so personally I think this is easier than having multiple terms.
> For the name part terms I think I would agree with Chuck that a single uninomial can be used for genus or infrageneric ranks.
> As a canonical binomial would *not* include a subgenus or section, there is not need to have that parsed information as a term.
> In case the scientificname actually IS the subgenus, the uninomial can be used.
>
>
> Markus
>
> On Nov 25, 2010, at 8:36, David Remsen (GBIF) wrote:
>
>> Rich
>>
>> Your two statements below don't jibe well in this case. Putting
>> random concatenations of higher taxa into dwc:higherClassification
>> would make for a real mess. Having only the basic named Linnaean
>> ranks does ignore all of the intermediate ranks but it supports
>> conformity at least for those in a way that higherClassification
>> cannot as you lose the associated rank term. It also supports what I
>> think is a fairly substantial bloc of data that exists in a denormal
>> form with only (or nearly only) the basic Linnaean ranks in named rank
>> columns. Concatenating these into dwc:higherClassification would be
>> lossy in this case.
>>
>> My real concern, however, would be in trying to subsequently line up
>> multiple datasets where there were omissions in some higher ranks so
>> that the concatenations were abbreviated. In other words
>>
>> Bivalvia:Mytildae:Mytilus edulis
>> Mollusca:Mytiloidea:Mytilus: Mytilus edulis
>> Animalia: Mollusca:Mytiloidea:Mytildae:Mytilus: Mytilus edulis
>>
>> See http://code.google.com/p/gbif-ecat/wiki/Nom5ExampleMytilusedulis
>> for a real world example and, ignoring the other inherent messes,
>> imagine trying to deal with this with no higher rank columns for
>> context and all those nulls removed (no fair keeping the delimiters
>> for them either).
>>
>> DR
>>
>>
>>>
>>> On 25/11/2010, at 11:56 AM, Richard Pyle wrote:
>>>
>>>> One golden rule of data management that I often
>>>> tell people is that it's often better to be consistent, then
>>>> correct. That
>>>> is, something that's consistently incorrect can be corrected easily.
>>
>>
>>> Right -- you mean in the sense of Family, Order, Class, etc.
>>> Personally, I
>>> think it would be "ideal" to eliminate these individual fields and
>>> just use
>>> dwc: higherClassification for this purpose. People with normalised
>>> data can
>>> represent it properly via parentNameUsage[ID] -- with the
>>> understanding that
>>> all names with a rank lower than genus would include the genus name as
>>> uninomial.
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
More information about the tdwg-content
mailing list