[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

David Remsen (GBIF) dremsen at gbif.org
Wed Nov 24 13:29:44 CET 2010


what you said.   yeah,  that's what I meant.

On Nov 24, 2010, at 1:20 PM, Markus Döring wrote:

>>> B) ...If they concatenate the authorship string with the taxon name
>>> string when populating dwc:scientificName, then the consumer has  
>>> no easy
>>> way
>>> of extracting the name bits from the authorship bits
>>
>> Exactly! Wearing my data consumer hat, the first thing I need to do  
>> with current dwc:ScientifiName content from multiple sources is try  
>> to generate canonical names by stripping off what appear to be  
>> authorities (hopefully successfully but not guaranteed). If there  
>> was an extra field populated in all or even a subset of cases, this  
>> task would not be required.
>>
>> So, I think the mnain driver for this has to be from the large  
>> scale data consumers - GBIF, OBIS (with which I am associated),  
>> EOL, ALA etc. - if they would find such a field useful that is the  
>> real test. In my other incarnation as a data supplier, I can  
>> concatenate everything into scientificname as per the present DwC  
>> spec, no problem, it just is a lossy export when it is received as  
>> far as I am concerned.
>
> From GBIFs point of view there is no problem at all with using the  
> full scientific name as it is.
> In fact my preferred solution would be to only have to look into  
> scientificName and nowhere else! Less options are superior.
>
> Also nearly all datasets have a mix of canonical and "qualified"  
> scientific names, so I am sure they will find it hard to populate  
> canonicalName only with canonicals and scientificName only with  
> names with authorship. I bet finally we would still have to check  
> for all options, dealing with canonicals in scientificName,  
> potentially having inconsistencies between canonicalName +  
> authorship and scientificName. It would also be harder to define a  
> single required term. If I supply the canonicalName already, do I  
> still have to populate the scientificName? Even if I only have the  
> canonical? If I have a non parsed full name, how will I be able to  
> fill the canonical? From my point of view its not getting any easier.
>
> Markus
>
>
>
>
>>> -----Original Message-----
>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>> Sent: Wednesday, 24 November 2010 8:53 AM
>>> To: 'Chuck Miller'; Rees, Tony (CMAR, Hobart); dremsen at gbif.org
>>> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>>
>>>
>>>> What is the specific objection to adding canonicalName to DwC
>>>> as an optional element, other than the fact it makes DwC one
>>>> thing larger?
>>>
>>> I don't have an objection to it per se, but I'd like to feel more  
>>> certain
>>> that I understand exactly what it is, and what it is intended to  
>>> achieve,
>>> that is not already achievable with existing terms and/or couldn't  
>>> be more
>>> achievable with an alternative solution. I think there is value in
>>> avoiding
>>> feature-creep with DwC, except when we can solve a real problem  
>>> with the
>>> existing terms. I agree there is a problem there, but I'm still  
>>> struggling
>>> to understand exactly what specific problem that something like
>>> canonicalName will solve.
>>>
>>>> There are databases which do not have their names parsed and
>>>> provide whatever they have recorded as ScientificName.  But,
>>>> there are also databases which do have parsed names and could
>>>> provide this more narrowly defined element, in addition to
>>>> the ScientificName.  Those databases could make use of a
>>>> dwc:canonicalName element in their data exchange or query response.
>>>
>>> Right -- but the point is this: if the data are already parsed,  
>>> where is
>>> the
>>> failure of the existing DwC terms in providing the desired  
>>> service?  We've
>>> already identified one of those: i.e., that "intermediate"  
>>> uninomial ranks
>>> not supported by existing DwC terms don't have a place to put the
>>> canonical
>>> form of the name (other than scientificName, which isn't currently
>>> intended
>>> or required to be canonical). So yes, that's a clear problem in  
>>> need of a
>>> soultion. But is a generic canaonicalName term really going to  
>>> solve that
>>> efficiently/effectively? What other problems might canonicalName  
>>> solve?
>>>
>>>> What we don't have and I think never will have is perfectly
>>>> consistent names data from every database in the world.  One
>>>> reason is a mountain of inconsistently recorded legacy data
>>>> from decades past that stands in the way of perfection.
>>>> Another is variation in convention or tradition for a variety
>>>> of reasons that have been explored in these recent threads.
>>>> So, I think the pragmatic approach is to accept the
>>>> inconsistencies and work around them.
>>>
>>> Agreed!  And my questions are:
>>>
>>> 1) What specific problems with existing DwC do we wish to solve?
>>> 2) How best to solve them?
>>>
>>> I'll list two examples for #1:
>>>
>>> A) Representing the canonical (sans-authorship) form of a  
>>> uninomial name
>>> at
>>> a rank not already represented by existing rank-specific DwC terms
>>> (kingdom,
>>> phylum, class, order, family, genus)
>>> Because the current definition of dwc:scientificName allows  
>>> (optionally)
>>> the
>>> inclusion of authorship information, there is no clean way to  
>>> represent a
>>> uninomial name in a way that expressly excludes authorship --  
>>> except if
>>> the
>>> uninomial name happens to be represented at the rank of kingdom,  
>>> phylum,
>>> class, order, family, or genus.
>>>
>>> B) Content providers who have authorship data in a separate field  
>>> from
>>> taxon
>>> name data, but who have not parsed the bits of a taxon name string
>>> In this case, the provider cannot provide the parsed bits of the  
>>> name, but
>>> can provide a (sort of) canonicalName string separately from an  
>>> authorship
>>> string.  If they concatenate the authorship string with the taxon  
>>> name
>>> string when populating dwc:scientificName, then the consumer has  
>>> no easy
>>> way
>>> of extracting the name bits from the authorship bits (unless the  
>>> provider
>>> also provides dwc:scientificNameAuthorship, wich could be exactly  
>>> removed
>>> from the dwc:scientificName valu, yielding what the provider would  
>>> have
>>> otherwised provided as canonicalName. Or, as David suggested, in  
>>> this case
>>> the Authorship text would not be concatenated with scientificName.
>>>
>>> I would like to know some other problems that could be solved with  
>>> the
>>> addition of a canonicalName term before I start commenting on #2.
>>>
>>> Aloha,
>>> Rich
>>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>



More information about the tdwg-content mailing list