[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

"Markus Döring (GBIF)" mdoering at gbif.org
Wed Nov 24 15:36:10 CET 2010


Not immediately at hand I am afraid.
Ill see what I can do and post them later.

Markus


On Nov 24, 2010, at 14:38, Chuck Miller wrote:

> Markus,
> Very good. I was curious though about the difference in sciname formation between the plant and animal kingdoms. Do you have those counts?
> 
> Thanks,
> Chuck
> 
> 
> 
> On Nov 24, 2010, at 6:56 AM, "Markus Döring (GBIF)" <mdoering at gbif.org> wrote:
> 
>> Chuck,
>> we see all sorts of things you can imagine in scientificName. For occurrence records the vast majority is the canonical form though - with an empty scientificNameAuthorship. I'd think they mostly dont have the authorship information captured in their system.
>> 
>> Some recent statistics I did on the latest 269 million occurrence records for taxonomy can be seen here:
>> http://code.google.com/p/gbif-occurrencestore/wiki/TaxonomicIntegration#Statistics
>> 
>> We have roughly 3.5 million distinct scientific names. 
>> Parsing them into their canonical form leaves only 2.1 million, only few of them being monomials (95.000 names representing 14.3 million occurrence records).
>> 
>> Not surprisingly zoological names often contain the year while botanical ones often contain the authorship.
>> You will find 4 parted names and multiple authorships in the same name for different parts, eg a species authorship and a subspecies one.
>> 
>> Markus
>> 
>> 
>> On Nov 24, 2010, at 0:16, Chuck Miller wrote:
>> 
>>> Dave,
>>> The botanical folks often include the authors with their names. What do
>>> the data records coming into GBIF from herbarium collections look like?
>>> Do they mostly include or omit the authors in scientificName? 
>>> 
>>> Chuck
>>> 
>>> -----Original Message-----
>>> From: Tony.Rees at csiro.au [mailto:Tony.Rees at csiro.au] 
>>> Sent: Tuesday, November 23, 2010 5:09 PM
>>> To: Chuck Miller; deepreef at bishopmuseum.org
>>> Cc: dremsen at gbif.org; tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>> 
>>> Hi all,
>>> 
>>> I just had a quick look at the first few thousand data records coming
>>> into OBIS for my region (Australia). Just about every supplier who
>>> includes authority as dwc:scientificNameAuthor has used
>>> dwc:scientificName "incorrectly" i.e., for the canonical name not the
>>> canonical name + author. This data then flows into GBIF, ALA, etc. and
>>> circulates in this form. So "users" are already ignoring the definition
>>> of dwc:scientificName in practice, it would seem, with no apparent ill
>>> effects (?) - not sure whether this is good or bad, hence the title of
>>> my original question which prompted this thread...
>>> 
>>> - Tony
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
>>>> Sent: Wednesday, 24 November 2010 9:56 AM
>>>> To: Richard Pyle
>>>> Cc: Rees, Tony (CMAR, Hobart); dremsen at gbif.org; tdwg- 
>>>> content at lists.tdwg.org; dmozzherin at eol.org
>>>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>>> DwCscientificName: good or bad?
>>>> 
>>>> Rich,
>>>> I gather your reason would be because it's unclear if anyone would 
>>>> actually use a canonicalName element? That is, it's unneeded. So, 
>>>> following on, who says they need a dwc:canonicalName element?
>>>> 
>>>> You said you worry about feature creep. I suppose I worry about 
>>>> semantic creep. Extending the meaning of a term makes it more 
>>>> universal, but in a data world it increases the variability of the 
>>>> data that may be found attached to the term in some dataset. 
>>>> Imprecision in terms can create a lot of data quality headaches. Is
>>> that acceptable?
>>>> 
>>>> Chuck
>>>> 
>>>> 
>>>> 
>>>> On Nov 23, 2010, at 3:52 PM, "Richard Pyle" 
>>>> <deepreef at bishopmuseum.org>
>>>> wrote:
>>>> 
>>>>> 
>>>>>> What is the specific objection to adding canonicalName to DwC as an
>>> 
>>>>>> optional element, other than the fact it makes DwC one thing 
>>>>>> larger?
>>>>> 
>>>>> I don't have an objection to it per se, but I'd like to feel more
>>>> certain
>>>>> that I understand exactly what it is, and what it is intended to
>>>> achieve,
>>>>> that is not already achievable with existing terms and/or couldn't 
>>>>> be
>>>> more
>>>>> achievable with an alternative solution. I think there is value in
>>>> avoiding
>>>>> feature-creep with DwC, except when we can solve a real problem with
>>> 
>>>>> the existing terms. I agree there is a problem there, but I'm still
>>>> struggling
>>>>> to understand exactly what specific problem that something like 
>>>>> canonicalName will solve.
>>>>> 
>>>>>> There are databases which do not have their names parsed and 
>>>>>> provide whatever they have recorded as ScientificName.  But, there 
>>>>>> are also databases which do have parsed names and could provide 
>>>>>> this more narrowly defined element, in addition to the 
>>>>>> ScientificName.  Those databases could make use of a 
>>>>>> dwc:canonicalName element in their data exchange or query response.
>>>>> 
>>>>> Right -- but the point is this: if the data are already parsed, 
>>>>> where is
>>>> the
>>>>> failure of the existing DwC terms in providing the desired service?
>>>> We've
>>>>> already identified one of those: i.e., that "intermediate" uninomial
>>>> ranks
>>>>> not supported by existing DwC terms don't have a place to put the
>>>> canonical
>>>>> form of the name (other than scientificName, which isn't currently
>>>> intended
>>>>> or required to be canonical). So yes, that's a clear problem in need
>>> 
>>>>> of
>>>> a
>>>>> soultion. But is a generic canaonicalName term really going to solve
>>>> that
>>>>> efficiently/effectively? What other problems might canonicalName
>>> solve?
>>>>> 
>>>>>> What we don't have and I think never will have is perfectly 
>>>>>> consistent names data from every database in the world.  One reason
>>> 
>>>>>> is a mountain of inconsistently recorded legacy data from decades 
>>>>>> past that stands in the way of perfection.
>>>>>> Another is variation in convention or tradition for a variety of 
>>>>>> reasons that have been explored in these recent threads.
>>>>>> So, I think the pragmatic approach is to accept the inconsistencies
>>> 
>>>>>> and work around them.
>>>>> 
>>>>> Agreed!  And my questions are:
>>>>> 
>>>>> 1) What specific problems with existing DwC do we wish to solve?
>>>>> 2) How best to solve them?
>>>>> 
>>>>> I'll list two examples for #1:
>>>>> 
>>>>> A) Representing the canonical (sans-authorship) form of a uninomial 
>>>>> name
>>>> at
>>>>> a rank not already represented by existing rank-specific DwC terms
>>>> (kingdom,
>>>>> phylum, class, order, family, genus) Because the current definition 
>>>>> of dwc:scientificName allows (optionally)
>>>> the
>>>>> inclusion of authorship information, there is no clean way to 
>>>>> represent
>>>> a
>>>>> uninomial name in a way that expressly excludes authorship -- except
>>> 
>>>>> if
>>>> the
>>>>> uninomial name happens to be represented at the rank of kingdom, 
>>>>> phylum, class, order, family, or genus.
>>>>> 
>>>>> B) Content providers who have authorship data in a separate field 
>>>>> from
>>>> taxon
>>>>> name data, but who have not parsed the bits of a taxon name string 
>>>>> In this case, the provider cannot provide the parsed bits of the 
>>>>> name,
>>>> but
>>>>> can provide a (sort of) canonicalName string separately from an
>>>> authorship
>>>>> string.  If they concatenate the authorship string with the taxon 
>>>>> name string when populating dwc:scientificName, then the consumer 
>>>>> has no easy
>>>> way
>>>>> of extracting the name bits from the authorship bits (unless the
>>>> provider
>>>>> also provides dwc:scientificNameAuthorship, wich could be exactly
>>>> removed
>>>>> from the dwc:scientificName valu, yielding what the provider would 
>>>>> have otherwised provided as canonicalName. Or, as David suggested, 
>>>>> in this
>>>> case
>>>>> the Authorship text would not be concatenated with scientificName.
>>>>> 
>>>>> I would like to know some other problems that could be solved with 
>>>>> the addition of a canonicalName term before I start commenting on
>>> #2.
>>>>> 
>>>>> Aloha,
>>>>> Rich
>>>>> 
>>>>> 
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content



More information about the tdwg-content mailing list