[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Chuck Miller Chuck.Miller at mobot.org
Wed Nov 24 14:38:48 CET 2010


Markus,
Very good. I was curious though about the difference in sciname formation between the plant and animal kingdoms. Do you have those counts?

Thanks,
Chuck



On Nov 24, 2010, at 6:56 AM, "Markus Döring (GBIF)" <mdoering at gbif.org> wrote:

> Chuck,
> we see all sorts of things you can imagine in scientificName. For occurrence records the vast majority is the canonical form though - with an empty scientificNameAuthorship. I'd think they mostly dont have the authorship information captured in their system.
> 
> Some recent statistics I did on the latest 269 million occurrence records for taxonomy can be seen here:
> http://code.google.com/p/gbif-occurrencestore/wiki/TaxonomicIntegration#Statistics
> 
> We have roughly 3.5 million distinct scientific names. 
> Parsing them into their canonical form leaves only 2.1 million, only few of them being monomials (95.000 names representing 14.3 million occurrence records).
> 
> Not surprisingly zoological names often contain the year while botanical ones often contain the authorship.
> You will find 4 parted names and multiple authorships in the same name for different parts, eg a species authorship and a subspecies one.
> 
> Markus
> 
> 
> On Nov 24, 2010, at 0:16, Chuck Miller wrote:
> 
>> Dave,
>> The botanical folks often include the authors with their names. What do
>> the data records coming into GBIF from herbarium collections look like?
>> Do they mostly include or omit the authors in scientificName? 
>> 
>> Chuck
>> 
>> -----Original Message-----
>> From: Tony.Rees at csiro.au [mailto:Tony.Rees at csiro.au] 
>> Sent: Tuesday, November 23, 2010 5:09 PM
>> To: Chuck Miller; deepreef at bishopmuseum.org
>> Cc: dremsen at gbif.org; tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>> DwCscientificName: good or bad?
>> 
>> Hi all,
>> 
>> I just had a quick look at the first few thousand data records coming
>> into OBIS for my region (Australia). Just about every supplier who
>> includes authority as dwc:scientificNameAuthor has used
>> dwc:scientificName "incorrectly" i.e., for the canonical name not the
>> canonical name + author. This data then flows into GBIF, ALA, etc. and
>> circulates in this form. So "users" are already ignoring the definition
>> of dwc:scientificName in practice, it would seem, with no apparent ill
>> effects (?) - not sure whether this is good or bad, hence the title of
>> my original question which prompted this thread...
>> 
>> - Tony
>> 
>> 
>>> -----Original Message-----
>>> From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
>>> Sent: Wednesday, 24 November 2010 9:56 AM
>>> To: Richard Pyle
>>> Cc: Rees, Tony (CMAR, Hobart); dremsen at gbif.org; tdwg- 
>>> content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>> 
>>> Rich,
>>> I gather your reason would be because it's unclear if anyone would 
>>> actually use a canonicalName element? That is, it's unneeded. So, 
>>> following on, who says they need a dwc:canonicalName element?
>>> 
>>> You said you worry about feature creep. I suppose I worry about 
>>> semantic creep. Extending the meaning of a term makes it more 
>>> universal, but in a data world it increases the variability of the 
>>> data that may be found attached to the term in some dataset. 
>>> Imprecision in terms can create a lot of data quality headaches. Is
>> that acceptable?
>>> 
>>> Chuck
>>> 
>>> 
>>> 
>>> On Nov 23, 2010, at 3:52 PM, "Richard Pyle" 
>>> <deepreef at bishopmuseum.org>
>>> wrote:
>>> 
>>>> 
>>>>> What is the specific objection to adding canonicalName to DwC as an
>> 
>>>>> optional element, other than the fact it makes DwC one thing 
>>>>> larger?
>>>> 
>>>> I don't have an objection to it per se, but I'd like to feel more
>>> certain
>>>> that I understand exactly what it is, and what it is intended to
>>> achieve,
>>>> that is not already achievable with existing terms and/or couldn't 
>>>> be
>>> more
>>>> achievable with an alternative solution. I think there is value in
>>> avoiding
>>>> feature-creep with DwC, except when we can solve a real problem with
>> 
>>>> the existing terms. I agree there is a problem there, but I'm still
>>> struggling
>>>> to understand exactly what specific problem that something like 
>>>> canonicalName will solve.
>>>> 
>>>>> There are databases which do not have their names parsed and 
>>>>> provide whatever they have recorded as ScientificName.  But, there 
>>>>> are also databases which do have parsed names and could provide 
>>>>> this more narrowly defined element, in addition to the 
>>>>> ScientificName.  Those databases could make use of a 
>>>>> dwc:canonicalName element in their data exchange or query response.
>>>> 
>>>> Right -- but the point is this: if the data are already parsed, 
>>>> where is
>>> the
>>>> failure of the existing DwC terms in providing the desired service?
>>> We've
>>>> already identified one of those: i.e., that "intermediate" uninomial
>>> ranks
>>>> not supported by existing DwC terms don't have a place to put the
>>> canonical
>>>> form of the name (other than scientificName, which isn't currently
>>> intended
>>>> or required to be canonical). So yes, that's a clear problem in need
>> 
>>>> of
>>> a
>>>> soultion. But is a generic canaonicalName term really going to solve
>>> that
>>>> efficiently/effectively? What other problems might canonicalName
>> solve?
>>>> 
>>>>> What we don't have and I think never will have is perfectly 
>>>>> consistent names data from every database in the world.  One reason
>> 
>>>>> is a mountain of inconsistently recorded legacy data from decades 
>>>>> past that stands in the way of perfection.
>>>>> Another is variation in convention or tradition for a variety of 
>>>>> reasons that have been explored in these recent threads.
>>>>> So, I think the pragmatic approach is to accept the inconsistencies
>> 
>>>>> and work around them.
>>>> 
>>>> Agreed!  And my questions are:
>>>> 
>>>> 1) What specific problems with existing DwC do we wish to solve?
>>>> 2) How best to solve them?
>>>> 
>>>> I'll list two examples for #1:
>>>> 
>>>> A) Representing the canonical (sans-authorship) form of a uninomial 
>>>> name
>>> at
>>>> a rank not already represented by existing rank-specific DwC terms
>>> (kingdom,
>>>> phylum, class, order, family, genus) Because the current definition 
>>>> of dwc:scientificName allows (optionally)
>>> the
>>>> inclusion of authorship information, there is no clean way to 
>>>> represent
>>> a
>>>> uninomial name in a way that expressly excludes authorship -- except
>> 
>>>> if
>>> the
>>>> uninomial name happens to be represented at the rank of kingdom, 
>>>> phylum, class, order, family, or genus.
>>>> 
>>>> B) Content providers who have authorship data in a separate field 
>>>> from
>>> taxon
>>>> name data, but who have not parsed the bits of a taxon name string 
>>>> In this case, the provider cannot provide the parsed bits of the 
>>>> name,
>>> but
>>>> can provide a (sort of) canonicalName string separately from an
>>> authorship
>>>> string.  If they concatenate the authorship string with the taxon 
>>>> name string when populating dwc:scientificName, then the consumer 
>>>> has no easy
>>> way
>>>> of extracting the name bits from the authorship bits (unless the
>>> provider
>>>> also provides dwc:scientificNameAuthorship, wich could be exactly
>>> removed
>>>> from the dwc:scientificName valu, yielding what the provider would 
>>>> have otherwised provided as canonicalName. Or, as David suggested, 
>>>> in this
>>> case
>>>> the Authorship text would not be concatenated with scientificName.
>>>> 
>>>> I would like to know some other problems that could be solved with 
>>>> the addition of a canonicalName term before I start commenting on
>> #2.
>>>> 
>>>> Aloha,
>>>> Rich
>>>> 
>>>> 
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> 


More information about the tdwg-content mailing list