[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?
Chuck Miller
Chuck.Miller at mobot.org
Wed Nov 24 14:38:48 CET 2010
Markus,
Very good. I was curious though about the difference in sciname formation between the plant and animal kingdoms. Do you have those counts?
Thanks,
Chuck
On Nov 24, 2010, at 6:56 AM, "Markus Döring (GBIF)" <mdoering at gbif.org> wrote:
> Chuck,
> we see all sorts of things you can imagine in scientificName. For occurrence records the vast majority is the canonical form though - with an empty scientificNameAuthorship. I'd think they mostly dont have the authorship information captured in their system.
>
> Some recent statistics I did on the latest 269 million occurrence records for taxonomy can be seen here:
> http://code.google.com/p/gbif-occurrencestore/wiki/TaxonomicIntegration#Statistics
>
> We have roughly 3.5 million distinct scientific names.
> Parsing them into their canonical form leaves only 2.1 million, only few of them being monomials (95.000 names representing 14.3 million occurrence records).
>
> Not surprisingly zoological names often contain the year while botanical ones often contain the authorship.
> You will find 4 parted names and multiple authorships in the same name for different parts, eg a species authorship and a subspecies one.
>
> Markus
>
>
> On Nov 24, 2010, at 0:16, Chuck Miller wrote:
>
>> Dave,
>> The botanical folks often include the authors with their names. What do
>> the data records coming into GBIF from herbarium collections look like?
>> Do they mostly include or omit the authors in scientificName?
>>
>> Chuck
>>
>> -----Original Message-----
>> From: Tony.Rees at csiro.au [mailto:Tony.Rees at csiro.au]
>> Sent: Tuesday, November 23, 2010 5:09 PM
>> To: Chuck Miller; deepreef at bishopmuseum.org
>> Cc: dremsen at gbif.org; tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>> DwCscientificName: good or bad?
>>
>> Hi all,
>>
>> I just had a quick look at the first few thousand data records coming
>> into OBIS for my region (Australia). Just about every supplier who
>> includes authority as dwc:scientificNameAuthor has used
>> dwc:scientificName "incorrectly" i.e., for the canonical name not the
>> canonical name + author. This data then flows into GBIF, ALA, etc. and
>> circulates in this form. So "users" are already ignoring the definition
>> of dwc:scientificName in practice, it would seem, with no apparent ill
>> effects (?) - not sure whether this is good or bad, hence the title of
>> my original question which prompted this thread...
>>
>> - Tony
>>
>>
>>> -----Original Message-----
>>> From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
>>> Sent: Wednesday, 24 November 2010 9:56 AM
>>> To: Richard Pyle
>>> Cc: Rees, Tony (CMAR, Hobart); dremsen at gbif.org; tdwg-
>>> content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>>
>>> Rich,
>>> I gather your reason would be because it's unclear if anyone would
>>> actually use a canonicalName element? That is, it's unneeded. So,
>>> following on, who says they need a dwc:canonicalName element?
>>>
>>> You said you worry about feature creep. I suppose I worry about
>>> semantic creep. Extending the meaning of a term makes it more
>>> universal, but in a data world it increases the variability of the
>>> data that may be found attached to the term in some dataset.
>>> Imprecision in terms can create a lot of data quality headaches. Is
>> that acceptable?
>>>
>>> Chuck
>>>
>>>
>>>
>>> On Nov 23, 2010, at 3:52 PM, "Richard Pyle"
>>> <deepreef at bishopmuseum.org>
>>> wrote:
>>>
>>>>
>>>>> What is the specific objection to adding canonicalName to DwC as an
>>
>>>>> optional element, other than the fact it makes DwC one thing
>>>>> larger?
>>>>
>>>> I don't have an objection to it per se, but I'd like to feel more
>>> certain
>>>> that I understand exactly what it is, and what it is intended to
>>> achieve,
>>>> that is not already achievable with existing terms and/or couldn't
>>>> be
>>> more
>>>> achievable with an alternative solution. I think there is value in
>>> avoiding
>>>> feature-creep with DwC, except when we can solve a real problem with
>>
>>>> the existing terms. I agree there is a problem there, but I'm still
>>> struggling
>>>> to understand exactly what specific problem that something like
>>>> canonicalName will solve.
>>>>
>>>>> There are databases which do not have their names parsed and
>>>>> provide whatever they have recorded as ScientificName. But, there
>>>>> are also databases which do have parsed names and could provide
>>>>> this more narrowly defined element, in addition to the
>>>>> ScientificName. Those databases could make use of a
>>>>> dwc:canonicalName element in their data exchange or query response.
>>>>
>>>> Right -- but the point is this: if the data are already parsed,
>>>> where is
>>> the
>>>> failure of the existing DwC terms in providing the desired service?
>>> We've
>>>> already identified one of those: i.e., that "intermediate" uninomial
>>> ranks
>>>> not supported by existing DwC terms don't have a place to put the
>>> canonical
>>>> form of the name (other than scientificName, which isn't currently
>>> intended
>>>> or required to be canonical). So yes, that's a clear problem in need
>>
>>>> of
>>> a
>>>> soultion. But is a generic canaonicalName term really going to solve
>>> that
>>>> efficiently/effectively? What other problems might canonicalName
>> solve?
>>>>
>>>>> What we don't have and I think never will have is perfectly
>>>>> consistent names data from every database in the world. One reason
>>
>>>>> is a mountain of inconsistently recorded legacy data from decades
>>>>> past that stands in the way of perfection.
>>>>> Another is variation in convention or tradition for a variety of
>>>>> reasons that have been explored in these recent threads.
>>>>> So, I think the pragmatic approach is to accept the inconsistencies
>>
>>>>> and work around them.
>>>>
>>>> Agreed! And my questions are:
>>>>
>>>> 1) What specific problems with existing DwC do we wish to solve?
>>>> 2) How best to solve them?
>>>>
>>>> I'll list two examples for #1:
>>>>
>>>> A) Representing the canonical (sans-authorship) form of a uninomial
>>>> name
>>> at
>>>> a rank not already represented by existing rank-specific DwC terms
>>> (kingdom,
>>>> phylum, class, order, family, genus) Because the current definition
>>>> of dwc:scientificName allows (optionally)
>>> the
>>>> inclusion of authorship information, there is no clean way to
>>>> represent
>>> a
>>>> uninomial name in a way that expressly excludes authorship -- except
>>
>>>> if
>>> the
>>>> uninomial name happens to be represented at the rank of kingdom,
>>>> phylum, class, order, family, or genus.
>>>>
>>>> B) Content providers who have authorship data in a separate field
>>>> from
>>> taxon
>>>> name data, but who have not parsed the bits of a taxon name string
>>>> In this case, the provider cannot provide the parsed bits of the
>>>> name,
>>> but
>>>> can provide a (sort of) canonicalName string separately from an
>>> authorship
>>>> string. If they concatenate the authorship string with the taxon
>>>> name string when populating dwc:scientificName, then the consumer
>>>> has no easy
>>> way
>>>> of extracting the name bits from the authorship bits (unless the
>>> provider
>>>> also provides dwc:scientificNameAuthorship, wich could be exactly
>>> removed
>>>> from the dwc:scientificName valu, yielding what the provider would
>>>> have otherwised provided as canonicalName. Or, as David suggested,
>>>> in this
>>> case
>>>> the Authorship text would not be concatenated with scientificName.
>>>>
>>>> I would like to know some other problems that could be solved with
>>>> the addition of a canonicalName term before I start commenting on
>> #2.
>>>>
>>>> Aloha,
>>>> Rich
>>>>
>>>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
More information about the tdwg-content
mailing list