[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?
"Markus Döring (GBIF)"
mdoering at gbif.org
Wed Nov 24 15:36:10 CET 2010
Not immediately at hand I am afraid.
Ill see what I can do and post them later.
Markus
On Nov 24, 2010, at 14:38, Chuck Miller wrote:
> Markus,
> Very good. I was curious though about the difference in sciname formation between the plant and animal kingdoms. Do you have those counts?
>
> Thanks,
> Chuck
>
>
>
> On Nov 24, 2010, at 6:56 AM, "Markus Döring (GBIF)" <mdoering at gbif.org> wrote:
>
>> Chuck,
>> we see all sorts of things you can imagine in scientificName. For occurrence records the vast majority is the canonical form though - with an empty scientificNameAuthorship. I'd think they mostly dont have the authorship information captured in their system.
>>
>> Some recent statistics I did on the latest 269 million occurrence records for taxonomy can be seen here:
>> http://code.google.com/p/gbif-occurrencestore/wiki/TaxonomicIntegration#Statistics
>>
>> We have roughly 3.5 million distinct scientific names.
>> Parsing them into their canonical form leaves only 2.1 million, only few of them being monomials (95.000 names representing 14.3 million occurrence records).
>>
>> Not surprisingly zoological names often contain the year while botanical ones often contain the authorship.
>> You will find 4 parted names and multiple authorships in the same name for different parts, eg a species authorship and a subspecies one.
>>
>> Markus
>>
>>
>> On Nov 24, 2010, at 0:16, Chuck Miller wrote:
>>
>>> Dave,
>>> The botanical folks often include the authors with their names. What do
>>> the data records coming into GBIF from herbarium collections look like?
>>> Do they mostly include or omit the authors in scientificName?
>>>
>>> Chuck
>>>
>>> -----Original Message-----
>>> From: Tony.Rees at csiro.au [mailto:Tony.Rees at csiro.au]
>>> Sent: Tuesday, November 23, 2010 5:09 PM
>>> To: Chuck Miller; deepreef at bishopmuseum.org
>>> Cc: dremsen at gbif.org; tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>>
>>> Hi all,
>>>
>>> I just had a quick look at the first few thousand data records coming
>>> into OBIS for my region (Australia). Just about every supplier who
>>> includes authority as dwc:scientificNameAuthor has used
>>> dwc:scientificName "incorrectly" i.e., for the canonical name not the
>>> canonical name + author. This data then flows into GBIF, ALA, etc. and
>>> circulates in this form. So "users" are already ignoring the definition
>>> of dwc:scientificName in practice, it would seem, with no apparent ill
>>> effects (?) - not sure whether this is good or bad, hence the title of
>>> my original question which prompted this thread...
>>>
>>> - Tony
>>>
>>>
>>>> -----Original Message-----
>>>> From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
>>>> Sent: Wednesday, 24 November 2010 9:56 AM
>>>> To: Richard Pyle
>>>> Cc: Rees, Tony (CMAR, Hobart); dremsen at gbif.org; tdwg-
>>>> content at lists.tdwg.org; dmozzherin at eol.org
>>>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>>> DwCscientificName: good or bad?
>>>>
>>>> Rich,
>>>> I gather your reason would be because it's unclear if anyone would
>>>> actually use a canonicalName element? That is, it's unneeded. So,
>>>> following on, who says they need a dwc:canonicalName element?
>>>>
>>>> You said you worry about feature creep. I suppose I worry about
>>>> semantic creep. Extending the meaning of a term makes it more
>>>> universal, but in a data world it increases the variability of the
>>>> data that may be found attached to the term in some dataset.
>>>> Imprecision in terms can create a lot of data quality headaches. Is
>>> that acceptable?
>>>>
>>>> Chuck
>>>>
>>>>
>>>>
>>>> On Nov 23, 2010, at 3:52 PM, "Richard Pyle"
>>>> <deepreef at bishopmuseum.org>
>>>> wrote:
>>>>
>>>>>
>>>>>> What is the specific objection to adding canonicalName to DwC as an
>>>
>>>>>> optional element, other than the fact it makes DwC one thing
>>>>>> larger?
>>>>>
>>>>> I don't have an objection to it per se, but I'd like to feel more
>>>> certain
>>>>> that I understand exactly what it is, and what it is intended to
>>>> achieve,
>>>>> that is not already achievable with existing terms and/or couldn't
>>>>> be
>>>> more
>>>>> achievable with an alternative solution. I think there is value in
>>>> avoiding
>>>>> feature-creep with DwC, except when we can solve a real problem with
>>>
>>>>> the existing terms. I agree there is a problem there, but I'm still
>>>> struggling
>>>>> to understand exactly what specific problem that something like
>>>>> canonicalName will solve.
>>>>>
>>>>>> There are databases which do not have their names parsed and
>>>>>> provide whatever they have recorded as ScientificName. But, there
>>>>>> are also databases which do have parsed names and could provide
>>>>>> this more narrowly defined element, in addition to the
>>>>>> ScientificName. Those databases could make use of a
>>>>>> dwc:canonicalName element in their data exchange or query response.
>>>>>
>>>>> Right -- but the point is this: if the data are already parsed,
>>>>> where is
>>>> the
>>>>> failure of the existing DwC terms in providing the desired service?
>>>> We've
>>>>> already identified one of those: i.e., that "intermediate" uninomial
>>>> ranks
>>>>> not supported by existing DwC terms don't have a place to put the
>>>> canonical
>>>>> form of the name (other than scientificName, which isn't currently
>>>> intended
>>>>> or required to be canonical). So yes, that's a clear problem in need
>>>
>>>>> of
>>>> a
>>>>> soultion. But is a generic canaonicalName term really going to solve
>>>> that
>>>>> efficiently/effectively? What other problems might canonicalName
>>> solve?
>>>>>
>>>>>> What we don't have and I think never will have is perfectly
>>>>>> consistent names data from every database in the world. One reason
>>>
>>>>>> is a mountain of inconsistently recorded legacy data from decades
>>>>>> past that stands in the way of perfection.
>>>>>> Another is variation in convention or tradition for a variety of
>>>>>> reasons that have been explored in these recent threads.
>>>>>> So, I think the pragmatic approach is to accept the inconsistencies
>>>
>>>>>> and work around them.
>>>>>
>>>>> Agreed! And my questions are:
>>>>>
>>>>> 1) What specific problems with existing DwC do we wish to solve?
>>>>> 2) How best to solve them?
>>>>>
>>>>> I'll list two examples for #1:
>>>>>
>>>>> A) Representing the canonical (sans-authorship) form of a uninomial
>>>>> name
>>>> at
>>>>> a rank not already represented by existing rank-specific DwC terms
>>>> (kingdom,
>>>>> phylum, class, order, family, genus) Because the current definition
>>>>> of dwc:scientificName allows (optionally)
>>>> the
>>>>> inclusion of authorship information, there is no clean way to
>>>>> represent
>>>> a
>>>>> uninomial name in a way that expressly excludes authorship -- except
>>>
>>>>> if
>>>> the
>>>>> uninomial name happens to be represented at the rank of kingdom,
>>>>> phylum, class, order, family, or genus.
>>>>>
>>>>> B) Content providers who have authorship data in a separate field
>>>>> from
>>>> taxon
>>>>> name data, but who have not parsed the bits of a taxon name string
>>>>> In this case, the provider cannot provide the parsed bits of the
>>>>> name,
>>>> but
>>>>> can provide a (sort of) canonicalName string separately from an
>>>> authorship
>>>>> string. If they concatenate the authorship string with the taxon
>>>>> name string when populating dwc:scientificName, then the consumer
>>>>> has no easy
>>>> way
>>>>> of extracting the name bits from the authorship bits (unless the
>>>> provider
>>>>> also provides dwc:scientificNameAuthorship, wich could be exactly
>>>> removed
>>>>> from the dwc:scientificName valu, yielding what the provider would
>>>>> have otherwised provided as canonicalName. Or, as David suggested,
>>>>> in this
>>>> case
>>>>> the Authorship text would not be concatenated with scientificName.
>>>>>
>>>>> I would like to know some other problems that could be solved with
>>>>> the addition of a canonicalName term before I start commenting on
>>> #2.
>>>>>
>>>>> Aloha,
>>>>> Rich
>>>>>
>>>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
More information about the tdwg-content
mailing list