[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?
David Remsen (GBIF)
dremsen at gbif.org
Wed Nov 24 13:29:44 CET 2010
what you said. yeah, that's what I meant.
On Nov 24, 2010, at 1:20 PM, Markus Döring wrote:
>>> B) ...If they concatenate the authorship string with the taxon name
>>> string when populating dwc:scientificName, then the consumer has
>>> no easy
>>> way
>>> of extracting the name bits from the authorship bits
>>
>> Exactly! Wearing my data consumer hat, the first thing I need to do
>> with current dwc:ScientifiName content from multiple sources is try
>> to generate canonical names by stripping off what appear to be
>> authorities (hopefully successfully but not guaranteed). If there
>> was an extra field populated in all or even a subset of cases, this
>> task would not be required.
>>
>> So, I think the mnain driver for this has to be from the large
>> scale data consumers - GBIF, OBIS (with which I am associated),
>> EOL, ALA etc. - if they would find such a field useful that is the
>> real test. In my other incarnation as a data supplier, I can
>> concatenate everything into scientificname as per the present DwC
>> spec, no problem, it just is a lossy export when it is received as
>> far as I am concerned.
>
> From GBIFs point of view there is no problem at all with using the
> full scientific name as it is.
> In fact my preferred solution would be to only have to look into
> scientificName and nowhere else! Less options are superior.
>
> Also nearly all datasets have a mix of canonical and "qualified"
> scientific names, so I am sure they will find it hard to populate
> canonicalName only with canonicals and scientificName only with
> names with authorship. I bet finally we would still have to check
> for all options, dealing with canonicals in scientificName,
> potentially having inconsistencies between canonicalName +
> authorship and scientificName. It would also be harder to define a
> single required term. If I supply the canonicalName already, do I
> still have to populate the scientificName? Even if I only have the
> canonical? If I have a non parsed full name, how will I be able to
> fill the canonical? From my point of view its not getting any easier.
>
> Markus
>
>
>
>
>>> -----Original Message-----
>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>> Sent: Wednesday, 24 November 2010 8:53 AM
>>> To: 'Chuck Miller'; Rees, Tony (CMAR, Hobart); dremsen at gbif.org
>>> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>> DwCscientificName: good or bad?
>>>
>>>
>>>> What is the specific objection to adding canonicalName to DwC
>>>> as an optional element, other than the fact it makes DwC one
>>>> thing larger?
>>>
>>> I don't have an objection to it per se, but I'd like to feel more
>>> certain
>>> that I understand exactly what it is, and what it is intended to
>>> achieve,
>>> that is not already achievable with existing terms and/or couldn't
>>> be more
>>> achievable with an alternative solution. I think there is value in
>>> avoiding
>>> feature-creep with DwC, except when we can solve a real problem
>>> with the
>>> existing terms. I agree there is a problem there, but I'm still
>>> struggling
>>> to understand exactly what specific problem that something like
>>> canonicalName will solve.
>>>
>>>> There are databases which do not have their names parsed and
>>>> provide whatever they have recorded as ScientificName. But,
>>>> there are also databases which do have parsed names and could
>>>> provide this more narrowly defined element, in addition to
>>>> the ScientificName. Those databases could make use of a
>>>> dwc:canonicalName element in their data exchange or query response.
>>>
>>> Right -- but the point is this: if the data are already parsed,
>>> where is
>>> the
>>> failure of the existing DwC terms in providing the desired
>>> service? We've
>>> already identified one of those: i.e., that "intermediate"
>>> uninomial ranks
>>> not supported by existing DwC terms don't have a place to put the
>>> canonical
>>> form of the name (other than scientificName, which isn't currently
>>> intended
>>> or required to be canonical). So yes, that's a clear problem in
>>> need of a
>>> soultion. But is a generic canaonicalName term really going to
>>> solve that
>>> efficiently/effectively? What other problems might canonicalName
>>> solve?
>>>
>>>> What we don't have and I think never will have is perfectly
>>>> consistent names data from every database in the world. One
>>>> reason is a mountain of inconsistently recorded legacy data
>>>> from decades past that stands in the way of perfection.
>>>> Another is variation in convention or tradition for a variety
>>>> of reasons that have been explored in these recent threads.
>>>> So, I think the pragmatic approach is to accept the
>>>> inconsistencies and work around them.
>>>
>>> Agreed! And my questions are:
>>>
>>> 1) What specific problems with existing DwC do we wish to solve?
>>> 2) How best to solve them?
>>>
>>> I'll list two examples for #1:
>>>
>>> A) Representing the canonical (sans-authorship) form of a
>>> uninomial name
>>> at
>>> a rank not already represented by existing rank-specific DwC terms
>>> (kingdom,
>>> phylum, class, order, family, genus)
>>> Because the current definition of dwc:scientificName allows
>>> (optionally)
>>> the
>>> inclusion of authorship information, there is no clean way to
>>> represent a
>>> uninomial name in a way that expressly excludes authorship --
>>> except if
>>> the
>>> uninomial name happens to be represented at the rank of kingdom,
>>> phylum,
>>> class, order, family, or genus.
>>>
>>> B) Content providers who have authorship data in a separate field
>>> from
>>> taxon
>>> name data, but who have not parsed the bits of a taxon name string
>>> In this case, the provider cannot provide the parsed bits of the
>>> name, but
>>> can provide a (sort of) canonicalName string separately from an
>>> authorship
>>> string. If they concatenate the authorship string with the taxon
>>> name
>>> string when populating dwc:scientificName, then the consumer has
>>> no easy
>>> way
>>> of extracting the name bits from the authorship bits (unless the
>>> provider
>>> also provides dwc:scientificNameAuthorship, wich could be exactly
>>> removed
>>> from the dwc:scientificName valu, yielding what the provider would
>>> have
>>> otherwised provided as canonicalName. Or, as David suggested, in
>>> this case
>>> the Authorship text would not be concatenated with scientificName.
>>>
>>> I would like to know some other problems that could be solved with
>>> the
>>> addition of a canonicalName term before I start commenting on #2.
>>>
>>> Aloha,
>>> Rich
>>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
More information about the tdwg-content
mailing list