[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Richard Pyle deepreef at bishopmuseum.org
Tue Nov 23 22:52:38 CET 2010


> What is the specific objection to adding canonicalName to DwC 
> as an optional element, other than the fact it makes DwC one 
> thing larger?

I don't have an objection to it per se, but I'd like to feel more certain
that I understand exactly what it is, and what it is intended to achieve,
that is not already achievable with existing terms and/or couldn't be more
achievable with an alternative solution. I think there is value in avoiding
feature-creep with DwC, except when we can solve a real problem with the
existing terms. I agree there is a problem there, but I'm still struggling
to understand exactly what specific problem that something like
canonicalName will solve.

> There are databases which do not have their names parsed and 
> provide whatever they have recorded as ScientificName.  But, 
> there are also databases which do have parsed names and could 
> provide this more narrowly defined element, in addition to 
> the ScientificName.  Those databases could make use of a  
> dwc:canonicalName element in their data exchange or query response.

Right -- but the point is this: if the data are already parsed, where is the
failure of the existing DwC terms in providing the desired service?  We've
already identified one of those: i.e., that "intermediate" uninomial ranks
not supported by existing DwC terms don't have a place to put the canonical
form of the name (other than scientificName, which isn't currently intended
or required to be canonical). So yes, that's a clear problem in need of a
soultion. But is a generic canaonicalName term really going to solve that
efficiently/effectively? What other problems might canonicalName solve?

> What we don't have and I think never will have is perfectly 
> consistent names data from every database in the world.  One 
> reason is a mountain of inconsistently recorded legacy data 
> from decades past that stands in the way of perfection.  
> Another is variation in convention or tradition for a variety 
> of reasons that have been explored in these recent threads. 
> So, I think the pragmatic approach is to accept the 
> inconsistencies and work around them.

Agreed!  And my questions are:

1) What specific problems with existing DwC do we wish to solve?
2) How best to solve them?

I'll list two examples for #1:

A) Representing the canonical (sans-authorship) form of a uninomial name at
a rank not already represented by existing rank-specific DwC terms (kingdom,
phylum, class, order, family, genus)
Because the current definition of dwc:scientificName allows (optionally) the
inclusion of authorship information, there is no clean way to represent a
uninomial name in a way that expressly excludes authorship -- except if the
uninomial name happens to be represented at the rank of kingdom, phylum,
class, order, family, or genus.

B) Content providers who have authorship data in a separate field from taxon
name data, but who have not parsed the bits of a taxon name string
In this case, the provider cannot provide the parsed bits of the name, but
can provide a (sort of) canonicalName string separately from an authorship
string.  If they concatenate the authorship string with the taxon name
string when populating dwc:scientificName, then the consumer has no easy way
of extracting the name bits from the authorship bits (unless the provider
also provides dwc:scientificNameAuthorship, wich could be exactly removed
from the dwc:scientificName valu, yielding what the provider would have
otherwised provided as canonicalName. Or, as David suggested, in this case
the Authorship text would not be concatenated with scientificName.

I would like to know some other problems that could be solved with the
addition of a canonicalName term before I start commenting on #2.

Aloha,
Rich




More information about the tdwg-content mailing list