[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Chuck Miller Chuck.Miller at mobot.org
Tue Nov 23 23:56:15 CET 2010


Rich,
I gather your reason would be because it's unclear if anyone would actually use a canonicalName element? That is, it's unneeded. So, following on, who says they need a dwc:canonicalName element?

You said you worry about feature creep. I suppose I worry about semantic creep. Extending the meaning of a term makes it more universal, but in a data world it increases the variability of the data that may be found attached to the term in some dataset. Imprecision in terms can create a lot of data quality headaches. Is that acceptable?

Chuck



On Nov 23, 2010, at 3:52 PM, "Richard Pyle" <deepreef at bishopmuseum.org> wrote:

> 
>> What is the specific objection to adding canonicalName to DwC 
>> as an optional element, other than the fact it makes DwC one 
>> thing larger?
> 
> I don't have an objection to it per se, but I'd like to feel more certain
> that I understand exactly what it is, and what it is intended to achieve,
> that is not already achievable with existing terms and/or couldn't be more
> achievable with an alternative solution. I think there is value in avoiding
> feature-creep with DwC, except when we can solve a real problem with the
> existing terms. I agree there is a problem there, but I'm still struggling
> to understand exactly what specific problem that something like
> canonicalName will solve.
> 
>> There are databases which do not have their names parsed and 
>> provide whatever they have recorded as ScientificName.  But, 
>> there are also databases which do have parsed names and could 
>> provide this more narrowly defined element, in addition to 
>> the ScientificName.  Those databases could make use of a  
>> dwc:canonicalName element in their data exchange or query response.
> 
> Right -- but the point is this: if the data are already parsed, where is the
> failure of the existing DwC terms in providing the desired service?  We've
> already identified one of those: i.e., that "intermediate" uninomial ranks
> not supported by existing DwC terms don't have a place to put the canonical
> form of the name (other than scientificName, which isn't currently intended
> or required to be canonical). So yes, that's a clear problem in need of a
> soultion. But is a generic canaonicalName term really going to solve that
> efficiently/effectively? What other problems might canonicalName solve?
> 
>> What we don't have and I think never will have is perfectly 
>> consistent names data from every database in the world.  One 
>> reason is a mountain of inconsistently recorded legacy data 
>> from decades past that stands in the way of perfection.  
>> Another is variation in convention or tradition for a variety 
>> of reasons that have been explored in these recent threads. 
>> So, I think the pragmatic approach is to accept the 
>> inconsistencies and work around them.
> 
> Agreed!  And my questions are:
> 
> 1) What specific problems with existing DwC do we wish to solve?
> 2) How best to solve them?
> 
> I'll list two examples for #1:
> 
> A) Representing the canonical (sans-authorship) form of a uninomial name at
> a rank not already represented by existing rank-specific DwC terms (kingdom,
> phylum, class, order, family, genus)
> Because the current definition of dwc:scientificName allows (optionally) the
> inclusion of authorship information, there is no clean way to represent a
> uninomial name in a way that expressly excludes authorship -- except if the
> uninomial name happens to be represented at the rank of kingdom, phylum,
> class, order, family, or genus.
> 
> B) Content providers who have authorship data in a separate field from taxon
> name data, but who have not parsed the bits of a taxon name string
> In this case, the provider cannot provide the parsed bits of the name, but
> can provide a (sort of) canonicalName string separately from an authorship
> string.  If they concatenate the authorship string with the taxon name
> string when populating dwc:scientificName, then the consumer has no easy way
> of extracting the name bits from the authorship bits (unless the provider
> also provides dwc:scientificNameAuthorship, wich could be exactly removed
> from the dwc:scientificName valu, yielding what the provider would have
> otherwised provided as canonicalName. Or, as David suggested, in this case
> the Authorship text would not be concatenated with scientificName.
> 
> I would like to know some other problems that could be solved with the
> addition of a canonicalName term before I start commenting on #2.
> 
> Aloha,
> Rich
> 
> 


More information about the tdwg-content mailing list