[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad?

Tony.Rees at csiro.au Tony.Rees at csiro.au
Wed Nov 24 00:09:03 CET 2010


Hi all,

I just had a quick look at the first few thousand data records coming into OBIS for my region (Australia). Just about every supplier who includes authority as dwc:scientificNameAuthor has used dwc:scientificName "incorrectly" i.e., for the canonical name not the canonical name + author. This data then flows into GBIF, ALA, etc. and circulates in this form. So "users" are already ignoring the definition of dwc:scientificName in practice, it would seem, with no apparent ill effects (?) - not sure whether this is good or bad, hence the title of my original question which prompted this thread...

- Tony
 

> -----Original Message-----
> From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
> Sent: Wednesday, 24 November 2010 9:56 AM
> To: Richard Pyle
> Cc: Rees, Tony (CMAR, Hobart); dremsen at gbif.org; tdwg-
> content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
> DwCscientificName: good or bad?
> 
> Rich,
> I gather your reason would be because it's unclear if anyone would
> actually use a canonicalName element? That is, it's unneeded. So,
> following on, who says they need a dwc:canonicalName element?
> 
> You said you worry about feature creep. I suppose I worry about semantic
> creep. Extending the meaning of a term makes it more universal, but in a
> data world it increases the variability of the data that may be found
> attached to the term in some dataset. Imprecision in terms can create a
> lot of data quality headaches. Is that acceptable?
> 
> Chuck
> 
> 
> 
> On Nov 23, 2010, at 3:52 PM, "Richard Pyle" <deepreef at bishopmuseum.org>
> wrote:
> 
> >
> >> What is the specific objection to adding canonicalName to DwC
> >> as an optional element, other than the fact it makes DwC one
> >> thing larger?
> >
> > I don't have an objection to it per se, but I'd like to feel more
> certain
> > that I understand exactly what it is, and what it is intended to
> achieve,
> > that is not already achievable with existing terms and/or couldn't be
> more
> > achievable with an alternative solution. I think there is value in
> avoiding
> > feature-creep with DwC, except when we can solve a real problem with the
> > existing terms. I agree there is a problem there, but I'm still
> struggling
> > to understand exactly what specific problem that something like
> > canonicalName will solve.
> >
> >> There are databases which do not have their names parsed and
> >> provide whatever they have recorded as ScientificName.  But,
> >> there are also databases which do have parsed names and could
> >> provide this more narrowly defined element, in addition to
> >> the ScientificName.  Those databases could make use of a
> >> dwc:canonicalName element in their data exchange or query response.
> >
> > Right -- but the point is this: if the data are already parsed, where is
> the
> > failure of the existing DwC terms in providing the desired service?
> We've
> > already identified one of those: i.e., that "intermediate" uninomial
> ranks
> > not supported by existing DwC terms don't have a place to put the
> canonical
> > form of the name (other than scientificName, which isn't currently
> intended
> > or required to be canonical). So yes, that's a clear problem in need of
> a
> > soultion. But is a generic canaonicalName term really going to solve
> that
> > efficiently/effectively? What other problems might canonicalName solve?
> >
> >> What we don't have and I think never will have is perfectly
> >> consistent names data from every database in the world.  One
> >> reason is a mountain of inconsistently recorded legacy data
> >> from decades past that stands in the way of perfection.
> >> Another is variation in convention or tradition for a variety
> >> of reasons that have been explored in these recent threads.
> >> So, I think the pragmatic approach is to accept the
> >> inconsistencies and work around them.
> >
> > Agreed!  And my questions are:
> >
> > 1) What specific problems with existing DwC do we wish to solve?
> > 2) How best to solve them?
> >
> > I'll list two examples for #1:
> >
> > A) Representing the canonical (sans-authorship) form of a uninomial name
> at
> > a rank not already represented by existing rank-specific DwC terms
> (kingdom,
> > phylum, class, order, family, genus)
> > Because the current definition of dwc:scientificName allows (optionally)
> the
> > inclusion of authorship information, there is no clean way to represent
> a
> > uninomial name in a way that expressly excludes authorship -- except if
> the
> > uninomial name happens to be represented at the rank of kingdom, phylum,
> > class, order, family, or genus.
> >
> > B) Content providers who have authorship data in a separate field from
> taxon
> > name data, but who have not parsed the bits of a taxon name string
> > In this case, the provider cannot provide the parsed bits of the name,
> but
> > can provide a (sort of) canonicalName string separately from an
> authorship
> > string.  If they concatenate the authorship string with the taxon name
> > string when populating dwc:scientificName, then the consumer has no easy
> way
> > of extracting the name bits from the authorship bits (unless the
> provider
> > also provides dwc:scientificNameAuthorship, wich could be exactly
> removed
> > from the dwc:scientificName valu, yielding what the provider would have
> > otherwised provided as canonicalName. Or, as David suggested, in this
> case
> > the Authorship text would not be concatenated with scientificName.
> >
> > I would like to know some other problems that could be solved with the
> > addition of a canonicalName term before I start commenting on #2.
> >
> > Aloha,
> > Rich
> >
> >


More information about the tdwg-content mailing list