[tdwg-content] [tdwg-tag] Inclusion of authorship in DwCscientificName: good or bad? [SEC=UNCLASSIFIED]

Thu Nov 25 09:56:40 CET 2010

Wait – now I’m confused.  How else would one represent the genus in the
WoRMS example?  Surely no one would put “Pomatomus” as the genus for a
record representing the binomial "Gasterosteus saltatrix".  Would they???

I agree with Dave -- the definition of the term "genus" is not right.  It
should follow the template for the definition of specificEpithet:

"The genus part of the scientificName."

Also, I don't like the inclusion of the genus within the definition of
subgenus term.  I would change it to:

"The subgenus part of the scientificName."

I think the problem is that these definitions do not account for the idea
that records representing synonyms would be passed around using these terms.
They were originally created as taxonomic attributes of occurrence records,
rather than as primary taxon name records. 

Obviously, terms like taxonomicStatus and acceptedNameUsage[ID] acknowledge
that synonyms would be passed around, but I don't think the definitions of
the rank-specific terms were updated accordingly.

It gets a bit fuzzy for names above the rank of genus.  For example, if
someone passes a family name represented as a synonym of a different family
name; what is put for the "family" term?  The synonym family name, or the
valid family name?  At first blush, I would think the synonym (literal)
family name.  However, that would require a change to the definition of the
dwc:family term (and all the other higher-rank terms).

Rich

From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of David Remsen
(GBIF)
Sent: Wednesday, November 24, 2010 10:39 PM
To: tdwg-content at lists.tdwg.org List
Cc: Paul Murray
Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in
DwCscientificName: good or bad? [SEC=UNCLASSIFIED]

dwc:Genus

I think the definition "The full scientific name of the genus in which the
taxon is classified." is incomplete and only makes sense for valid/accepted
taxon names.  I think the definition should be changed so that the dwc:genus
refers to the genus part of the name.   For some synonyms,  the genus part
is different.   In this case,  why should the genus part refer to the genus
of the accepted/valid taxon?  It is already linked to that taxon via other
methods and would inherit that information through the link.   It's just an
opportunity to create integrity conflicts as well as an opportunity to lose
some valuable additional information.   

Consider this record in the WoRMS database concerning my favorite fish:

http://www.marinespecies.org/aphia.php?p=taxdetails&id=301162

Note that the parent (genus) for this synonym is the literal, nominal parent
genus,  not the genus for the valid name.   Given the degree of homonymy
among the genera this could provide useful and explicit linking to a parent
genus records,  particularly if it were included,  like in this case, in the
source dataset.   The value in either cases is limited in the larger
aggregate world due to the recommendation that dwc:Genus, like all the named
higher taxon elements,  be canonical.

On a related note then,  I would recommend that for synonyms,  the more
normal and enriched dwc:parentNameUsageID should be used to retain this
information.   In other words,  normally, a synonym is linked to the
accepted/valid taxon via acceptedNameUsageID and dwc:parentNameUsageID is
null.  In this case, however, it should be used to 

DR

On Nov 25, 2010, at 9:11 AM, Markus Döring wrote:

the denormalised single Linnean Rank terms are very, very helpful for
sharing occurrence data.
They are the primary means to distinguish between homonyms when only a
canonical name is given.
And they are found in many denormalised sources like spreadsheets. No doubt
these are needed!

And yes, dwc:genus and dwc:subgenus according to the definition is for the
*classification*, not the parsed name (even though this is mostly the same).

As far as I can tell the dwc changes we are discussing are still the same.
Either:

A) add a canonicalName term
or
B) add an atomised term for genus/uninomial + infrageneric/uninomial

I think both options are a way to go.
A single canonical name if given correctly is very straight forward to
parse, so personally I think this is easier than having multiple terms.
For the name part terms I think I would agree with Chuck that a single
uninomial can be used for genus or infrageneric ranks.
As a canonical binomial would *not* include a subgenus or section, there is
not need to have that parsed information as a term.
In case the scientificname actually IS the subgenus, the uninomial can be
used.

Markus

On Nov 25, 2010, at 8:36, David Remsen (GBIF) wrote:

Rich

Your two statements below don't jibe well in this case.   Putting  
random concatenations of higher taxa into dwc:higherClassification  
would make for a real mess.   Having only the basic named Linnaean  
ranks does ignore all of the intermediate ranks but it supports  
conformity at least for those in a way that higherClassification  
cannot as you lose the associated rank term.   It also supports what I  
think is a fairly substantial bloc of data that exists in a denormal  
form with only (or nearly only) the basic Linnaean ranks in named rank  
columns.   Concatenating these into dwc:higherClassification would be  
lossy in this case.

My real concern, however, would be in trying to subsequently line up  
multiple datasets where there were omissions in some higher ranks so  
that the concatenations were abbreviated.   In other words

Bivalvia:Mytildae:Mytilus edulis
Mollusca:Mytiloidea:Mytilus: Mytilus edulis
Animalia: Mollusca:Mytiloidea:Mytildae:Mytilus: Mytilus edulis

See http://code.google.com/p/gbif-ecat/wiki/Nom5ExampleMytilusedulis   
for a real world example and, ignoring the other inherent messes,   
imagine trying to deal with this with no higher rank columns for  
context and all those nulls removed (no fair keeping the delimiters  
for them either).

DR

On 25/11/2010, at 11:56 AM, Richard Pyle wrote:

One golden rule of data management that I often
tell people is that it's often better to be consistent, then  
correct.  That
is, something that's consistently incorrect can be corrected easily.

Right -- you mean in the sense of Family, Order, Class, etc.   
Personally, I
think it would be "ideal" to eliminate these individual fields and  
just use
dwc: higherClassification for this purpose.  People with normalised  
data can
represent it properly via parentNameUsage[ID] -- with the  
understanding that
all names with a rank lower than genus would include the genus name as
uninomial.

_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content