Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

23 Nov 2010

      Tony

I did indeed mean that scientificName and authorship could be used in  
the following way

1.  "Agalinis purpurea"  -> scientificName ("Agalinis purpurea")
   - where a canonical form of the name with no authorship in the  
source data

2.  "Agalinis purpurea (L.) Pennell"  -> scientificName ("Agalinis  
purpurea (L.) Pennell" )
- where a unparsed name+author is in the source data

3.  "Agalinis purpurea" AND "(L.) Pennell"  -> scientificName  
("Agalinis purpurea") + scientificNameAuthorship ("(L.) Pennell")
- where a semi-parsed name + author is in the source data

4. "Agalinis" AND purpurea" AND "(L.) Pennell"  > scientificName  
("Agalinis purpurea") + scientificNameAuthorship ("(L.)
- where a fully atomised name is in the source data and the 'name'  
parts concatenated to make a proper canonical name.

Cases 3 and 4 require modification of the definition at http://rs.tdwg.org/dwc/terms/index.htm#scientificName 
  to be something like

"The full scientific name, which may include authorship and date  
information if known..." with the implicit intention that it is not  
REQUIRED to parse or semi-parse an unparsed name in order to properly  
share it.

David

On Nov 23, 2010, at 12:35 PM, <Tony.Rees@csiro.au> wrote:
...
David Remsen wrote:
...
Maybe we shouldnt add canonical name but rather something more
specific to the concatenated form like
 dwc:scientificNameWithAuthorshipAndOtherBits
 dwc:scientificName
 dwc:scientificNameAuthorship
If by "dwc:scientificName" you mean with authorship omitted, that is  
fine, however it would need the dwc definition to be altered...
Then at least folk would/should know which field to populate.  
However the mandatory yes/no issue would also have to be addressed -  
at present I think dwc:scientificName is the only taxonomy related  
element that is mandatory, all others are optional. Under your  
scenario it would then maybe be one of either of the first 2 fields,  
or both as available, I guess?
Regards - Tony
________________________________________
From: David Remsen (GBIF) [dremsen@gbif.org]
Sent: Tuesday, 23 November 2010 7:47 PM
To: Rees, Tony (CMAR, Hobart)
Cc: David Remsen (GBIF); deepreef@bishopmuseum.org;  
m.doering@mac.com; tdwg-content@lists.tdwg.org; dmozzherin@eol.org
Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in  
DwC scientificName: good or bad?
While I haven't seen them all,  I have seen and had to understand a
good number of biodiversity databases including many focused on
managing species lists in one form or another.   Names are represented
in these three forms.
1. Completely unparsed where the entire verbose name text is in a
single field corresponding to dwc:scientificName.   In some databases
this means just a scientific name as many databases don't hold
authorship information.
2. Semi-parsed where the canonical name is separated from the
authorship information corresponding to the proposed canonicalName and
dwc:scientificNameAuthorship
3. Fully parsed into atoms (genus, specific epithet, infraspecific
rank, infraspecies, authorship) corresponding to the incomplete set of
dwc atomic elements already in existence.    This form is the most
problematic because 1) it isn't always clear from the parts how the
actual complete name is intended to be represented and 2) there are so
many structural exceptions and complexities that many more 'atoms'
need to be described to effectively enable it to be used. 3) there is
the problematic definition of the use of Genus as described by Markus
that conflicts with atomising synonyms.
It makes sense to maintain the separation of name and authorship in
data sources that already do this but Im not convinced a canonicalName
element is required.   It seems that it is suggested so that it makes
it easier to consume the data but it also means its more confusing for
a typical data manager or biologist to produce it.   I have a database
with binomials alone.  How many data managers or biologists will map
them to canonicalName before scientificName?   I know we want to avoid
testing different conditions when we use the data but we will have to
in either case.
Maybe we shouldnt add canonical name but rather something more
specific to the concatenated form like
dwc:scientificNameWithAuthorshipAndOtherBits
dwc:scientificName
dwc:scientificNameAuthorship
I'd know what to do then
DR
On Nov 22, 2010, at 11:18 PM, <Tony.Rees@csiro.au>
<Tony.Rees@csiro.au> wrote:
...
Hi Rich, all,
You wrote:
.
...
Otherwise, we could argue forever about which of the dozen possible
forms
we
think DwC needs a term for.
No, I think that is muddying the waters (with respect of course...)
I simply made the case for "canonicalName" - aka scientific name
without authorship - as a valuable adjunct to "scientificName", for
users who can supply both, and consumers who would otherwise have to
generate the former from the latter algorithmically. Markus, Dima
probably represent the main "consumers" here and I if you like can
represent a "provider" (although I wear other "consumer" hats on
occasion as well). Basically if a "canonicalName" field does not
exist, I will just omit to provide this information, which seems sub-
optimal since it all exists pre-parsed and manually verified in my
system, and someone else will then have to do the job again...
Regards - Tony
...
-----Original Message-----
From: Richard Pyle [mailto:deepreef@bishopmuseum.org]
Sent: Tuesday, 23 November 2010 7:06 AM
To: Rees, Tony (CMAR, Hobart); m.doering@mac.com
Cc: tdwg-content@lists.tdwg.org; dmozzherin@eol.org
Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in  
DwC
scientificName: good or bad?
...
"unininomial" would equal "canonicalName" for ranks subgenus
and above, but not for species and below, while canonicalName
(or scientificNameCanonical if you prefer) covers all cases,
which is why I thik it is preferable, especially as the
majority of names in circulation are at species level and
below I think...
Atomising further i.e. a binomial or poynomial into genus,
species, infaspecies is actually a separate activity with its
own rationale, I would say.
Just my personal view, of course...
The cleanest way to do it is to simply have Rank, NameElement and
parentNameUsageID, and be done with it (maybe with the addition of
verbatimNameString for purists).  But that assumes that providers
have
parsed data, which they often do not.  Maybe with services like  
those
associated with GNI, the time of databases with unparsed names data
are
drawing to a close.  Or, maybe if GNUB gets a foot-hold, we'll
solve all
the
problems via a simply actionable persistent identifier.
But until that time, dwc needs to find a balance between users who
want
pre-parsed data, and providers who do not have pre-parsed data.
I think dwc *almost* accomodates both worlds, as long as
scientificName is
defined as "the complete set of textual elements useful for
recognizing a
unique scientific name"; which is either concatenated by the
provider with
parsed data, or simply "provided" by the provider with unparsed  
data.
What we seem to be arguing about now is how many different forms  
of a
"formatted" name do we want?
With or without authorship?
With or without year?
With or without infraspecific prefixes ("var.", "f." etc.)?
With or without infrageneric name(s)?
With or without italics codes?
With or without qualifiers like "cf.", "aff.", etc.?
Etc.
Etc.
Etc.
There are potentially dozens of different terms we could define to
accommodate every particular niche-need.
Personally, I think that the existing "scientificName" should be
split
into
two different terms:
fullScientificNameStringWithAuthorship
And
verbatimNameString
The first would be a concatenated text string assembled from parsed
bits,
according to a community standard concatenation form.
The second would be the literal text string as it appeared in the
original
source.
Otherwise, we could argue forever about which of the dozen possible
forms
we
think DwC needs a term for.
Rich
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content