[tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Chuck Miller Chuck.Miller at mobot.org
Tue Nov 23 21:00:08 CET 2010


What is the specific objection to adding canonicalName to DwC as an
optional element, other than the fact it makes DwC one thing larger?

There are databases which do not have their names parsed and provide
whatever they have recorded as ScientificName.  But, there are also
databases which do have parsed names and could provide this more
narrowly defined element, in addition to the ScientificName.  Those
databases could make use of a  dwc:canonicalName element in their data
exchange or query response.

What we don't have and I think never will have is perfectly consistent
names data from every database in the world.  One reason is a mountain
of inconsistently recorded legacy data from decades past that stands in
the way of perfection.  Another is variation in convention or tradition
for a variety of reasons that have been explored in these recent
threads. So, I think the pragmatic approach is to accept the
inconsistencies and work around them.
  
Chuck

-----Original Message-----
From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of
Tony.Rees at csiro.au
Sent: Tuesday, November 23, 2010 1:40 PM
To: dremsen at gbif.org
Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
scientificName: good or bad?

Hi David,

It seems to me that your suggestion is still not quite ideal, in that
sometimes just the dwc:scientificName element will be picked up and
passed around and the content will not be consistent between those
suppliers who concatenate the available authority info and those who do
not. That suggests to me that an extra field for known canonicalName if
this can be supplied is still desirable - but I am not sure if I am
alone in thinking this...

Regards - Tony

________________________________________
From: David Remsen (GBIF) [dremsen at gbif.org]
Sent: Tuesday, 23 November 2010 11:15 PM
To: Rees, Tony (CMAR, Hobart)
Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; m.doering at mac.com;
tdwg-content at lists.tdwg.org; dmozzherin at eol.org
Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
scientificName: good or bad?

Tony

I did indeed mean that scientificName and authorship could be used in
the following way

1.  "Agalinis purpurea"  -> scientificName ("Agalinis purpurea")
   - where a canonical form of the name with no authorship in the source
data

2.  "Agalinis purpurea (L.) Pennell"  -> scientificName ("Agalinis
purpurea (L.) Pennell" )
- where a unparsed name+author is in the source data

3.  "Agalinis purpurea" AND "(L.) Pennell"  -> scientificName ("Agalinis
purpurea") + scientificNameAuthorship ("(L.) Pennell")
- where a semi-parsed name + author is in the source data

4. "Agalinis" AND purpurea" AND "(L.) Pennell"  > scientificName
("Agalinis purpurea") + scientificNameAuthorship ("(L.)
- where a fully atomised name is in the source data and the 'name'
parts concatenated to make a proper canonical name.

Cases 3 and 4 require modification of the definition at
http://rs.tdwg.org/dwc/terms/index.htm#scientificName
  to be something like

"The full scientific name, which may include authorship and date
information if known..." with the implicit intention that it is not
REQUIRED to parse or semi-parse an unparsed name in order to properly
share it.

David

On Nov 23, 2010, at 12:35 PM, <Tony.Rees at csiro.au> wrote:

> David Remsen wrote:
>
>> Maybe we shouldnt add canonical name but rather something more 
>> specific to the concatenated form like  
>> dwc:scientificNameWithAuthorshipAndOtherBits
>>  dwc:scientificName
>>  dwc:scientificNameAuthorship
>
> If by "dwc:scientificName" you mean with authorship omitted, that is 
> fine, however it would need the dwc definition to be altered...
>
> Then at least folk would/should know which field to populate.
> However the mandatory yes/no issue would also have to be addressed - 
> at present I think dwc:scientificName is the only taxonomy related 
> element that is mandatory, all others are optional. Under your 
> scenario it would then maybe be one of either of the first 2 fields, 
> or both as available, I guess?
>
> Regards - Tony
>
> ________________________________________
> From: David Remsen (GBIF) [dremsen at gbif.org]
> Sent: Tuesday, 23 November 2010 7:47 PM
> To: Rees, Tony (CMAR, Hobart)
> Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; m.doering at mac.com;

> tdwg-content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC 
> scientificName: good or bad?
>
> While I haven't seen them all,  I have seen and had to understand a 
> good number of biodiversity databases including many focused on
> managing species lists in one form or another.   Names are represented
> in these three forms.
>
> 1. Completely unparsed where the entire verbose name text is in a
> single field corresponding to dwc:scientificName.   In some databases
> this means just a scientific name as many databases don't hold 
> authorship information.
>
> 2. Semi-parsed where the canonical name is separated from the 
> authorship information corresponding to the proposed canonicalName and

> dwc:scientificNameAuthorship
>
> 3. Fully parsed into atoms (genus, specific epithet, infraspecific 
> rank, infraspecies, authorship) corresponding to the incomplete set of
> dwc atomic elements already in existence.    This form is the most
> problematic because 1) it isn't always clear from the parts how the 
> actual complete name is intended to be represented and 2) there are so

> many structural exceptions and complexities that many more 'atoms'
> need to be described to effectively enable it to be used. 3) there is 
> the problematic definition of the use of Genus as described by Markus 
> that conflicts with atomising synonyms.
>
> It makes sense to maintain the separation of name and authorship in 
> data sources that already do this but Im not convinced a canonicalName
> element is required.   It seems that it is suggested so that it makes
> it easier to consume the data but it also means its more confusing for
> a typical data manager or biologist to produce it.   I have a database
> with binomials alone.  How many data managers or biologists will map
> them to canonicalName before scientificName?   I know we want to avoid
> testing different conditions when we use the data but we will have to 
> in either case.
>
> Maybe we shouldnt add canonical name but rather something more 
> specific to the concatenated form like
>
> dwc:scientificNameWithAuthorshipAndOtherBits
> dwc:scientificName
> dwc:scientificNameAuthorship
>
> I'd know what to do then
>
> DR
>
> On Nov 22, 2010, at 11:18 PM, <Tony.Rees at csiro.au> 
> <Tony.Rees at csiro.au> wrote:
>
>> Hi Rich, all,
>>
>> You wrote:
>> .
>>> Otherwise, we could argue forever about which of the dozen possible 
>>> forms we think DwC needs a term for.
>>
>> No, I think that is muddying the waters (with respect of course...) I

>> simply made the case for "canonicalName" - aka scientific name 
>> without authorship - as a valuable adjunct to "scientificName", for 
>> users who can supply both, and consumers who would otherwise have to 
>> generate the former from the latter algorithmically. Markus, Dima 
>> probably represent the main "consumers" here and I if you like can 
>> represent a "provider" (although I wear other "consumer" hats on 
>> occasion as well). Basically if a "canonicalName" field does not 
>> exist, I will just omit to provide this information, which seems sub-

>> optimal since it all exists pre-parsed and manually verified in my 
>> system, and someone else will then have to do the job again...
>>
>> Regards - Tony
>>
>>
>>> -----Original Message-----
>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>> Sent: Tuesday, 23 November 2010 7:06 AM
>>> To: Rees, Tony (CMAR, Hobart); m.doering at mac.com
>>> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in 
>>> DwC
>>> scientificName: good or bad?
>>>
>>>> "unininomial" would equal "canonicalName" for ranks subgenus and 
>>>> above, but not for species and below, while canonicalName (or 
>>>> scientificNameCanonical if you prefer) covers all cases, which is 
>>>> why I thik it is preferable, especially as the majority of names in

>>>> circulation are at species level and below I think...
>>>>
>>>> Atomising further i.e. a binomial or poynomial into genus, species,

>>>> infaspecies is actually a separate activity with its own rationale,

>>>> I would say.
>>>>
>>>> Just my personal view, of course...
>>>
>>> The cleanest way to do it is to simply have Rank, NameElement and 
>>> parentNameUsageID, and be done with it (maybe with the addition of 
>>> verbatimNameString for purists).  But that assumes that providers 
>>> have parsed data, which they often do not.  Maybe with services like

>>> those associated with GNI, the time of databases with unparsed names

>>> data are drawing to a close.  Or, maybe if GNUB gets a foot-hold, 
>>> we'll solve all the problems via a simply actionable persistent 
>>> identifier.
>>>
>>> But until that time, dwc needs to find a balance between users who 
>>> want pre-parsed data, and providers who do not have pre-parsed data.
>>>
>>> I think dwc *almost* accomodates both worlds, as long as 
>>> scientificName is defined as "the complete set of textual elements 
>>> useful for recognizing a unique scientific name"; which is either 
>>> concatenated by the provider with parsed data, or simply "provided" 
>>> by the provider with unparsed data.
>>>
>>> What we seem to be arguing about now is how many different forms of 
>>> a "formatted" name do we want?
>>>
>>> With or without authorship?
>>>
>>> With or without year?
>>>
>>> With or without infraspecific prefixes ("var.", "f." etc.)?
>>>
>>> With or without infrageneric name(s)?
>>>
>>> With or without italics codes?
>>>
>>> With or without qualifiers like "cf.", "aff.", etc.?
>>>
>>> Etc.
>>>
>>> Etc.
>>>
>>> Etc.
>>>
>>> There are potentially dozens of different terms we could define to 
>>> accommodate every particular niche-need.
>>>
>>> Personally, I think that the existing "scientificName" should be 
>>> split into two different terms:
>>>
>>> fullScientificNameStringWithAuthorship
>>> And
>>> verbatimNameString
>>>
>>> The first would be a concatenated text string assembled from parsed 
>>> bits, according to a community standard concatenation form.
>>>
>>> The second would be the literal text string as it appeared in the 
>>> original source.
>>>
>>> Otherwise, we could argue forever about which of the dozen possible 
>>> forms we think DwC needs a term for.
>>>
>>> Rich
>>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content


More information about the tdwg-content mailing list