[tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Chuck Miller Chuck.Miller at mobot.org
Tue Nov 23 22:23:18 CET 2010


Yeh.  What's the binomial for a kill-joy?

Chuck

-----Original Message-----
From: Bob Morris [mailto:morris.bob at gmail.com] 
Sent: Tuesday, November 23, 2010 2:21 PM
To: Chuck Miller
Cc: Tony.Rees at csiro.au; dremsen at gbif.org; tdwg-content at lists.tdwg.org; dmozzherin at eol.org
Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Aw, Chuck, you are such a kill-joy.  We should never do anything until it is perfectly consistent.  :-)


On Tue, Nov 23, 2010 at 3:00 PM, Chuck Miller <Chuck.Miller at mobot.org> wrote:
> What is the specific objection to adding canonicalName to DwC as an 
> optional element, other than the fact it makes DwC one thing larger?
>
> There are databases which do not have their names parsed and provide 
> whatever they have recorded as ScientificName.  But, there are also 
> databases which do have parsed names and could provide this more 
> narrowly defined element, in addition to the ScientificName.  Those 
> databases could make use of a  dwc:canonicalName element in their data 
> exchange or query response.
>
> What we don't have and I think never will have is perfectly consistent 
> names data from every database in the world.  One reason is a mountain 
> of inconsistently recorded legacy data from decades past that stands 
> in the way of perfection.  Another is variation in convention or 
> tradition for a variety of reasons that have been explored in these 
> recent threads. So, I think the pragmatic approach is to accept the 
> inconsistencies and work around them.
>
> Chuck
>
> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org
> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of 
> Tony.Rees at csiro.au
> Sent: Tuesday, November 23, 2010 1:40 PM
> To: dremsen at gbif.org
> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
> scientificName: good or bad?
>
> Hi David,
>
> It seems to me that your suggestion is still not quite ideal, in that 
> sometimes just the dwc:scientificName element will be picked up and 
> passed around and the content will not be consistent between those 
> suppliers who concatenate the available authority info and those who 
> do not. That suggests to me that an extra field for known 
> canonicalName if this can be supplied is still desirable - but I am 
> not sure if I am alone in thinking this...
>
> Regards - Tony
>
> ________________________________________
> From: David Remsen (GBIF) [dremsen at gbif.org]
> Sent: Tuesday, 23 November 2010 11:15 PM
> To: Rees, Tony (CMAR, Hobart)
> Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; m.doering at mac.com; 
> tdwg-content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
> scientificName: good or bad?
>
> Tony
>
> I did indeed mean that scientificName and authorship could be used in 
> the following way
>
> 1.  "Agalinis purpurea"  -> scientificName ("Agalinis purpurea")
>   - where a canonical form of the name with no authorship in the 
> source data
>
> 2.  "Agalinis purpurea (L.) Pennell"  -> scientificName ("Agalinis 
> purpurea (L.) Pennell" )
> - where a unparsed name+author is in the source data
>
> 3.  "Agalinis purpurea" AND "(L.) Pennell"  -> scientificName 
> ("Agalinis
> purpurea") + scientificNameAuthorship ("(L.) Pennell")
> - where a semi-parsed name + author is in the source data
>
> 4. "Agalinis" AND purpurea" AND "(L.) Pennell"  > scientificName 
> ("Agalinis purpurea") + scientificNameAuthorship ("(L.)
> - where a fully atomised name is in the source data and the 'name'
> parts concatenated to make a proper canonical name.
>
> Cases 3 and 4 require modification of the definition at 
> http://rs.tdwg.org/dwc/terms/index.htm#scientificName
>  to be something like
>
> "The full scientific name, which may include authorship and date 
> information if known..." with the implicit intention that it is not 
> REQUIRED to parse or semi-parse an unparsed name in order to properly 
> share it.
>
> David
>
> On Nov 23, 2010, at 12:35 PM, <Tony.Rees at csiro.au> wrote:
>
>> David Remsen wrote:
>>
>>> Maybe we shouldnt add canonical name but rather something more 
>>> specific to the concatenated form like 
>>> dwc:scientificNameWithAuthorshipAndOtherBits
>>>  dwc:scientificName
>>>  dwc:scientificNameAuthorship
>>
>> If by "dwc:scientificName" you mean with authorship omitted, that is 
>> fine, however it would need the dwc definition to be altered...
>>
>> Then at least folk would/should know which field to populate.
>> However the mandatory yes/no issue would also have to be addressed - 
>> at present I think dwc:scientificName is the only taxonomy related 
>> element that is mandatory, all others are optional. Under your 
>> scenario it would then maybe be one of either of the first 2 fields, 
>> or both as available, I guess?
>>
>> Regards - Tony
>>
>> ________________________________________
>> From: David Remsen (GBIF) [dremsen at gbif.org]
>> Sent: Tuesday, 23 November 2010 7:47 PM
>> To: Rees, Tony (CMAR, Hobart)
>> Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; 
>> m.doering at mac.com;
>
>> tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
>> scientificName: good or bad?
>>
>> While I haven't seen them all,  I have seen and had to understand a 
>> good number of biodiversity databases including many focused on 
>> managing species lists in one form or another.   Names are 
>> represented in these three forms.
>>
>> 1. Completely unparsed where the entire verbose name text is in a 
>> single field corresponding to dwc:scientificName.   In some databases 
>> this means just a scientific name as many databases don't hold 
>> authorship information.
>>
>> 2. Semi-parsed where the canonical name is separated from the 
>> authorship information corresponding to the proposed canonicalName 
>> and
>
>> dwc:scientificNameAuthorship
>>
>> 3. Fully parsed into atoms (genus, specific epithet, infraspecific 
>> rank, infraspecies, authorship) corresponding to the incomplete set 
>> of dwc atomic elements already in existence.    This form is the most 
>> problematic because 1) it isn't always clear from the parts how the 
>> actual complete name is intended to be represented and 2) there are 
>> so
>
>> many structural exceptions and complexities that many more 'atoms'
>> need to be described to effectively enable it to be used. 3) there is 
>> the problematic definition of the use of Genus as described by Markus 
>> that conflicts with atomising synonyms.
>>
>> It makes sense to maintain the separation of name and authorship in 
>> data sources that already do this but Im not convinced a 
>> canonicalName element is required.   It seems that it is suggested so 
>> that it makes it easier to consume the data but it also means its 
>> more confusing for a typical data manager or biologist to produce it.   
>> I have a database with binomials alone.  How many data managers or 
>> biologists will map them to canonicalName before scientificName?   I 
>> know we want to avoid testing different conditions when we use the 
>> data but we will have to in either case.
>>
>> Maybe we shouldnt add canonical name but rather something more 
>> specific to the concatenated form like
>>
>> dwc:scientificNameWithAuthorshipAndOtherBits
>> dwc:scientificName
>> dwc:scientificNameAuthorship
>>
>> I'd know what to do then
>>
>> DR
>>
>> On Nov 22, 2010, at 11:18 PM, <Tony.Rees at csiro.au> 
>> <Tony.Rees at csiro.au> wrote:
>>
>>> Hi Rich, all,
>>>
>>> You wrote:
>>> .
>>>> Otherwise, we could argue forever about which of the dozen possible 
>>>> forms we think DwC needs a term for.
>>>
>>> No, I think that is muddying the waters (with respect of course...) 
>>> I
>
>>> simply made the case for "canonicalName" - aka scientific name 
>>> without authorship - as a valuable adjunct to "scientificName", for 
>>> users who can supply both, and consumers who would otherwise have to 
>>> generate the former from the latter algorithmically. Markus, Dima 
>>> probably represent the main "consumers" here and I if you like can 
>>> represent a "provider" (although I wear other "consumer" hats on 
>>> occasion as well). Basically if a "canonicalName" field does not 
>>> exist, I will just omit to provide this information, which seems 
>>> sub-
>
>>> optimal since it all exists pre-parsed and manually verified in my 
>>> system, and someone else will then have to do the job again...
>>>
>>> Regards - Tony
>>>
>>>
>>>> -----Original Message-----
>>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>>> Sent: Tuesday, 23 November 2010 7:06 AM
>>>> To: Rees, Tony (CMAR, Hobart); m.doering at mac.com
>>>> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in 
>>>> DwC
>>>> scientificName: good or bad?
>>>>
>>>>> "unininomial" would equal "canonicalName" for ranks subgenus and 
>>>>> above, but not for species and below, while canonicalName (or 
>>>>> scientificNameCanonical if you prefer) covers all cases, which is 
>>>>> why I thik it is preferable, especially as the majority of names 
>>>>> in
>
>>>>> circulation are at species level and below I think...
>>>>>
>>>>> Atomising further i.e. a binomial or poynomial into genus, 
>>>>> species,
>
>>>>> infaspecies is actually a separate activity with its own 
>>>>> rationale,
>
>>>>> I would say.
>>>>>
>>>>> Just my personal view, of course...
>>>>
>>>> The cleanest way to do it is to simply have Rank, NameElement and 
>>>> parentNameUsageID, and be done with it (maybe with the addition of 
>>>> verbatimNameString for purists).  But that assumes that providers 
>>>> have parsed data, which they often do not.  Maybe with services 
>>>> like
>
>>>> those associated with GNI, the time of databases with unparsed 
>>>> names
>
>>>> data are drawing to a close.  Or, maybe if GNUB gets a foot-hold, 
>>>> we'll solve all the problems via a simply actionable persistent 
>>>> identifier.
>>>>
>>>> But until that time, dwc needs to find a balance between users who 
>>>> want pre-parsed data, and providers who do not have pre-parsed data.
>>>>
>>>> I think dwc *almost* accomodates both worlds, as long as 
>>>> scientificName is defined as "the complete set of textual elements 
>>>> useful for recognizing a unique scientific name"; which is either 
>>>> concatenated by the provider with parsed data, or simply "provided"
>>>> by the provider with unparsed data.
>>>>
>>>> What we seem to be arguing about now is how many different forms of 
>>>> a "formatted" name do we want?
>>>>
>>>> With or without authorship?
>>>>
>>>> With or without year?
>>>>
>>>> With or without infraspecific prefixes ("var.", "f." etc.)?
>>>>
>>>> With or without infrageneric name(s)?
>>>>
>>>> With or without italics codes?
>>>>
>>>> With or without qualifiers like "cf.", "aff.", etc.?
>>>>
>>>> Etc.
>>>>
>>>> Etc.
>>>>
>>>> Etc.
>>>>
>>>> There are potentially dozens of different terms we could define to 
>>>> accommodate every particular niche-need.
>>>>
>>>> Personally, I think that the existing "scientificName" should be 
>>>> split into two different terms:
>>>>
>>>> fullScientificNameStringWithAuthorship
>>>> And
>>>> verbatimNameString
>>>>
>>>> The first would be a concatenated text string assembled from parsed 
>>>> bits, according to a community standard concatenation form.
>>>>
>>>> The second would be the literal text string as it appeared in the 
>>>> original source.
>>>>
>>>> Otherwise, we could argue forever about which of the dozen possible 
>>>> forms we think DwC needs a term for.
>>>>
>>>> Rich
>>>>
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>



--
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob at gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)


More information about the tdwg-content mailing list