[tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Bob Morris morris.bob at gmail.com
Tue Nov 23 21:20:51 CET 2010


Aw, Chuck, you are such a kill-joy.  We should never do anything until
it is perfectly consistent.  :-)


On Tue, Nov 23, 2010 at 3:00 PM, Chuck Miller <Chuck.Miller at mobot.org> wrote:
> What is the specific objection to adding canonicalName to DwC as an
> optional element, other than the fact it makes DwC one thing larger?
>
> There are databases which do not have their names parsed and provide
> whatever they have recorded as ScientificName.  But, there are also
> databases which do have parsed names and could provide this more
> narrowly defined element, in addition to the ScientificName.  Those
> databases could make use of a  dwc:canonicalName element in their data
> exchange or query response.
>
> What we don't have and I think never will have is perfectly consistent
> names data from every database in the world.  One reason is a mountain
> of inconsistently recorded legacy data from decades past that stands in
> the way of perfection.  Another is variation in convention or tradition
> for a variety of reasons that have been explored in these recent
> threads. So, I think the pragmatic approach is to accept the
> inconsistencies and work around them.
>
> Chuck
>
> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org
> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of
> Tony.Rees at csiro.au
> Sent: Tuesday, November 23, 2010 1:40 PM
> To: dremsen at gbif.org
> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
> scientificName: good or bad?
>
> Hi David,
>
> It seems to me that your suggestion is still not quite ideal, in that
> sometimes just the dwc:scientificName element will be picked up and
> passed around and the content will not be consistent between those
> suppliers who concatenate the available authority info and those who do
> not. That suggests to me that an extra field for known canonicalName if
> this can be supplied is still desirable - but I am not sure if I am
> alone in thinking this...
>
> Regards - Tony
>
> ________________________________________
> From: David Remsen (GBIF) [dremsen at gbif.org]
> Sent: Tuesday, 23 November 2010 11:15 PM
> To: Rees, Tony (CMAR, Hobart)
> Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; m.doering at mac.com;
> tdwg-content at lists.tdwg.org; dmozzherin at eol.org
> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
> scientificName: good or bad?
>
> Tony
>
> I did indeed mean that scientificName and authorship could be used in
> the following way
>
> 1.  "Agalinis purpurea"  -> scientificName ("Agalinis purpurea")
>   - where a canonical form of the name with no authorship in the source
> data
>
> 2.  "Agalinis purpurea (L.) Pennell"  -> scientificName ("Agalinis
> purpurea (L.) Pennell" )
> - where a unparsed name+author is in the source data
>
> 3.  "Agalinis purpurea" AND "(L.) Pennell"  -> scientificName ("Agalinis
> purpurea") + scientificNameAuthorship ("(L.) Pennell")
> - where a semi-parsed name + author is in the source data
>
> 4. "Agalinis" AND purpurea" AND "(L.) Pennell"  > scientificName
> ("Agalinis purpurea") + scientificNameAuthorship ("(L.)
> - where a fully atomised name is in the source data and the 'name'
> parts concatenated to make a proper canonical name.
>
> Cases 3 and 4 require modification of the definition at
> http://rs.tdwg.org/dwc/terms/index.htm#scientificName
>  to be something like
>
> "The full scientific name, which may include authorship and date
> information if known..." with the implicit intention that it is not
> REQUIRED to parse or semi-parse an unparsed name in order to properly
> share it.
>
> David
>
> On Nov 23, 2010, at 12:35 PM, <Tony.Rees at csiro.au> wrote:
>
>> David Remsen wrote:
>>
>>> Maybe we shouldnt add canonical name but rather something more
>>> specific to the concatenated form like
>>> dwc:scientificNameWithAuthorshipAndOtherBits
>>>  dwc:scientificName
>>>  dwc:scientificNameAuthorship
>>
>> If by "dwc:scientificName" you mean with authorship omitted, that is
>> fine, however it would need the dwc definition to be altered...
>>
>> Then at least folk would/should know which field to populate.
>> However the mandatory yes/no issue would also have to be addressed -
>> at present I think dwc:scientificName is the only taxonomy related
>> element that is mandatory, all others are optional. Under your
>> scenario it would then maybe be one of either of the first 2 fields,
>> or both as available, I guess?
>>
>> Regards - Tony
>>
>> ________________________________________
>> From: David Remsen (GBIF) [dremsen at gbif.org]
>> Sent: Tuesday, 23 November 2010 7:47 PM
>> To: Rees, Tony (CMAR, Hobart)
>> Cc: David Remsen (GBIF); deepreef at bishopmuseum.org; m.doering at mac.com;
>
>> tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>> Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC
>> scientificName: good or bad?
>>
>> While I haven't seen them all,  I have seen and had to understand a
>> good number of biodiversity databases including many focused on
>> managing species lists in one form or another.   Names are represented
>> in these three forms.
>>
>> 1. Completely unparsed where the entire verbose name text is in a
>> single field corresponding to dwc:scientificName.   In some databases
>> this means just a scientific name as many databases don't hold
>> authorship information.
>>
>> 2. Semi-parsed where the canonical name is separated from the
>> authorship information corresponding to the proposed canonicalName and
>
>> dwc:scientificNameAuthorship
>>
>> 3. Fully parsed into atoms (genus, specific epithet, infraspecific
>> rank, infraspecies, authorship) corresponding to the incomplete set of
>> dwc atomic elements already in existence.    This form is the most
>> problematic because 1) it isn't always clear from the parts how the
>> actual complete name is intended to be represented and 2) there are so
>
>> many structural exceptions and complexities that many more 'atoms'
>> need to be described to effectively enable it to be used. 3) there is
>> the problematic definition of the use of Genus as described by Markus
>> that conflicts with atomising synonyms.
>>
>> It makes sense to maintain the separation of name and authorship in
>> data sources that already do this but Im not convinced a canonicalName
>> element is required.   It seems that it is suggested so that it makes
>> it easier to consume the data but it also means its more confusing for
>> a typical data manager or biologist to produce it.   I have a database
>> with binomials alone.  How many data managers or biologists will map
>> them to canonicalName before scientificName?   I know we want to avoid
>> testing different conditions when we use the data but we will have to
>> in either case.
>>
>> Maybe we shouldnt add canonical name but rather something more
>> specific to the concatenated form like
>>
>> dwc:scientificNameWithAuthorshipAndOtherBits
>> dwc:scientificName
>> dwc:scientificNameAuthorship
>>
>> I'd know what to do then
>>
>> DR
>>
>> On Nov 22, 2010, at 11:18 PM, <Tony.Rees at csiro.au>
>> <Tony.Rees at csiro.au> wrote:
>>
>>> Hi Rich, all,
>>>
>>> You wrote:
>>> .
>>>> Otherwise, we could argue forever about which of the dozen possible
>>>> forms we think DwC needs a term for.
>>>
>>> No, I think that is muddying the waters (with respect of course...) I
>
>>> simply made the case for "canonicalName" - aka scientific name
>>> without authorship - as a valuable adjunct to "scientificName", for
>>> users who can supply both, and consumers who would otherwise have to
>>> generate the former from the latter algorithmically. Markus, Dima
>>> probably represent the main "consumers" here and I if you like can
>>> represent a "provider" (although I wear other "consumer" hats on
>>> occasion as well). Basically if a "canonicalName" field does not
>>> exist, I will just omit to provide this information, which seems sub-
>
>>> optimal since it all exists pre-parsed and manually verified in my
>>> system, and someone else will then have to do the job again...
>>>
>>> Regards - Tony
>>>
>>>
>>>> -----Original Message-----
>>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>>> Sent: Tuesday, 23 November 2010 7:06 AM
>>>> To: Rees, Tony (CMAR, Hobart); m.doering at mac.com
>>>> Cc: tdwg-content at lists.tdwg.org; dmozzherin at eol.org
>>>> Subject: RE: [tdwg-content] [tdwg-tag] Inclusion of authorship in
>>>> DwC
>>>> scientificName: good or bad?
>>>>
>>>>> "unininomial" would equal "canonicalName" for ranks subgenus and
>>>>> above, but not for species and below, while canonicalName (or
>>>>> scientificNameCanonical if you prefer) covers all cases, which is
>>>>> why I thik it is preferable, especially as the majority of names in
>
>>>>> circulation are at species level and below I think...
>>>>>
>>>>> Atomising further i.e. a binomial or poynomial into genus, species,
>
>>>>> infaspecies is actually a separate activity with its own rationale,
>
>>>>> I would say.
>>>>>
>>>>> Just my personal view, of course...
>>>>
>>>> The cleanest way to do it is to simply have Rank, NameElement and
>>>> parentNameUsageID, and be done with it (maybe with the addition of
>>>> verbatimNameString for purists).  But that assumes that providers
>>>> have parsed data, which they often do not.  Maybe with services like
>
>>>> those associated with GNI, the time of databases with unparsed names
>
>>>> data are drawing to a close.  Or, maybe if GNUB gets a foot-hold,
>>>> we'll solve all the problems via a simply actionable persistent
>>>> identifier.
>>>>
>>>> But until that time, dwc needs to find a balance between users who
>>>> want pre-parsed data, and providers who do not have pre-parsed data.
>>>>
>>>> I think dwc *almost* accomodates both worlds, as long as
>>>> scientificName is defined as "the complete set of textual elements
>>>> useful for recognizing a unique scientific name"; which is either
>>>> concatenated by the provider with parsed data, or simply "provided"
>>>> by the provider with unparsed data.
>>>>
>>>> What we seem to be arguing about now is how many different forms of
>>>> a "formatted" name do we want?
>>>>
>>>> With or without authorship?
>>>>
>>>> With or without year?
>>>>
>>>> With or without infraspecific prefixes ("var.", "f." etc.)?
>>>>
>>>> With or without infrageneric name(s)?
>>>>
>>>> With or without italics codes?
>>>>
>>>> With or without qualifiers like "cf.", "aff.", etc.?
>>>>
>>>> Etc.
>>>>
>>>> Etc.
>>>>
>>>> Etc.
>>>>
>>>> There are potentially dozens of different terms we could define to
>>>> accommodate every particular niche-need.
>>>>
>>>> Personally, I think that the existing "scientificName" should be
>>>> split into two different terms:
>>>>
>>>> fullScientificNameStringWithAuthorship
>>>> And
>>>> verbatimNameString
>>>>
>>>> The first would be a concatenated text string assembled from parsed
>>>> bits, according to a community standard concatenation form.
>>>>
>>>> The second would be the literal text string as it appeared in the
>>>> original source.
>>>>
>>>> Otherwise, we could argue forever about which of the dozen possible
>>>> forms we think DwC needs a term for.
>>>>
>>>> Rich
>>>>
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>



-- 
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob at gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)


More information about the tdwg-content mailing list