<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Re: [tdwg-content] proposed term: dwc:verbatimScientificName</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>I feel somewhat confused by this. The actual situation looks<BR>
eminently simple to me:<BR>
<BR>
1) In a database there sits a text string (in any of a bewildering<BR>
number of shapes). The straightforward thing would be to call<BR>
this something like<BR>
<BR>
NameInDatabase: Dalbergia baroni Baker <BR>
or<BR>
VerbatimNameInDatabase: Dalbergia baroni Baker<BR>
<BR>
2) There also is the scientific name, as governed by the relevant<BR>
nomenclatural Code. The straightforward thing would be to call this<BR>
<BR>
ScientificName: Dalbergia baronii<BR>
<BR>
3) Finally, there is whatever intermediate result the algorithms that<BR>
are being used are converting the harvested text string into. I suppose<BR>
I would call this something like<BR>
<BR>
ParserResult: Dalbergia baroni<BR>
or<BR>
ParserResultForScientificName: Dalbergia baroni<BR>
<BR>
Obviously, it would be nice if algorithms did exist which could convert<BR>
a text string into a scientific name, but this still lies in the future.<BR>
<BR>
In the mean time, I do not see why all the complexity needs to come in?<BR>
<BR>
Paul van Rijckevorsel<BR>
<BR>
<BR>
-----Oorspronkelijk bericht-----<BR>
Van: tdwg-content-bounces@lists.tdwg.org namens David Remsen (GBIF)<BR>
Verzonden: wo 8-12-2010 17:09<BR>
Aan: tdwg-content@lists.tdwg.org List<BR>
Onderwerp: [tdwg-content] proposed term: dwc:verbatimScientificName<BR>
<BR>
Markus and I wanted to try to consolidate the issues related to the <BR>
current use and definition of scientificName that have been the focus <BR>
of last weeks discussion in as simple a way as we can and leave it <BR>
with a simple proposal which we will add to the issue tracking on the <BR>
project site.<BR>
<BR>
1. We propose that a new term, dwc:verbatimScientificName carry the <BR>
existing definition for dwc:scientificName and<BR>
2. dwc:scientificName follow the more accepted convention that is <BR>
better represented by the earlier proposed definition for Canonical Name<BR>
<BR>
The intention is to enable data publishers to distinguish unparsed, <BR>
complex scientific names from more cleanly separated scientific name <BR>
data. This will relieve consumers of these data from testing each <BR>
instance of a name for one of these two conditions.<BR>
<BR>
Here are the definitions for the two existing terms that have been <BR>
part of the discussion:<BR>
<BR>
dwc:scientificName - The full scientific name, with authorship and <BR>
date information if known. When forming part of an Identification, <BR>
this should be the name in lowest level taxonomic rank that can be <BR>
determined. This term should not contain identification <BR>
qualifications, which should instead be supplied in the <BR>
IdentificationQualifier term.<BR>
<BR>
dwc:scientificNameAuthorship - The authorship information for the <BR>
scientificName formatted according to the conventions of the <BR>
applicable nomenclaturalCode.<BR>
<BR>
Here are terms and definitions used in the following 5 source data <BR>
configurations we came up with. They don't have to be exact for this <BR>
purpose.<BR>
<BR>
canonical name - The nomenclatural components of a scentific name <BR>
without authorship information.<BR>
authorship - the authorship information that follows a scientific name<BR>
verbatim name - the verbatim text stored in a source database when it <BR>
differs from or combines the two definitions above. This is a bit <BR>
more broad than the def for scientificName.<BR>
<BR>
We identified the following configurations in a source database and <BR>
how they would be mapped to the existing terms. In cases 4 and 5 we <BR>
also propose how we would map these were there a 3rd available term <BR>
(called 'mapping b:')<BR>
<BR>
When a source database contains:<BR>
<BR>
1. canonical names only<BR>
<BR>
Mapping: canonical name -> dwc:scientificName<BR>
<BR>
2. canonical name and authorship in two fields<BR>
<BR>
Mapping: canonical name -> dwc:scientificName / authorship-<BR>
>dwc:scientificNameAuthorship<BR>
<BR>
3. verbatim name only<BR>
<BR>
Mapping: verbatim name -> dwc:scientificName<BR>
<BR>
4. all three: canonical name, authorship, and verbatim name in 3 diff. <BR>
columns<BR>
<BR>
Mapping a: verbatim name -> dwc:scientificName / authorship-<BR>
>dwc:scientificNameAuthorship<BR>
<BR>
Mapping b: canonical name -> dwc:scientificName / authorship-<BR>
>dwc:scientificNameAuthorship / verbatim name -> <BR>
dwc:verbatimScientificName<BR>
<BR>
5. a mix of canonical and verbatim names in a single column<BR>
<BR>
Mapping a: verbatim name + canonical names -> dwc:scientificName<BR>
<BR>
Mapping b: verbatim name + canonical names -> <BR>
dwc:verbatimScientificName<BR>
<BR>
Summary - with the current two terms are left with no choice but to <BR>
support both canonical and verbatim names in a single term, which <BR>
makes consuming these data difficult.<BR>
<BR>
We propose that a new term, dwc:verbatimScientificName carry the <BR>
existing definition for dwc:scientificName and that dwc:scientificName <BR>
follow the more accepted convention that is better represented by the <BR>
definition for Canonical Name<BR>
<BR>
Best,<BR>
David Remsen / Markus Döring<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>