<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
I do not have an opinion on this issue but wanted to note that the
TaxonName part of the TDWG Ontology appears to be fully functional.
Given that the TaxonName and TaxonConcept ontologies are based on
TCS, there may be existing terms (based on TCS) with stable URIs to
represent exactly what people want to say. They wouldn't be Darwin
Core terms, but they would be defined and have stable URIs
nonetheless. For example,<br>
<br>
<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete">http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete</a><br>
which can be abbreviated tn:nameComplete<br>
where tn:=<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/ontology/voc/TaxonName#">http://rs.tdwg.org/ontology/voc/TaxonName#</a><br>
<br>
is defined as "The complete uninomial, binomial or trinomial name
without any authority or year components."<br>
<br>
Thus one could mark up data as <br>
<tn:nameComplete>Homo sapiens</tn:nameComplete><br>
and theoretically this would have meaning to the extent to which
people take the TDWG Ontology seriously. But that is a different
item for discussion...<br>
<br>
Steve<br>
<br>
To view the rdf, see:
<a class="moz-txt-link-freetext" href="http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/TaxonName.owl">http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/TaxonName.owl</a><br>
and
<a class="moz-txt-link-freetext" href="http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/TaxonConcept.owl">http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/TaxonConcept.owl</a><br>
<br>
On 3/14/2012 3:15 PM, Kennedy, Jessie wrote:
<blockquote
cite="mid:78F1C759D89E1C44A6DD4C06B9A4B270010386C4F4B4@E2K7MBX.napier-mail.napier.ac.uk"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
        {font-family:Webdings;
        panose-1:5 3 1 2 1 5 9 6 7 3;}
@font-face
        {font-family:"Comic Sans MS";
        panose-1:3 15 7 2 3 3 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        mso-fareast-language:EN-GB;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        mso-fareast-language:EN-GB;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Peter
–you just described what TCS offered….this was all covered
in the discussion on TCS… (and many more things that have
been discussed recently)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The
guide to using it covers some of the thoughts behind these
issues I think…<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a
moz-do-not-send="true"
href="http://www.tdwg.org/fileadmin/subgroups/tnc/User_Guide.pdf">http://www.tdwg.org/fileadmin/subgroups/tnc/User_Guide.pdf</a>
<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:tdwg-tag-bounces@lists.tdwg.org">tdwg-tag-bounces@lists.tdwg.org</a>
[<a class="moz-txt-link-freetext" href="mailto:tdwg-tag-bounces@lists.tdwg.org">mailto:tdwg-tag-bounces@lists.tdwg.org</a>] <b>On Behalf Of </b>Peter
Desmet<br>
<b>Sent:</b> 14 March 2012 20:11<br>
<b>To:</b> Paul Kirk<br>
<b>Cc:</b> TDWG content mailing list; Donald Hobern (GBIF);
TDWG TAG mailing list; Christian Gendreau; dev Developers<br>
<b>Subject:</b> Re: [tdwg-tag] [tdwg-content] Canonical name
parsing<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Paul,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Higher taxon: "Magnoliidae Novák ex
Takhtajan" (a subclass).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- scientificName: Magnoliidae Novák ex
Takhtajan<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- taxonRank: subclass<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">But there are no terms to share the
canonical name "Magnoliidae". The only available options are
kingdom, phylum, class, order, family, genus, subgenus,
specificEpithet, infraspecificEpithet, none of which are
appropriate.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Solution:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- canonicalScientificName: Magnoliidae<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Infrageneric taxon: "<span
style="font-family:"Arial","sans-serif";color:#222222">Abies
sect. Amabilis (Matzenko) Farjon & Rushforth" (a
section)</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-family:"Arial","sans-serif";color:#222222">-
scientificName: Abies sect. Amabilis (Matzenko) Farjon
& Rushforth</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-family:"Arial","sans-serif";color:#222222">-
taxonRank: section</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-family:"Arial","sans-serif";color:#222222">-
genus: Abies</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-family:"Arial","sans-serif";color:#222222">But
there are no terms to share "Abies Amabilis", "Abies sect.
Amabilis", "Abies section Amabilis" or even "Amabilis". </span>The
only available options are kingdom, phylum, class, order,
family, genus, subgenus, specificEpithet,
infraspecificEpithet, none of which are appropriate. Why we
have subgenus, but not <b>infragenericEpithet</b> is
another issue. I would at least be able to share "Amabilis".<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Solution:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- canonicalScientificName: Abies Amabilis<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- taxonRank: section<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Peter<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">There is no place to share the
canonical name "Magnoliidae" for this taxon.<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Wed, Mar 14, 2012 at 14:37, Paul
Kirk <<a moz-do-not-send="true"
href="mailto:p.kirk@cabi.org">p.kirk@cabi.org</a>>
wrote:<o:p></o:p></p>
<div>
<div>
<p><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">'For
higher taxa or infrageneric taxa, these terms are
not sufficient' ... why?<o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Paul<o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <o:p></o:p></span></p>
<div class="MsoNormal" style="text-align:center"
align="center"><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
<hr align="center" size="2" width="100%"></span></div>
<div>
<div>
<p class="MsoNormal"><b><span
style="font-family:"Tahoma","sans-serif";color:black">From:</span></b><span
style="font-family:"Tahoma","sans-serif";color:black">
<a moz-do-not-send="true"
href="mailto:tdwg-tag-bounces@lists.tdwg.org"
target="_blank">tdwg-tag-bounces@lists.tdwg.org</a>
[<a moz-do-not-send="true"
href="mailto:tdwg-tag-bounces@lists.tdwg.org"
target="_blank">tdwg-tag-bounces@lists.tdwg.org</a>]
on behalf of Peter Desmet [<a
moz-do-not-send="true"
href="mailto:peter.desmet@umontreal.ca"
target="_blank">peter.desmet@umontreal.ca</a>]<br>
<b>Sent:</b> 14 March 2012 18:26<br>
<b>To:</b> Richard Pyle<br>
<b>Cc:</b> TDWG content mailing list; Donald
Hobern (GBIF); dev Developers; Christian
Gendreau; TDWG TAG mailing list<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span
style="font-family:"Tahoma","sans-serif";color:black"><br>
<b>Subject:</b> Re: [tdwg-tag] [tdwg-content]
Canonical name parsing<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal">Rich, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I wished those terms were
sufficient, but as mentioned in the
justification for <span
style="font-family:"Arial","sans-serif";color:#222222"><a
moz-do-not-send="true"
href="http://code.google.com/p/darwincore/issues/detail?id=150"
target="_blank">http://code.google.com/p/darwincore/issues/detail?id=150</a>:</span><o:p></o:p></p>
</div>
<div>
<pre style="max-width:80em;white-space:pre-wrap"><span style="font-size:9.0pt">genus, specificEpithet, infraspecificEpithet: concatenated, this terms are identical to the canonicalScientificName for genera, species and infraspecific taxa. For higher taxa or infrageneric taxa, these terms are not sufficient. In addition, there is some ambiguity regarding the genus definition: for synonyms, is it the accepted genus or the genus that is part of the synonym name? See: <a moz-do-not-send="true" href="http://lists.tdwg.org/pipermail/tdwg-content/2010-November/002052.html" target="_blank"><span style="color:#0000CC">http://lists.tdwg.org/pipermail/tdwg-content/2010-November/002052.html</span></a>. In the former case, the genus cannot be used to concatenate a canonicalScientificName.<o:p></o:p></span></pre>
</div>
<div>
<p class="MsoNormal">To give an example for a
higher taxon:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">scientificName: Magnoliidae
Novák ex Takhtajan<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">taxonRank: subclass<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">There is no place to
share the canonical name "Magnoliidae" for
this taxon.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Peter<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Wed, Mar 14, 2012 at
14:13, Richard Pyle <<a
moz-do-not-send="true"
href="mailto:deepreef@bishopmuseum.org"
target="_blank">deepreef@bishopmuseum.org</a>>
wrote:<o:p></o:p></p>
<p class="MsoNormal"><br>
I guess the parts that confuse me are:<br>
<br>
1) What providers are able to produce a
canonicalScientificName as per Peter’s
definition, but are unable to provide the
pre-parsed elements of genus | subgenus |
specificEpithet | infraspecificEpithet?<br>
<br>
2) What consumers could make use of a
canonicalScientificName as per Peter’s
definition, but are unable to make (even
better) use of the pre-parsed elements of
genus | subgenus | specificEpithet |
infraspecificEpithet?<br>
<br>
Aloha,<br>
Rich<br>
<br>
<br>
<br>
<br>
From: <a moz-do-not-send="true"
href="mailto:tdwg-content-bounces@lists.tdwg.org"
target="_blank">tdwg-content-bounces@lists.tdwg.org</a>
[mailto:<a moz-do-not-send="true"
href="mailto:tdwg-content-bounces@lists.tdwg.org"
target="_blank">tdwg-content-bounces@lists.tdwg.org</a>]
On Behalf Of Peter Desmet<br>
Sent: Wednesday, March 14, 2012 7:03 AM<br>
To: Donald Hobern (GBIF)<br>
Cc: TDWG content mailing list; Christian
Gendreau; Tim Robertson [GBIF]; TDWG TAG
mailing list; dev Developers<br>
Subject: Re: [tdwg-content] Canonical name
parsing<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"
style="margin-bottom:12.0pt"><br>
Hi Donald,<br>
<br>
scientificName, with its current
definition [1] is a great term and
should be continued to used as such.
As with most Darwin Core terms, it
offers flexibility, so its not an
impediment for publishing data. In the
GBIF context, this term is considered
mandatory: records without it are
ignored during indexing (I believe).
All of this can stay.<br>
<br>
canonicalScientificName would be an
additional term with a clear rule (see
my proposed definition [2]). This is
the case for other Darwin Core terms
as well, such as<br>
decimalLatitude [3],
minimalElevationInMeters [4] or
countryCode [5]. They serve as an
ready-to-use addition/alternative to
verbatimLatitude [6],
verbatimElevation [7] and country [8]
respectively. These terms don't stop
anyone from publishing data, but data
publishers who can provide this kind
of information have the choice to do
so. It would be the same for
canonicalScientificName.<br>
<br>
And yes, an aggregator like GBIF can
play an important role in providing
consistent data to its users and
figuring out what they really need,
but not all data is consumed that way.
In addition, I hope a user would be
able to download cleaned data from the
GBIF portal as Darwin Core. Wouldn't
it be nice that the parsed
canonicalScientificName created by
GBIF can be provided in its proper
term? There are users out there who
want this!<br>
<br>
Regards,<br>
<br>
Peter<br>
<br>
[1] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#scientificName"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#scientificName</a><br>
[2] <a moz-do-not-send="true"
href="http://code.google.com/p/darwincore/issues/detail?id=150"
target="_blank">http://code.google.com/p/darwincore/issues/detail?id=150</a><br>
[3] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#decimalLatitude"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#decimalLatitude</a><br>
[4] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#minimumElevationInMeters"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#minimumElevationInMeters</a><br>
[5] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#countryCode"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#countryCode</a><br>
[6] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#verbatimLatitude"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#verbatimLatitude</a><br>
[7] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#verbatimElevation"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#verbatimElevation</a><br>
[8] <a moz-do-not-send="true"
href="http://rs.tdwg.org/dwc/terms/index.htm#country"
target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#country</a><br>
<br>
On Wed, Mar 14, 2012 at 11:19, Donald
Hobern (GBIF) <<a
moz-do-not-send="true"
href="mailto:dhobern@gbif.org"
target="_blank">dhobern@gbif.org</a>>
wrote:<br>
><br>
> Hi Peter.<br>
><br>
> I certainly agree that
aggregators only represent one use
case here but, having seen a lot of
the mess of real-world data, I don't
believe that simply adding a new term
will fix this problem for the users
you describe. To get the results you
want, we would need a sufficiently
large majority of data sets to follow
the rules perfectly that we could
ignore those that were non-conformant.
This would mean we should mandate
that every data set must use the new
element (with or without the existing
scientificName element) and that they
must present scientific names in the
expected way (or else have their data
considered non-compliant). Until now,
the philosophy on publishing Darwin
Core data has been to make it as easy
as possible for data providers to
expose their data, even at the expense
of greater complexity for consumers.
I suspect that we would have a lot
less data available for use now if we
had taken a more stringent approach.<br>
><br>
> In some ways, this proposal
reminds me of the structures in ABCD
which seek to offer users verbatim and
more normalised ways to represent
several types of information. This
actually makes consuming all the
possible forms of such data very
complex, since a record may contain
all variant forms or just any one of
them. If multiple forms are
available, which one should be
considered the primary version?<br>
><br>
> I suspect that things may also
get complicated as soon as you discuss
botanical subspecies, varieties,
subvarieties, forms and subforms.
There are recommended ways to
abbreviate the rank markers in these
cases but some variation can be
expected.<br>
><br>
> Of course aggregators should be
providing more robust services for
accessing exactly what you want in a
consistent, predictable way and I
would suggest that the best place to
attack the problem is to define
exactly what a typical user needs to
see and then for GBIF and similar
projects to work on delivering
predictable data downloads and web
services that clean out all of these
nomenclatural inconsistencies - and
perhaps also add value in other ways
such as augmenting the data with
associated environmental values (as
the Atlas of Living Australia does).
This would allow us all to work
together on developing a consistent
and predictable algorithm for handling
interpretation of name strings,
including synonymy, misspellings,
virus names and everything else that
makes this such a difficult problem.<br>
><br>
> Best wishes,<br>
><br>
> Donald<br>
><br>
>
----------------------------------------------------------------------<br>
> Donald Hobern - GBIF Director - <a
moz-do-not-send="true"
href="mailto:dhobern@gbif.org"
target="_blank">dhobern@gbif.org</a><br>
> Global Biodiversity Information
Facility <a moz-do-not-send="true"
href="http://www.gbif.org/"
target="_blank">http://www.gbif.org/</a><br>
> GBIF Secretariat,
Universitetsparken 15, DK-2100
Copenhagen Ø, Denmark<br>
> Tel: <a moz-do-not-send="true"
href="tel:%2B45%203532%201471"
target="_blank">+45 3532 1471</a>
Mob: <a moz-do-not-send="true"
href="tel:%2B45%202875%201471"
target="_blank">+45 2875 1471</a>
Fax: <a moz-do-not-send="true"
href="tel:%2B45%202875%201480"
target="_blank">+45 2875 1480</a><br>
>
----------------------------------------------------------------------<br>
><br>
><br>
> -----Original Message-----<br>
> From: <a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com" target="_blank">peter.desmet.cubc@gmail.com</a>
[mailto:<a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com"
target="_blank">peter.desmet.cubc@gmail.com</a>]
On Behalf Of Peter Desmet<br>
> Sent: Wednesday, March 14, 2012
3:41 PM<br>
> To: Tim Robertson [GBIF]<br>
> Cc: Donald Hobern (GBIF); dev
Developers; TDWG content mailing list;
TDWG TAG mailing list; Christian
Gendreau<br>
> Subject: Re: Canonical name
parsing<br>
><br>
> Hi Tim,<br>
><br>
> I agree, aggregators like GBIF
and Canadensys will have to deal with
clean and dirty data in each field
anyway: they need code libraries to
deal with this and it is good that
these are being developed. But, that
doesn't help someone who wants to use
data from a Darwin Core Archive with
his data in Excel or a Roderic Page
who wants to get things done for a
prototype.<br>
> Having to use Java libraries or
even the Name Parser [1] (though both<br>
> great) is a barrier to data use.
Darwin Core (Archives) is not only
used for machine to machine
interaction, humans use it too, and I
think we should allow easy hacking (I
mean this in the good sense),
especially for something as important
as the scientific name.<br>
> In addition, as a data publisher
(e.g. for our VASCAN checklist) I<br>
> *do* have the information to
provide a clean and simple to use
canonicalScientificName, but I just
can't share it via the otherwise
excellent biodiversity sharing
standard Darwin Core. I think that's a
pity.<br>
><br>
> Peter<br>
><br>
> [1] <a moz-do-not-send="true"
href="http://tools.gbif.org/nameparser/"
target="_blank">http://tools.gbif.org/nameparser/</a><br>
> [2] <a moz-do-not-send="true"
href="http://data.canadensys.net/vascan"
target="_blank">http://data.canadensys.net/vascan</a><br>
><br>
> PS: Yes, Canadensys will use the
GBIF interpretation libraries. Since
we develop in Java as well, using
those libraries is as easy as the
proverbial "one line of code". We're
looking forward in testing them and
providing patches to enhance them.
Open source FTW! :-)<br>
><br>
><br>
> On Wed, Mar 14, 2012 at 07:32,
Tim Robertson [GBIF] <<a
moz-do-not-send="true"
href="mailto:trobertson@gbif.org"
target="_blank">trobertson@gbif.org</a>>
wrote:<br>
> > Hi Peter,<br>
> ><br>
> > I'm replying off the TDWG
list, since it is a bit of a tangent
to your discussion. If you feel it is
relevant, please CC the list again.<br>
> ><br>
> > At GBIF as you know, we have
to interpret all kinds of quality of
content. I tend to agree with Donald
that this would not really help in
consumption, as in my experience we
will have to deal with both clean and
dirty data in each field *anyway* when
this is used at network scale. I
would rather see us evolve the
interpretation libraries to handle all
the corner cases, which we need to
develop anyway. We already do a
pretty decent job at extracting
canonicals. This is further enhanced
when you couple the extracted
canonical with a fuzzy match against
the "authoritative names" we can now
index thanks to the availability of
checklists in DwC-A format.<br>
> ><br>
> > I know you are a Java shop.
Are you using the GBIF interpretation
libraries [1] at the moment? If not,
is there a reason why you don't?<br>
> > They are used in all GBIF
projects (portal, checklistbank etc),
and the more we enhance them, the
better it is for everyone. We have a
significant test coverage [2,3] and
there have been quite some man months
(years?) spent already in their
development and with some real regular
expression experts (most notably
Markus D. and Dave M.). All our work
is Maven-ized, versioned and available
in our Maven repository [4].<br>
> ><br>
> > I hope these are interesting
to you. We would welcome any patches
to enhance them, or assistance in
identifying the corner cases and
capturing those as unit tests.<br>
> ><br>
> > Hope this helps,<br>
> > Tim<br>
> ><br>
> > [1]<br>
> > <a moz-do-not-send="true"
href="http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src"
target="_blank">http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src</a><br>
> >
/main/java/org/gbif/ecat/parser/NameParser.java<br>
> > [2]<br>
> > <a moz-do-not-send="true"
href="http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src"
target="_blank">http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src</a><br>
> >
/test/java/org/gbif/ecat/parser/NameParserTest.java<br>
> > [3]<br>
> > <a moz-do-not-send="true"
href="http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src"
target="_blank">http://code.google.com/p/gbif-ecat/source/browse/trunk/ecat-common/src</a><br>
> > /#src%2Ftest%2Fresources [4]<br>
> > <a moz-do-not-send="true"
href="http://repository.gbif.org/index.html#nexus-search;quick%7Eecat-common"
target="_blank">http://repository.gbif.org/index.html#nexus-search;quick~ecat-common</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> Peter Desmet<br>
> Biodiversity Informatics Manager<br>
> Canadensys - <a
moz-do-not-send="true"
href="http://www.canadensys.net"
target="_blank">www.canadensys.net</a><br>
><br>
> Université de Montréal
Biodiversity Centre<br>
> 4101 rue Sherbrooke est<br>
> Montreal, QC, H1X2B2<br>
> Canada<br>
><br>
> Phone: <a moz-do-not-send="true"
href="tel:514-343-6111%20%2382354"
target="_blank">514-343-6111 #82354</a><br>
> Fax: <a moz-do-not-send="true"
href="tel:514-343-2288"
target="_blank">514-343-2288</a><br>
> Email: <a moz-do-not-send="true"
href="mailto:peter.desmet@umontreal.ca" target="_blank">peter.desmet@umontreal.ca</a>
/ <a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com"
target="_blank">peter.desmet.cubc@gmail.com</a><br>
> Skype: anderhalv<br>
> Public profile: <a
moz-do-not-send="true"
href="http://www.linkedin.com/in/peterdesmet"
target="_blank">http://www.linkedin.com/in/peterdesmet</a><br>
><br>
><br>
>
_______________________________________________<br>
> tdwg-content mailing list<br>
> <a moz-do-not-send="true"
href="mailto:tdwg-content@lists.tdwg.org"
target="_blank">tdwg-content@lists.tdwg.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.tdwg.org/mailman/listinfo/tdwg-content"
target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
<br>
<br>
<br>
<br>
--<br>
Peter Desmet<br>
Biodiversity Informatics Manager<br>
Canadensys - <a
moz-do-not-send="true"
href="http://www.canadensys.net"
target="_blank">www.canadensys.net</a><br>
<br>
Université de Montréal Biodiversity
Centre<br>
4101 rue Sherbrooke est<br>
Montreal, QC, H1X2B2<br>
Canada<br>
<br>
Phone: <a moz-do-not-send="true"
href="tel:514-343-6111%20%2382354"
target="_blank">514-343-6111 #82354</a><br>
Fax: <a moz-do-not-send="true"
href="tel:514-343-2288"
target="_blank">514-343-2288</a><br>
Email: <a moz-do-not-send="true"
href="mailto:peter.desmet@umontreal.ca"
target="_blank">peter.desmet@umontreal.ca</a>
/ <a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com"
target="_blank">peter.desmet.cubc@gmail.com</a><br>
Skype: anderhalv<br>
Public profile: <a
moz-do-not-send="true"
href="http://www.linkedin.com/in/peterdesmet"
target="_blank">http://www.linkedin.com/in/peterdesmet</a><br>
<br>
<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal">This message is only
intended for the addressee named above.
Its contents may be privileged or
otherwise protected. Any unauthorized
use, disclosure or copying of this message
or its contents is prohibited. If you
have received this message by mistake,
please notify us immediately by reply mail
or by collect telephone call. Any
personal opinions expressed in this
message do not necessarily represent the
views of the Bishop Museum.<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal">_______________________________________________<br>
tdwg-content mailing list<br>
<a moz-do-not-send="true"
href="mailto:tdwg-content@lists.tdwg.org"
target="_blank">tdwg-content@lists.tdwg.org</a><br>
<a moz-do-not-send="true"
href="http://lists.tdwg.org/mailman/listinfo/tdwg-content"
target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><br>
<br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">-- <br>
Peter Desmet<br>
Biodiversity Informatics Manager<br>
Canadensys - <a moz-do-not-send="true"
href="http://www.canadensys.net"
target="_blank">www.canadensys.net</a><br>
<br>
Université de Montréal Biodiversity Centre<br>
4101 rue Sherbrooke est<br>
Montreal, QC, H1X2B2<br>
Canada<br>
<br>
Phone: <a moz-do-not-send="true"
href="tel:514-343-6111%20%2382354"
target="_blank">514-343-6111 #82354</a><br>
Fax: <a moz-do-not-send="true"
href="tel:514-343-2288" target="_blank">514-343-2288</a><br>
Email: <a moz-do-not-send="true"
href="mailto:peter.desmet@umontreal.ca"
target="_blank">peter.desmet@umontreal.ca</a>
/ <a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com"
target="_blank">peter.desmet.cubc@gmail.com</a><br>
Skype: anderhalv<br>
Public profile: <a moz-do-not-send="true"
href="http://www.linkedin.com/in/peterdesmet"
target="_blank">http://www.linkedin.com/in/peterdesmet</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p><span
style="font-size:18.0pt;font-family:Webdings;color:#78BE20">P</span><span
style="font-family:"Courier
New";color:#78BE20"> </span><span
style="font-size:8.0pt;font-family:"Comic Sans
MS";color:#78BE20">Think Green - don't print this
email unless you really need to </span><o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span
style="font-size:7.5pt;font-family:"Arial","sans-serif"">************************************************************************<br>
The information contained in this e-mail and any
files transmitted with it is confidential and is for
the exclusive use of the intended recipient. If you
are not the intended recipient please note that any
distribution, copying or use of this communication
or the information in it is prohibited. <br>
<br>
Whilst CAB International trading as CABI takes steps
to prevent the transmission of viruses via e-mail,
we cannot guarantee that any e-mail or attachment is
free from computer viruses and you are strongly
advised to undertake your own anti-virus
precautions.</span><o:p></o:p></p>
</div>
<p class="MsoNormal"><span
style="font-size:7.5pt;font-family:"Arial","sans-serif"">If
you have received this communication in error, please
notify us by e-mail at <a moz-do-not-send="true"
href="mailto:cabi@cabi.org" target="_blank">cabi@cabi.org</a>
or by telephone on <a moz-do-not-send="true"
href="tel:%2B44%20%280%291491%20832111"
target="_blank">+44 (0)1491 832111</a> and then
delete the e-mail and any copies of it.<br>
<br>
CABI is an International Organization recognised by
the UK Government under Statutory Instrument 1982 No.
1071...<br>
<br>
**************************************************************************</span><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
tdwg-tag mailing list<br>
<a moz-do-not-send="true"
href="mailto:tdwg-tag@lists.tdwg.org">tdwg-tag@lists.tdwg.org</a><br>
<a moz-do-not-send="true"
href="http://lists.tdwg.org/mailman/listinfo/tdwg-tag"
target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-tag</a><o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
<br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">-- <br>
Peter Desmet<br>
Biodiversity Informatics Manager<br>
Canadensys - <a moz-do-not-send="true"
href="http://www.canadensys.net" target="_blank">www.canadensys.net</a><br>
<br>
Université de Montréal Biodiversity Centre<br>
4101 rue Sherbrooke est<br>
Montreal, QC, H1X2B2<br>
Canada<br>
<br>
Phone: 514-343-6111 #82354<br>
Fax: 514-343-2288<br>
Email: <a moz-do-not-send="true"
href="mailto:peter.desmet@umontreal.ca" target="_blank">peter.desmet@umontreal.ca</a>
/ <a moz-do-not-send="true"
href="mailto:peter.desmet.cubc@gmail.com" target="_blank">peter.desmet.cubc@gmail.com</a><br>
Skype: anderhalv<br>
Public profile: <a moz-do-not-send="true"
href="http://www.linkedin.com/in/peterdesmet"
target="_blank">http://www.linkedin.com/in/peterdesmet</a><o:p></o:p></p>
</div>
</div>
<br clear="all">
<style type="text/css"> .style1 { font-family: Arial; } .style2 { font-size: 9.0pt; } .style3 { color: #000000; } </style>
<p><span class="style1"><span class="style2">Edinburgh Napier
University is one of Scotland's top universities for
graduate employability. 93.2% of graduates are in work or
further study within six months of leaving. This university
is also proud winner of the Queen's Anniversary Prize for
Higher and Further Education 2009, awarded for innovative
housing construction for environmental benefit and quality
of life. </span></span></p>
<p><span class="style1"><span class="style2">
<span class="style1<span class=" style1=""><span
class="style3">This message is intended for the
addressee(s) only and should not be read, copied or
disclosed to anyone else outwith the University without
the permission of the sender.<br>
It is your responsibility to ensure that this message
and any attachments are scanned for viruses or other
defects. Edinburgh Napier University does not accept
liability for any loss or damage which may result from
this email or any attachment, or for errors or omissions
arising after it was sent. Email is not a secure medium.
Email entering the University's system is subject to
routine monitoring and filtering by the University.<br>
<br>
Edinburgh Napier University is a registered Scottish
charity. Registration number SC018373</span></span></span></span></p>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>
</pre>
</body>
</html>