<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">

<TITLE>Re: [tdwg-content] proposed term: dwc:verbatimScientificName</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>It does not look so to me, this is perhaps the first 5%<BR>

of the way. The point is that a parser will never convert<BR>

a text string into a scientific name (except by accident),<BR>

but will at most label several components, opening the<BR>

way for a more sophisticated algorithm (although the<BR>

'more sophisticated algorithm' would probably work just as<BR>

well without a parser).<BR>

<BR>

It does look viable to build such a 'more sophisticated<BR>

algorithm' now, as the spellcheckers that are built into<BR>

so much software these days would do the job, if they<BR>

were loaded with a sufficiently comprehensive vocabulary.<BR>

However, I don't see any sign of that happening?<BR>

<BR>

Paul van Rijckevorsel<BR>

<BR>

<BR>

-----Oorspronkelijk bericht-----<BR>

Van: David Remsen (GBIF) [<A HREF="mailto:dremsen@gbif.org">mailto:dremsen@gbif.org</A>]<BR>

Verzonden: do 9-12-2010 16:05<BR>

Aan: Bob Morris<BR>

CC: David Remsen (GBIF); dipteryx@freeler.nl; tdwg-content@lists.tdwg.org<BR>

Onderwerp: Re: [tdwg-content] proposed term: dwc:verbatimScientificName<BR>

<BR>

The GBIF web parser service is a good step in that direction.<BR>

<BR>

<A HREF="http://tools.gbif.org/nameparser/">http://tools.gbif.org/nameparser/</A><BR>

<BR>

Select the test names and review the extended output.<BR>

<BR>

David<BR>

On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:<BR>

<BR>

&gt;&gt; ...<BR>

&gt;<BR>

&gt;&gt; Obviously, it would be nice if algorithms did exist which could&nbsp;<BR>

&gt;&gt; convert<BR>

&gt;&gt; a text string into a scientific name, but this still lies in the&nbsp;<BR>

&gt;&gt; future.<BR>

&gt;<BR>

&gt; For those of us attempting to populate databases with information<BR>

&gt; extracted from published literature, the future is now. It seems to me<BR>

&gt; that normalizing the extraction to some standardized form \before/<BR>

&gt; putting it in the database is more robust than forcing the parsing to<BR>

&gt; be done afterwards.&nbsp; So we need&nbsp; rules for those forms, and an<BR>

&gt; unambiguous way in our metadata to cite which rules have been<BR>

&gt; followed.&nbsp; In a previous post my p.s. also whined about a similar need<BR>

&gt; for born-digital taxonomic treatments.<BR>

&gt;<BR>

&gt; Bob<BR>

&gt;<BR>

&gt;<BR>

&gt;<BR>

&gt;<BR>

&gt; --<BR>

&gt; Robert A. Morris<BR>

&gt; Emeritus Professor&nbsp; of Computer Science<BR>

&gt; UMASS-Boston<BR>

&gt; 100 Morrissey Blvd<BR>

&gt; Boston, MA 02125-3390<BR>

&gt; Associate, Harvard University Herbaria<BR>

&gt; email: morris.bob@gmail.com<BR>

&gt; web: <A HREF="http://bdei.cs.umb.edu/">http://bdei.cs.umb.edu/</A><BR>

&gt; web: <A HREF="http://etaxonomy.org/mw/FilteredPush">http://etaxonomy.org/mw/FilteredPush</A><BR>

&gt; <A HREF="http://www.cs.umb.edu/~ram">http://www.cs.umb.edu/~ram</A><BR>

&gt; phone (+1) 857 222 7992 (mobile)<BR>

&gt; _______________________________________________<BR>

&gt; tdwg-content mailing list<BR>

&gt; tdwg-content@lists.tdwg.org<BR>

&gt; <A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>

&gt;<BR>

<BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>