<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Re: [tdwg-content] proposed term: dwc:verbatimScientificName</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>It does not look so to me, this is perhaps the first 5%<BR>
of the way. The point is that a parser will never convert<BR>
a text string into a scientific name (except by accident),<BR>
but will at most label several components, opening the<BR>
way for a more sophisticated algorithm (although the<BR>
'more sophisticated algorithm' would probably work just as<BR>
well without a parser).<BR>
<BR>
It does look viable to build such a 'more sophisticated<BR>
algorithm' now, as the spellcheckers that are built into<BR>
so much software these days would do the job, if they<BR>
were loaded with a sufficiently comprehensive vocabulary.<BR>
However, I don't see any sign of that happening?<BR>
<BR>
Paul van Rijckevorsel<BR>
<BR>
<BR>
-----Oorspronkelijk bericht-----<BR>
Van: David Remsen (GBIF) [<A HREF="mailto:dremsen@gbif.org">mailto:dremsen@gbif.org</A>]<BR>
Verzonden: do 9-12-2010 16:05<BR>
Aan: Bob Morris<BR>
CC: David Remsen (GBIF); dipteryx@freeler.nl; tdwg-content@lists.tdwg.org<BR>
Onderwerp: Re: [tdwg-content] proposed term: dwc:verbatimScientificName<BR>
<BR>
The GBIF web parser service is a good step in that direction.<BR>
<BR>
<A HREF="http://tools.gbif.org/nameparser/">http://tools.gbif.org/nameparser/</A><BR>
<BR>
Select the test names and review the extended output.<BR>
<BR>
David<BR>
On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:<BR>
<BR>
>> ...<BR>
><BR>
>> Obviously, it would be nice if algorithms did exist which could <BR>
>> convert<BR>
>> a text string into a scientific name, but this still lies in the <BR>
>> future.<BR>
><BR>
> For those of us attempting to populate databases with information<BR>
> extracted from published literature, the future is now. It seems to me<BR>
> that normalizing the extraction to some standardized form \before/<BR>
> putting it in the database is more robust than forcing the parsing to<BR>
> be done afterwards. So we need rules for those forms, and an<BR>
> unambiguous way in our metadata to cite which rules have been<BR>
> followed. In a previous post my p.s. also whined about a similar need<BR>
> for born-digital taxonomic treatments.<BR>
><BR>
> Bob<BR>
><BR>
><BR>
><BR>
><BR>
> --<BR>
> Robert A. Morris<BR>
> Emeritus Professor of Computer Science<BR>
> UMASS-Boston<BR>
> 100 Morrissey Blvd<BR>
> Boston, MA 02125-3390<BR>
> Associate, Harvard University Herbaria<BR>
> email: morris.bob@gmail.com<BR>
> web: <A HREF="http://bdei.cs.umb.edu/">http://bdei.cs.umb.edu/</A><BR>
> web: <A HREF="http://etaxonomy.org/mw/FilteredPush">http://etaxonomy.org/mw/FilteredPush</A><BR>
> <A HREF="http://www.cs.umb.edu/~ram">http://www.cs.umb.edu/~ram</A><BR>
> phone (+1) 857 222 7992 (mobile)<BR>
> _______________________________________________<BR>
> tdwg-content mailing list<BR>
> tdwg-content@lists.tdwg.org<BR>
> <A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
><BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>