It does not look so to me, this is perhaps the first 5%
of the way. The point is that a parser will never convert
a text string into a scientific name (except by accident),
but will at most label several components, opening the
way for a more sophisticated algorithm (although the
'more sophisticated algorithm' would probably work just as
well without a parser).
It does look viable to build such a 'more sophisticated
algorithm' now, as the spellcheckers that are built into
so much software these days would do the job, if they
were loaded with a sufficiently comprehensive vocabulary.
However, I don't see any sign of that happening?
Paul van Rijckevorsel
-----Oorspronkelijk bericht-----
Van: David Remsen (GBIF) [mailto:dremsen@gbif.org]
Verzonden: do 9-12-2010 16:05
Aan: Bob Morris
CC: David Remsen (GBIF); dipteryx@freeler.nl; tdwg-content@lists.tdwg.org
Onderwerp: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
The GBIF web parser service is a good step in that direction.
http://tools.gbif.org/nameparser/
Select the test names and review the extended output.
David
On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:
>> ...
>
>> Obviously, it would be nice if algorithms did exist which could
>> convert
>> a text string into a scientific name, but this still lies in the
>> future.
>
> For those of us attempting to populate databases with information
> extracted from published literature, the future is now. It seems to me
> that normalizing the extraction to some standardized form \before/
> putting it in the database is more robust than forcing the parsing to
> be done afterwards. So we need rules for those forms, and an
> unambiguous way in our metadata to cite which rules have been
> followed. In a previous post my p.s. also whined about a similar need
> for born-digital taxonomic treatments.
>
> Bob
>
>
>
>
> --
> Robert A. Morris
> Emeritus Professor of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: morris.bob@gmail.com
> web: http://bdei.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
> _______________________________________________
> tdwg-content mailing list
> tdwg-content@lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>