[tdwg-content] proposed term: dwc:verbatimScientificName

dipteryx at freeler.nl dipteryx at freeler.nl
Thu Dec 9 17:32:17 CET 2010


It does not look so to me, this is perhaps the first 5% 
of the way. The point is that a parser will never convert 
a text string into a scientific name (except by accident), 
but will at most label several components, opening the
way for a more sophisticated algorithm (although the 
'more sophisticated algorithm' would probably work just as
well without a parser).

It does look viable to build such a 'more sophisticated 
algorithm' now, as the spellcheckers that are built into
so much software these days would do the job, if they 
were loaded with a sufficiently comprehensive vocabulary.
However, I don't see any sign of that happening?

Paul van Rijckevorsel


-----Oorspronkelijk bericht-----
Van: David Remsen (GBIF) [mailto:dremsen at gbif.org]
Verzonden: do 9-12-2010 16:05
Aan: Bob Morris
CC: David Remsen (GBIF); dipteryx at freeler.nl; tdwg-content at lists.tdwg.org
Onderwerp: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
 
The GBIF web parser service is a good step in that direction.

http://tools.gbif.org/nameparser/

Select the test names and review the extended output.

David
On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:

>> ...
>
>> Obviously, it would be nice if algorithms did exist which could  
>> convert
>> a text string into a scientific name, but this still lies in the  
>> future.
>
> For those of us attempting to populate databases with information
> extracted from published literature, the future is now. It seems to me
> that normalizing the extraction to some standardized form \before/
> putting it in the database is more robust than forcing the parsing to
> be done afterwards.  So we need  rules for those forms, and an
> unambiguous way in our metadata to cite which rules have been
> followed.  In a previous post my p.s. also whined about a similar need
> for born-digital taxonomic treatments.
>
> Bob
>
>
>
>
> -- 
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: morris.bob at gmail.com
> web: http://bdei.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101209/303eb0eb/attachment.html 


More information about the tdwg-content mailing list