[tdwg-content] iDigBio Augmenting OCR October Workshop, February Hackathon Invitation

Deb Paul dpaul at fsu.edu
Wed Aug 22 21:54:33 CEST 2012

Invitation to Augmented OCR Best Practices Workshop and Hack-a-thon Planning

iDigBio (https://www.idigbio.org/) is running a workshop (October 1-2, 
2012) and hack-a-thon (February 2013) to identify best practices and 
develop tools to get information from museum labels into computers.

We are seeking individuals to participate in the "iDigBio Augmenting 
OCR" workshop on October 1-2. The objective of the workshop is to 
improve OCR output and subsequent manipulation by algorithms to extract 
the content of biological collection specimen labels and notes and have 
them efficiently and accurately inserted into a database for future use. 
Participants in the October workshop plan to narrow the hack-a-thon 
focus down to specific programmatic goals for software developers 
working at a hackathon to be held in February of 2013.

Most broadly there can be four main steps to digitization: create an 
image, process the image to text using Optical Character Recognition 
(OCR) and/or human typists, break the content of the text into 
semantically useful fields such as family, scientific name, collector, 
date collected, location, habitat, growth habit and other fields and 
finally format this information for injection into a database. The 
participants will help to identify and collect images that are 
representative of those that will be needed by the biology community. 
This collection of images will serve as the working set for developers 
in the February Hack-a-thon.

The October workshop participants plan to identify OCR output products 
that will be useful for the community as well as metrics that help 
evaluate how well different automation approaches produce these 
products. This may include measures of accuracy of the OCR but also 
accuracy of automated error correction, effectiveness of breaking text 
into meaningful semantic units such as precision, recall and F-Score. We 
seek biologists, programmers and others involved in the digitization 
process to participate in this October workshop to plan the February 
hack-a-thon and participate in the hackathon itself.

Anyone can view our wish list at http://tinyurl.com/OCRHackathonWishList 
of some possible goals we have for optimizing machine and natural 
language processing algorithms used on OCR output from specimen labels. 
If interested in participating and you would like to know more please 
email asap to: Debbie Paul, dpaul at fsu.edu Deadline Thursday, August 30th 
to participate in the Oct 1 - 2 workshop.

Looking forward to your participation, From all of us in the iDigBio 
Augmenting OCR Working Group
Please forward to other interested listserves - thanks!

Deborah Paul
User Services, iDigBio
Institute for Digital Information, iDigInfo
Florida State University
Tallahassee, Florida 32308

More information about the tdwg-content mailing list