The Open University and the Natural History Museum are seeking a postdoctoral researcher for a year (approx. 28k pa) to work on concept extraction from scanned taxonomic literature.
Scanned texts contain errors introduced by imperfect OCR and other sources, so techniques are required that are robust in the face of such errors. The successful applicant will develop techniques that use typographical and contextual cues to identify and tag relevant document content.
The ideal candidate would have a PhD (or equivalent experience), and experience in one or more of the following: - natural language processing/information extraction/information retrieval, in particular from noisy data; - image analysis and feature extraction; - document layout (reverse-engineering a DTD); - XML for mark-up and term annotation; - broad familiarity with biological systematics.
Good programming skills are essential, as is the ability to learn quickly. Applications from candidates with a background in the biological sciences who can demonstrate appropriate computing skills are encouraged.
For project description see http://editwebrevisions.info/content/jobs
Apply through to http://www3.open.ac.uk/employment, or email the Recruitment Secretary at MCS-Recruitment_at_open.ac.uk quoting the reference number. Closing date: 16th October 2008.
For enquiries about the research project, please contact: David Morse [d.r.morse_at_open.ac.uk].