[tdwg] Research post in concept extraction

Dave Roberts workpackage6 at googlemail.com
Tue Sep 30 15:51:27 CEST 2008

The Open University and the Natural History Museum are seeking a  
postdoctoral researcher for a year (approx. 28k pa) to work on  
concept extraction from scanned taxonomic literature.

Scanned texts contain errors introduced by imperfect OCR and other  
sources, so techniques are required that are robust in the face of  
such errors. The successful applicant will develop techniques that  
use typographical and contextual cues to identify and tag relevant  
document content.

The ideal candidate would have a PhD (or equivalent experience), and  
experience in one or more of the following:
-	natural language processing/information extraction/information  
retrieval, in particular from noisy data;
-	image analysis and feature extraction;
-	document layout (reverse-engineering a DTD);
-	XML for mark-up and term annotation;
-	broad familiarity with biological systematics.

Good programming skills are essential, as is the ability to learn  
quickly. Applications from candidates with a background in the  
biological sciences who can demonstrate appropriate computing skills  
are encouraged.

For project description see http://editwebrevisions.info/content/jobs

Apply through to http://www3.open.ac.uk/employment, or email the  
Recruitment Secretary at MCS-Recruitment_at_open.ac.uk quoting the  
reference number. Closing date: 16th October 2008.

For enquiries about the research project, please contact: David Morse  
Dr D.McL. Roberts,        Tel: +44 (0)20 7942 5086
European Distributed Institute of Taxonomy Project,
Ccordinator WorkPackage 6 (Unifying Revisionary Taxonomy),
Dept. Zoology,
The Natural History Museum,
Cromwell Road,
London        SW7 5BD
Great Britain             Email: dmr at nomencurator.org
Web page:  http://www.editwebrevisions.info/
Web page:  http://www.e-taxonomy.eu/
In general, it is easier to get forgiveness than to get permission.

More information about the tdwg mailing list