When I'm mapping specimen codes to GBIF I have one query interface and one return format. If I have to go to individual providers then all bets are off. Perhaps I'm lucky and the provider supports something like linked data, so I can figure out how to retrieve data (as opposed to a human-friendly web page). But instead I expect we will have all sorts of formats. For example, today I discovered records in GenBank that are linked to a tissue database with web pages like this:

So, I have to write code to scrape this page and get the bit I need (the voucher code). Really? In this day and age? On the one had it's great that this information exists, but if it's not computer readable then make it harder to integrate the data.

Even if we use standard vocabularies we can still have problems. BigDig found a whole range of different versions of Darwin Core in the wild (see http://bigdig.ecoforge.net/wiki/SchemaStatus ), and I suspect this is one of the sources of GBIF's problems (whoever decided that catalogNumber and catalogNumberText where a good idea has a lot to answer for).

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html