[tdwg-tag] Specimen identifiers [SEC=UNCLASSIFIED]
r.page at bio.gla.ac.uk
Wed Feb 29 18:49:12 CET 2012
One further reason for centralisation (again, not "instead of" but "as well as") is consistency of metadata.
When I'm mapping specimen codes to GBIF I have one query interface and one return format. If I have to go to individual providers then all bets are off. Perhaps I'm lucky and the provider supports something like linked data, so I can figure out how to retrieve data (as opposed to a human-friendly web page). But instead I expect we will have all sorts of formats. For example, today I discovered records in GenBank that are linked to a tissue database with web pages like this:
http://collections.nhm.ku.edu/KU_Tissue/detail.jsp?record=367 (from sequence http://www.ncbi.nlm.nih.gov/nuccore/FJ215165 )
So, I have to write code to scrape this page and get the bit I need (the voucher code). Really? In this day and age? On the one had it's great that this information exists, but if it's not computer readable then make it harder to integrate the data.
Even if we use standard vocabularies we can still have problems. BigDig found a whole range of different versions of Darwin Core in the wild (see http://bigdig.ecoforge.net/wiki/SchemaStatus ), and I suspect this is one of the sources of GBIF's problems (whoever decided that catalogNumber and catalogNumberText where a good idea has a lot to answer for).
This is one reason I argue that we want both centralisation and decentralisation.
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tdwg-tag