GUIDs, LSIDs, and metadata
Not much is happening on the list side of things, so in the interest of sparking discussion here are a few thoughts.
1. GUIDs by themselves are trivial. We are awash in them (book ISBNs, GenBank accession numbers, etc.). Software developers generate them all the time for things Windows components, Firefox extensions, web objects, etc. There are tools for making these, e.g. here's one: AAF813DE-21E0-11DA-A940-000D93425524.
2. The key is to link GUIDs to information, and for that information to be in a predictable form. For example, DOIs are widely used GUIDs, but when you resolve a DOI you have no idea what to expect. You might get a PDF or HTML view of a manuscript, or just an abstract, or a page asking for money to view a manuscript. The format of the response varies widely.
3. Of course, GUIDs ARE vital. The DiGIR protocol's biggest weakness, in my opinion, is that it fails to provide GUIDs. Whereas it does provide information in a standard form (Darwin Core), the user has no way of getting a GUID. I'd briefly toyed with an interim solution for a project I'm working on. A DiGIR GUID would be
digir.fieldmuseum.org:80/digir/DiGIR.php:MammalsDwC2:158106
which is the address of the DiGIR provider, the Resource name, and the specimen number (in this case, the specimen is FMNH 158106). This plan was scuppered by the fact that more than one specimen can have the same specimen code.For example the Museum of Vertebrate Zoology has three speciemns with the code MVZ 148946, corresponding to the taxa Chaetodipus baileyi baileyi, Calidris mauri, and Rana cascadae. A DiGIR request for specimen MVZ 148946 returns three totally different specimens!
4. I like LSIDs (despite the overhead of setting them up), but for me the main attraction is their use of metadata in RDF. This opens up a world of tools from the Semantci Web community, such as triple stores (databases for RDF). One can harvest metadata and store this is a "knowledge base." As this knowledge base grows we can uncover new facts. For example, NCBI doesn't know that Gliricidia ehrenbergii and Hybosema ehrenbergii are synonyms, whereas IPNI does. If these database soutput RDF we can extract this information. If you have IBM's LaunchPad and Internet Explorer 6, or Firefox with my LSID extension, then this link (lsidres:urn:lsid:ipni.org.lsid.zoology.gla.ac.uk:Id:1108320-2) displays RDF for one of IPNI's records for Gliricidia ehrenbergii (readers without any of these tools can view the raw RDF at http://ipni.org.lsid.zoology.gla.ac.uk/authority/metadata?lsid=urn: lsid:ipni.org.lsid.zoology.gla.ac.uk:Id:1108320-2 ). This RDF has links to LSIDs for nomenclatural synonyms for this name, and if you follow those you encounter Hybosema ehrenbergii. Hence, armed with consistent metadata one can make inferences about names.
5. Another attraction of RDF is it side steps the need for the huge, bloated XML schema which seem to bedevil the field at the moment. RDF tends to be simple, flat, and there are a number of existing vocabularies we can draw on (e.g., http://www.w3.org/2003/01/geo/)
6. I must confess I regard taxonomic concepts as a potential black hole. I understand the arguments in favour, I just don't buy that this is a tractable problem. I also think it is largely going to be of historical interest as more and more data become linked to specimens and to things like DNA barcodes. The fact that reconciling even two taxonomic classifications can be a major undertaking does not bode well for this project. For some more general thoughts on this issue, see http://shirky.com/writings/ontology_overrated.html (a taxonomic classification is an ontology).
7. I think the first priority for assigning GUIDs is museum specimens. For taxon names (if not concepts) this is trivial, given that most name databases have their own, internally unique ids (but not all -- those databases that use names as primary keys, or which don't expose integer identifiers will need to rethink their design).
Regards
Rod
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
participants (1)
-
Roderic Page