Re: GUIDs, LSIDs, and metadata
On 12 Sep 2005, at 08:40, Peter Dawyndt wrote:
It seems that the solutions worked into the StrainInfo.net portal have many common grounds with the problems encoutered with the integration of taxonomic names into a single coherent system. In this context I also recommend the Taxonomic Databases Working Group to take a look at the experimental work done by George Garrity of Bergey's Manual Trust to work bacterial taxonomic names into the DOI framework. After all, it seems to me that the DOI framework currently offers a far more extended framework of software solutions and organisational issues that outreach those of the LSIDs at present. An essential thing that is missing in the latter framework seems to be a well-thought about business plan to guarantee the long-term survival of the GUID system. Also it seems a bit like reinventing the wheel to me to overlook as system that has already gone through the 'proof-of-principle' stage. We already have a morbid growth of identifiers that are piling up our information systems, so it would be unwise to put our effort into the proliferation of network standards that cover the same domain.
DOIs as presently implemented have limitations, especially the lack of any way of knowing what kind of output the DOI will yield. LSIDs specify that any LSID will resolve to RDF metadata, and also provides an explicit mechanism for downloading associated data. For example, the LSID urn:lsid:sid.zoology.gla.ac.uk:id:6 resolves to this metadata:
<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:e="urn:lsid:sid.zoology.gla.ac.uk:predicates:" xmlns:a="http://purl.org/dc/elements/1.1/#" xmlns:d="http://www.w3.org/2000/01/rdf-schema#" xmlns:b="http://purl.org/dc/elements/1.1/" xmlns:c="urn:lsid:i3c.org:predicates:" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
<rdf:Description rdf:about="urn:lsid:sid.zoology.gla.ac.uk:id:6"> <c:storedas rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:id:6"/> <rdf:type rdf:resource="urn:lsid:i3c.org:types:content"/> <b:format>image/tiff</b:format> <d:label>6.TIF</d:label> <e:sem rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:predicates:sem"/> <a:title>Image of Dennyus hirundinis</a:title> <a:date>1997-05-23</a:date> <a:creator>Vince Smith</a:creator> </rdf:Description> </rdf:RDF>
The "storedas" tag tells us there is data associated with this LSID, the "format" tag tells us it is a TIFF image. If you resolve this LSID in Firefox (lsidres:urn:lsid:sid.zoology.gla.ac.uk:id:6), you'll be able to get the TIFF image (on a Mac OS X machine, or a PC with Quicktime installed, the image will load in the browser window, otherwise you will be prompted to save the image).
The point of all of this is that I can write scripts or programs that can make use of LSIDs, for example, by aggregating them into a local knowledge base. To my mind, this is where the true power of LSIDs becomes apparent.
I think there are at least three distinct issues to be considered:
1. Persistence, which is a social issue
2. Assigning and resolving GUIDs
3. What gets served when a GUID is resolved
I think we can de-couple these, in the sense that we could use a handle system like that upon which DOIs are based, but adopt the RDF metadata approach of LSIDs.
This thread has also raised the issue of mapping between multiple GUIDs. I think it is inevitable that we will have to deal with this, especially as there already exist major databases containing taxonomic information. For example, consider the task of mapping between mammalian names in DiGIR providers, and those used in GenBank (a relatively straightforward problem). In some lucky cases where we have specimen information in GenBank we can tie the two together that way, but for other names/sequences we aren't this lucky. If our databases are distributed, and run by organisations with different goals and agendas (I doubt biodiversity rates highly in NCBIs list of things to do), we will have to deal with this.
Further reading:
[1] P. Dawyndt, M. Vancanneyt, H. De Meyer & J. Swings (2005) Knowledge Accumulation and Resolution of Data Inconsistencies during the Integration of Microbial Information Sources 17(8), 1111-1126. http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.131
[2] P. Dawyndt, M. Vancanneyt & J. Swings (2004). On the integration of microbial information. WFCC Newsletter 38, 19-34. http://wdcm.nig.ac.jp/wfcc/NEWSLETTER/newsletter38/a3.pdf
[3] P. Dawyndt, B. De Baets, X. Zhou, J. Ma & J. Swings. StrainInfo.net: Holding a wealth of downstream information on microbial resources right in our hands. http://www.cpdr.ucl.ac.be/bioinf/papers/bioinf/bioinf_dawyndt.pdf
Also check out the background document and discussion papers that came out of the specialist workshop on "Exploring and exploiting microbiological commons: contributions of bioinformatics and intellectual property rights in sharing biological information" at http://lmg.ugent.be/bioinf-ipr/
Interesting reading.
Rod
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
participants (1)
-
Roderic Page