GUIDs, LSIDs, and metadata

Roderic Page r.page at BIO.GLA.AC.UK
Mon Sep 12 10:35:59 CEST 2005


On 12 Sep 2005, at 08:40, Peter Dawyndt wrote:
>
> It seems that the solutions worked into the StrainInfo.net portal have
> many
> common grounds with the problems encoutered with the integration of
> taxonomic
> names into a single coherent system. In this context I also recommend
> the
> Taxonomic Databases Working Group to take a look at the experimental
> work done
> by George Garrity of Bergey's Manual Trust to work bacterial taxonomic
> names
> into the DOI framework. After all, it seems to me that the DOI
> framework
> currently offers a far more extended framework of software solutions
> and
> organisational issues that outreach those of the LSIDs at present. An
> essential
> thing that is missing in the latter framework seems to be a
> well-thought about
> business plan to guarantee the long-term survival of the GUID system.
> Also it
> seems a bit like reinventing the wheel to me to overlook as system
> that has
> already gone through the 'proof-of-principle' stage. We already have a
> morbid
> growth of identifiers that are piling up our information systems, so
> it would be
> unwise to put our effort into the proliferation of network standards
> that cover
> the same domain.


DOIs as presently implemented have limitations, especially the lack of
any way of knowing what kind of output the DOI will yield. LSIDs
specify that any LSID will resolve to RDF metadata, and also provides
an explicit mechanism for downloading associated data. For example, the
LSID urn:lsid:sid.zoology.gla.ac.uk:id:6 resolves to this metadata:

<?xml version="1.0" encoding="utf-8"?><rdf:RDF
xmlns:e="urn:lsid:sid.zoology.gla.ac.uk:predicates:"
xmlns:a="http://purl.org/dc/elements/1.1/#"
xmlns:d="http://www.w3.org/2000/01/rdf-schema#"
xmlns:b="http://purl.org/dc/elements/1.1/"
xmlns:c="urn:lsid:i3c.org:predicates:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 >
<rdf:Description rdf:about="urn:lsid:sid.zoology.gla.ac.uk:id:6">
<c:storedas rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:id:6"/>
<rdf:type rdf:resource="urn:lsid:i3c.org:types:content"/>
<b:format>image/tiff</b:format>
<d:label>6.TIF</d:label>
<e:sem rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:predicates:sem"/>
<a:title>Image of Dennyus hirundinis</a:title>
<a:date>1997-05-23</a:date>
<a:creator>Vince Smith</a:creator>
</rdf:Description>
</rdf:RDF>

The "storedas" tag tells us there is data associated with this LSID,
the "format" tag tells us it is a TIFF image. If you resolve this LSID
in Firefox (lsidres:urn:lsid:sid.zoology.gla.ac.uk:id:6), you'll be
able to get the TIFF image (on a Mac OS X machine, or a PC with
Quicktime installed, the image will load in the browser window,
otherwise you will be prompted to save the image).

The point of all of this is that I can write scripts or programs that
can make use of LSIDs, for example, by aggregating them into a local
knowledge base. To my mind, this is where the true power of LSIDs
becomes apparent.


I think there are at least  three distinct issues to be considered:

1. Persistence, which is a social issue

2. Assigning and resolving GUIDs

3. What gets served when a GUID is resolved

I think we can de-couple these, in the sense that we could use a handle
system like that upon which DOIs are based, but adopt the RDF metadata
approach of LSIDs.

This thread has also raised the issue of mapping between multiple
GUIDs. I think it is inevitable that we will have to deal with this,
especially as there already exist major databases containing taxonomic
information. For example, consider the task of mapping between
mammalian names in DiGIR providers, and those used in GenBank (a
relatively straightforward problem). In some lucky cases where we have
specimen information in GenBank we can tie the two together that way,
but for other names/sequences we aren't this lucky. If our databases
are distributed, and run by organisations with different goals and
agendas (I doubt biodiversity rates highly in NCBIs list of things to
do), we will have to deal with this.


>
> Further reading:
>
> [1] P. Dawyndt, M. Vancanneyt, H. De Meyer & J. Swings (2005) Knowledge
> Accumulation and Resolution of Data Inconsistencies during the
> Integration of
> Microbial Information Sources 17(8), 1111-1126.
> http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.131
>
> [2] P. Dawyndt, M. Vancanneyt & J. Swings (2004). On the integration of
> microbial information. WFCC Newsletter 38, 19-34.
> http://wdcm.nig.ac.jp/wfcc/NEWSLETTER/newsletter38/a3.pdf
>
> [3] P. Dawyndt, B. De Baets, X. Zhou, J. Ma & J. Swings.
> StrainInfo.net: Holding
> a wealth of downstream information on microbial resources right in our
> hands.
> http://www.cpdr.ucl.ac.be/bioinf/papers/bioinf/bioinf_dawyndt.pdf
>
> Also check out the background document and discussion papers that came
> out of
> the specialist workshop on "Exploring and exploiting microbiological
> commons:
> contributions of bioinformatics and intellectual property rights in
> sharing
> biological information" at http://lmg.ugent.be/bioinf-ipr/


Interesting reading.

Rod



Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/




More information about the tdwg-tag mailing list