Re: GUIDs, LSIDs, and metadata

12 Sep 2005

      On 12 Sep 2005, at 08:40, Peter Dawyndt wrote:
...
It seems that the solutions worked into the StrainInfo.net portal have
many
common grounds with the problems encoutered with the integration of
taxonomic
names into a single coherent system. In this context I also recommend
the
Taxonomic Databases Working Group to take a look at the experimental
work done
by George Garrity of Bergey's Manual Trust to work bacterial taxonomic
names
into the DOI framework. After all, it seems to me that the DOI
framework
currently offers a far more extended framework of software solutions
and
organisational issues that outreach those of the LSIDs at present. An
essential
thing that is missing in the latter framework seems to be a
well-thought about
business plan to guarantee the long-term survival of the GUID system.
Also it
seems a bit like reinventing the wheel to me to overlook as system
that has
already gone through the 'proof-of-principle' stage. We already have a
morbid
growth of identifiers that are piling up our information systems, so
it would be
unwise to put our effort into the proliferation of network standards
that cover
the same domain.
DOIs as presently implemented have limitations, especially the lack of
any way of knowing what kind of output the DOI will yield. LSIDs
specify that any LSID will resolve to RDF metadata, and also provides
an explicit mechanism for downloading associated data. For example, the
LSID urn:lsid:sid.zoology.gla.ac.uk:id:6 resolves to this metadata:

<?xml version="1.0" encoding="utf-8"?><rdf:RDF
xmlns:e="urn:lsid:sid.zoology.gla.ac.uk:predicates:"
xmlns:a="http://purl.org/dc/elements/1.1/#"
xmlns:d="http://www.w3.org/2000/01/rdf-schema#"
xmlns:b="http://purl.org/dc/elements/1.1/"
xmlns:c="urn:lsid:i3c.org:predicates:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
...
<rdf:Description rdf:about="urn:lsid:sid.zoology.gla.ac.uk:id:6">
<c:storedas rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:id:6"/>
<rdf:type rdf:resource="urn:lsid:i3c.org:types:content"/>
<b:format>image/tiff</b:format>
<d:label>6.TIF</d:label>
<e:sem rdf:resource="urn:lsid:sid.zoology.gla.ac.uk:predicates:sem"/>
<a:title>Image of Dennyus hirundinis</a:title>
<a:date>1997-05-23</a:date>
<a:creator>Vince Smith</a:creator>
</rdf:Description>
</rdf:RDF>

The "storedas" tag tells us there is data associated with this LSID,
the "format" tag tells us it is a TIFF image. If you resolve this LSID
in Firefox (lsidres:urn:lsid:sid.zoology.gla.ac.uk:id:6), you'll be
able to get the TIFF image (on a Mac OS X machine, or a PC with
Quicktime installed, the image will load in the browser window,
otherwise you will be prompted to save the image).

The point of all of this is that I can write scripts or programs that
can make use of LSIDs, for example, by aggregating them into a local
knowledge base. To my mind, this is where the true power of LSIDs
becomes apparent.

I think there are at least  three distinct issues to be considered:

1. Persistence, which is a social issue

2. Assigning and resolving GUIDs

3. What gets served when a GUID is resolved

I think we can de-couple these, in the sense that we could use a handle
system like that upon which DOIs are based, but adopt the RDF metadata
approach of LSIDs.

This thread has also raised the issue of mapping between multiple
GUIDs. I think it is inevitable that we will have to deal with this,
especially as there already exist major databases containing taxonomic
information. For example, consider the task of mapping between
mammalian names in DiGIR providers, and those used in GenBank (a
relatively straightforward problem). In some lucky cases where we have
specimen information in GenBank we can tie the two together that way,
but for other names/sequences we aren't this lucky. If our databases
are distributed, and run by organisations with different goals and
agendas (I doubt biodiversity rates highly in NCBIs list of things to
do), we will have to deal with this.
...
Further reading:
[1] P. Dawyndt, M. Vancanneyt, H. De Meyer & J. Swings (2005) Knowledge
Accumulation and Resolution of Data Inconsistencies during the
Integration of
Microbial Information Sources 17(8), 1111-1126.
http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.131
[2] P. Dawyndt, M. Vancanneyt & J. Swings (2004). On the integration of
microbial information. WFCC Newsletter 38, 19-34.
http://wdcm.nig.ac.jp/wfcc/NEWSLETTER/newsletter38/a3.pdf
[3] P. Dawyndt, B. De Baets, X. Zhou, J. Ma & J. Swings.
StrainInfo.net: Holding
a wealth of downstream information on microbial resources right in our
hands.
http://www.cpdr.ucl.ac.be/bioinf/papers/bioinf/bioinf_dawyndt.pdf
Also check out the background document and discussion papers that came
out of
the specialist workshop on "Exploring and exploiting microbiological
commons:
contributions of bioinformatics and intellectual property rights in
sharing
biological information" at http://lmg.ugent.be/bioinf-ipr/
Interesting reading.

Rod

Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/

Roderic Page

tags

participants (1)