Re: identifiers for geologic samples

30 Jan 2006

      Bob is confusing me with somebody who reads standards (what they?).

Firstly, I think having an Internet domain name in an identifier pretty  
much kills any claim to be semantically opaque, regardless of what one  
does in the namespace and identifier parts of the LSID.

Secondly, it seems to me that the reality is that LSID resolution  
depends on the DNS, and on users mucking with SRV records. Section 13  
of the spec goes on at great lengths about the DNS. The spec discusses  
an alternative (Dynamic Delegation Discovery Service -- DDDS) that  
relies on a central authority (lsidauthority.org), which as far as I  
can tell doesn't exist (unlike the equivalent for handles).

Handles and ARK in particular have thought about life after/without the  
DNS, but LSIDs seem only to pay lip service to this.

I'm not very anti LSIDs, indeed arguably I've spent more time than most  
(publicly) playing with them (leaving aside the guys at IBM, EBI, etc.  
who programmed the underlying code). My rethink came about when Donat  
Agosti started urging me to look at setting up GUIDs for AntBase  
publications. DOIs are out because of cost. Do I recommend LSIDs? Well,  
does Donat to muck with the DNS (or get a systems admin to do it)? And  
once he's done so, who can use his LSIDs anyway? Nobody really, part  
from script jockeys like me. Much as it pains me greatly, I don't think  
I can tell Donat to use LSIDs.

So I took a peek at some other alternatives (mostly covered already on  
the TDWG-GUID wiki, but see my notes at  
http://ispecies.blogspot.com/2006/01/identifiers-for- 
publications.html), and in terms of speed and easy of use was going to  
suggest static URLs would work fine for Donat's purposes, at least in  
the short term, (especially as we'd like to get something working ASAP  
because a lot of things down the line depend on this). However, then I  
got hold of the handle system, had a play, and after the usual pain of  
dealing with Java on Fedora Core 4, got it working. This caused me to  
rethink handles.

To my mind, the big plus of LSIDs lies in the explicit definition of an  
interface for getting metadata and for getting data. DOIs and Handles  
are essentially a free-for-all (do they resolve to data, metadata, will  
what I see depend on whether I'm at home or at my place of work, will I  
need a subscription to see the data, etc.). The actual identifier part  
of the LSID spec I'm less struck on.

Now, we could make sure our handles resolved to something meaningful in  
a standard way, and get most of the cool features of LSIDs for free. A  
very simple, lightweight way to do this is have them simply resolve to  
RDF in XML format. This could be transformed into HTML or whatever.  
Indeed, with modern web browsers, we could serve XML and have the  
browser automatically render it in HTML. This would mean users could  
see HTML, but scripts could access RDF, using exactly the same point of  
access. For a demo, see the handle hdl:2254/20971, which can be  
resolved here: http://hdl.handle.net/2254/20971 . What you'll see in a  
modern browser is HTML, but if you "view source" it's just RDF. Call me  
sad, but I find this pretty neat.

All I'm suggesting is that, as cool as LSIDs are, and as much as I have  
devoted a lot of effort to getting some working, I'm not convinced they  
are the way to go.

My final question in all this is: if LSIDs are really so cool, why is  
it that the EBI and NCBI have no publicly available LSID services (or  
even private ones)?

Why is it that apart from some ecologists in Wisconsin, the only people  
taking LSIDs seriously (by which I mean, writing code) as far as I can  
see are BioMoby and MyGrid, both largely academic projects that are  
cool, but are not major data providers.

Thus endith the rant.

Rod

On 30 Jan 2006, at 03:04, Bob Morris wrote:
...
...
From my understanding of the LSID spec, LSIDs do not rely on the DNS.  
Rather one particular mechanism for discovering resolution services  
depends on the DNS, and nothing in the LSID specification requires the  
use of that mechanism, convenient as it presently may be. Future  
resolution service discovery mechanisms can use the existing LSID as  
they please provided only they meet the specification of a resolution  
service discovery service.
Also, LSIDs are required by the spec to be semantically opaque.  
Though, this has some exceptions, and semantic opacity is narrowly  
defined, I would say that except for resolution service discovery  
services---and such services that use DNS are narrowly constrained by  
the spec---, those applications that ascribe meaning to parts of an  
LSID are probably guilty of violating the spec and perhaps don't  
deserve that much sympathy.
cf Section 8 and Section 13.3 of  
http://www.omg.org/docs/dtc/04-05-01.pdf
I hope that those who argue against LSIDs on either of the above two  
grounds will place in the wiki (or point me to where it already is)  
how I am misreading the spec.If I am reading it correctly, I don't  
understand how the arguments Rod puts forth here would lead to  
rejection of LSID whatever other disadvantages it may have compared to  
alternatives.
This is a familiar sounding point and maybe somebody answered me the  
last time I whined about it, long ago in a mailing list far, far away.  
My apologies if so.
Bob Morris
On 1/28/06, Roderic Page <r.page@bio.gla.ac.uk> wrote: On 28 Jan 2006,  
at 01:02, Richard Pyle wrote:
...
...
The more I think about it, the more I think this is the sort of  
system
that
would work well for our field.  A centralized issuer (which could  
issue
blocks of thousands or millions of numbers at a time),
The major problem I see with this is that a central registry may be a
rate limiting step because it has to allocate blocks, it would also
decide for format of the last part of the identifier (which the
provider might not find desirable), and it may well lead to lots of
wasted identifiers ( e.g., it allocates 100,000 to me, but I use 3 off
them).
Would it not be better to devolve this? You can still have a central
registry. For example, Handles and DOIs work by having a central
registry for the prefix ( e.g., "1018") and the provider is  
responsible
for allocating the suffix locally.
...
I'm not sure how wise it would be to create a new syntax standard,
rather
than go with one of the ones we've discussed.  But if (for example)
using
LSID, I personally think it would be preferable to establish a  
highly
generic form, such as:
urn:lsid:gbif.org:BioGUID:12345
Without wishing to preempt some of the things I'm going to present at
the workshop, I'm going off LSIDs a little because of their reliance  
on
the Internet DNS. Apart from the hassle of mucking with the DNS  
records
to set them up (I suspect not every provider is going to find this  
easy
to do), it assumes that the Internet its present form is going to be
here forever, and it also embeds information in the identifier (e.g.,
"gbif.org") that currently has meaning, but over time may loose
meaning, or worse, be positively misleading (say if GBIF goes belly up
and somebody else serves the data).
Handles (including DOIs) and ARK have no information in the identifier
(perhaps not strictly true for some DOIs, but that's by choice not
design), and also in principle don't need the internet. In the future
some other mode of information transport may come along, and they  
could
still be used.
While it might be hard to imagine the Internet and the DNS going away,
if anybody has a 5 1/4" floppy lying around, they'll be aware of how
hard it is to get information off it these days as 5 1/4" drives are
scarce as hens teeth -- the only one in my department is in an old PC
that is connected to the network. The digital library community seem
particularly sensitive to these issues, which is perhaps why they use
handles, DOIs, and ARK.
Regards
Rod
---------------------------------------------------------------------- 
--
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:   http://systematicbiology.org
Search for taxon names at  
http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org
___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with  
voicemail http://uk.messenger.yahoo.com

----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org

___________________________________________________________
Yahoo! Photos  NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com