identifiers for geologic samples

Roderic Page r.page at BIO.GLA.AC.UK
Mon Jan 30 08:25:00 CET 2006


Bob is confusing me with somebody who reads standards (what they?).

Firstly, I think having an Internet domain name in an identifier pretty  
much kills any claim to be semantically opaque, regardless of what one  
does in the namespace and identifier parts of the LSID.

Secondly, it seems to me that the reality is that LSID resolution  
depends on the DNS, and on users mucking with SRV records. Section 13  
of the spec goes on at great lengths about the DNS. The spec discusses  
an alternative (Dynamic Delegation Discovery Service -- DDDS) that  
relies on a central authority (lsidauthority.org), which as far as I  
can tell doesn't exist (unlike the equivalent for handles).

Handles and ARK in particular have thought about life after/without the  
DNS, but LSIDs seem only to pay lip service to this.

I'm not very anti LSIDs, indeed arguably I've spent more time than most  
(publicly) playing with them (leaving aside the guys at IBM, EBI, etc.  
who programmed the underlying code). My rethink came about when Donat  
Agosti started urging me to look at setting up GUIDs for AntBase  
publications. DOIs are out because of cost. Do I recommend LSIDs? Well,  
does Donat to muck with the DNS (or get a systems admin to do it)? And  
once he's done so, who can use his LSIDs anyway? Nobody really, part  
from script jockeys like me. Much as it pains me greatly, I don't think  
I can tell Donat to use LSIDs.

So I took a peek at some other alternatives (mostly covered already on  
the TDWG-GUID wiki, but see my notes at  
http://ispecies.blogspot.com/2006/01/identifiers-for- 
publications.html), and in terms of speed and easy of use was going to  
suggest static URLs would work fine for Donat's purposes, at least in  
the short term, (especially as we'd like to get something working ASAP  
because a lot of things down the line depend on this). However, then I  
got hold of the handle system, had a play, and after the usual pain of  
dealing with Java on Fedora Core 4, got it working. This caused me to  
rethink handles.

To my mind, the big plus of LSIDs lies in the explicit definition of an  
interface for getting metadata and for getting data. DOIs and Handles  
are essentially a free-for-all (do they resolve to data, metadata, will  
what I see depend on whether I'm at home or at my place of work, will I  
need a subscription to see the data, etc.). The actual identifier part  
of the LSID spec I'm less struck on.

Now, we could make sure our handles resolved to something meaningful in  
a standard way, and get most of the cool features of LSIDs for free. A  
very simple, lightweight way to do this is have them simply resolve to  
RDF in XML format. This could be transformed into HTML or whatever.  
Indeed, with modern web browsers, we could serve XML and have the  
browser automatically render it in HTML. This would mean users could  
see HTML, but scripts could access RDF, using exactly the same point of  
access. For a demo, see the handle hdl:2254/20971, which can be  
resolved here: http://hdl.handle.net/2254/20971 . What you'll see in a  
modern browser is HTML, but if you "view source" it's just RDF. Call me  
sad, but I find this pretty neat.

All I'm suggesting is that, as cool as LSIDs are, and as much as I have  
devoted a lot of effort to getting some working, I'm not convinced they  
are the way to go.

My final question in all this is: if LSIDs are really so cool, why is  
it that the EBI and NCBI have no publicly available LSID services (or  
even private ones)?

Why is it that apart from some ecologists in Wisconsin, the only people  
taking LSIDs seriously (by which I mean, writing code) as far as I can  
see are BioMoby and MyGrid, both largely academic projects that are  
cool, but are not major data providers.

Thus endith the rant.

Rod







On 30 Jan 2006, at 03:04, Bob Morris wrote:

> >From my understanding of the LSID spec, LSIDs do not rely on the DNS.  
> Rather one particular mechanism for discovering resolution services  
> depends on the DNS, and nothing in the LSID specification requires the  
> use of that mechanism, convenient as it presently may be. Future  
> resolution service discovery mechanisms can use the existing LSID as  
> they please provided only they meet the specification of a resolution  
> service discovery service.
>
> Also, LSIDs are required by the spec to be semantically opaque.  
> Though, this has some exceptions, and semantic opacity is narrowly  
> defined, I would say that except for resolution service discovery  
> services---and such services that use DNS are narrowly constrained by  
> the spec---, those applications that ascribe meaning to parts of an  
> LSID are probably guilty of violating the spec and perhaps don't  
> deserve that much sympathy.
>
> cf Section 8 and Section 13.3 of  
> http://www.omg.org/docs/dtc/04-05-01.pdf
>
> I hope that those who argue against LSIDs on either of the above two  
> grounds will place in the wiki (or point me to where it already is)  
> how I am misreading the spec.If I am reading it correctly, I don't  
> understand how the arguments Rod puts forth here would lead to  
> rejection of LSID whatever other disadvantages it may have compared to  
> alternatives.
>
> This is a familiar sounding point and maybe somebody answered me the  
> last time I whined about it, long ago in a mailing list far, far away.  
> My apologies if so.
>
> Bob Morris
>
>
> On 1/28/06, Roderic Page <r.page at bio.gla.ac.uk> wrote: On 28 Jan 2006,  
> at 01:02, Richard Pyle wrote:
>>
>> > The more I think about it, the more I think this is the sort of  
>> system
>> > that
>> > would work well for our field.  A centralized issuer (which could  
>> issue
>> > blocks of thousands or millions of numbers at a time),
>>
>> The major problem I see with this is that a central registry may be a
>> rate limiting step because it has to allocate blocks, it would also
>> decide for format of the last part of the identifier (which the
>> provider might not find desirable), and it may well lead to lots of
>> wasted identifiers ( e.g., it allocates 100,000 to me, but I use 3 off
>> them).
>>
>> Would it not be better to devolve this? You can still have a central
>> registry. For example, Handles and DOIs work by having a central
>> registry for the prefix ( e.g., "1018") and the provider is  
>> responsible
>> for allocating the suffix locally.
>>
>>
>> > I'm not sure how wise it would be to create a new syntax standard,
>> > rather
>> > than go with one of the ones we've discussed.  But if (for example)
>> > using
>> > LSID, I personally think it would be preferable to establish a  
>> highly
>> > generic form, such as:
>> >
>> > urn:lsid:gbif.org:BioGUID:12345
>>
>> Without wishing to preempt some of the things I'm going to present at
>> the workshop, I'm going off LSIDs a little because of their reliance  
>> on
>> the Internet DNS. Apart from the hassle of mucking with the DNS  
>> records
>> to set them up (I suspect not every provider is going to find this  
>> easy
>> to do), it assumes that the Internet its present form is going to be
>> here forever, and it also embeds information in the identifier (e.g.,
>> "gbif.org") that currently has meaning, but over time may loose
>> meaning, or worse, be positively misleading (say if GBIF goes belly up
>> and somebody else serves the data).
>>
>> Handles (including DOIs) and ARK have no information in the identifier
>> (perhaps not strictly true for some DOIs, but that's by choice not
>> design), and also in principle don't need the internet. In the future
>> some other mode of information transport may come along, and they  
>> could
>> still be used.
>>
>> While it might be hard to imagine the Internet and the DNS going away,
>> if anybody has a 5 1/4" floppy lying around, they'll be aware of how
>> hard it is to get information off it these days as 5 1/4" drives are
>> scarce as hens teeth -- the only one in my department is in an old PC
>> that is connected to the network. The digital library community seem
>> particularly sensitive to these issues, which is perhaps why they use
>> handles, DOIs, and ARK.
>>
>> Regards
>>
>> Rod
>>
>>
>>
>> ---------------------------------------------------------------------- 
>> --
>> ----------------------------------------
>> Professor Roderic D. M. Page
>> Editor, Systematic Biology
>> DEEB, IBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QP
>> United Kingdom
>>
>> Phone:    +44 141 330 4778
>> Fax:      +44 141 330 2792
>> email:    r.page at bio.gla.ac.uk
>> web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>
>> Subscribe to Systematic Biology through the Society of Systematic
>> Biologists Website:   http://systematicbiology.org
>> Search for taxon names at  
>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>> Find out what we know about a species at http://ispecies.org
>>
>>
>>
>>
>>
>>
>> ___________________________________________________________
>> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with  
>> voicemail http://uk.messenger.yahoo.com
>>
------------------------------------------------------------------------ 
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org




___________________________________________________________
Yahoo! Photos – NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com




More information about the tdwg-tag mailing list