identifiers for geologic samples
Roderic Page
r.page at BIO.GLA.AC.UK
Mon Jan 30 08:25:00 CET 2006
Bob is confusing me with somebody who reads standards (what they?).
Firstly, I think having an Internet domain name in an identifier pretty
much kills any claim to be semantically opaque, regardless of what one
does in the namespace and identifier parts of the LSID.
Secondly, it seems to me that the reality is that LSID resolution
depends on the DNS, and on users mucking with SRV records. Section 13
of the spec goes on at great lengths about the DNS. The spec discusses
an alternative (Dynamic Delegation Discovery Service -- DDDS) that
relies on a central authority (lsidauthority.org), which as far as I
can tell doesn't exist (unlike the equivalent for handles).
Handles and ARK in particular have thought about life after/without the
DNS, but LSIDs seem only to pay lip service to this.
I'm not very anti LSIDs, indeed arguably I've spent more time than most
(publicly) playing with them (leaving aside the guys at IBM, EBI, etc.
who programmed the underlying code). My rethink came about when Donat
Agosti started urging me to look at setting up GUIDs for AntBase
publications. DOIs are out because of cost. Do I recommend LSIDs? Well,
does Donat to muck with the DNS (or get a systems admin to do it)? And
once he's done so, who can use his LSIDs anyway? Nobody really, part
from script jockeys like me. Much as it pains me greatly, I don't think
I can tell Donat to use LSIDs.
So I took a peek at some other alternatives (mostly covered already on
the TDWG-GUID wiki, but see my notes at
http://ispecies.blogspot.com/2006/01/identifiers-for-
publications.html), and in terms of speed and easy of use was going to
suggest static URLs would work fine for Donat's purposes, at least in
the short term, (especially as we'd like to get something working ASAP
because a lot of things down the line depend on this). However, then I
got hold of the handle system, had a play, and after the usual pain of
dealing with Java on Fedora Core 4, got it working. This caused me to
rethink handles.
To my mind, the big plus of LSIDs lies in the explicit definition of an
interface for getting metadata and for getting data. DOIs and Handles
are essentially a free-for-all (do they resolve to data, metadata, will
what I see depend on whether I'm at home or at my place of work, will I
need a subscription to see the data, etc.). The actual identifier part
of the LSID spec I'm less struck on.
Now, we could make sure our handles resolved to something meaningful in
a standard way, and get most of the cool features of LSIDs for free. A
very simple, lightweight way to do this is have them simply resolve to
RDF in XML format. This could be transformed into HTML or whatever.
Indeed, with modern web browsers, we could serve XML and have the
browser automatically render it in HTML. This would mean users could
see HTML, but scripts could access RDF, using exactly the same point of
access. For a demo, see the handle hdl:2254/20971, which can be
resolved here: http://hdl.handle.net/2254/20971 . What you'll see in a
modern browser is HTML, but if you "view source" it's just RDF. Call me
sad, but I find this pretty neat.
All I'm suggesting is that, as cool as LSIDs are, and as much as I have
devoted a lot of effort to getting some working, I'm not convinced they
are the way to go.
My final question in all this is: if LSIDs are really so cool, why is
it that the EBI and NCBI have no publicly available LSID services (or
even private ones)?
Why is it that apart from some ecologists in Wisconsin, the only people
taking LSIDs seriously (by which I mean, writing code) as far as I can
see are BioMoby and MyGrid, both largely academic projects that are
cool, but are not major data providers.
Thus endith the rant.
Rod
On 30 Jan 2006, at 03:04, Bob Morris wrote:
> >From my understanding of the LSID spec, LSIDs do not rely on the DNS.
> Rather one particular mechanism for discovering resolution services
> depends on the DNS, and nothing in the LSID specification requires the
> use of that mechanism, convenient as it presently may be. Future
> resolution service discovery mechanisms can use the existing LSID as
> they please provided only they meet the specification of a resolution
> service discovery service.
>
> Also, LSIDs are required by the spec to be semantically opaque.
> Though, this has some exceptions, and semantic opacity is narrowly
> defined, I would say that except for resolution service discovery
> services---and such services that use DNS are narrowly constrained by
> the spec---, those applications that ascribe meaning to parts of an
> LSID are probably guilty of violating the spec and perhaps don't
> deserve that much sympathy.
>
> cf Section 8 and Section 13.3 of
> http://www.omg.org/docs/dtc/04-05-01.pdf
>
> I hope that those who argue against LSIDs on either of the above two
> grounds will place in the wiki (or point me to where it already is)
> how I am misreading the spec.If I am reading it correctly, I don't
> understand how the arguments Rod puts forth here would lead to
> rejection of LSID whatever other disadvantages it may have compared to
> alternatives.
>
> This is a familiar sounding point and maybe somebody answered me the
> last time I whined about it, long ago in a mailing list far, far away.
> My apologies if so.
>
> Bob Morris
>
>
> On 1/28/06, Roderic Page <r.page at bio.gla.ac.uk> wrote: On 28 Jan 2006,
> at 01:02, Richard Pyle wrote:
>>
>> > The more I think about it, the more I think this is the sort of
>> system
>> > that
>> > would work well for our field. A centralized issuer (which could
>> issue
>> > blocks of thousands or millions of numbers at a time),
>>
>> The major problem I see with this is that a central registry may be a
>> rate limiting step because it has to allocate blocks, it would also
>> decide for format of the last part of the identifier (which the
>> provider might not find desirable), and it may well lead to lots of
>> wasted identifiers ( e.g., it allocates 100,000 to me, but I use 3 off
>> them).
>>
>> Would it not be better to devolve this? You can still have a central
>> registry. For example, Handles and DOIs work by having a central
>> registry for the prefix ( e.g., "1018") and the provider is
>> responsible
>> for allocating the suffix locally.
>>
>>
>> > I'm not sure how wise it would be to create a new syntax standard,
>> > rather
>> > than go with one of the ones we've discussed. But if (for example)
>> > using
>> > LSID, I personally think it would be preferable to establish a
>> highly
>> > generic form, such as:
>> >
>> > urn:lsid:gbif.org:BioGUID:12345
>>
>> Without wishing to preempt some of the things I'm going to present at
>> the workshop, I'm going off LSIDs a little because of their reliance
>> on
>> the Internet DNS. Apart from the hassle of mucking with the DNS
>> records
>> to set them up (I suspect not every provider is going to find this
>> easy
>> to do), it assumes that the Internet its present form is going to be
>> here forever, and it also embeds information in the identifier (e.g.,
>> "gbif.org") that currently has meaning, but over time may loose
>> meaning, or worse, be positively misleading (say if GBIF goes belly up
>> and somebody else serves the data).
>>
>> Handles (including DOIs) and ARK have no information in the identifier
>> (perhaps not strictly true for some DOIs, but that's by choice not
>> design), and also in principle don't need the internet. In the future
>> some other mode of information transport may come along, and they
>> could
>> still be used.
>>
>> While it might be hard to imagine the Internet and the DNS going away,
>> if anybody has a 5 1/4" floppy lying around, they'll be aware of how
>> hard it is to get information off it these days as 5 1/4" drives are
>> scarce as hens teeth -- the only one in my department is in an old PC
>> that is connected to the network. The digital library community seem
>> particularly sensitive to these issues, which is perhaps why they use
>> handles, DOIs, and ARK.
>>
>> Regards
>>
>> Rod
>>
>>
>>
>> ----------------------------------------------------------------------
>> --
>> ----------------------------------------
>> Professor Roderic D. M. Page
>> Editor, Systematic Biology
>> DEEB, IBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QP
>> United Kingdom
>>
>> Phone: +44 141 330 4778
>> Fax: +44 141 330 2792
>> email: r.page at bio.gla.ac.uk
>> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>
>> Subscribe to Systematic Biology through the Society of Systematic
>> Biologists Website: http://systematicbiology.org
>> Search for taxon names at
>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>> Find out what we know about a species at http://ispecies.org
>>
>>
>>
>>
>>
>>
>> ___________________________________________________________
>> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with
>> voicemail http://uk.messenger.yahoo.com
>>
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page at bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org
___________________________________________________________
Yahoo! Photos NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com
More information about the tdwg-tag
mailing list