Re: identifiers for geologic samples
Bob is confusing me with somebody who reads standards (what they?).
Firstly, I think having an Internet domain name in an identifier pretty much kills any claim to be semantically opaque, regardless of what one does in the namespace and identifier parts of the LSID.
Secondly, it seems to me that the reality is that LSID resolution depends on the DNS, and on users mucking with SRV records. Section 13 of the spec goes on at great lengths about the DNS. The spec discusses an alternative (Dynamic Delegation Discovery Service -- DDDS) that relies on a central authority (lsidauthority.org), which as far as I can tell doesn't exist (unlike the equivalent for handles).
Handles and ARK in particular have thought about life after/without the DNS, but LSIDs seem only to pay lip service to this.
I'm not very anti LSIDs, indeed arguably I've spent more time than most (publicly) playing with them (leaving aside the guys at IBM, EBI, etc. who programmed the underlying code). My rethink came about when Donat Agosti started urging me to look at setting up GUIDs for AntBase publications. DOIs are out because of cost. Do I recommend LSIDs? Well, does Donat to muck with the DNS (or get a systems admin to do it)? And once he's done so, who can use his LSIDs anyway? Nobody really, part from script jockeys like me. Much as it pains me greatly, I don't think I can tell Donat to use LSIDs.
So I took a peek at some other alternatives (mostly covered already on the TDWG-GUID wiki, but see my notes at http://ispecies.blogspot.com/2006/01/identifiers-for- publications.html), and in terms of speed and easy of use was going to suggest static URLs would work fine for Donat's purposes, at least in the short term, (especially as we'd like to get something working ASAP because a lot of things down the line depend on this). However, then I got hold of the handle system, had a play, and after the usual pain of dealing with Java on Fedora Core 4, got it working. This caused me to rethink handles.
To my mind, the big plus of LSIDs lies in the explicit definition of an interface for getting metadata and for getting data. DOIs and Handles are essentially a free-for-all (do they resolve to data, metadata, will what I see depend on whether I'm at home or at my place of work, will I need a subscription to see the data, etc.). The actual identifier part of the LSID spec I'm less struck on.
Now, we could make sure our handles resolved to something meaningful in a standard way, and get most of the cool features of LSIDs for free. A very simple, lightweight way to do this is have them simply resolve to RDF in XML format. This could be transformed into HTML or whatever. Indeed, with modern web browsers, we could serve XML and have the browser automatically render it in HTML. This would mean users could see HTML, but scripts could access RDF, using exactly the same point of access. For a demo, see the handle hdl:2254/20971, which can be resolved here: http://hdl.handle.net/2254/20971 . What you'll see in a modern browser is HTML, but if you "view source" it's just RDF. Call me sad, but I find this pretty neat.
All I'm suggesting is that, as cool as LSIDs are, and as much as I have devoted a lot of effort to getting some working, I'm not convinced they are the way to go.
My final question in all this is: if LSIDs are really so cool, why is it that the EBI and NCBI have no publicly available LSID services (or even private ones)?
Why is it that apart from some ecologists in Wisconsin, the only people taking LSIDs seriously (by which I mean, writing code) as far as I can see are BioMoby and MyGrid, both largely academic projects that are cool, but are not major data providers.
Thus endith the rant.
Rod
On 30 Jan 2006, at 03:04, Bob Morris wrote:
From my understanding of the LSID spec, LSIDs do not rely on the DNS.
Rather one particular mechanism for discovering resolution services depends on the DNS, and nothing in the LSID specification requires the use of that mechanism, convenient as it presently may be. Future resolution service discovery mechanisms can use the existing LSID as they please provided only they meet the specification of a resolution service discovery service.
Also, LSIDs are required by the spec to be semantically opaque. Though, this has some exceptions, and semantic opacity is narrowly defined, I would say that except for resolution service discovery services---and such services that use DNS are narrowly constrained by the spec---, those applications that ascribe meaning to parts of an LSID are probably guilty of violating the spec and perhaps don't deserve that much sympathy.
cf Section 8 and Section 13.3 of http://www.omg.org/docs/dtc/04-05-01.pdf
I hope that those who argue against LSIDs on either of the above two grounds will place in the wiki (or point me to where it already is) how I am misreading the spec.If I am reading it correctly, I don't understand how the arguments Rod puts forth here would lead to rejection of LSID whatever other disadvantages it may have compared to alternatives.
This is a familiar sounding point and maybe somebody answered me the last time I whined about it, long ago in a mailing list far, far away. My apologies if so.
Bob Morris
On 1/28/06, Roderic Page r.page@bio.gla.ac.uk wrote: On 28 Jan 2006, at 01:02, Richard Pyle wrote:
The more I think about it, the more I think this is the sort of
system
that would work well for our field. A centralized issuer (which could
issue
blocks of thousands or millions of numbers at a time),
The major problem I see with this is that a central registry may be a rate limiting step because it has to allocate blocks, it would also decide for format of the last part of the identifier (which the provider might not find desirable), and it may well lead to lots of wasted identifiers ( e.g., it allocates 100,000 to me, but I use 3 off them).
Would it not be better to devolve this? You can still have a central registry. For example, Handles and DOIs work by having a central registry for the prefix ( e.g., "1018") and the provider is responsible for allocating the suffix locally.
I'm not sure how wise it would be to create a new syntax standard, rather than go with one of the ones we've discussed. But if (for example) using LSID, I personally think it would be preferable to establish a
highly
generic form, such as:
urn:lsid:gbif.org:BioGUID:12345
Without wishing to preempt some of the things I'm going to present at the workshop, I'm going off LSIDs a little because of their reliance on the Internet DNS. Apart from the hassle of mucking with the DNS records to set them up (I suspect not every provider is going to find this easy to do), it assumes that the Internet its present form is going to be here forever, and it also embeds information in the identifier (e.g., "gbif.org") that currently has meaning, but over time may loose meaning, or worse, be positively misleading (say if GBIF goes belly up and somebody else serves the data).
Handles (including DOIs) and ARK have no information in the identifier (perhaps not strictly true for some DOIs, but that's by choice not design), and also in principle don't need the internet. In the future some other mode of information transport may come along, and they could still be used.
While it might be hard to imagine the Internet and the DNS going away, if anybody has a 5 1/4" floppy lying around, they'll be aware of how hard it is to get information off it these days as 5 1/4" drives are scarce as hens teeth -- the only one in my department is in an old PC that is connected to the network. The digital library community seem particularly sensitive to these issues, which is perhaps why they use handles, DOIs, and ARK.
Regards
Rod
--
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org
___________________________________________________________ Yahoo! Photos NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com
participants (1)
-
Roderic Page