[Tdwg-tag] Why we should not use LSID

Fri May 5 12:31:26 CEST 2006

(Steve/my comments are also collated on: 
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs)

Steve wrote:

> d1.) PURLs require a central resolver which is a single point of failure

I see no difference between lsid and purl here. We can set up a single or 
multiple lsid authorities, or a single or multiple purl services.

> d2.) There are no conventions about what to expect when you resolve a PURL

> ... my point is that PURL only provides for 
> the possibility of persistence through indirection.  We're not 
> interested solely in indirection.  We want to build a set of services on 
> top of whatever GUID system we select.  This set of services requires 
> common agreement on what you get when you resolve a GUID.  The LSID spec 
> attempts to address this issue by splitting the universe into data and 
> metadata and strongly suggesting the use of RDF for metadata.  There is 
> no agreement on what you get when you resolve a PURL, and even if we 
> came to agreement within our community there's no software in place to 
> help us enforce these conventions.

LSIDs provide only a partial solution, since no agreement exists which  
metadata you get from an lsid service, GBIF/TDWG has to design that entirely 
themselves. So you may not know what kind of data are behind an LSID.

What I propose is to simply use content negotiation in combination with purls 
to achieve exactly the same system. We simply, within our network, agree that 
when contacted with some ACCEPT: xml-metadata (MY EXAMPLE, IS THERE ALREADY 
SOMETHING FOR THIS PURPOSE) should return metadata, with all other ACCEPT the 
data.

To me this gives all the advantages of LSIDs without the disadvantages. WOULD 
THIS WORK?

> d3.) PURLs may be easy to consume but they're not necessarily easy to 
> produce

> ...  This means that the PURL resolver 
> should provide a remote service (software interface) for registering a 
> new PURL, in part to facilitate automated registration of a large number 
> of identifiers.  

It seems to me that it should be simple enough to mimic the good sides of LSIDs 
and simply create a PURL resolver that only registers namespaces, but not IDs. 
That is, I register the namespace "LIAS" with purl.gbif.net as as synonym to 
"www.LIAS.net/names/webservice". If I try to resolve 
"purl.gbif.net/LIAS/149872", the purl service will issue a rewrite/forward to  
"www.LIAS.net/names/webservice/149872".

I believe this is very thin management shell over existing forwarding modules 
in e.g. apache.

In fact, when registering a purl namespace, a data provider may even decide:

a) I want to handle metadata content negotiation myself
b) metadata requests should go to a different service 
("www.LIAS.net/names/metadata/149872").
c) I cannot handle metadata, but I can register common metadata applicable to 
all objects within the registered namespace (the purl service could then report 
these common metadata).

> problem by registering what OCLC calls a "partial redirection" 
> (http://purl.oclc.org/docs/inet96.html#partial).  I don't consider 
> partial redirects to be GUIDs because they allow the use of a domain as 
> a prefix for a localized URL hierarchy.  In order to guarantee that I 
> don't mess up your PURLs, the OCLC PURL resolver require authentication 
> in order to register a new PURL.  Authentication systems aren't easy to 
> implement or support.

This seems to be close to what I describe. I do not see the difference to 
LSIDs, though. The authority will in many cases only be such a partial 
redirection service, unable to verify that ALL possible LSIDs remain valid.

> sorted out.  For instance, with equality testing, do we want to be able 
> to have software say that two things are equal if their GUIDs are 
> bitwise identical?  If two GUIDs are not bitwise identical, can they 
> refer to the same object?  Do we require that different versions of the 
> same object have the same GUID, different GUIDs with a relationship 
> between them asserted in metadata, or the same base GUID with a 
> different version component tacked onto the end?  What about different 
> representations (formats) of the same thing (say an XML and an RDF 
> version)?  Can they have the same GUID?  How does our object equality 
> testing by GUID choice affect our choice of how to do versioning? How do 
> we actually compose a compound object out of simple related objects?  
> All of these questions require careful consideration and are affected by 
> our choice of a GUID system. 

Yes, but I believe all are similarly problematic with LSID systems.

After all it looks almost like GBIF will be one of very few systems to adopt 
LSIDs. So we may just as well in our community instead of relying on a urn:lsid 
prefix rely on a convention that all purl services must start with http://purl. 

> I'm not against inventing something new that's essentially a set of 
> restrictions on top of PURL.  Maybe we could get the best of both worlds 
> -- the simplicity of PURL with the conventions of LSID.

To me this seems to be quite achievable.

For the last two years I always thought LSIDs are truly nice, I am only just 
changing my mind. Most LSIDs conventions are really good, including the 
separation into authority, namespace, ID, and the idea of separating data and 
metadata, and the ability to recognize (whether for software or human 
management) which URI is to be permantent. I believe we easily mimic all these 
based on PURL (I say this with almost no technical understanding...)

Gregor

----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203