[Tdwg-tag] Why we should not use LSID

Thu May 4 13:41:41 CEST 2006

PURLs are centrally managed indirection-through-redirection over HTTP.  
Because resolution is only an HTTP call away, PURLs are both easy to 
understand and very easy to consume.  PURLs are also powerful because 
anything that can be assigned a URL can have a PURL (maybe that makes 
them too powerful). 

There are some advantages to PURLs:
a1.) PURLs are easy to consume
a2.) PURLs require a central resolver which may provide greater 
reliability than a network with many LSID authorities
a3.) PURLs make it easy to solve the "single resource change in 
custodianship" problem

And I see some disadvantages to PURLs:
d1.) PURLs require a central resolver which is a single point of failure
d2.) There are no conventions about what to expect when you resolve a PURL
d3.) PURLs may be easy to consume but they're not easy to produce
d3.) PURLs can't be distinguished from URLs by software

I'll address each with a sentence or two.

a1.) PURLs are easy to consume

Because PURLs rely on simple HTTP GET, they are trivial to resolve.  One 
can use a web browser to manually resolve a PURL or use any of a large 
number of programs or software libraries for fetching URL contents via 
HTTP GET.  This is the primary advantage of PURLs.

a2.)  PURLs require a central resolver which may provide greater 
reliability than a network with many LSID authorities

If we assume that it's equally likely that a given GUID could be 
resolved by any of the resolvers on the network, then the reliability of 
the GUID network reduces to the average resolver reliability.  If it 
turns out that there are 100 LSID resolvers but at any given time 20 are 
likely to be non-responsive, then it's quite possible that a PURL based 
network with a single well managed resolver (98 % uptime) could provide 
better quality of service than an LSID-based network.

a3.) PURLs make it easy to solve the "single resource change in 
custodianship" problem

If the ownership or location of a data object changes, its PURL wouldn't 
change, it would merely redirect you to the new location of the object.  
This is a potential problem with LSIDs because change in custodianship 
of an entire authority is easy to deal with, but change in custodianship 
of a single identified object is difficult to handle.

d1.) PURLs require a central resolver which is a single point of failure

A PURL resolver acts as a centralized registry.  While a single PURL 
resolver my provide better reliability than a distributed network of 
LSID resolvers, centralization comes at a cost.  A central PURL resolver 
is a single point of failure.  To guard against failure, the community 
must guarantee that the organization hosting the resolver will be funded 
over time and that it will work to prevent hardware issues, network 
outages, denial of service attacks, etc.  The community may also demand 
that the organization that hosts the PURL resolver provide technical 
support. 

d2.) There are no conventions about what to expect when you resolve a PURL

Under the OCLC's purl.org resolver, there are no conventions about what 
you get when you resolve a PURL.  A PURL can point to a chunk of RDF 
describing a particular specimen, a DVD rip of a Bollywood movie, a 
second PURL that redirects to the first PURL in an endless loop, or a 
web application that returns no content but sends a signal to your fancy 
new networked coffee machine telling it to make a double espresso.  Some 
of these examples are silly, but my point is that PURL only provides for 
the possibility of persistence through indirection.  We're not 
interested solely in indirection.  We want to build a set of services on 
top of whatever GUID system we select.  This set of services requires 
common agreement on what you get when you resolve a GUID.  The LSID spec 
attempts to address this issue by splitting the universe into data and 
metadata and strongly suggesting the use of RDF for metadata.  There is 
no agreement on what you get when you resolve a PURL, and even if we 
came to agreement within our community there's no software in place to 
help us enforce these conventions.

d3.) PURLs may be easy to consume but they're not necessarily easy to 
produce

PURLs are easy to resolve but hard to register.  A central PURL resolver 
has to provide functionality for registering PURLs and 
assigning/reassigning live URLs to them.  It's simple to envision a 
web-based form for registering PURLs (see 
http://www.purl.org/maint/choose.html), but I imagine that most of the 
time new PURLs will be requested by a piece of software that's trying to 
publish a large number of resources.  This means that the PURL resolver 
should provide a remote service (software interface) for registering a 
new PURL, in part to facilitate automated registration of a large number 
of identifiers.  Interestingly enough, I don't think the OCLC PURL 
resolver implementation provides this functionality.  I imagine that 
most people who want to register a large number of PURLs work around the 
problem by registering what OCLC calls a "partial redirection" 
(http://purl.oclc.org/docs/inet96.html#partial).  I don't consider 
partial redirects to be GUIDs because they allow the use of a domain as 
a prefix for a localized URL hierarchy.  In order to guarantee that I 
don't mess up your PURLs, the OCLC PURL resolver require authentication 
in order to register a new PURL.  Authentication systems aren't easy to 
implement or support.

d3.) PURLs can't be distinguished from URLs by software

Most GUID systems come with a set of assumptions about when and how it's 
appropriate to use a GUID.  In addition to distributed resolution we 
might want to use GUIDs for things like equality testing, versioning, or 
object composition.  Each of these uses raise questions that need to be 
sorted out.  For instance, with equality testing, do we want to be able 
to have software say that two things are equal if their GUIDs are 
bitwise identical?  If two GUIDs are not bitwise identical, can they 
refer to the same object?  Do we require that different versions of the 
same object have the same GUID, different GUIDs with a relationship 
between them asserted in metadata, or the same base GUID with a 
different version component tacked onto the end?  What about different 
representations (formats) of the same thing (say an XML and an RDF 
version)?  Can they have the same GUID?  How does our object equality 
testing by GUID choice affect our choice of how to do versioning? How do 
we actually compose a compound object out of simple related objects?  
All of these questions require careful consideration and are affected by 
our choice of a GUID system. 

I guess what I'm trying to say is that we're not interested in GUIDs for 
the sake of GUIDs alone, but instead require them for specific uses that 
extend beyond simple naming and resolution.  I hope that we'll examine 
some of these questions and come to agreement on our conventions for 
GUID use.  Once we have these conventions (either because they're 
embedded in the GUID scheme we choose or because we've arrived at them 
during meetings and documented them appropriately), we'll need to write 
software that operates on these assumptions and enforces these 
conventions.  That software will have to be able to distinguish a GUID 
from a non-GUID because we can do certain things with GUIDd objects that 
we can't do with non-GUIDd ones.  With PURL this is problematic because 
a piece of software cannot easily distinguish a PURL from a URL yet they 
probably ought to be treated differently.

I'm not a huge fan of LSID.  I think a urn based identification system 
introduces a barrier to entry for some.  I think the SOAP/web services 
stuff in the LSID spec and the Java toolkit from IBM introduce another 
barrier.  PURL may be easier to use (at least for resolution), but it 
doesn't go as far as LSID in laying the groundwork for a network of 
services that can at the very least share data, if not actually help 
researchers do something interesting with it.

I'm not against inventing something new that's essentially a set of 
restrictions on top of PURL.  Maybe we could get the best of both worlds 
-- the simplicity of PURL with the conventions of LSID.

-Steve

Döring, Markus wrote:

>Hello, 
>please see my comments inline below. I will try to use PURLs not only in the purl.org sense, but also as a simple way of creating stable URLs through a centralized URL redirection. If you consider this I cant see relevant benefits of LSIDs that are not shared by PURLs. Considering the potential problems we might run into with any software framework (not only RDF) that includes resolving I am in strong favor of simple URLs.
>
>--
>Markus
> 
>
>-----Ursprüngliche Nachricht-----
>  
>
>>Von: tdwg-tag-bounces at lists.tdwg.org [mailto:tdwg-tag-bounces at lists.tdwg.org] Im Auftrag von Kevin Richards
>>Gesendet: Mittwoch, 3. Mai 2006 13:45
>>An: tdwg-tag at lists.tdwg.org
>>Betreff: Re: [Tdwg-tag] Why we should not use LSID
>>
>>Roger
>>
>>I agree that PURLs are a perfectly good option for our GUID needs, and that they would probably be one of the easier technologies to get "working".
>>
>>Like you I really had to think again to work out the benefits of LSIDs over PURLs, expecially considering the disadvantage you have mentioned.
>>
>>Some of the benefits of LSIDs include:
>>- clearly separate data and metadata services (as you have mentioned)
>>    
>>
>
>MD: From what I've understood from the GUID group nearly only metadata is used though. So if we deal with metadata only then its not a big practical difference at least.
>
>
>  
>
>>- separation from domain names - as far as I understand, the PURL still requires domain name resolution of the actual ID url to obtain the resolution server address - this ties you to a particular url format
>>    
>>
>
>MD: We could easily setup a redirection service http://purl.gbif.net/AUTHORITY/whatever that redirects to whereever you want to keep your resolver. Just the authority URL part needs to be centrally managed.
>
>MD: This leads me to a questions about LSIDs which I never understood. LSID are bound to domain name resolution and their guarantee to be globally unique is heavily based on DNS. So to me a central body keeping track of LSID authorities is required to guarantee life long uniqueness of LSID URNs. If "bgbm.org" is owned by someone different that also wants to set up a LSID authority, how does he know there was one already under that domain? He could be reissuing the same URN (LSID) again. Thats exactly what people use as an argument against URLs, but its also true for LSIDs as far as I understand the technology.
>
>
>  
>
>>- LSID assigning service can be managed by provider organisation ("ownership" of data and IDs is often high on a data provider's requirements list)
>>    
>>
>
>MD: so can PURLs
>
>
>  
>
>>- LSIDs provide a "standard" technology for resolving and serving up data objects - ie every provider will have the LSID authority services running on their server that will serve up data and metadata (+ other services if required) in the same way, for every provider
>>    
>>
>
>MD: URLs are even more standard I would think. Take Apache and there you go.
>
>
>  
>
>>- related to the previous point, a standard mechanism for third party annotations of LSIDs is provided with every LSID server implementation
>>    
>>
>
>MD: Annotea (for RDF) uses simple HTTP. As Rod said pingbacks are a way to go as well (over http). And I am sure there are many other standards existing for URLs.
>
>
>  
>
>>- same URN LSID can be used for resolution of http, ftp, soap and tcp protocols (unsure how PURLs handle this?) ...other cool stuff, I'm sure, that I cant think of right now - too late at night
>>    
>>
>
>MD: true. but is that needed?
>
>
>  
>
>>Probably best to avoid LSIDs for RDF class identfiers etc, but do the semantic web tools you are talking about have no way of recognising different url resolution types - I'm wondering if you can "plug in" lsid resolution into these tools?
>>    
>>
>
>MD: that would surely be good. I have no experience with RDF frameworks, but everywhere I look I see URIs that are in fact URLs.
>
>
>
>  
>
>>Kevin
>>
>>
>>
>>    
>>
>>>>>Roger Hyam <roger at tdwg.org> 05/03/06 10:29 PM >>>
>>>>>          
>>>>>
>>Hi Rod,
>>
>> From the meeting report - which I am struggling to get back to - these two bullet points sum it up I think
>>
>>·         There are certain things for which LSIDs are not appropriate. 
>>It would be legal to use them for RDF resource identifiers for controlled vocabularies and XML Schema locations BUT we would have to extend existing software libraries to do this which is not desirable.
>>
>>·         *Recommendation:* LSIDs are not used for controlled 
>>vocabularies, ontologies or XML Schema locations. LSIDs should be used to refer to instances.
>>
>>Basically it was felt that if we used LSIDs for things like rdfs:Class definitions then any library that went off to fetch the definitions automatically would have to be extended so that it understood LSID resolution. On the other hand it was felt that use of LSIDs for real resources (things we are actually describing like specimens and people) was fine. Once an ontology is loaded then it is all fine though so to an extent this may be a false problem.
>>
>>We spent a long time talking about what is part of the ontology and what isn't and went round in circles (please lets not do it again). Basically class and property descriptions should be URL type URIs but instance URIs can be LSIDs. If you want to define the genus /Rhododendron/ as being an OWL DL class retrieved remotely then you should probably give it a URL. If you want to define it as a data item then use a LSID.
>>
>>I think Gregor's worries (correct me if I am wrong Gregor) are that in SDD (possibly our whole domain) many things could be considered classes and properties. i.e. Things you want your reasoner to use in the reasoning rather than simply reason about. In this case it may be better to have URLs for everything.
>>
>>There is a niggling doubt (in my mind) that we may come across 'cool' 
>>tools and libraries that assume that *all *resource URIs are URLs and that we would not be able to use them or would need to extend them if we use LSIDs. Imagine a semantic web browser where you click on a node and it fetches the associated resource to expand itself.
>>
>>I do occasionally struggle to see the advantages of LSIDs as GUIDs over just conventions for use of URLs but these may be matters of personal faith.  Another bullet point in the report says:
>>
>>·         *Recommendation: *GUIDs Group should issue a document clearly 
>>justifying adoption of GUID technology. The advantages need to be clearly explained.
>>
>>I'll try and get this report out ASAP but it looks very similar to the wiki page here:
>>
>>http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
>>
>>Obviously would be grateful for your thoughts.
>>
>>Roger
>>
>>
>>
>>Roderic Page wrote:
>>    
>>
>>>Dear Gregor,
>>>
>>>For the benefit of those not at TAG 1, can you please explain why 
>>>"LSIDs are not interoperable with semantic web technologies"?
>>>
>>>Regards
>>>
>>>Rod
>>>
>>>On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
>>>
>>>  
>>>      
>>>
>>>>Note that part of my concern about the use of concept when talking 
>>>>about classes/properties/data elements is that I more and more 
>>>>believe we will want to use ontology reasoners for uses other than 
>>>>software design, i.e. as part of what we currently consider data 
>>>>(taxon names, concepts, rank hierarchy, parts of organisms, 
>>>>properties of organisms, etc.). All these are ontological concepts, 
>>>>and efforts www.plantontology.org do use OWL to reason on them.
>>>>
>>>>The SDD presentation (the one not held in EDI, attached) contained 
>>>>some examples how we might want to query our data - in ways that
>>>>OWL-for-software-
>>>>design seems not to cover - and which using LSIDs would even prevent.
>>>>
>>>>Please discuss:
>>>>
>>>>http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
>>>>
>>>>http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
>>>>
>>>>Gregor
>>>>----------------------------------------------------------
>>>>Gregor Hagedorn (G.Hagedorn at bba.de)
>>>>Institute for Plant Virology, Microbiology, and Biosafety Federal 
>>>>Research Center for Agriculture and Forestry (BBA)
>>>>Königin-Luise-Str. 19           Tel: +49-30-8304-2220
>>>>14195 Berlin, Germany           Fax: +49-30-8304-2203
>>>>
>>>>The following section of this message contains a file attachment 
>>>>prepared for transmission using the Internet MIME message format.
>>>>If you are using Pegasus Mail, or any other MIME-compliant system, 
>>>>you should be able to save it or view it from within your mailer.
>>>>If you cannot, please ask your system administrator for assistance.
>>>>
>>>>   ---- File information -----------
>>>>     File:  SDD-TAG1.ppt
>>>>     Date:  23 Apr 2006, 18:10
>>>>     Size:  1056768 bytes.
>>>>     Type:  Unknown
>>>><SDD-TAG1.ppt>_______________________________________________
>>>>Tdwg-tag mailing list
>>>>Tdwg-tag at lists.tdwg.org
>>>>http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>>>>
>>>>    
>>>>        
>>>>
>>>----------------------------------------------------------------------
>>>--
>>>----------------------------------------
>>>Professor Roderic D. M. Page
>>>Editor, Systematic Biology
>>>DEEB, IBLS
>>>Graham Kerr Building
>>>University of Glasgow
>>>Glasgow G12 8QP
>>>United Kingdom
>>>
>>>Phone:    +44 141 330 4778
>>>Fax:      +44 141 330 2792
>>>email:    r.page at bio.gla.ac.uk
>>>web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>>
>>>Subscribe to Systematic Biology through the Society of Systematic 
>>>Biologists Website:  http://systematicbiology.org Search for taxon 
>>>names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
>>>Find out what we know about a species: http://ispecies.org Rod's rants 
>>>on phyloinformatics: http://iphylo.blogspot.com
>>>
>>>
>>>Send instant messages to your online friends 
>>>http://uk.messenger.yahoo.com
>>>
>>>
>>>_______________________________________________
>>>Tdwg-tag mailing list
>>>Tdwg-tag at lists.tdwg.org
>>>http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>>>
>>>  
>>>      
>>>
>>-- 
>>
>>-------------------------------------
>> Roger Hyam
>> Technical Architect
>> Taxonomic Databases Working Group
>>-------------------------------------
>> http://www.tdwg.org
>> roger at tdwg.org
>> +44 1578 722782
>>-------------------------------------
>>
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error.  If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
>>
>>The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.  
>>
>>Landcare Research
>>http://www.landcareresearch.co.nz
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>_______________________________________________
>>Tdwg-tag mailing list
>>Tdwg-tag at lists.tdwg.org
>>http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>>
>>    
>>
>
>_______________________________________________
>Tdwg-tag mailing list
>Tdwg-tag at lists.tdwg.org
>http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>  
>