Re: [Tdwg-tag] Why we should not use LSID

4 May 2006

      Just when everything seemed settled... ;-)

For those wanting to revisit all this, there's also a nice series of  
presentations at http://www.dcc.ac.uk/events/pi-2005/ .

The ARK identifier is an example where appending symbols to the  
identifier determines what you get (e.g., '?' for metadata). I guess  
one could do something similar for PURLs.

Why not have, *ahem* "BioPURLs" (wince), that is, PURLs deployed by the  
biodiversity community with conventions for returning what we want?

I wonder whether, if we go down the PURL route, won't we eventually  
converge on Handles/DOIs, which have administration tools in place?  
Ultimately, centralisation requires good tools and good support.

In any event, is it possible to separate GUIDs from the whole metadata  
side of things?  And given that every GUID system currently in  
operation uses (or can use) URLs, can't we postpone deciding on this  
until we have a few test systems built and we have a real idea of  
what's involved. In a sense, wrap everything in URLs, the GUID is  
either the URL or embedded in the URL, then see what happens.

And yes, I'm sure this pretty much contradicts everything in my earlier  
posts...

Regards

Rod

On 4 May 2006, at 12:41, Steven Perry wrote:
...
PURLs are centrally managed indirection-through-redirection over HTTP.
Because resolution is only an HTTP call away, PURLs are both easy to
understand and very easy to consume.  PURLs are also powerful because
anything that can be assigned a URL can have a PURL (maybe that makes
them too powerful).
There are some advantages to PURLs:
a1.) PURLs are easy to consume
a2.) PURLs require a central resolver which may provide greater
reliability than a network with many LSID authorities
a3.) PURLs make it easy to solve the "single resource change in
custodianship" problem
And I see some disadvantages to PURLs:
d1.) PURLs require a central resolver which is a single point of  
failure
d2.) There are no conventions about what to expect when you resolve a  
PURL
d3.) PURLs may be easy to consume but they're not easy to produce
d3.) PURLs can't be distinguished from URLs by software
I'll address each with a sentence or two.
a1.) PURLs are easy to consume
Because PURLs rely on simple HTTP GET, they are trivial to resolve.   
One
can use a web browser to manually resolve a PURL or use any of a large
number of programs or software libraries for fetching URL contents via
HTTP GET.  This is the primary advantage of PURLs.
a2.)  PURLs require a central resolver which may provide greater
reliability than a network with many LSID authorities
If we assume that it's equally likely that a given GUID could be
resolved by any of the resolvers on the network, then the reliability  
of
the GUID network reduces to the average resolver reliability.  If it
turns out that there are 100 LSID resolvers but at any given time 20  
are
likely to be non-responsive, then it's quite possible that a PURL based
network with a single well managed resolver (98 % uptime) could provide
better quality of service than an LSID-based network.
a3.) PURLs make it easy to solve the "single resource change in
custodianship" problem
If the ownership or location of a data object changes, its PURL  
wouldn't
change, it would merely redirect you to the new location of the object.
This is a potential problem with LSIDs because change in custodianship
of an entire authority is easy to deal with, but change in  
custodianship
of a single identified object is difficult to handle.
d1.) PURLs require a central resolver which is a single point of  
failure
A PURL resolver acts as a centralized registry.  While a single PURL
resolver my provide better reliability than a distributed network of
LSID resolvers, centralization comes at a cost.  A central PURL  
resolver
is a single point of failure.  To guard against failure, the community
must guarantee that the organization hosting the resolver will be  
funded
over time and that it will work to prevent hardware issues, network
outages, denial of service attacks, etc.  The community may also demand
that the organization that hosts the PURL resolver provide technical
support.
d2.) There are no conventions about what to expect when you resolve a  
PURL
Under the OCLC's purl.org resolver, there are no conventions about what
you get when you resolve a PURL.  A PURL can point to a chunk of RDF
describing a particular specimen, a DVD rip of a Bollywood movie, a
second PURL that redirects to the first PURL in an endless loop, or a
web application that returns no content but sends a signal to your  
fancy
new networked coffee machine telling it to make a double espresso.   
Some
of these examples are silly, but my point is that PURL only provides  
for
the possibility of persistence through indirection.  We're not
interested solely in indirection.  We want to build a set of services  
on
top of whatever GUID system we select.  This set of services requires
common agreement on what you get when you resolve a GUID.  The LSID  
spec
attempts to address this issue by splitting the universe into data and
metadata and strongly suggesting the use of RDF for metadata.  There is
no agreement on what you get when you resolve a PURL, and even if we
came to agreement within our community there's no software in place to
help us enforce these conventions.
d3.) PURLs may be easy to consume but they're not necessarily easy to
produce
PURLs are easy to resolve but hard to register.  A central PURL  
resolver
has to provide functionality for registering PURLs and
assigning/reassigning live URLs to them.  It's simple to envision a
web-based form for registering PURLs (see
http://www.purl.org/maint/choose.html), but I imagine that most of the
time new PURLs will be requested by a piece of software that's trying  
to
publish a large number of resources.  This means that the PURL resolver
should provide a remote service (software interface) for registering a
new PURL, in part to facilitate automated registration of a large  
number
of identifiers.  Interestingly enough, I don't think the OCLC PURL
resolver implementation provides this functionality.  I imagine that
most people who want to register a large number of PURLs work around  
the
problem by registering what OCLC calls a "partial redirection"
(http://purl.oclc.org/docs/inet96.html#partial).  I don't consider
partial redirects to be GUIDs because they allow the use of a domain as
a prefix for a localized URL hierarchy.  In order to guarantee that I
don't mess up your PURLs, the OCLC PURL resolver require authentication
in order to register a new PURL.  Authentication systems aren't easy to
implement or support.
d3.) PURLs can't be distinguished from URLs by software
Most GUID systems come with a set of assumptions about when and how  
it's
appropriate to use a GUID.  In addition to distributed resolution we
might want to use GUIDs for things like equality testing, versioning,  
or
object composition.  Each of these uses raise questions that need to be
sorted out.  For instance, with equality testing, do we want to be able
to have software say that two things are equal if their GUIDs are
bitwise identical?  If two GUIDs are not bitwise identical, can they
refer to the same object?  Do we require that different versions of the
same object have the same GUID, different GUIDs with a relationship
between them asserted in metadata, or the same base GUID with a
different version component tacked onto the end?  What about different
representations (formats) of the same thing (say an XML and an RDF
version)?  Can they have the same GUID?  How does our object equality
testing by GUID choice affect our choice of how to do versioning? How  
do
we actually compose a compound object out of simple related objects?
All of these questions require careful consideration and are affected  
by
our choice of a GUID system.
I guess what I'm trying to say is that we're not interested in GUIDs  
for
the sake of GUIDs alone, but instead require them for specific uses  
that
extend beyond simple naming and resolution.  I hope that we'll examine
some of these questions and come to agreement on our conventions for
GUID use.  Once we have these conventions (either because they're
embedded in the GUID scheme we choose or because we've arrived at them
during meetings and documented them appropriately), we'll need to write
software that operates on these assumptions and enforces these
conventions.  That software will have to be able to distinguish a GUID
from a non-GUID because we can do certain things with GUIDd objects  
that
we can't do with non-GUIDd ones.  With PURL this is problematic because
a piece of software cannot easily distinguish a PURL from a URL yet  
they
probably ought to be treated differently.
I'm not a huge fan of LSID.  I think a urn based identification system
introduces a barrier to entry for some.  I think the SOAP/web services
stuff in the LSID spec and the Java toolkit from IBM introduce another
barrier.  PURL may be easier to use (at least for resolution), but it
doesn't go as far as LSID in laying the groundwork for a network of
services that can at the very least share data, if not actually help
researchers do something interesting with it.
I'm not against inventing something new that's essentially a set of
restrictions on top of PURL.  Maybe we could get the best of both  
worlds
-- the simplicity of PURL with the conventions of LSID.
-Steve
Döring, Markus wrote:
...
Hello,
please see my comments inline below. I will try to use PURLs not only  
in the purl.org sense, but also as a simple way of creating stable  
URLs through a centralized URL redirection. If you consider this I  
cant see relevant benefits of LSIDs that are not shared by PURLs.  
Considering the potential problems we might run into with any  
software framework (not only RDF) that includes resolving I am in  
strong favor of simple URLs.
--
Markus
-----Ursprüngliche Nachricht-----
...
Von: tdwg-tag-bounces@lists.tdwg.org  
[mailto:tdwg-tag-bounces@lists.tdwg.org] Im Auftrag von Kevin  
Richards
Gesendet: Mittwoch, 3. Mai 2006 13:45
An: tdwg-tag@lists.tdwg.org
Betreff: Re: [Tdwg-tag] Why we should not use LSID
Roger
I agree that PURLs are a perfectly good option for our GUID needs,  
and that they would probably be one of the easier technologies to  
get "working".
Like you I really had to think again to work out the benefits of  
LSIDs over PURLs, expecially considering the disadvantage you have  
mentioned.
Some of the benefits of LSIDs include:
- clearly separate data and metadata services (as you have mentioned)
MD: From what I've understood from the GUID group nearly only  
metadata is used though. So if we deal with metadata only then its  
not a big practical difference at least.
...
- separation from domain names - as far as I understand, the PURL  
still requires domain name resolution of the actual ID url to obtain  
the resolution server address - this ties you to a particular url  
format
MD: We could easily setup a redirection service  
http://purl.gbif.net/AUTHORITY/whatever that redirects to whereever  
you want to keep your resolver. Just the authority URL part needs to  
be centrally managed.
MD: This leads me to a questions about LSIDs which I never  
understood. LSID are bound to domain name resolution and their  
guarantee to be globally unique is heavily based on DNS. So to me a  
central body keeping track of LSID authorities is required to  
guarantee life long uniqueness of LSID URNs. If "bgbm.org" is owned  
by someone different that also wants to set up a LSID authority, how  
does he know there was one already under that domain? He could be  
reissuing the same URN (LSID) again. Thats exactly what people use as  
an argument against URLs, but its also true for LSIDs as far as I  
understand the technology.
...
- LSID assigning service can be managed by provider organisation  
("ownership" of data and IDs is often high on a data provider's  
requirements list)
MD: so can PURLs
...
- LSIDs provide a "standard" technology for resolving and serving up  
data objects - ie every provider will have the LSID authority  
services running on their server that will serve up data and  
metadata (+ other services if required) in the same way, for every  
provider
MD: URLs are even more standard I would think. Take Apache and there  
you go.
...
- related to the previous point, a standard mechanism for third  
party annotations of LSIDs is provided with every LSID server  
implementation
MD: Annotea (for RDF) uses simple HTTP. As Rod said pingbacks are a  
way to go as well (over http). And I am sure there are many other  
standards existing for URLs.
...
- same URN LSID can be used for resolution of http, ftp, soap and  
tcp protocols (unsure how PURLs handle this?) ...other cool stuff,  
I'm sure, that I cant think of right now - too late at night
MD: true. but is that needed?
...
Probably best to avoid LSIDs for RDF class identfiers etc, but do  
the semantic web tools you are talking about have no way of  
recognising different url resolution types - I'm wondering if you  
can "plug in" lsid resolution into these tools?
MD: that would surely be good. I have no experience with RDF  
frameworks, but everywhere I look I see URIs that are in fact URLs.
...
Kevin
...
...
...
Roger Hyam <roger@tdwg.org> 05/03/06 10:29 PM >>>
Hi Rod,
From the meeting report - which I am struggling to get back to -  
these two bullet points sum it up I think
·         There are certain things for which LSIDs are not  
appropriate.
It would be legal to use them for RDF resource identifiers for  
controlled vocabularies and XML Schema locations BUT we would have  
to extend existing software libraries to do this which is not  
desirable.
·         *Recommendation:* LSIDs are not used for controlled
vocabularies, ontologies or XML Schema locations. LSIDs should be  
used to refer to instances.
Basically it was felt that if we used LSIDs for things like  
rdfs:Class definitions then any library that went off to fetch the  
definitions automatically would have to be extended so that it  
understood LSID resolution. On the other hand it was felt that use  
of LSIDs for real resources (things we are actually describing like  
specimens and people) was fine. Once an ontology is loaded then it  
is all fine though so to an extent this may be a false problem.
We spent a long time talking about what is part of the ontology and  
what isn't and went round in circles (please lets not do it again).  
Basically class and property descriptions should be URL type URIs  
but instance URIs can be LSIDs. If you want to define the genus  
/Rhododendron/ as being an OWL DL class retrieved remotely then you  
should probably give it a URL. If you want to define it as a data  
item then use a LSID.
I think Gregor's worries (correct me if I am wrong Gregor) are that  
in SDD (possibly our whole domain) many things could be considered  
classes and properties. i.e. Things you want your reasoner to use in  
the reasoning rather than simply reason about. In this case it may  
be better to have URLs for everything.
There is a niggling doubt (in my mind) that we may come across 'cool'
tools and libraries that assume that *all *resource URIs are URLs  
and that we would not be able to use them or would need to extend  
them if we use LSIDs. Imagine a semantic web browser where you click  
on a node and it fetches the associated resource to expand itself.
I do occasionally struggle to see the advantages of LSIDs as GUIDs  
over just conventions for use of URLs but these may be matters of  
personal faith.  Another bullet point in the report says:
·         *Recommendation: *GUIDs Group should issue a document  
clearly
justifying adoption of GUID technology. The advantages need to be  
clearly explained.
I'll try and get this report out ASAP but it looks very similar to  
the wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
Obviously would be grateful for your thoughts.
Roger
Roderic Page wrote:
...
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why
"LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
...
Note that part of my concern about the use of concept when talking
about classes/properties/data elements is that I more and more
believe we will want to use ontology reasoners for uses other than
software design, i.e. as part of what we currently consider data
(taxon names, concepts, rank hierarchy, parts of organisms,
properties of organisms, etc.). All these are ontological concepts,
and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained
some examples how we might want to query our data - in ways that
OWL-for-software-
design seems not to cover - and which using LSIDs would even  
prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety Federal
Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203
The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.
---- File information -----------
    File:  SDD-TAG1.ppt
    Date:  23 Apr 2006, 18:10
    Size:  1056768 bytes.
    Type:  Unknown
<SDD-TAG1.ppt>_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
-------------------------------------------------------------------- 
--
--
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org Search for taxon
names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org Rod's  
rants
on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends
http://uk.messenger.yahoo.com
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--
-------------------------------------
Roger Hyam
Technical Architect
Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
roger@tdwg.org
+44 1578 722782
-------------------------------------
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
+++++++
WARNING: This email and any attachments may be confidential and/or  
privileged. They are intended for the addressee only and are not to  
be read, used, copied or disseminated by anyone receiving them in  
error.  If you are not the intended recipient, please notify the  
sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not  
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
+++++++
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
------------------------------------------------------------------------ 
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com

Send instant messages to your online friends http://uk.messenger.yahoo.com