Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG?

7 Apr 2009

      A few random comments:

Donald wrote:
...
InstitutionCode/CollectionCode/CatalogueNumber triple and to the  
three main substitutable elements in an LSID.  Some systems such as  
DOI may obscure the whoGeneratedTheData
This assumes that it's good to have lots of metadata embedded in the  
identifier. This level of "branding" might be fine for specimens  
(assuming each data provider has the ability to serve their own data),  
but what about shared identifiers such as taxon names -- I suspect  
having to "choose a brand" is going to be an obstacle to adoption for  
just the identifiers that we most need to share. Identifiers such as  
DOIs have less branding (although publishers have managed to attach  
branding significant to the few digits after the "10." prefix).

Note also that DOIs (and Handles) can be queried for metadata, see  
Tony Hammnd's OpenHandle project (http://www.crossref.org/CrossTech/2008/10/the_last_mile.html 
  and http://code.google.com/p/openhandle/), so we don't need to embed  
this in the actual identifier itself.
...
1. The "urn:lsid:" part of the identifier serves as a clear  
statement of intent which is not present with an HTTP URI.  We could  
mandate that ONLY http://purl.tdwg.org/ URIs count as GUIDs in our  
domain and that e.g. http://www.csiro.au/ URIs cannot do
Yes, but intent matters little unless backed up by actual services.
...
You are also correct that the big issue with this is the question of  
ownership.  Quite frankly, if we had believed in 2006 that  
institutions would be prepared to cede responsibility for handling  
their identifiers to a third party, the recommendations from the  
TDWG workshops would probably have been rather different.  Part of  
the reason for adopting LSIDs was because institutions did not seem  
to want to use an identifier which might imply that a third-party  
was responsible for the data.
If commercial rivals usually at each others throats (e.g., publishers)  
can get over these issues and form CrossRef, surely biodiversity  
providers can get over this issue. That we can't suggests that the  
field hasn't bought into the idea of global identifiers and sharing yet.

Roger wrote:
...
I have said in the past "If persistence is important to you then keep
your own copy." This is how it has worked for 100s of years in the
library community. If the reason for having a centralised resolution
mechanism is to try and support persistence then the centralised
service should actually cache metadata (not data). I would imagine a
scalable infrastructure would be quite simple to implement. Data could
be stored in a Lucene index or Hadoop cluster or something. It would
only be a very large hash table and only keep the latest version of
the RDF.
This sounds a lot like CrossRef to me. Cache the metadata and provide  
services on top. Deja vu all over again.
...
No normal person is going to read this or type it in. I am afraid that
when people started using UUIDs in LSIDs it blew the sociological
argument for LSIDs out of the water for me. I had carefully designed
BCI identifiers to be human readable and writable like this
Yep. Plus the irony of having a globally unique identifier (the UUID)  
as part of another globally unique identifier (LSID), which is then  
part of another identifier (the HTTP proxied version of the LSID).  
We're not making things easy for ourselves.

So, for the sake of a straw man, why don't we:

1. Use DOIs/Handles, assigned by a central agency

2. Provide a central set of services running on top of these  
identifiers, modelled upon CrossRef but specific to our data types.  
Among the services are an HTTP proxy that supports 303 redirects (a la  
linked data)

3. The central service monitors data availability and has a "league  
table" of performance (or some related measure of data quality). It  
has a central cache to ensure data consumers are minimally affected if  
a provider goes offline.

If we are wedded to HTTP then LSIDs don't make much sense. If we have  
concerns about HTTP-based identifiers, then why not use a system that  
has already proved itself (DOI/Handle)? Surely we need a better  
argument than the "Concorde fallacy" that we've invested so much  
effort in LSIDs so far it's too late to stop...

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html