[tdwg-tag] SourceForge LSID project websites broken - role for TDWG?

Roderic Page r.page at bio.gla.ac.uk
Tue Apr 7 17:19:27 CEST 2009

A few random comments:

Donald wrote:

> InstitutionCode/CollectionCode/CatalogueNumber triple and to the  
> three main substitutable elements in an LSID.  Some systems such as  
> DOI may obscure the whoGeneratedTheData

This assumes that it's good to have lots of metadata embedded in the  
identifier. This level of "branding" might be fine for specimens  
(assuming each data provider has the ability to serve their own data),  
but what about shared identifiers such as taxon names -- I suspect  
having to "choose a brand" is going to be an obstacle to adoption for  
just the identifiers that we most need to share. Identifiers such as  
DOIs have less branding (although publishers have managed to attach  
branding significant to the few digits after the "10." prefix).

Note also that DOIs (and Handles) can be queried for metadata, see  
Tony Hammnd's OpenHandle project (http://www.crossref.org/CrossTech/2008/10/the_last_mile.html 
  and http://code.google.com/p/openhandle/), so we don't need to embed  
this in the actual identifier itself.

> 1. The "urn:lsid:" part of the identifier serves as a clear  
> statement of intent which is not present with an HTTP URI.  We could  
> mandate that ONLY http://purl.tdwg.org/ URIs count as GUIDs in our  
> domain and that e.g. http://www.csiro.au/ URIs cannot do

Yes, but intent matters little unless backed up by actual services.

> You are also correct that the big issue with this is the question of  
> ownership.  Quite frankly, if we had believed in 2006 that  
> institutions would be prepared to cede responsibility for handling  
> their identifiers to a third party, the recommendations from the  
> TDWG workshops would probably have been rather different.  Part of  
> the reason for adopting LSIDs was because institutions did not seem  
> to want to use an identifier which might imply that a third-party  
> was responsible for the data.

If commercial rivals usually at each others throats (e.g., publishers)  
can get over these issues and form CrossRef, surely biodiversity  
providers can get over this issue. That we can't suggests that the  
field hasn't bought into the idea of global identifiers and sharing yet.

Roger wrote:

> I have said in the past "If persistence is important to you then keep
> your own copy." This is how it has worked for 100s of years in the
> library community. If the reason for having a centralised resolution
> mechanism is to try and support persistence then the centralised
> service should actually cache metadata (not data). I would imagine a
> scalable infrastructure would be quite simple to implement. Data could
> be stored in a Lucene index or Hadoop cluster or something. It would
> only be a very large hash table and only keep the latest version of
> the RDF.

This sounds a lot like CrossRef to me. Cache the metadata and provide  
services on top. Deja vu all over again.

> No normal person is going to read this or type it in. I am afraid that
> when people started using UUIDs in LSIDs it blew the sociological
> argument for LSIDs out of the water for me. I had carefully designed
> BCI identifiers to be human readable and writable like this

Yep. Plus the irony of having a globally unique identifier (the UUID)  
as part of another globally unique identifier (LSID), which is then  
part of another identifier (the HTTP proxied version of the LSID).  
We're not making things easy for ourselves.

So, for the sake of a straw man, why don't we:

1. Use DOIs/Handles, assigned by a central agency

2. Provide a central set of services running on top of these  
identifiers, modelled upon CrossRef but specific to our data types.  
Among the services are an HTTP proxy that supports 303 redirects (a la  
linked data)

3. The central service monitors data availability and has a "league  
table" of performance (or some related measure of data quality). It  
has a central cache to ensure data consumers are minimally affected if  
a provider goes offline.

If we are wedded to HTTP then LSIDs don't make much sense. If we have  
concerns about HTTP-based identifiers, then why not use a system that  
has already proved itself (DOI/Handle)? Surely we need a better  
argument than the "Concorde fallacy" that we've invested so much  
effort in LSIDs so far it's too late to stop...



Roderic Page
Professor of Taxonomy
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

More information about the tdwg-tag mailing list