A few random comments:
Donald wrote:
InstitutionCode/CollectionCode/CatalogueNumber triple and to the three main substitutable elements in an LSID. Some systems such as DOI may obscure the whoGeneratedTheData
This assumes that it's good to have lots of metadata embedded in the identifier. This level of "branding" might be fine for specimens (assuming each data provider has the ability to serve their own data), but what about shared identifiers such as taxon names -- I suspect having to "choose a brand" is going to be an obstacle to adoption for just the identifiers that we most need to share. Identifiers such as DOIs have less branding (although publishers have managed to attach branding significant to the few digits after the "10." prefix).
Note also that DOIs (and Handles) can be queried for metadata, see Tony Hammnd's OpenHandle project (http://www.crossref.org/CrossTech/2008/10/the_last_mile.html and http://code.google.com/p/openhandle/), so we don't need to embed this in the actual identifier itself.
- The "urn:lsid:" part of the identifier serves as a clear
statement of intent which is not present with an HTTP URI. We could mandate that ONLY http://purl.tdwg.org/ URIs count as GUIDs in our domain and that e.g. http://www.csiro.au/ URIs cannot do
Yes, but intent matters little unless backed up by actual services.
You are also correct that the big issue with this is the question of ownership. Quite frankly, if we had believed in 2006 that institutions would be prepared to cede responsibility for handling their identifiers to a third party, the recommendations from the TDWG workshops would probably have been rather different. Part of the reason for adopting LSIDs was because institutions did not seem to want to use an identifier which might imply that a third-party was responsible for the data.
If commercial rivals usually at each others throats (e.g., publishers) can get over these issues and form CrossRef, surely biodiversity providers can get over this issue. That we can't suggests that the field hasn't bought into the idea of global identifiers and sharing yet.
Roger wrote:
I have said in the past "If persistence is important to you then keep your own copy." This is how it has worked for 100s of years in the library community. If the reason for having a centralised resolution mechanism is to try and support persistence then the centralised service should actually cache metadata (not data). I would imagine a scalable infrastructure would be quite simple to implement. Data could be stored in a Lucene index or Hadoop cluster or something. It would only be a very large hash table and only keep the latest version of the RDF.
This sounds a lot like CrossRef to me. Cache the metadata and provide services on top. Deja vu all over again.
No normal person is going to read this or type it in. I am afraid that when people started using UUIDs in LSIDs it blew the sociological argument for LSIDs out of the water for me. I had carefully designed BCI identifiers to be human readable and writable like this
Yep. Plus the irony of having a globally unique identifier (the UUID) as part of another globally unique identifier (LSID), which is then part of another identifier (the HTTP proxied version of the LSID). We're not making things easy for ourselves.
So, for the sake of a straw man, why don't we:
1. Use DOIs/Handles, assigned by a central agency
2. Provide a central set of services running on top of these identifiers, modelled upon CrossRef but specific to our data types. Among the services are an HTTP proxy that supports 303 redirects (a la linked data)
3. The central service monitors data availability and has a "league table" of performance (or some related measure of data quality). It has a central cache to ensure data consumers are minimally affected if a provider goes offline.
If we are wedded to HTTP then LSIDs don't make much sense. If we have concerns about HTTP-based identifiers, then why not use a system that has already proved itself (DOI/Handle)? Surely we need a better argument than the "Concorde fallacy" that we've invested so much effort in LSIDs so far it's too late to stop...
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html