[tdwg-tag] SourceForge LSID project websites broken - role for TDWG?
r.page at bio.gla.ac.uk
Tue Apr 7 17:19:27 CEST 2009
A few random comments:
> InstitutionCode/CollectionCode/CatalogueNumber triple and to the
> three main substitutable elements in an LSID. Some systems such as
> DOI may obscure the whoGeneratedTheData
This assumes that it's good to have lots of metadata embedded in the
identifier. This level of "branding" might be fine for specimens
(assuming each data provider has the ability to serve their own data),
but what about shared identifiers such as taxon names -- I suspect
having to "choose a brand" is going to be an obstacle to adoption for
just the identifiers that we most need to share. Identifiers such as
DOIs have less branding (although publishers have managed to attach
branding significant to the few digits after the "10." prefix).
Note also that DOIs (and Handles) can be queried for metadata, see
Tony Hammnd's OpenHandle project (http://www.crossref.org/CrossTech/2008/10/the_last_mile.html
and http://code.google.com/p/openhandle/), so we don't need to embed
this in the actual identifier itself.
> 1. The "urn:lsid:" part of the identifier serves as a clear
> statement of intent which is not present with an HTTP URI. We could
> mandate that ONLY http://purl.tdwg.org/ URIs count as GUIDs in our
> domain and that e.g. http://www.csiro.au/ URIs cannot do
Yes, but intent matters little unless backed up by actual services.
> You are also correct that the big issue with this is the question of
> ownership. Quite frankly, if we had believed in 2006 that
> institutions would be prepared to cede responsibility for handling
> their identifiers to a third party, the recommendations from the
> TDWG workshops would probably have been rather different. Part of
> the reason for adopting LSIDs was because institutions did not seem
> to want to use an identifier which might imply that a third-party
> was responsible for the data.
If commercial rivals usually at each others throats (e.g., publishers)
can get over these issues and form CrossRef, surely biodiversity
providers can get over this issue. That we can't suggests that the
field hasn't bought into the idea of global identifiers and sharing yet.
> I have said in the past "If persistence is important to you then keep
> your own copy." This is how it has worked for 100s of years in the
> library community. If the reason for having a centralised resolution
> mechanism is to try and support persistence then the centralised
> service should actually cache metadata (not data). I would imagine a
> scalable infrastructure would be quite simple to implement. Data could
> be stored in a Lucene index or Hadoop cluster or something. It would
> only be a very large hash table and only keep the latest version of
> the RDF.
This sounds a lot like CrossRef to me. Cache the metadata and provide
services on top. Deja vu all over again.
> No normal person is going to read this or type it in. I am afraid that
> when people started using UUIDs in LSIDs it blew the sociological
> argument for LSIDs out of the water for me. I had carefully designed
> BCI identifiers to be human readable and writable like this
Yep. Plus the irony of having a globally unique identifier (the UUID)
as part of another globally unique identifier (LSID), which is then
part of another identifier (the HTTP proxied version of the LSID).
We're not making things easy for ourselves.
So, for the sake of a straw man, why don't we:
1. Use DOIs/Handles, assigned by a central agency
2. Provide a central set of services running on top of these
identifiers, modelled upon CrossRef but specific to our data types.
Among the services are an HTTP proxy that supports 303 redirects (a la
3. The central service monitors data availability and has a "league
table" of performance (or some related measure of data quality). It
has a central cache to ensure data consumers are minimally affected if
a provider goes offline.
If we are wedded to HTTP then LSIDs don't make much sense. If we have
concerns about HTTP-based identifiers, then why not use a system that
has already proved itself (DOI/Handle)? Surely we need a better
argument than the "Concorde fallacy" that we've invested so much
effort in LSIDs so far it's too late to stop...
Professor of Taxonomy
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
More information about the tdwg-tag