A few comments on semantic opacity...
1. My examples ("urn:lsid:csiro.tdwg.org:anic:12345") were deliberately transparent (or at least translucent) to make it easier to follow the example, but I would have no real problem with them having a form more like "urn:lsid:bio-id.org:9876:12345".
2. I think a single-minded drive towards semantic opacity would be as quixotic and self-destructive as anything we could do. UUIDs are nicely opaque, and we could build a DOI-like system which maps individual UUIDs to their current locations. Such an approach would be painful and an administrative nightmare. I also suspect that such opaque identifiers would be resisted by most users. If we step away from such a pure implementation, the alternatives all embed some kind of semantic cues which make the system operate better. The form of a DOI encodes relevant data on the source of the object. PURLs and LSIDs do the same. The point with semantic opacity in the LSID specification is that it is not possible for a client to make inferences about the location of data based on the subelements within the LSID. It is up to the resolver implementations to determine how to return the data. Once this point is accepted, I would in fact say that the presence of some semantic clues within the identifier text is a good thing. The clues may for various reasons no longer conform to the reality of how the metadata are managed, but a user may still rapidly glean relevant indications whether an identifier is worth resolving (it may indicate that it relates to a nomenclatural record, or that it, at least originally, was minted by some respected source). I see such clues as having the same kind of value which has enabled Linnaean nomenclature to persist so long. My preference for LSIDs would therefore be for them to be like the ones Roger minted for BCI.
3. I also note that this discussion has suggested remarkable near-unanimity from many people in their distaste for LSIDs. However I fear that the level of agreement would be little higher if we were discussing DOIs, or PURLs. Some of the objections have been that LSIDs do not fit well with the key technologies of the semantic web and that something more like PURLs would be the right course to follow. Other objections have related to the semantic near-transparency of many LSIDs or the absence of strongly centralised support with the implication that something more like DOIs would be better. Both arguments have value, but they point in different directions. The various identifier schemes make up a landscape within which no identifier scheme represents an adaptive peak in all contexts. We need to develop applicability statements for how to use several of these schemes as alternatives for biodiversity data and we need to identify the drivers which may guide different providers to different schemes for different purposes.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: Bob Morris [mailto:morris.bob@gmail.com] Sent: Wednesday, 8 April 2009 2:04 AM To: Roderic Page Cc: Hobern, Donald (Entomology, Black Mountain); Roger Hyam; tdwg-tag@lists.tdwg.org Subject: Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG?
A few non random comments on Rod's random comments on Donald's proposal
On Tue, Apr 7, 2009 at 11:19 AM, Roderic Page r.page@bio.gla.ac.uk wrote:
A few random comments:
Donald wrote:
InstitutionCode/CollectionCode/CatalogueNumber triple and to the three main substitutable elements in an LSID. Some systems such as DOI may obscure the whoGeneratedTheData
Rod responded:
This assumes that it's good to have lots of metadata embedded in the identifier. This level of "branding" might be fine for specimens (assuming each data provider has the ability to serve their own data), but what about shared identifiers such as taxon names -- I suspect having to "choose a brand" is going to be an obstacle to adoption for just the identifiers that we most need to share. Identifiers such as DOIs have less branding (although publishers have managed to attach branding significant to the few digits after the "10." prefix).
Bob cites: "LSIDs are intended to be semantically opaque, in that the LSID assigned to a resource should not be counted on to describe the characteristics or attributes of the resource that the LSID refers to. The users of the LSIDs are permitted to use individual components (as specified elsewhere in this document) of LSIDs - although the LSID component parts themselves should be treated as opaque pieces of the identifier." LSID spec, Section 8.
It's regrettable that the LSID spec is so poorly written that it permits the useless term "should". Alas, I suppose that leaves room for argument with my position that LSIDs with embedded metadata are not LSIDs--they are something else based on the LSID syntax. There's nothing inherently wrong with, oh, say, a Handles implementation based on prefacing LSID syntax with something controlled. See below.
Rod remarks:
Note also that DOIs (and Handles) can be queried for metadata, see Tony Hammnd's OpenHandle project (http://www.crossref.org/CrossTech/2008/10/the_last_mile.html and http://code.google.com/p/openhandle/), so we don't need to embed this in the actual identifier itself.
Bob replies DOIs \are/ Handles. This is the (unstated?) reason that http://wiki.tdwg.org/twiki/bin/view/GUID/TechnologyComparison is filled with comparisons of the form "DOI: Same as Handles"
DOI is an implementation of Handles, with the additional treatment of things about which Handles is silent . See http://www.doi.org/factsheets/DOIHandle.html When I read that document casually, I come to the initial conclusion that Donald's proposal is essentially doing the same kind of extension to Handles (possibly a Good Thing if correct), except for allowing metadata in the identifier (yech!).
--Bob