[tdwg-content] IPNI versioned LSIDs for names [SEC=UNCLASSIFIED]

Fri Jan 7 01:32:10 CET 2011

On 06/01/2011, at 6:16 PM, Peter DeVries wrote:

>  IPNI seemed to work.  However, I was somewhat appalled to observe that
> they seem to change the revision identifier any time that they change
> any part of the metadata.  That renders the LSID useless as a permanent
> GUID for the name and I believe is inconsistent with the design of LSIDs
> where the revision is only supposed to change if the underlying data
> itself (NOT metadata) changes.
> 
> It is my understanding that this is exactly how LSID's are supposed to work and one reason I don't like them.

My understanding is that this is not the case, and I'd suggest that it is possible that the approach taken by this resolution service may be not entirely correct. The specification mandates "It is allowed to have an LSID representing [I'd have preferred 'identifying'] abstract entities or concepts. If an LSID represents real data, the LSID resolution service must resolve always the same set [I'd have preferred the word 'sequence'] of bytes representing the data".

To make sense of this, we must discuss the distinction between 'data' and 'metadata' in this context. This distinction is the same as the linked data "information resource"/"other resource" distinction and is old news for a lot of people here.

A linked data URI or an LSID may serve as the name of a real-world thing: a specimen, Fluffy the Tiger at London Zoo, the idea of 'Fabaceae' expressed in Mabberley's Plant Book Edn 3. In those cases, we cannot stuff the actual real-world thing down the fibre-optic pipe: it either won't fit (in the case of fluffy the tiger) or is rather too abstract (in the case of a taxonomic name).

However, we do have an enormous amount of information *about* the things identified by that LSID or URI, and we can serve it up as RDF. We might provide you with Fluffy's weight and age, we might provide you the name's parts, some accepted representations of it, and the id of its publication.

For things like this, there is no requirement that the result you get back be byte-for-byte identical. As Peter points out - that is actually a bit pointless.

On the other hand, we may have documents and media which most certainly can be stuffed down the pipe: pdfs, audio clips, what have you. These things are LSID "data", linked data "information resources". The requirement in the world of LSIDs is that the data must always be byte-for-byte identical, and that's where version numbers come into play.

The point of confusion is that the RDF metadata is also "stuff that can be put down the pipe" - you could understand it as data rather than metadata if you chose. The crucial point is that urn:lsid:zoo.uk:individual:Fluffy is not the name of some particular chunk of RDF, it is the name of Fluffy over in that cage there. It's obvious in the case of Fluffy, less obvious in the case of "Australhypopus flagellifer Fain & Friend, 1984", but the distinction holds. A name, or a taxon, is not the same thing as a chunk of RDF describing it. The spec does not at all mandate that that that description - the metadata - be static.

The LSID version identifier seems to me a way of mitigating the "data must be static" problem, of handling it when people update an image or a pdf document that is named with an lsid. For example: the zoo might formally publish an infectious materials handling policy that gets updated from time to time, and wants to have an LSID referring not simply to "the policy", but to the particular PDF document whose publishing is an important act by the zoo. The version mechanism allows you to have a persistent lsid for "the current policy" or perhaps "the collection of these important policy documents", while also allowing you to have a different LSID for the pdf document promulgated 1998. The first lsid has no data - it refers to an abstract thing - but its metadata will indicate what versioned LSID is the current one.

(Without using LSID version numbers, another solution for this is to make use of namespaces: zoo.documentseries and zoo.pfds, for instance. The point of lsid version numbers is that you can see a relationship between the pfd and the series it is part of by looking at the lsid itself. http URIs can be structured as deeply as you like and the problem does not arise.)

This might be a sensible way for IPNI to have its cake and eat it too if the goal of versioning is to keep all of the old versions available. But if the version business at the IPNI resolution service is simply  - I hesitate to suggest it - a misunderstanding of the spec, then perhaps it ought to be fixed.

_______________________________________________

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110107/2724c5d0/attachment.html