[tdwg-guid] LSID metadata persistence (or lack thereof)

Sally Hinchcliffe S.Hinchcliffe at rbgkew.org.uk
Fri Jul 13 17:04:00 CEST 2007

I think the non-immutability of metadata is correct. Basically we 
would all like everyone else's metadata to stay the same, but we 
want to be able to change our own metadata at will (correcting 
errors, improving data quality, adding new information etc.)

Versioning has to be the answer for this, for those databases (e.g. 
IPNI) that can support it. However all of the conversations I've had 
regarding LSIDs and verisoning have said that:
1. Versioning in LSIDs is more or less deprecated
2. Versioning in LSIDs is for Data, not Metadata

Accordingly, for ipni, given that we have versioning, we bodged in a 
'versioned as' field in the metadata to handle the fact that we could 
both give you a version number AND supply you with a (hopefully 
unchanged) copy of the metadata based on the version you had if 
that's what you want. 

But that's anticipating Ricardo's next thread so I'll leave it there.


>     Hi there folks,
>     As Chuck mentioned a few weeks ago, we do have a few outstanding 
> issues to address regarding LSIDs. I would like to discuss those one by 
> one, in an orderly manner, and reach consensus as much as we can. Then 
> we can sum them up in a TDWG standard, possibly by or shortly after the 
> Bratislava conference.
>     The first issue I would like to discuss is *LSID metadata 
> persistence*. First, let me remind you of a corollary established by the 
> LSID specification:
> *            Corollary 1: *_LSIDs are not guaranteed to be resolvable 
> indefinitely._
>     In other words, there is no guarantee that one will always be able 
> to retrieve the data associated with an LSID as the authority may choose 
> (or be forced) not  to resolve an LSID anymore.
>     Second, let me distinguish this kind of persistence I'm talking 
> about from other two related concepts (which we'll not discuss in this 
> thread):
>         1) *Persistence of Assignment: *Once assigned to an object, an 
> LSID is indefinitely associated with it. The same LSID cannot be 
> assigned to another object. Ever! The LSID may not be resolvable 
> anymore, but it cannot be assigned to another object. This is 
> established by the LSID specification.
>         2) *Persistence of LSID Data: *The data associated with an LSID 
> (i.e, the byte stream returned by the LSID getData call) must never 
> change. Although the LSID may not be resolvable anymore (according to 
> corollary 1), the data associated with an LSID must never ever change. 
> That's defined by the LSID spec, too.
>     What I want to discuss here is the *persistence of LSID metadata* 
> (what is returned by the getMetadata call) or the lack thereof.
>     A use case associated with *metadata persistence* is when someone 
> collects observation records (and implicitly, their determinations) and 
> runs an experiment (a model or simulation) with it. This person may want 
> to record the identifiers of the points used so that someone using the 
> results of that experiment may refer back to the primary data, to 
> validate or repeat it the experiment.
>     The bad news is that LSID identification scheme (or any other GUID 
> that I know of) was not designed to guarantee *metadata persistence*, 
> and thus it cannot implement the use case above by itself. To implement 
> that use case, the specification would have to *guarantee* that the 
> metadata (which we are using here as data) is immutable. But it doesn't.
>     Most of us wish that metadata was persistent, but it isn't. Many 
> things can change in the metadata: a new determination, a mispeling that 
> is corrected, many things. We just cannot guarantee that the metadata 
> will look like it was sometime ago.
>     We then reach the following conclusion.
>             *Corollary 2: *LSIDs metadata is not immutable nor persistent.
>     The consequence of this corollary is that, if you need to refer back 
> to a piece of information (metadata) associated with an LSID, exactly as 
> it was when you got it, you *must make a copy of it*, or arrange that 
> someone else make that copy for you.
>     In other words, a client cannot *assume* that the metadata 
> associated with an LSID today will be the same tomorrow. If the client 
> does assume that, it may be relying on a false assumption and its output 
> may be flawed.
>     If we are not happy with that conclusion, we may develop an 
> additional component in our architecture, an archive of some sort, to 
> handle (meta)data persistence. That is exactly what the STD-DOI project 
> (http://www.std-doi.de/) and SEEK (http://seek.ecoinformatics.org) have 
> done to some extent.
>     While we cannot guarantee that LSID metadata is persistent nor 
> immutable, we can definitely document how the metadata have changed 
> through *metadata **versioning*. That's the topic of the next thread. We 
> will move on to discuss *metadata **versioning* as soon as we are done 
> with *metadata persistence*.
>     Cheers,
> Ricardo

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk

More information about the tdwg-tag mailing list