New subject: [tdwg-guid] LSID metadata persistence (or lack thereof)[Scanned]

13 Jul 2007


      Hi there folks,

    As Chuck mentioned a few weeks ago, we do have a few outstanding 
issues to address regarding LSIDs. I would like to discuss those one by 
one, in an orderly manner, and reach consensus as much as we can. Then 
we can sum them up in a TDWG standard, possibly by or shortly after the 
Bratislava conference.

    The first issue I would like to discuss is *LSID metadata 
persistence*. First, let me remind you of a corollary established by the 
LSID specification:

*            Corollary 1: *_LSIDs are not guaranteed to be resolvable 
indefinitely._

    In other words, there is no guarantee that one will always be able 
to retrieve the data associated with an LSID as the authority may choose 
(or be forced) not  to resolve an LSID anymore.

    Second, let me distinguish this kind of persistence I'm talking 
about from other two related concepts (which we'll not discuss in this 
thread):

        1) *Persistence of Assignment: *Once assigned to an object, an 
LSID is indefinitely associated with it. The same LSID cannot be 
assigned to another object. Ever! The LSID may not be resolvable 
anymore, but it cannot be assigned to another object. This is 
established by the LSID specification.

        2) *Persistence of LSID Data: *The data associated with an LSID 
(i.e, the byte stream returned by the LSID getData call) must never 
change. Although the LSID may not be resolvable anymore (according to 
corollary 1), the data associated with an LSID must never ever change. 
That's defined by the LSID spec, too.

    What I want to discuss here is the *persistence of LSID metadata* 
(what is returned by the getMetadata call) or the lack thereof.

    A use case associated with *metadata persistence* is when someone 
collects observation records (and implicitly, their determinations) and 
runs an experiment (a model or simulation) with it. This person may want 
to record the identifiers of the points used so that someone using the 
results of that experiment may refer back to the primary data, to 
validate or repeat it the experiment.

    The bad news is that LSID identification scheme (or any other GUID 
that I know of) was not designed to guarantee *metadata persistence*, 
and thus it cannot implement the use case above by itself. To implement 
that use case, the specification would have to *guarantee* that the 
metadata (which we are using here as data) is immutable. But it doesn't.

    Most of us wish that metadata was persistent, but it isn't. Many 
things can change in the metadata: a new determination, a mispeling that 
is corrected, many things. We just cannot guarantee that the metadata 
will look like it was sometime ago.

    We then reach the following conclusion.
  
            *Corollary 2: *LSIDs metadata is not immutable nor persistent.

    The consequence of this corollary is that, if you need to refer back 
to a piece of information (metadata) associated with an LSID, exactly as 
it was when you got it, you *must make a copy of it*, or arrange that 
someone else make that copy for you.

    In other words, a client cannot *assume* that the metadata 
associated with an LSID today will be the same tomorrow. If the client 
does assume that, it may be relying on a false assumption and its output 
may be flawed.

    If we are not happy with that conclusion, we may develop an 
additional component in our architecture, an archive of some sort, to 
handle (meta)data persistence. That is exactly what the STD-DOI project 
(http://www.std-doi.de/) and SEEK (http://seek.ecoinformatics.org) have 
done to some extent.

    While we cannot guarantee that LSID metadata is persistent nor 
immutable, we can definitely document how the metadata have changed 
through *metadata **versioning*. That's the topic of the next thread. We 
will move on to discuss *metadata **versioning* as soon as we are done 
with *metadata persistence*.

    Cheers,

Ricardo

[tdwg-guid] LSID metadata persistence (or lack thereof)

Ricardo Pereira

Sally Hinchcliffe

Chuck Miller

Dave Vieglais

Chuck Miller

Dave Vieglais

P. Bryan Heidorn

Paul Kirk

Chuck Miller

P. Bryan Heidorn

Richard Pyle

Richard Pyle

P. Bryan Heidorn

Richard Pyle

P. Bryan Heidorn

Bob Morris

Bob Morris

Dave Vieglais

Matthew Jones

Dave Vieglais

P. Bryan Heidorn

Richard Pyle

Ricardo Pereira

Chuck Miller

Dave Vieglais

Richard Pyle

Richard Pyle

Paul Kirk

Richard Pyle

Greg Whitbread

Ricardo Pereira

Dave Vieglais

Chuck Miller

tags

participants (10)