[tdwg-guid] LSID metadata persistence (or lack thereof)

Fri Jul 13 19:29:07 CEST 2007

Ricardo,

Looking at this definition: "Persistence of LSID Data: The data
associated with an LSID (i.e, the byte stream returned by the LSID
getData call) must never change"

Perhaps this is a more straightforward way to conceive LSIDs.  The LSID
goes with a byte stream.  It's that byte stream that must stay the same.
So, if there is a byte stream associated with a collection that needs to
stay the same, then whatever that byte stream happens to be is the data
that gets an LSID assigned to it.  That sure seems a clearer definition
of what is data and what is metadata, rather than the issue of primary
object and all that.  

So we can create a new definition in the context of LSIDs: Data is a
byte stream that is persistent, never changes and can have an LSID.
Metadata is a byte stream is non-persistent, might change and is only
associated with an LSID.

The institution who assigns an LSID can make their own decision about
whether the byte stream being provided is persistent or non-persistent.
By assigning an LSID to any byte stream, whatever it is, the institution
is declaring it to be data and persistent.

So, in the example given of an observation record with a determination
that needs to remain fixed and unchanged, by assigning an LSID to that
observation+determination it would be "declared to be data" and
unchangeable.  A different determination would then be different data
with a different LSID.  That would provide a solution for those who want
to employ it.  Others could choose not to use it.

Chuck

________________________________

From: tdwg-guid-bounces at lists.tdwg.org
[mailto:tdwg-guid-bounces at lists.tdwg.org] On Behalf Of Ricardo Pereira
Sent: Friday, July 13, 2007 9:47 AM
To: tdwg-guid at lists.tdwg.org
Subject: [tdwg-guid] LSID metadata persistence (or lack thereof)

    Hi there folks,

    As Chuck mentioned a few weeks ago, we do have a few outstanding
issues to address regarding LSIDs. I would like to discuss those one by
one, in an orderly manner, and reach consensus as much as we can. Then
we can sum them up in a TDWG standard, possibly by or shortly after the
Bratislava conference.

    The first issue I would like to discuss is LSID metadata
persistence. First, let me remind you of a corollary established by the
LSID specification:

            Corollary 1: LSIDs are not guaranteed to be resolvable
indefinitely. 

    In other words, there is no guarantee that one will always be able
to retrieve the data associated with an LSID as the authority may choose
(or be forced) not  to resolve an LSID anymore. 

    Second, let me distinguish this kind of persistence I'm talking
about from other two related concepts (which we'll not discuss in this
thread):

        1) Persistence of Assignment: Once assigned to an object, an
LSID is indefinitely associated with it. The same LSID cannot be
assigned to another object. Ever! The LSID may not be resolvable
anymore, but it cannot be assigned to another object. This is
established by the LSID specification.

        2) Persistence of LSID Data: The data associated with an LSID
(i.e, the byte stream returned by the LSID getData call) must never
change. Although the LSID may not be resolvable anymore (according to
corollary 1), the data associated with an LSID must never ever change.
That's defined by the LSID spec, too. 

    What I want to discuss here is the persistence of LSID metadata
(what is returned by the getMetadata call) or the lack thereof.

    A use case associated with metadata persistence is when someone
collects observation records (and implicitly, their determinations) and
runs an experiment (a model or simulation) with it. This person may want
to record the identifiers of the points used so that someone using the
results of that experiment may refer back to the primary data, to
validate or repeat it the experiment.

    The bad news is that LSID identification scheme (or any other GUID
that I know of) was not designed to guarantee metadata persistence, and
thus it cannot implement the use case above by itself. To implement that
use case, the specification would have to guarantee that the metadata
(which we are using here as data) is immutable. But it doesn't.

    Most of us wish that metadata was persistent, but it isn't. Many
things can change in the metadata: a new determination, a mispeling that
is corrected, many things. We just cannot guarantee that the metadata
will look like it was sometime ago.

    We then reach the following conclusion.

            Corollary 2: LSIDs metadata is not immutable nor persistent.

    The consequence of this corollary is that, if you need to refer back
to a piece of information (metadata) associated with an LSID, exactly as
it was when you got it, you must make a copy of it, or arrange that
someone else make that copy for you.

    In other words, a client cannot assume that the metadata associated
with an LSID today will be the same tomorrow. If the client does assume
that, it may be relying on a false assumption and its output may be
flawed.

    If we are not happy with that conclusion, we may develop an
additional component in our architecture, an archive of some sort, to
handle (meta)data persistence. That is exactly what the STD-DOI project
(http://www.std-doi.de/) and SEEK (http://seek.ecoinformatics.org) have
done to some extent.

    While we cannot guarantee that LSID metadata is persistent nor
immutable, we can definitely document how the metadata have changed
through metadata versioning. That's the topic of the next thread. We
will move on to discuss metadata versioning as soon as we are done with
metadata persistence.

    Cheers,

Ricardo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20070713/d1dc988f/attachment.html