Bob, That's more or less what I was trying to say. LSID = same bytes. Metadata about LSID = varying bytes. That's a very simple definition. It is divorced from the question of primary object, first class object, or any of that stuff. Just simply same bytes.
I don't know what to say about serving up LSID data bytes from a database via XML through DOM. Dave is suggesting that could cause more trouble than it saves and I can see that point since you can't control how reused code works in the future, so the bytes might change. Maybe an LSID provider should never do that.
But, is the LSID getData call supposed to return data bytes in XML form? That is, not the metadata, the data.
Chuck
-----Original Message----- From: Bob Morris [mailto:morris.bob@gmail.com] Sent: Friday, July 13, 2007 3:08 PM To: Paul Kirk Cc: Chuck Miller; Dave Vieglais; tdwg-guid@lists.tdwg.org Subject: Re: [tdwg-guid] LSID metadata persistence (or lack thereof)[Scanned]
This entire discussion confuses me. The LSID standard is published. Why is there a discussion of what an LSID should be? The standard requires that the data, as defined by the return of getData, to be identical for all resolutions of the LSID. From page 9 of the LSID spec:
" bytes getData (LSID lsid) bytes getDataByRange (LSID lsid, integer start, integer length) Metadata_response getMetadata (LSID lsid, string[] accepted_formats) Metadata_response getMetadataSubset (LSID lsid, string[] accepted_formats, string selector) The data retrieval services may implement all of the methods, or only methods for retrieving data, or only methods for retrieving associated metadata. The same LSID named data object must be resolved always to the same set of bytes. Therefore, all of the data retrieval services return the same results for the same LSID. The user has, however, the choice of which one of these to utilize depending on its location, known quality of service and other attributes. With metadata, the situation is different. Each data retrieval service can provide different metadata for the same LSID."
This doesn't seem very ambiguous to me, and doesn't have anything to do with imperfect storage of data or anything else about the physical or electronic world. If two calls to getData() with the same argument on two occasions to possibly two different resolution services do not yield the same set of bytes, then one or the other or both of those is not executing a compliant service response. Unless this discussion is really "Shall we call something other than the return of getData by the term 'data associated with the LSID?' there seems to be nothing to discuss.
Bob
On 7/13/07, Paul Kirk p.kirk@cabi.org wrote:
In an imperfect world there is no such thing as an
'identical-byte-stream'
because the technology we use is imperfect ... the disk controllers
which
manage our bytes and the disk we use to store our bytes have
recognized
error rates. Perhaps I'm being a pedant in the above analysis but I
was
almost persuaded that except for digital objects (images, sounds)
which can
be data all other 'things' (names, specimen accession numbers) had to
be
metadata. This to me makes no sense in the real but imperfect world we
live
in. An LSID assigned to a name (e.g. Homo sapiens) is assigned to the
name
as data, not metadata. What is 'identical' here it that if the
spelling has
to change for any reason the new spelling gets a new LSID and the now incorrect spelling gets deprecated (but is still resolvable) with a
pointer
to the correct spelling/LSID in the metadata.
OK?
Paul
From: tdwg-guid-bounces@lists.tdwg.org on behalf of Chuck Miller Sent: Fri 13/07/2007 19:03 To: Dave Vieglais Cc: tdwg-guid@lists.tdwg.org Subject: RE: [tdwg-guid] LSID metadata persistence (or lack thereof)[Scanned]
Dave, What you say is true. But, I think we already have too many
variations,
subtleties, and reinterpretations which are endlessly debated.
The LSID standard would be simple, clear and consistent if we used the identical-byte-stream definition. The LSID would uniquely tag a persistent byte stream. A persistent byte stream is always the same thing without any further explanation or clarification.
The provider of an LSID byte-stream would need to commit to keeping
that
byte-stream persistent and not represent it in multiple ways, even though technically they could. If they can't commit to that, then it can't be an LSID byte-stream.
And in the name of simplicity and clarity, if they had to provide different byte-stream representations then they would have to assign a different LSID to each and use "SameAs" metadata.
Chuck
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Friday, July 13, 2007 12:42 PM To: Chuck Miller Cc: Ricardo Pereira; tdwg-guid@lists.tdwg.org Subject: Re: [tdwg-guid] LSID metadata persistence (or lack thereof)
Hi Ricardo, Chuck, Asserting that the byte stream returned as data associated with an LSID should never change is perhaps a bit confusing from a programmatic view. There are for example many ways to represent data in xml that are identical from an information content point of view, but the byte streams could be very different.
Perhaps it might be better to state something like "the canonical representation of the data associated with an LSID must not change", or something to that effect?
Dave V.
On Jul 14, 2007, at 05:29, Chuck Miller wrote:
Ricardo,
Looking at this definition: "Persistence of LSID Data: The data associated with an LSID (i.e, the byte stream returned by the LSID getData call) must never change"
Perhaps this is a more straightforward way to conceive LSIDs. The LSID goes with a byte stream. It's that byte stream that must stay the same. So, if there is a byte stream associated with a collection that needs to stay the same, then whatever that byte stream happens to be is the data that gets an LSID assigned to it. That sure seems a clearer definition of what is data and what is metadata, rather than the issue of primary object and all that.
So we can create a new definition in the context of LSIDs: Data is a byte stream that is persistent, never changes and can have an LSID. Metadata is a byte stream is non-persistent, might change and is only associated with an LSID.
The institution who assigns an LSID can make their own decision about whether the byte stream being provided is persistent or non- persistent. By assigning an LSID to any byte stream, whatever it is, the institution is declaring it to be data and persistent.
So, in the example given of an observation record with a determination that needs to remain fixed and unchanged, by assigning an LSID to that observation+determination it would be "declared to be data" and unchangeable. A different determination would then be different data with a different LSID. That would provide a solution for those who want to employ it. Others could choose not to use it.
Chuck
From: tdwg-guid-bounces@lists.tdwg.org [mailto:tdwg-guid- bounces@lists.tdwg.org] On Behalf Of Ricardo Pereira Sent: Friday, July 13, 2007 9:47 AM To: tdwg-guid@lists.tdwg.org Subject: [tdwg-guid] LSID metadata persistence (or lack thereof)
Hi there folks, As Chuck mentioned a few weeks ago, we do have a few
outstanding issues to address regarding LSIDs. I would like to discuss those one by one, in an orderly manner, and reach consensus as much as we can. Then we can sum them up in a TDWG standard, possibly by or shortly after the Bratislava conference.
The first issue I would like to discuss is LSID metadata
persistence. First, let me remind you of a corollary established by the LSID specification:
Corollary 1: LSIDs are not guaranteed to be resolvable
indefinitely.
In other words, there is no guarantee that one will always be
able to retrieve the data associated with an LSID as the authority may choose (or be forced) not to resolve an LSID anymore.
Second, let me distinguish this kind of persistence I'm talking
about from other two related concepts (which we'll not discuss in this thread):
1) Persistence of Assignment: Once assigned to an object,
an LSID is indefinitely associated with it. The same LSID cannot be assigned to another object. Ever! The LSID may not be resolvable anymore, but it cannot be assigned to another object. This is established by the LSID specification.
2) Persistence of LSID Data: The data associated with an
LSID (i.e, the byte stream returned by the LSID getData call) must never change. Although the LSID may not be resolvable anymore (according to corollary 1), the data associated with an LSID must never ever change. That's defined by the LSID spec, too.
What I want to discuss here is the persistence of LSID metadata
(what is returned by the getMetadata call) or the lack thereof.
A use case associated with metadata persistence is when someone
collects observation records (and implicitly, their determinations) and runs an experiment (a model or simulation) with it. This person may want to record the identifiers of the points used so that someone using the results of that experiment may refer back to the primary data, to validate or repeat it the experiment.
The bad news is that LSID identification scheme (or any other
GUID that I know of) was not designed to guarantee metadata persistence, and thus it cannot implement the use case above by itself. To implement that use case, the specification would have to guarantee that the metadata (which we are using here as data) is immutable. But it doesn't.
Most of us wish that metadata was persistent, but it isn't.
Many things can change in the metadata: a new determination, a mispeling that is corrected, many things. We just cannot guarantee that the metadata will look like it was sometime ago.
We then reach the following conclusion. Corollary 2: LSIDs metadata is not immutable nor
persistent.
The consequence of this corollary is that, if you need to refer
back to a piece of information (metadata) associated with an LSID, exactly as it was when you got it, you must make a copy of it, or arrange that someone else make that copy for you.
In other words, a client cannot assume that the metadata
associated with an LSID today will be the same tomorrow. If the client does assume that, it may be relying on a false assumption and its output may be flawed.
If we are not happy with that conclusion, we may develop an
additional component in our architecture, an archive of some sort, to handle (meta)data persistence. That is exactly what the STD-DOI project (http://www.std-doi.de/) and SEEK (http:// seek.ecoinformatics.org) have done to some extent.
While we cannot guarantee that LSID metadata is persistent nor
immutable, we can definitely document how the metadata have changed through metadata versioning. That's the topic of the next thread. We will move on to discuss metadata versioning as soon as we are done with metadata persistence.
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
P Think Green - don't print this email unless you really need to
************************************************************************
The information contained in this e-mail and any files transmitted
with it
is confidential and is for the exclusive use of the intended
recipient. If
you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is
prohibited.
Whilst CAB International trading as CABI takes steps to prevent the transmission of viruses via e-mail, we cannot guarantee that any
e-mail or
attachment is free from computer viruses and you are strongly advised
to
undertake your own anti-virus precautions.
If you have received this communication in error, please notify us by e-mail at cabi@cabi.org or by telephone on +44 (0)1491 829199 and then delete the e-mail and any copies of it.
CABI is an International Organization recognised by the UK Government
under
Statutory Instrument 1982 No. 1071.
************************************************************************ **
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid