[tdwg-guid] Immutability of LSID data

Richard Pyle deepreef at bishopmuseum.org
Tue Jul 17 03:05:26 CEST 2007


 

Hi Ricardo,

I certainly agree with the direction you want to take the discussion in, but
I do want to make a couple of comments:

> I wasn't the one who came up with the LSID spec, but I suppose 
> that those methods were specifically designed to handle sequence 
> data (DNA and protein data). The getDataByRange method in 
> particular was designed to allow clients to refer to very 
> specific subsets of those sequences.
> 
> No doubt that this is all very useful for the bioinformatics folks, 
> but as we've seen in previous discussions, it is not as useful 
> for us in the biodiversity (and ecological) informatics communities. 
> The main reason is that some of our data is represented in XML, 
> which cannot be serialized as the very same stream of bytes every
> time. But it may still be helpful to use the getData call to retrieve
> such data.

I am dubious that we will eventually find much use for the getData() call
for any non-digital objects (which, in my understanding, includes many of
the things we want to exchange data about).  However, I think that the
getData() call *does* have value for a non-trivial portion of data objects
that *are* of interest to us in the biodiversity informatics community.
First of all, the data generated by the bioinformatics folks are of interest
to our community, and will increasingly become so as time moves on. But I
think the getData() call could also be of value to other objects of interest
to us as well. Examples include cropped regions of image files, individual
pages of multi-page scanned paper document files, specific segments of video
files, specific segments of audio files, among others.  Obviously, this
would depend on the nature of binary file itself, such that a contiguous
block of bytes extracted from within the complete binary file would
represent a meaningful, render-able unit of information (possibly not .  But
the point is, I do believe that getData() does have potential use to us in
our data domain for certain tasks.

More fundamentally, however, I want to echo something you said in an earlier
post: "We should not try to return something in the LSID getData() call just
for the sake of it."  In other words, if our LSIDs identify something other
than a *static* binary data file (which, in my mind, a database record or a
dynamically generated XML file usually do *not* represent -- for the reasons
that Dave and others have already pointed out), then we should find ways to
make use of the information as returned via getMetadata().

I am still a bit confused as to why we need to define all of these
"symatically immutable" rules just to allow us to make use of the getData()
call for objects that are not static binary files.  I can certainly
understand a set of rules that our community defines in terms of managing
versioning of metadata and dealing with the sorts of use-cases that Matt
describes, but I don't see why we can't layer that on top of the existing
LSID specs and methods, rather than "bend" the existing LSID spec and/or
develop new methods.  Like Bryan said:

But I definitely think it would be mistake to re-define what "data" means in
the context of LSIDs (i.e., to allow mutability in certain cases, and in so
doing fail to fulfill the contract for serving LSIDs).

Aloha,
Rich





More information about the tdwg-tag mailing list