[tdwg-guid] Immutability of LSID data

Mon Jul 16 16:51:01 CEST 2007

I am not sure if I follow completely Bob but I think you are pointing  
out an important issue for "semantics immutability" versus "byte/bit- 
level immunity". If a client retrieves data from two different  
clients under a byte-level immutability contract a simple equivalence  
test should be able to verify the byte-level equivalence. Under the  
semantic immutability contract, a more complex text for equivalence  
would be required to fit for example the mime-type.

In practice I do not think this is an issue. If clients act under  
blind faith under either contract they would not text the  
equivalence. In fact they would usually only retrieve a particular  
LSID from one service. The blind faith client would process the data  
as if the data provider is following the contract and no more. The  
client could not assume byte-level immutability when there is only  
semantic immutability because it may indeed break the client code.  
Caching a byte-level representation of data from one call can not be  
compared with semantic data. If XML is carried in the data all  
operations must be consistent with XML operations. I do not see this  
as a problem.

Since in the biodiversity community LSID data payloads would be about  
a large variety of objects, clients would always need to check the  
data types before most processing operations. The data type  
information would be encoded in the metadata but could also be  
segregated by service provider (but even there for good form the  
metadata should encode the data type.) The metadata needs to encode  
both the physical layout of the bits and "use" (there must be a  
better word). For example, the data could be a Darwin core records, a  
dublin core records or SDD. All are XML but the legal operations over  
that XML are different depending on the "use". Some clients could  
just pass the data through without be concerned about this but other  
clients would need to process accordingly perhaps ignoring types it  
knows nothing about.

------

Unrelated to Bob's comment I would like to add a point about digital  
from birth vs made digital data.

What is data and what is metadata has no relation to being digital or  
not. There was data and metadata long before there were computers.  
Galileo studying the time of objects to move down an inclined plane  
collected data, the time, distance, angle and mass of the objects. At  
least the time and the distance recorded in his notebooks are data.  
If we re-represent his data from the notebook in digital format in  
2007 so we can process it in an excel spreadsheet it is still the  
same data. If we just take a photo of the book we might have a  
different beast but as long as we leave his number as numbers it is  
the same data. The metadata about inclined plane experiment would  
include information about the apparatus used. For example he might  
have bells that ring at different locations/distances of the inclined  
plane., it might be made of a wooden frame with brass rails. All this  
metadata tells us about the data, it is data about the data. Similar  
arguments can be made about specimens. A digital representation of a  
specimen is still data. No one is arguing that the specimen is a  
species or a species concept. A specimen glued to paper or in a photo  
can be assigned to a species concept, meaning someone has said this  
is an X. As such we can treat it as an exemplar of X. If it is a type  
we can even say it is a very good example of X but it does not cover  
the entire concept of X. The image of the specimen can be data. We  
need not treat it as metadata just because it is digital or because  
there is an object or event in the world that is now primary  
representation. Galileo's numbers also existing in the notebook do  
not make the numbers in the computer any less data. We will want to  
add metadata to the digital numbers to tell the user that they came  
from Galileo's notebook.

Bryan
-- 
--------------------------------------------------------------------
   P. Bryan Heidorn
   Graduate School of Library and Information Science
   University of Illinois at Urbana-Champaign
   pheidorn at uiuc.edu
   (V)217/ 244-7792     (F)217/ 244-3302
   http://www.uiuc.edu/goto/heidorn
   Online Calendar: http://www.uiuc.edu/goto/heidorncalendar

On Jul 16, 2007, at 9:01 AM, Bob Morris wrote:

> On 7/16/07, Ricardo Pereira <ricardo at tdwg.org> wrote:
>>
> One thing that is wrong with it is that if a conforming client
> acquires the data with a getData call from two different sources, and
> they return different byte strings, then the client is permitted to
> signal an error and possibly break an application that exercises a
> blind faith in the power of "semantic immutability".
>
>
>>  b) Some may claim that caching of LSIDs and the associated data  
>> would be
>> impossible. But since the data is always "semantically immutable",  
>> what's
>> wrong with caching it?
>>
>
> -- 
> Robert A. Morris
> Professor of Computer Science
> UMASS-Boston
> ram at cs.umb.edu
> http://bdei.cs.umb.edu/
> http://www.cs.umb.edu/~ram
> http://www.cs.umb.edu/~ram/calendar.html
> phone (+1)617 287 6466
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid