[tdwg-guid] Immutability of LSID data

Bob Morris morris.bob at gmail.com
Mon Jul 16 18:06:26 CEST 2007

There is no way to guarantee that a particular application which
passes an LSID to another application can expect anything other

On 7/16/07, P. Bryan Heidorn <pheidorn at uiuc.edu> wrote:
> I am not sure if I follow completely Bob but I think you are pointing
> out an important issue for "semantics immutability" versus "byte/bit-
> level immunity". If a client retrieves data from two different
> clients under a byte-level immutability contract a simple equivalence
> test should be able to verify the byte-level equivalence. Under the
> semantic immutability contract, a more complex text for equivalence
> would be required to fit for example the mime-type.
> In practice I do not think this is an issue. If clients act under
> blind faith under either contract they would not text the
> equivalence. In fact they would usually only retrieve a particular
> LSID from one service. The blind faith client would process the data
> as if the data provider is following the contract and no more. The
> client could not assume byte-level immutability when there is only
> semantic immutability because it may indeed break the client code.
> Caching a byte-level representation of data from one call can not be
> compared with semantic data. If XML is carried in the data all
> operations must be consistent with XML operations. I do not see this
> as a problem.
> Since in the biodiversity community LSID data payloads would be about
> a large variety of objects, clients would always need to check the
> data types before most processing operations. The data type
> information would be encoded in the metadata but could also be
> segregated by service provider (but even there for good form the
> metadata should encode the data type.) The metadata needs to encode
> both the physical layout of the bits and "use" (there must be a
> better word). For example, the data could be a Darwin core records, a
> dublin core records or SDD. All are XML but the legal operations over
> that XML are different depending on the "use". Some clients could
> just pass the data through without be concerned about this but other
> clients would need to process accordingly perhaps ignoring types it
> knows nothing about.
> ------
> Unrelated to Bob's comment I would like to add a point about digital
> from birth vs made digital data.
> What is data and what is metadata has no relation to being digital or
> not. There was data and metadata long before there were computers.
> Galileo studying the time of objects to move down an inclined plane
> collected data, the time, distance, angle and mass of the objects. At
> least the time and the distance recorded in his notebooks are data.
> If we re-represent his data from the notebook in digital format in
> 2007 so we can process it in an excel spreadsheet it is still the
> same data. If we just take a photo of the book we might have a
> different beast but as long as we leave his number as numbers it is
> the same data. The metadata about inclined plane experiment would
> include information about the apparatus used. For example he might
> have bells that ring at different locations/distances of the inclined
> plane., it might be made of a wooden frame with brass rails. All this
> metadata tells us about the data, it is data about the data. Similar
> arguments can be made about specimens. A digital representation of a
> specimen is still data. No one is arguing that the specimen is a
> species or a species concept. A specimen glued to paper or in a photo
> can be assigned to a species concept, meaning someone has said this
> is an X. As such we can treat it as an exemplar of X. If it is a type
> we can even say it is a very good example of X but it does not cover
> the entire concept of X. The image of the specimen can be data. We
> need not treat it as metadata just because it is digital or because
> there is an object or event in the world that is now primary
> representation. Galileo's numbers also existing in the notebook do
> not make the numbers in the computer any less data. We will want to
> add metadata to the digital numbers to tell the user that they came
> from Galileo's notebook.
> Bryan
> --
> --------------------------------------------------------------------
>    P. Bryan Heidorn
>    Graduate School of Library and Information Science
>    University of Illinois at Urbana-Champaign
>    pheidorn at uiuc.edu
>    (V)217/ 244-7792     (F)217/ 244-3302
>    http://www.uiuc.edu/goto/heidorn
>    Online Calendar: http://www.uiuc.edu/goto/heidorncalendar
> On Jul 16, 2007, at 9:01 AM, Bob Morris wrote:
> > On 7/16/07, Ricardo Pereira <ricardo at tdwg.org> wrote:
> >>
> > One thing that is wrong with it is that if a conforming client
> > acquires the data with a getData call from two different sources, and
> > they return different byte strings, then the client is permitted to
> > signal an error and possibly break an application that exercises a
> > blind faith in the power of "semantic immutability".
> >
> >
> >>  b) Some may claim that caching of LSIDs and the associated data
> >> would be
> >> impossible. But since the data is always "semantically immutable",
> >> what's
> >> wrong with caching it?
> >>
> >
> > --
> > Robert A. Morris
> > Professor of Computer Science
> > UMASS-Boston
> > ram at cs.umb.edu
> > http://bdei.cs.umb.edu/
> > http://www.cs.umb.edu/~ram
> > http://www.cs.umb.edu/~ram/calendar.html
> > phone (+1)617 287 6466
> > _______________________________________________
> > tdwg-guid mailing list
> > tdwg-guid at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-guid

Robert A. Morris
Professor of Computer Science
ram at cs.umb.edu
phone (+1)617 287 6466

More information about the tdwg-tag mailing list