[tdwg-guid] LSID metadata persistence (or lack thereof)[Scanned]

P. Bryan Heidorn pheidorn at uiuc.edu
Fri Jul 13 22:38:52 CEST 2007


This is stated clearly along with Bob's comments.

My complication about semantics is unnecessary for the LSID definition.

LSID data does not change meaning it always has the same bit pattern  
for a given LSID.
Since XML allows different bit level expressions for equivalent  
records there is a mismatch with the LSID mechanism. The community  
can live with this as long as there are additional constraints put on  
the generation of XML-based records. For the sake of simplicity keep  
the fixed bitlevel exppression. The existing metadat mechanism  
handles the semantics of interpretation of the data (sorry to use the  
word "semantics" but it is nothing really special, just a definition  
of the "meaning" of the data) The semantics are relevant because it  
tells what can be done with the data. Some data is internally defined  
but not all so the metadata mechanism in LSID covers those cases.

Everything is fine and there is nothing to work out... except how to  
actually use the mechanism.

-- 
--------------------------------------------------------------------
   P. Bryan Heidorn
   Graduate School of Library and Information Science
   University of Illinois at Urbana-Champaign
   pheidorn at uiuc.edu
   (V)217/ 244-7792     (F)217/ 244-3302
   http://www.uiuc.edu/goto/heidorn
   Online Calendar: http://www.uiuc.edu/goto/heidorncalendar


On Jul 13, 2007, at 3:11 PM, Richard Pyle wrote:

>
> Thanks to Ricardo for starting this very timely discussion.  I've been
> following LSIDs for a long time now, and have attended both GBIF GUID
> workshops, and had some very detailed conversations with Ben  
> Szekely about
> this very issue, and I think I have a pretty good handle on it.   
> And, it's
> really not that complicated.
>
> The byte-stream (bit sequence) for the data of a given LSID cannot  
> change,
> according to the LSID spec.  The "meaning" of the data is  
> irrelevant in this
> context -- what matters is the actual sequence of 1's and 0's.  If  
> you have
> a TIFF image file that represents a 12-megapixel image, and you  
> change one
> bit of one pixel of that image file, you cannot use the same LSID to
> represent it.  If you package it into a ZIP file, that ZIP file is  
> a new
> bytestream and could not be returned as the data for that LSID  
> assigned to
> the TIFF image data object.
>
> If we want to change this specification, then we are not using  
> LSIDs anymore
> -- we are using something like "TDWG identifiers that look an awful  
> lot like
> LSIDs, but really aren't LSIDs".  I think that's the last thing this
> community should do.
>
> The "data" for LSIDs should be an unambiguous digital object.   
> Species names
> are not digital objects.  They are not even physical objects. In  
> fact, they
> aren't even text objects (the text string of a species "name", as  
> defined by
> any of the nomenclatural codes, is a property or attribute of the
> name-object -- not the name-object itself).  Species names are  
> "abstract" or
> "conceptual" objects -- with no inherent digital manifestation, and  
> not even
> any inherent physical manifestation.  The LSID spec accomodates  
> such objects
> in the form of "data-less" LSIDs -- that is, LSIDs with zero "data"  
> content
> (only metadata).
>
> Please, let's not get bogged down in alternate definitions of the word
> "data" and "metadata".  I swear, the single greatest impediment to  
> progress
> in biodiversity informatics (by far) in my opinion has been human- 
> language
> semantics.  I had to qualify the word "semantics" in the previous  
> sentence
> with "human-language", because even the very word "semantics" has  
> more than
> one meaning in our conversations (I almost used the word "vocabulary"
> instead of "sematics", but of course that word, too, has another  
> meaning
> within our various conversations).  We could fill a small  
> dictionary with
> words that have more than one meaning in different contexts  
> ("concept",
> "type", "class", "synonym", and worst of all, "name" -- among many  
> others).
>
> So, when we speak of "data" and "metadata" in the context of LSIDs,  
> let us
> please use those words specifically in the context of their well- 
> defined
> meaning as related to LSIDs.
>
> And in this LSID sense of the word "data", many of our objects  
> (taxon names,
> taxon concepts, locality descriptions, specimens, agents,  
> bibliographic
> citations, etc.) simply have no "data", because none of these  
> things have
> any inherent digital manifestation.  We could concatenate what would
> otherwise be LSID-metadata for one of these non-digital objects  
> (e.g., a
> database record) into a single byte-stream, and define this as  
> "data" tied
> to a particular LSID, but then a new LSID would need to be issued  
> everytime
> someone wanted to change that bytestream (e.g., convert it from  
> ASCII to
> UNICODE, or change the meaning, rendering, or content of one of the
> concatenated metadata elements). For this, and other reasons, I  
> think this
> is a bad approach.
>
> Instead, I think we should embrace LSIDs *WITH* data (sensu LSID  
> spec) in
> cases where it makes sense to do so (e.g., image files, PDFs,  
> perhaps DNS
> sequences represented as an ASCII character stream or some other  
> specified
> standard binary format), and embrace LSIDs *WITHOUT* data (only  
> metadata) --
> as accomodated in the LSID spec -- for most of non-digital objects  
> we want
> to exchange information about (taxon names, taxon concepts, locality
> descriptions, specimens, agents, bibliographic citations, etc.).
>
> Getting back to the intended topic of this discussion (metadata
> persistence), I frankly am very happy that there is no requirement for
> metadata persistence in the LSID spec (if there was a requirement for
> persistence, then you might as well package it all up as data, then  
> use the
> embedded versioning component of LSIDs or some other mechanism for  
> issuing
> new LSIDs that are cross-linked to each other in an appropriate way).
>
> I believe the answer to Ricardo's example is better addressed in  
> the next
> discussion, concerning methods for data versioning.  I think the  
> answer to
> this issue (persistence of metadata) necessarily must be solved via  
> that
> discussion (versioning), so maybe we should discuss the versioning  
> issue
> first.
>
> Aloha,
> Rich
>
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
>   and Associate Zoologist in Ichthyology
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
>
>
>
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid




More information about the tdwg-tag mailing list