Thanks to Ricardo for starting this very timely discussion. I've been following LSIDs for a long time now, and have attended both GBIF GUID workshops, and had some very detailed conversations with Ben Szekely about this very issue, and I think I have a pretty good handle on it. And, it's really not that complicated.
The byte-stream (bit sequence) for the data of a given LSID cannot change, according to the LSID spec. The "meaning" of the data is irrelevant in this context -- what matters is the actual sequence of 1's and 0's. If you have a TIFF image file that represents a 12-megapixel image, and you change one bit of one pixel of that image file, you cannot use the same LSID to represent it. If you package it into a ZIP file, that ZIP file is a new bytestream and could not be returned as the data for that LSID assigned to the TIFF image data object.
If we want to change this specification, then we are not using LSIDs anymore -- we are using something like "TDWG identifiers that look an awful lot like LSIDs, but really aren't LSIDs". I think that's the last thing this community should do.
The "data" for LSIDs should be an unambiguous digital object. Species names are not digital objects. They are not even physical objects. In fact, they aren't even text objects (the text string of a species "name", as defined by any of the nomenclatural codes, is a property or attribute of the name-object -- not the name-object itself). Species names are "abstract" or "conceptual" objects -- with no inherent digital manifestation, and not even any inherent physical manifestation. The LSID spec accomodates such objects in the form of "data-less" LSIDs -- that is, LSIDs with zero "data" content (only metadata).
Please, let's not get bogged down in alternate definitions of the word "data" and "metadata". I swear, the single greatest impediment to progress in biodiversity informatics (by far) in my opinion has been human-language semantics. I had to qualify the word "semantics" in the previous sentence with "human-language", because even the very word "semantics" has more than one meaning in our conversations (I almost used the word "vocabulary" instead of "sematics", but of course that word, too, has another meaning within our various conversations). We could fill a small dictionary with words that have more than one meaning in different contexts ("concept", "type", "class", "synonym", and worst of all, "name" -- among many others).
So, when we speak of "data" and "metadata" in the context of LSIDs, let us please use those words specifically in the context of their well-defined meaning as related to LSIDs.
And in this LSID sense of the word "data", many of our objects (taxon names, taxon concepts, locality descriptions, specimens, agents, bibliographic citations, etc.) simply have no "data", because none of these things have any inherent digital manifestation. We could concatenate what would otherwise be LSID-metadata for one of these non-digital objects (e.g., a database record) into a single byte-stream, and define this as "data" tied to a particular LSID, but then a new LSID would need to be issued everytime someone wanted to change that bytestream (e.g., convert it from ASCII to UNICODE, or change the meaning, rendering, or content of one of the concatenated metadata elements). For this, and other reasons, I think this is a bad approach.
Instead, I think we should embrace LSIDs *WITH* data (sensu LSID spec) in cases where it makes sense to do so (e.g., image files, PDFs, perhaps DNS sequences represented as an ASCII character stream or some other specified standard binary format), and embrace LSIDs *WITHOUT* data (only metadata) -- as accomodated in the LSID spec -- for most of non-digital objects we want to exchange information about (taxon names, taxon concepts, locality descriptions, specimens, agents, bibliographic citations, etc.).
Getting back to the intended topic of this discussion (metadata persistence), I frankly am very happy that there is no requirement for metadata persistence in the LSID spec (if there was a requirement for persistence, then you might as well package it all up as data, then use the embedded versioning component of LSIDs or some other mechanism for issuing new LSIDs that are cross-linked to each other in an appropriate way).
I believe the answer to Ricardo's example is better addressed in the next discussion, concerning methods for data versioning. I think the answer to this issue (persistence of metadata) necessarily must be solved via that discussion (versioning), so maybe we should discuss the versioning issue first.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html