sigh, Sorry I did not reply earlier. My time was eaten up with proposal writing and the like. I agree that we are talking about digital data for the LSID and I did not intend to insinuate otherwise in my prior message. It is just that people are putting Galileo's data online in digital format and do need unique identifiers, it just is not biodiversity data. The issues have been addressed many times and it is important to learn from past experience. We can say that it is important to be able to insure the properties of the data service, so that the digesting process can make assumptions about the data. In Java and many other languages there is a bit level equivalence operator such as "=". This is relevant to the concept of homonyms. Hannu pointed out that it is nice to be able to make assumptions about the nature of the data being delivered. You can for example know you can use "=" in your program and assume it should return true if the service is following the rules (of bit level immutability). When we say two things are equivalent in these languages we mean "equivalent" under the languages operators. The LSID GetData function service is defined in these terms which is very reasonable for many forms of data including molecular sequences (except that genetic matching algorithms frequently treat a genetic sequence and it complement as equivalent because they are both half of the same double helix. So, I would guess that even the molecular community who defined LSID might have people who are unhappy with the current definition. In some languages we are allowed to overload operators such as "=" with our own definition of equivalence. The language designers did this because people often need different definitions of equivalence particularly for complex data types. In many programming tasks, bit level equivalence, is not needed and is indeed problematic. So RDF and software such as DOM define equivalence not as bit level matching of 1's and 0's in a particular order but as a higher order construct. So, we can have a born-digital object that describes a species of plant. "<leaf><arrangement>alternate</arrangement><length unit="mm">10</ length></leaf>" is equivalent to "<leaf><length unit="mm">10</ length><arrangement>alternate</arrangement></leaf>" There are applications in biodiversity informatics where bit level equivalence is useful so I support keeping getData's requirment of bit-level equivalence. Other branches of biodiversity informatics, however would benefit from a different definition of equivalence. This can be handled with an LSID extension as a new function. Who pays for development of this new function is important. We can role out a more constrained standard with getData as is and later add the new getDataRepresenationallyEquivolant later. So, lets move ahead, adopt LSID and start using it for the cases where bit level equivalence is acceptable and either expand it later or develop a different standard to give unique identifiers for the other applications. -- -------------------------------------------------------------------- P. Bryan Heidorn Graduate School of Library and Information Science University of Illinois at Urbana-Champaign pheidorn@uiuc.edu (V)217/ 244-7792 (F)217/ 244-3302 http://www.uiuc.edu/goto/heidorn Online Calendar: http://www.uiuc.edu/goto/heidorncalendar On Jul 16, 2007, at 8:17 PM, Richard Pyle wrote:
Hi Bryan,
What is data and what is metadata has no relation to being digital or not. There was data and metadata long before there were computers.
Again, we are coming back to this communication problem. I agree with you in the context of the words "data" and "metadata" as most of us probably define them. But we are talking about LSIDs, and so we should follow the definitions of these words in the context of the LSID spec. It may be terribly unfortunate that the LSID spec defines "data" differently from how most of us would use that word -- just as it is terribly unfortunate that a "named concept" has essentially nothing to do with either a taxon "concept" or a taxon "name", or that a "Class" written in C++ has no relationship to the "Class" Mammalia, or that a data "type" has nothing to do with a "type" specimen, or the fact that all of these "homonyms" cause problems that are different from the sorts of problems created by taxonomic "homonyms" -- among dozens of other frustrating language barriers we have.
However, in the context of LSIDs, which is what we are now discussing, the word "data" does indeed unambiguously refer to a digital/binary bytestream, and *not* the kind of "data" that Galileo collected.
Aloha, Rich