From my perspective several of these issues are pretty clear. Let me attempt answers below and see if we're all in agreement.
Chuck Miller wrote:
I have been pondering this question of what exactly is meant by a GUID since Donald's first call for input.
The term GUID is one we started using in SEEK when looking for a solution to the identity and resolution problems that we saw looming for the Taxonomic COncept Standard. Dave Thau's presentation on this (linked on the GUID wiki) defines this pretty well and explores the issues.
First, is it correct that GUID stands for Globally Unique Identifier?
Second, What do we mean by Globally? What do we mean by Unique?
"globally unique" means simply that an identifier that is issued can only have one valid interpretation across all possible systems. Regardless of the mechanism used to resolve the identifier, the object that the id 'identifies' will be bit-for-bit identical. However, note that a given object can legitimately have more than one identifier.
What do we mean by Identifier? Or, specifically what are we identifying?
This is a bit trickier. We will clearly be identifying several different types of objects, some of which are physical (e.g., specimens), and some of which are digital (e.g., observation data). On the specimen side, resolution of the identifier and retrieving the 'data' makes no sense because the 'data' is a physical object that cannot be electronically transported. On the 'digital' side, it makes sense to resolve and retrieve the data. There are some tricky issues dealing with granularity of the identifier for digital data (does the identifier point at a tuple in an entity, or at a whole entity, or at multiple entities). In addition you still have the very thorny issue of what is data and what is metadata. I'll write another note regarding this issue.
Matt
I believe we are far from consensus on all three of those definitions, even the meaning of globally. And, I believe we will be going around in circles on LSIDs, ARNs, and such until we get the expectation more clearly defined.
What I hear in most of the discussions so far are descriptions of a GLID
- Globally Locatable IDentifier. In the Internet world, GLIDs started
with the URL - Universal Resource Locator which has evolved to the URI - Universal Resource Identifier concept. Another form of URI is the URN - Uniform Resource Name which enables a persistent name, independent of server location. This is the kind of thing I think we want and are discussing in this GUID thread.
I think we should draw a distinction between GUID and GLID.
An identifier of a thing can be globally unique without stating its location. But, again, it raises the question of what the definition of unique is. An ISBN number identifies a book "uniquely", but there may be millions of "unique" copies of it. Similarly, duplicate sheets of a collected plant specimen are all from the same "unique" organism and may each even be referred to by the same "unique" collector and number. But, each sheet itself is also unique. We need a clear definition.
An identifier of a place can also be globally unique, like a URL. But, being able to go to that place requires a global infrastructure to handle the addressing.
Where it gets really messy is when we want an identifier of a thing that is unique but can move around to different places, like a URN. The addressing has to work like an administrative assistant who keeps tabs on where the staff is currently located so she can direct phone calls to them. Without the administrative assistant, people who move around can't be contacted. It looks like a lot of what LSID, ARN, and such seem to about is "administrative assistant" addressing schemes, how to navigate to the entity through layers of address abstraction. But, in each case it raises the issue of who/where is the administrative assistant, on top of the question of the addressing scheme itself.
Shouldn't we get these definitions and expectations nailed down first? Then look at solutions?
Chuck Miller Chief Information Officer Missouri Botanical Garden 4344 Shaw Boulevard Saint Louis, Missouri 63119 Phone: 1-314-577-9419 Cell: 1-314-614-6952
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matt Jones jones@nceas.ucsb.edu Ph: 907-789-0496 National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~