Nozomi Ytow wrote:
Matt Jones wrote:
The term GUID is one we started using in SEEK when looking for a solution to the identity and resolution problems that we saw looming for the Taxonomic COncept Standard. Dave Thau's presentation on this (linked on the GUID wiki) defines this pretty well and explores the issues.
Here we see a tipical trouble with identifier and identity. Do you mean identity of an object (a unique thing, so we don't need identifier because it is the thing) or equivalence of data (there can be multiple data objects having the same value)? Where we need GUID we can't rely on identity, in my understanding.
I think you may be taking a more literal definition of the word 'identifier' than is often used. For example, in an ER model people frequently use a numeric value as a surrogate for a compound primary key and call that numeric value the 'identifier'. What they mean is that the numeric value can be used as a proxy for the compund key values, and that either of these uniquely points to a single tuple in the entity. That they call the surrogate an 'identifier' does not imply that it is equivalent to the whole tuple, only that it is a proxy or surrogate for the tuple. I believe this is the sense in which we have been using the term 'identifier' in these GUID discussions.
"globally unique" means simply that an identifier that is issued can only have one valid interpretation across all possible systems.
What do you mean by valid? Suppose a data object in data provider's database. A GBIF portal has its copy when last a user accessed to the data object. The data provider changes its contents for some reason afther the last access through the GBIF portal. What is the valid interpretation of these data objects? Tha provider's one?
By definition, the provider should have no capability to change the bits behind an identifer -- this eliminates the persistence and replicability properties. Any changes or corrections to the bits must be at least accompanied by a version increment in the ID or we lose all of the advantages of having the ID.
Regardless of the mechanism used to resolve the identifier, the object that the id 'identifies' will be bit-for-bit identical.
So you mean equivalence, not identity. If it is bit-for-bit equivalence, why do you need GUID? The contents IS the GUID you defined.
I need the GUID because I should not have to download a 1GB hyperspectral image to compare it against my own collection to see if I already have it. The GUID serves as a convenient handle or proxy for the content, which is very useful in many situations, not the least of which is collections management for digital data.
There are some tricky issues dealing with granularity of the identifier for digital data (does the identifier point at a tuple in an entity, or at a whole entity, or at multiple entities).
Do you mean your bit-for-bit GUID requires scope disamibugater also? Isn't it assigned to a data object, i.e. unit to be handled as a chunk?
Yes, it is assigned to a data object. However, different applications that I can conceive would want to use differently grained objects. As Peter pointed out, the choice of what level you assign an identifier is application-dependent, and we have many applications in mind as far as I can see. So we have the conundrum -- which applications represent our use-cases for deciding the granularity at which we assign identifiers? Specimen-collection management applications? Ecological data management applications? Taxonomic name resolition? Time-series analysis of taxonomic observation data across worldwide sites? Species-prediction modeling based on specimen collection data? Species-prediction modeling based on ecological survey data? The list goes on and on, and each may have different application requirements for the granularity of the identifiers.
It may be better to use other words such as globally disambiguateor or distinguisher, because we do not mean identity by identifier.
I think we do mean identity. The identifier is a label that can be used as a handle to reference the object.
Matt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matt Jones jones@nceas.ucsb.edu National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~