What do we mean by GUID?
jones at NCEAS.UCSB.EDU
Wed Oct 12 09:14:55 CEST 2005
Nozomi Ytow wrote:
> Matt Jones wrote:
>>The term GUID is one we started using in SEEK when looking for a
>>solution to the identity and resolution problems that we saw looming for
>>the Taxonomic COncept Standard. Dave Thau's presentation on this
>>(linked on the GUID wiki) defines this pretty well and explores the issues.
> Here we see a tipical trouble with identifier and identity. Do you
> mean identity of an object (a unique thing, so we don't need
> identifier because it is the thing) or equivalence of data (there can
> be multiple data objects having the same value)? Where we need GUID
> we can't rely on identity, in my understanding.
I think you may be taking a more literal definition of the word
'identifier' than is often used. For example, in an ER model people
frequently use a numeric value as a surrogate for a compound primary key
and call that numeric value the 'identifier'. What they mean is that
the numeric value can be used as a proxy for the compund key values, and
that either of these uniquely points to a single tuple in the entity.
That they call the surrogate an 'identifier' does not imply that it is
equivalent to the whole tuple, only that it is a proxy or surrogate for
the tuple. I believe this is the sense in which we have been using the
term 'identifier' in these GUID discussions.
>>"globally unique" means simply that an identifier that is issued can
>>only have one valid interpretation across all possible systems.
> What do you mean by valid? Suppose a data object in data provider's
> database. A GBIF portal has its copy when last a user accessed to the
> data object. The data provider changes its contents for some reason
> afther the last access through the GBIF portal. What is the
> valid interpretation of these data objects? Tha provider's one?
By definition, the provider should have no capability to change the bits
behind an identifer -- this eliminates the persistence and replicability
properties. Any changes or corrections to the bits must be at least
accompanied by a version increment in the ID or we lose all of the
advantages of having the ID.
>>Regardless of the mechanism used to resolve the identifier, the object
>>that the id 'identifies' will be bit-for-bit identical.
> So you mean equivalence, not identity. If it is bit-for-bit
> equivalence, why do you need GUID? The contents IS the GUID
> you defined.
I need the GUID because I should not have to download a 1GB
hyperspectral image to compare it against my own collection to see if I
already have it. The GUID serves as a convenient handle or proxy for
the content, which is very useful in many situations, not the least of
which is collections management for digital data.
>>There are some tricky issues
>>dealing with granularity of the identifier for digital data (does the
>>identifier point at a tuple in an entity, or at a whole entity, or at
> Do you mean your bit-for-bit GUID requires scope disamibugater also?
> Isn't it assigned to a data object, i.e. unit to be handled as a
Yes, it is assigned to a data object. However, different applications
that I can conceive would want to use differently grained objects. As
Peter pointed out, the choice of what level you assign an identifier is
application-dependent, and we have many applications in mind as far as I
can see. So we have the conundrum -- which applications represent our
use-cases for deciding the granularity at which we assign identifiers?
Specimen-collection management applications? Ecological data management
applications? Taxonomic name resolition? Time-series analysis of
taxonomic observation data across worldwide sites? Species-prediction
modeling based on specimen collection data? Species-prediction modeling
based on ecological survey data? The list goes on and on, and each may
have different application requirements for the granularity of the
> It may be better to use other words such as globally disambiguateor
> or distinguisher, because we do not mean identity by identifier.
I think we do mean identity. The identifier is a label that can be used
as a handle to reference the object.
jones at nceas.ucsb.edu
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
More information about the tdwg-tag