What do we mean by GUID?

Matt Jones jones at NCEAS.UCSB.EDU
Wed Oct 12 09:14:55 CEST 2005


Nozomi Ytow wrote:
> Matt Jones wrote:
>
>
>>The term GUID is one we started using in SEEK when looking for a
>>solution to the identity and resolution problems that we saw looming for
>>the Taxonomic COncept Standard.  Dave Thau's presentation on this
>>(linked on the GUID wiki) defines this pretty well and explores the issues.
>
>
> Here we see a tipical trouble with identifier and identity.  Do you
> mean identity of an object (a unique thing, so we don't need
> identifier because it is the thing) or equivalence of data (there can
> be multiple data objects having the same value)?  Where we need GUID
> we can't rely on identity, in my understanding.
I think you may be taking a more literal definition of the word
'identifier' than is often used.  For example, in an ER model people
frequently use a numeric value as a surrogate for a compound primary key
and call that numeric value the 'identifier'.  What they mean is that
the numeric value can be used as a proxy for the compund key values, and
that either of these uniquely points to a single tuple in the entity.
That they call the surrogate an 'identifier' does not imply that it is
equivalent to the whole tuple, only that it is a proxy or surrogate for
the tuple.  I believe this is the sense in which we have been using the
term 'identifier' in these GUID discussions.

>
>
>
>>"globally unique" means simply that an identifier that is issued can
>>only have one valid interpretation across all possible systems.
>
>
> What do you mean by valid?  Suppose a data object in data provider's
> database.  A GBIF portal has its copy when last a user accessed to the
> data object.  The data provider changes its contents for some reason
> afther the last access through the GBIF portal.  What is the
> valid interpretation of these data objects?  Tha provider's one?
By definition, the provider should have no capability to change the bits
behind an identifer -- this eliminates the persistence and replicability
properties.  Any changes or corrections to the bits must be at least
accompanied by a version increment in the ID or we lose all of the
advantages of having the ID.

>
>
>>Regardless of the mechanism used to resolve the identifier, the object
>>that the id 'identifies' will be bit-for-bit identical.
>
>
> So you mean equivalence, not identity.  If it is bit-for-bit
> equivalence, why do you need GUID?  The contents IS the GUID
> you defined.
I need the GUID because I should not have to download a 1GB
hyperspectral image to compare it against my own collection to see if I
already have it.  The GUID serves as a convenient handle or proxy for
the content, which is very useful in many situations, not the least of
which is collections management for digital data.

>
>
>
>>There are some tricky issues
>>dealing with granularity of the identifier for digital data (does the
>>identifier point at a tuple in an entity, or at a whole entity, or at
>>multiple entities).
>
>
> Do you mean your bit-for-bit GUID requires scope disamibugater also?
> Isn't it assigned to a data object, i.e. unit to be handled as a
> chunk?
Yes, it is assigned to a data object.  However, different applications
that I can conceive would want to use differently grained objects.  As
Peter pointed out, the choice of what level you assign an identifier is
application-dependent, and we have many applications in mind as far as I
can see.  So we have the conundrum -- which applications represent our
use-cases for deciding the granularity at which we assign identifiers?
Specimen-collection management applications?  Ecological data management
applications? Taxonomic name resolition?  Time-series analysis of
taxonomic observation data across worldwide sites?  Species-prediction
modeling based on specimen collection data?  Species-prediction modeling
based on ecological survey data?  The list goes on and on, and each may
have different application requirements for the granularity of the
identifiers.

>
>
> It may be better to use other words such as globally disambiguateor
> or distinguisher, because we do not mean identity by identifier.
I think we do mean identity.  The identifier is a label that can be used
as a handle to reference the object.

Matt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones
jones at nceas.ucsb.edu
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara     http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




More information about the tdwg-tag mailing list