Re: Topic 1: What do we mean by "GUID"?

12 Oct 2005

      --- Kevin Richards <RichardsK@LANDCARERESEARCH.CO.NZ>
wrote:
...
From my computer-oriented viewpoint I consider a
GUID in our discussion domain to be a an identifier
(a character string that represents an object) that
"points" to a particular record in a database or
file on a computer.  The idea is that the ID is
globally unique - ie there is no other identifier in
the world that is the same, but this is not easy to
guarantee.  I think the main aim here is to ensure
it is unique within the domain for which it was
intended (and a the main reason for using an
existing GUID system such as ARK).
I thnik my main point here is that the GUID must
represent a digital object (eg database record) and
cannot represent a physical object (ie you cannot
transfer the physical object via the Internet).  A
record in a database may refer to a physical object,
however the GUID will refer to the database record
and the physical object will be "described" in the
database record and referred to perhaps by a
physical address/location.
I tend to agree with this suggestion that the GUIDS
should apply to the records served through the
GBIF/TDWG network. The physical objects like for
example the specimens in the institution have already
their identifiers which are not GUIDS and serve for
internal use mainly. Currently these identifiers are
already used in DarwinCore or ABCD data served to GBIF
to link  to Physical objects or additional information
available elsewhere.
...
GUIDs should be assigned to any record/file/etc that
will be served up to external users.
Could GBIF/TDWG in this context not play the role of a
ticketting service serving "GUIDS" for the UNIT level
data to the providers?

For example if a provider wants to provide 2000 new
records to GBIF, 2000 unique identifiers (like
accession numbers) will be assigned to his 2000
records?

The interlinking and localisation business could than
be dealt with at an other level using the
"description"  of the record.

Regards

Pat
...
I think the ARK article does cover most of the
issues surrounding GUIDs, except implementation
specific issues such as who the authorities should
be and what form/granularity the data to be served
up should be in.  I still favour LSIDs where the
resolution of an LSID works in well with the DNS
system, and perhaps because they are actuially
intended for the life sciences domain.  The "problem
definition" of the GUID as described below in
Donald's email seems to sum up the requirements of a
GUID to me.  I'm not sure that "statements of
commitment" are a job for the GUID itself, but they
should be implied.  Implied commitments for LSIDs
include byte-identical data every time and infinite
persistence of the data (a big ask I know).
Kevin
...
...
...
dhobern@GBIF.ORG 12/10/2005 3:37 a.m. >>>
[ I will be trying to provide some structure to
discussions in this mailing list by raising specific
topics and looking for comments.  Please keep the
Topic number in responses ] Topic 1: What do we mean
by GUID? The most fundamental thing that we need to
establish as we consider a GUID implementation is a
definition for "GUID" in this context.  We have been
using a number of terms to describe the identifiers
we need (unique, resolvable, persistent, etc.).
I've been spending some time following up on Rod
Page's recommendation that we consider the use of
Archival Resource Keys (ARK) from the California
Digital Library (see
http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).
The CDL web site includes an excellent overview of
this GUID model, which also serves as an excellent
introduction to the issues involved.  I would urge
you all to read this document * it's only nine pages
long!):
http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf
This document arrives at the following problem
definition for persistent, actionable identifiers:
The goal: long-term actionable identifiers.
Requirement: that identifiers deliver you to objects
(where feasible). Requirement: that identifiers
deliver you to object metadata. Desirable: each
object should wear its own identifier. Requirement:
that identifiers deliver you to statements of
commitment. The problem: URLs break for some objects
(that is, associations between URLs and objects are
not maintained), and we have no way to tell which
ones will or won't break. Why URLs break: because
objects are moved, removed, and replaced *
completely normal activities * and the provider in
each case demonstrates insufficient commitment to
update indirection tables, or to plan identifier
assignment carefully. Persistence is in the mission
of few organizations. Conventional hypothesis: use
indirect names (PURLs, URNs, Handles) instead of
URLs; what worked for DNS should work for digital
object references.  Wrong. Indirection is
spectacularly successful and elegant in DNS, but
it's a side issue in the provision of digital object
persistence.  This document clearly identifies
issues around provider service commitments as the
key problem that needs solving.  The construction of
ARKs seeks to address this in a couple of ways.  It
separates the role of Name Assigning Authority (i.e.
who initially assigns the identifier) from that of
the Name Mapping Authority (i.e. who is able to map
the identifier to the data object at any particular
time).  It also defines a simple standard
relationship between three things: the data object,
the metadata for the object, and a commitment
statement from the provider as to what aspects of
persistence are guaranteed. ARK is a technology that
we have not really considered up to this point.  My
question for discussion is what, if anything, is
missing or wrong about the problem definition
provided in this document?  If we agree that it
provides a crisp definition of what we need, that in
itself will be a major step forward. Please provide
your thoughts. Donald
---------------------------------------------------------------
...
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database
Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax:
+45-35321480
---------------------------------------------------------------
...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
...
WARNING: This email and any attachments may be
confidential and/or
privileged. They are intended for the addressee only
and are not to be read,
used, copied or disseminated by anyone receiving
them in error.  If you are
not the intended recipient, please notify the sender
by return email and
delete this message and any attachments.
The views expressed in this email are those of the
sender and do not
necessarily reflect the official views of Landcare
Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
...
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

Re: Topic 1: What do we mean by "GUID"?

Patricia Mergen