Topic 1: What do we mean by "GUID"?

Kevin Richards RichardsK at LANDCARERESEARCH.CO.NZ
Thu Oct 13 10:38:58 CEST 2005


>>>From my computer-oriented viewpoint I consider a GUID in our discussion domain to be a an identifier (a character string that represents an object) that "points" to a particular record in a database or file on a computer.  The idea is that the ID is globally unique - ie there is no other identifier in the world that is the same, but this is not easy to guarantee.  I think the main aim here is to ensure it is unique within the domain for which it was intended (and a the main reason for using an existing GUID system such as ARK).  
 
I thnik my main point here is that the GUID must represent a digital object (eg database record) and cannot represent a physical object (ie you cannot transfer the physical object via the Internet).  A record in a database may refer to a physical object, however the GUID will refer to the database record and the physical object will be "described" in the database record and referred to perhaps by a physical address/location.  
 
GUIDs should be assigned to any record/file/etc that will be served up to external users.
 
I think the ARK article does cover most of the issues surrounding GUIDs, except implementation specific issues such as who the authorities should be and what form/granularity the data to be served up should be in.  I still favour LSIDs where the resolution of an LSID works in well with the DNS system, and perhaps because they are actuially intended for the life sciences domain.  The "problem definition" of the GUID as described below in Donald's email seems to sum up the requirements of a GUID to me.  I'm not sure that "statements of commitment" are a job for the GUID itself, but they should be implied.  Implied commitments for LSIDs include byte-identical data every time and infinite persistence of the data (a big ask I know).
 
Kevin


>>> dhobern at GBIF.ORG 12/10/2005 3:37 a.m. >>>

[ I will be trying to provide some structure to discussions in this mailing list by raising specific topics and looking for comments.  Please keep the Topic number in responses ] Topic 1: What do we mean by GUID? The most fundamental thing that we need to establish as we consider a GUID implementation is a definition for "GUID" in this context.  We have been using a number of terms to describe the identifiers we need (unique, resolvable, persistent, etc.).   I've been spending some time following up on Rod Page's recommendation that we consider the use of Archival Resource Keys (ARK) from the California Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).  The CDL web site includes an excellent overview of this GUID model, which also serves as an excellent introduction to the issues involved.  I would urge you all to read this document * it's only nine pages long!): http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf This document arrives at the following problem definition for persistent, actionable identifiers: The goal: long-term actionable identifiers. Requirement: that identifiers deliver you to objects (where feasible). Requirement: that identifiers deliver you to object metadata. Desirable: each object should wear its own identifier. Requirement: that identifiers deliver you to statements of commitment. The problem: URLs break for some objects (that is, associations between URLs and objects are not maintained), and we have no way to tell which ones will or won't break. Why URLs break: because objects are moved, removed, and replaced * completely normal activities * and the provider in each case demonstrates insufficient commitment to update indirection tables, or to plan identifier assignment carefully. Persistence is in the mission of few organizations. Conventional hypothesis: use indirect names (PURLs, URNs, Handles) instead of URLs; what worked for DNS should work for digital object references.  Wrong. Indirection is spectacularly successful and elegant in DNS, but it's a side issue in the provision of digital object persistence.  This document clearly identifies issues around provider service commitments as the key problem that needs solving.  The construction of ARKs seeks to address this in a couple of ways.  It separates the role of Name Assigning Authority (i.e. who initially assigns the identifier) from that of the Name Mapping Authority (i.e. who is able to map the identifier to the data object at any particular time).  It also defines a simple standard relationship between three things: the data object, the metadata for the object, and a commitment statement from the provider as to what aspects of persistence are guaranteed. ARK is a technology that we have not really considered up to this point.  My question for discussion is what, if anything, is missing or wrong about the problem definition provided in this document?  If we agree that it provides a crisp definition of what we need, that in itself will be a major step forward. Please provide your thoughts. Donald
 
---------------------------------------------------------------
Donald Hobern (dhobern at gbif.org)
Programme Officer for Data Access and Database Interoperability 
Global Biodiversity Information Facility Secretariat 
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
--------------------------------------------------------------- 



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error.  If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.

The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.

Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


--=__Part86A48562.0__Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Description: HTML

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v = "urn:schemas-microsoft-com:vml" xmlns:o = "urn:schemas-microsoft-com:office:office" xmlns:w = "urn:schemas-microsoft-com:office:word" xmlns:st1 = "urn:schemas-microsoft-com:office:smarttags"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1515" name=GENERATOR><o:SmartTagType name="State" namespaceuri="urn:schemas-microsoft-com:office:smarttags"></o:SmartTagType><o:SmartTagType name="country-region" namespaceuri="urn:schemas-microsoft-com:office:smarttags"></o:SmartTagType><o:SmartTagType name="City" namespaceuri="urn:schemas-microsoft-com:office:smarttags"></o:SmartTagType><o:SmartTagType name="place" namespaceuri="urn:schemas-microsoft-com:office:smarttags"></o:SmartTagType>
<STYLE>
st1\:*{behavior:url(#default#ieooui) }
</STYLE>

<STYLE>
<!--
 /* Font Definitions */
 @font-face
        {font-family:CMR10;
        panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:CMTI10;
        panose-1:0 0 0 0 0 0 0 0 0 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:Arial;
        color:windowtext;}
@page Section1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
        {page:Section1;}
 /* List Definitions */
 @list l0
        {mso-list-id:256908390;
        mso-list-type:hybrid;
        mso-list-template-ids:1073485238 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-tab-stop:36.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:72.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
-->
</STYLE>
</HEAD>
<BODY lang=EN-US style="MARGIN: 4px 4px 1px; FONT: 10pt Tahoma" vLink=purple link=blue>
<DIV>From my computer-oriented viewpoint I consider a GUID in our discussion domain to be a an identifier (a character string that represents an object) that "points" to a particular record in a database or&nbsp;file on a computer.&nbsp; The idea&nbsp;is that the ID is globally unique - ie there is no other identifier in the world that is the same, but this is not easy to guarantee.&nbsp; I think the main aim here is to ensure it is unique within the domain for which it was intended (and a the main reason for using an existing GUID system such as ARK).&nbsp; </DIV>
<DIV>&nbsp;</DIV>
<DIV>I thnik my main point here is that&nbsp;the GUID&nbsp;must represent a digital object (eg database record) and cannot represent a physical object (ie you cannot transfer the physical object via the Internet).&nbsp; A record in a database may refer to a physical object,&nbsp;however the GUID will refer to the database record and the physical object will be "described" in the database record and referred to perhaps by a physical address/location.&nbsp; </DIV>
<DIV>&nbsp;</DIV>
<DIV>GUIDs should be assigned to any record/file/etc that will be served up to external users.</DIV>
<DIV>&nbsp;</DIV>
<DIV>I think the ARK article does cover most of the issues surrounding GUIDs, except implementation specific issues such as who the authorities should be and what form/granularity the data to be served up should be in.&nbsp; I still favour LSIDs&nbsp;where the resolution of an LSID&nbsp;works in well with the DNS system, and&nbsp;perhaps because they are actuially intended for the life sciences domain.&nbsp; The "problem definition" of the GUID as described below in Donald's email seems to sum up the requirements of a GUID to me.&nbsp; I'm not sure that "statements of commitment" are a job for the GUID itself, but they should be implied.&nbsp; Implied commitments for LSIDs include byte-identical data every time and infinite persistence of the data (a big ask I know).</DIV>
<DIV>&nbsp;</DIV>
<DIV>Kevin</DIV>
<DIV><BR><BR>&gt;&gt;&gt; dhobern at GBIF.ORG 12/10/2005 3:37 a.m. &gt;&gt;&gt;<BR></DIV>
<DIV style="COLOR: #000000">
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">[ I will be trying to provide some structure to discussions in this mailing list by raising specific topics and looking for comments. &nbsp;Please keep the Topic number in responses ]<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Topic 1: What do we mean by GUID?<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">The most fundamental thing that we need to establish as we consider a GUID implementation is a definition for “GUID” in this context. &nbsp;We have been using a number of terms to describe the identifiers we need (unique, resolvable, persistent, etc.). &nbsp;<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I’ve been spending some time following up on Rod Page’s recommendation that we consider the use of Archival Resource Keys (ARK) from the California Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). &nbsp;The CDL web site includes an excellent overview of this GUID model, which also serves as an excellent introduction to the issues involved. &nbsp;I would urge you all to read this document – it’s only nine pages long!):<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><A href="http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf">http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf</A><o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">This document arrives at the following problem definition for persistent, actionable identifiers:<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<OL style="MARGIN-TOP: 0cm" type=1>
<LI class=MsoNormal style="mso-list: l0 level1 lfo1"><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMR10">The goal: long-term </SPAN></FONT><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">actionable</SPAN></FONT></I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMTI10"> </SPAN></FONT><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMR10">identifiers.<o:p></o:p></SPAN></FONT> 
<OL style="MARGIN-TOP: 0cm" type=a>
<LI class=MsoNormal style="mso-list: l0 level2 lfo1"><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">Requirement: that identifiers deliver you to objects (where feasible).<o:p></o:p></SPAN></FONT></I> 
<LI class=MsoNormal style="mso-list: l0 level2 lfo1"><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">Requirement: that identifiers deliver you to object metadata.<o:p></o:p></SPAN></FONT></I> 
<LI class=MsoNormal style="mso-list: l0 level2 lfo1"><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">Desirable: each object should wear its own identifier.<o:p></o:p></SPAN></FONT></I> 
<LI class=MsoNormal style="mso-list: l0 level2 lfo1"><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">Requirement: that identifiers deliver you to statements of commitment</SPAN></FONT></I><I><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMR10">.<o:p></o:p></SPAN></FONT></I> </LI></OL>
<LI class=MsoNormal style="mso-list: l0 level1 lfo1"><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMR10">The problem: URLs break </SPAN></FONT><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">for some objects (that is, associations between URLs and objects are not maintained), and we have no way to tell which ones will or won’t break</SPAN></FONT></I><I><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMR10">.<o:p></o:p></SPAN></FONT></I> 
<LI class=MsoNormal style="mso-list: l0 level1 lfo1"><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMR10">Why URLs break: because objects are moved, removed, and replaced </SPAN></FONT><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMTI10">– <I><SPAN style="FONT-STYLE: italic">completely normal activities – and the provider in each case demonstrates insufficient commitment to update indirection tables, or to plan identifier assignment carefully. Persistence is in the mission of few organizations.<o:p></o:p></SPAN></I></SPAN></FONT> 
<LI class=MsoNormal style="mso-list: l0 level1 lfo1"><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: CMR10">Conventional hypothesis: use indirect names (PURLs, URNs, Handles) instead of URLs; what worked for DNS should work for digital object references.&nbsp; </SPAN></FONT><I><FONT face=CMTI10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMTI10">Wrong. Indirection is spectacularly successful and elegant in DNS, but it’s a side issue in the provision of digital object persistence.</SPAN></FONT></I><I><FONT face=CMR10 size=2><SPAN style="FONT-SIZE: 10pt; FONT-STYLE: italic; FONT-FAMILY: CMR10"><o:p></o:p></SPAN></FONT></I> </LI></OL>
<P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">This document clearly identifies issues around provider service commitments as the key problem that needs solving. &nbsp;The construction of ARKs seeks to address this in a couple of ways. &nbsp;It separates the role of Name Assigning Authority (i.e. who initially assigns the identifier) from that of the Name Mapping Authority (i.e. who is able to map the identifier to the data object at any particular time). &nbsp;It also defines a simple standard relationship between three things: the data object, the metadata for the object, and a commitment statement from the provider as to what aspects of persistence are guaranteed.<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><st1:State w:st="on"><st1:place w:st="on"><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">ARK</SPAN></FONT></st1:place></st1:State><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> is a technology that we have not really considered up to this point. &nbsp;My question for discussion is what, if anything, is missing or wrong about the problem definition provided in this document? &nbsp;If we agree that it provides a crisp definition of what we need, that in itself will be a major step forward.<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Please provide your thoughts.<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Donald<BR>&nbsp;<BR>---------------------------------------------------------------<BR>Donald Hobern (<A href="mailto:dhobern at gbif.org">dhobern at gbif.org</A>)<BR>Programme Officer for Data Access and Database Interoperability <BR>Global Biodiversity Information Facility Secretariat <BR>Universitetsparken 15, DK-2100 <st1:place w:st="on"><st1:City w:st="on">Copenhagen</st1:City>, <st1:country-region w:st="on">Denmark</st1:country-region></st1:place><BR>Tel: +45-35321483&nbsp;&nbsp; <st1:City w:st="on"><st1:place w:st="on">Mobile</st1:place></st1:City>: +45-28751483&nbsp;&nbsp; Fax: +45-35321480<BR>---------------------------------------------------------------</SPAN></FONT><FONT face=Arial size=2><SPAN lang=EN-GB style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN lang=EN-GB style="FONT-SIZE: 12pt"><o:p>&nbsp;</o:p></SPAN></FONT></P></DIV></DIV></BODY></HTML>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<BR>
WARNING: This email and any attachments may be confidential and/or<BR>
privileged. They are intended for the addressee only and are not to be read,<BR>
used, copied or disseminated by anyone receiving them in error.  If you are<BR>
not the intended recipient, please notify the sender by return email and<BR>
delete this message and any attachments.<BR>
<BR>
The views expressed in this email are those of the sender and do not<BR>
necessarily reflect the official views of Landcare Research.  <BR>
<BR>
Landcare Research<BR>
http://www.landcareresearch.co.nz<BR>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<BR>
<BR>
</BODY></HTML>


More information about the tdwg-tag mailing list