guids

Kevin Richards RichardsK at LANDCARERESEARCH.CO.NZ
Tue Sep 27 17:25:42 CEST 2005


A few thoughts...
 
To me, the only way I can see GUIDs being used effectively is to assign a GUID to all data objects that will be exposed outside the local system where it is stored, eg database records, image files, document files - I cant see how you could assign GUIDs to specimens themselves as the specimen itself cannot be returned via a query (unless it is a loan request?).  This will result in a lot of GUIDs, distributed through a lot of systems.  This may seem a concern due to objects in different systems referring to the "same" object but having different GUIDs.  But it is not really a problem - all we have acheived at this stage is giving every data object in all systems a globally unique ID, and one that is hopefully resolvable.  Whether they may refer to the same "entity"/"concept" as another object in the system does not matter at this point.  (by "concept" I dont mean "taxon concept")
 
Objects that refer to the same "concept", such as the same taxonomic name represented in two different systems, needs to be "cross referenced" in some other way.  One way this could be done is by building up a table of mappings between systems where the qualified person in each of those systems has decided the objects refer to the same "concept".  These mappings can then be returned as metadata for either of the GUIDs in the two systems, as "other data objects that refer to the same thing".  This will result in a fairly complex global network of mappings, and will be difficulat to maintain.  Another way to do this is to have a central authoritative table of "concepts" with GUIDs that all other synonymic objects in other systems point to.  Then when an object is access via its GUID, part of the metadata will be the central "concept" GUID, and therefore can be compared to other objects in other systems with the same "concept" GUID.  This will also allow queries to be executed against the different systems for example "give me all objects with the central concept GUID x".
 
It seems DOI and LSIDs are the preferred options at this stage.  I personally prefer LSIDs for several reasons including their standard URN format and the fixed 1:1 mapping of objects and IDs.  I see that DOIs allow, through the handle system, to query for "different" objects of an entity depending on what you want to query for.  Eg for a specimen, you could query for the image of that specimen.  This allows multiple objects to be accessed through a single GUID (this also seems to be how a lot of people I have talked to "view" a GUID system to work).  I can see this being useful in the biodiversity informatics world but I can also see problems it may cause - each of the objects accessed through the single DOI, should have their own GUIDs otherwise they will end up being referenced from somewhere "through" another object - the maintenance of the GUIDs then becomes more challenging.  
 
My main concern with LSIDs is the "must return the exact same set of bytes everytime" requirement of LSIDs.  This can be overcome however by providing all the data in the metadata of the LSID (which seems a bit backward) and only returning a "label/name" that will never change for the data of the LSID.  Otherwise the versioning component of LSIDs can be used to handle changes within the data.
 
Kevin Richards


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error.  If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.

The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.

Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


--=__Part022074C6.0__Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Description: HTML

<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1515" name=GENERATOR></HEAD>
<BODY style="MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">
<DIV>A few thoughts...</DIV>
<DIV>&nbsp;</DIV>
<DIV>To me, the only way I can see GUIDs being used effectively is to assign a GUID to all data objects that will be exposed outside the local system where it is stored, eg database records, image files, document files - I cant see how you could assign GUIDs to specimens themselves as the specimen itself cannot be returned via a query (unless it is a loan request?).&nbsp; This will result in a lot of GUIDs, distributed through a lot of systems.&nbsp; This may seem a concern due to objects in different systems referring to the "same" object but having different GUIDs.&nbsp;&nbsp;But it is not really a problem - all we have acheived at this stage is&nbsp;giving every data object in all systems a globally unique ID, and one that is hopefully resolvable.&nbsp; Whether they may refer to the same "entity"/"concept" as another object in the system does not matter at this point.&nbsp; (by "concept" I dont mean "taxon concept")</DIV>
<DIV>&nbsp;</DIV>
<DIV>Objects that refer to the same "concept", such as&nbsp;the same&nbsp;taxonomic name represented in two different systems,&nbsp;needs to be "cross referenced" in some other way.&nbsp; One way this could&nbsp;be done is by building up a table of mappings between systems where the qualified person in each of those systems has decided the objects refer to the same "concept".&nbsp; These mappings can then be returned as metadata for either of the GUIDs in the two systems,&nbsp;as "other data objects that refer to the same thing".&nbsp; This will result in a fairly complex global network of mappings, and will be difficulat to maintain.&nbsp; Another way to do this is to have a central authoritative table of "concepts" with GUIDs that all other synonymic objects in other systems point to.&nbsp; Then when an object is access via its GUID, part of the metadata will be the central "concept" GUID, and therefore can be compared to other objects in other systems with the same "concept" GUID.&nbsp; This will also allow queries to be executed against the different systems for example "give me all objects with the central concept GUID x".</DIV>
<DIV>&nbsp;</DIV>
<DIV>It seems DOI and LSIDs are the preferred options at this stage.&nbsp; I personally prefer LSIDs for several reasons including their standard URN format and the fixed 1:1 mapping of objects and IDs.&nbsp; I see that DOIs allow, through the handle system, to query for "different" objects of an entity depending on what you want to query for.&nbsp; Eg for a specimen, you could query&nbsp;for the image of that specimen.&nbsp; This allows multiple objects to be accessed through a single GUID (this also seems to be how a lot of people I have talked to "view" a GUID system to work).&nbsp; I can see this being useful in the biodiversity informatics world but I can also see problems it may cause - each of the objects accessed through the single DOI, should have their own GUIDs otherwise they will end up being referenced from somewhere "through" another object - the maintenance of the GUIDs then becomes more challenging.&nbsp; </DIV>
<DIV>&nbsp;</DIV>
<DIV>My main concern with LSIDs is the "must return the exact same set of bytes everytime" requirement of LSIDs.&nbsp; This can be overcome however by providing all the data in the metadata of the LSID (which seems a bit backward) and only returning a "label/name" that will never change&nbsp;for the data of the LSID.&nbsp; Otherwise the versioning component of LSIDs can be used to handle changes within the data.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Kevin Richards</DIV></BODY></HTML>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<BR>
WARNING: This email and any attachments may be confidential and/or<BR>
privileged. They are intended for the addressee only and are not to be read,<BR>
used, copied or disseminated by anyone receiving them in error.  If you are<BR>
not the intended recipient, please notify the sender by return email and<BR>
delete this message and any attachments.<BR>
<BR>
The views expressed in this email are those of the sender and do not<BR>
necessarily reflect the official views of Landcare Research.  <BR>
<BR>
Landcare Research<BR>
http://www.landcareresearch.co.nz<BR>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<BR>
<BR>
</BODY></HTML>


More information about the tdwg-tag mailing list