Different reasons for different GUIDs (Was: GUIDs, LSIDs, and metadata)

Donald Hobern dhobern at GBIF.ORG
Tue Sep 13 13:45:24 CEST 2005


Richard Pyle wrote:

> One paradigm might have each major database create its own LSID:
>
> urn:lsid:catalogoffishes.org:SPNO:123456
> urn:lsid:gbif.org:ECAT:876543
> urn:lsid:itis.gov:TSN:567890
>
> But then we're burdoned with the task of cross-mapping each of these, and
> also preserving the legacy IDs into perpetuity after we've eventually
> converged on a single taxon name GUID system.

This indicates a general issue that we need to consider and which may make
the whole problem a little simpler for us in some areas.

There are different situations (use cases) in which we may require GUIDs.
The service characteristics of GUIDs may not always be the same.

It seems reasonable for us to think of a system which associates GUIDs with
every data object shared by anyone using TDWG standards.  These objects will
represent a wide range of different types of record (specimen, collection,
taxon name, taxon concept, synonymy assertion, publication, author, user,
etc.).  Use of genuinely _global_ identifiers in all of these cases will
have the following benefits:

1. Users can reliably request associated data items simply by knowing the
GUID assigned to them by the data provider.  For example, I can view a
specimen record and then request an associated taxon concept record
identified within the specimen record by GUID.

2. It may simplify a data provider's own task in providing cross-references
between data elements which are all under their own control.

This is a compelling set of reasons for using GUIDs, and for their being
resolvable.  Such GUIDs should be relatively simple to manage and we could
put together many different solutions that would satisfy these requirements.

There is however a further requirement that may only apply in certain
situations and for certain classes of data: support for _different_ data
providers to re-use the _same_ GUID if they are sharing data for the same
real-world object (collection, specimen, etc.).  For example institutionA is
the first body to reference a taxon concept published under the name Aus bus
in publicationC.  If institutionB subsequently wishes to reference Aus bus
sensu publicationC, they would need to find and reuse the same idengtifier.
The benefit from this would be as follows:

3. Users can assume that data records relate to the same real-world object
if and only if they share the same GUID.

I think that we have generally assumed that all our GUID uses will require
this additional function, but this is the part that will get very expensive
to implement.  I believe that we should take a case-by-case decision on
whether the additional infrastructure is really required to support this
level of integration.

As an example, I can see excellent reasons for us at least to try for
benefit 3 for specimen records, and also probably in some way for taxon
names and publications (whatever actual form the identifiers for these may
take).  However (I think, like Rod) I am not totally convinced that we will
need to manage GUIDs offering benefit 3 for taxon concepts.  We should use
GUIDs (with benefits 1 and 2) for taxon concepts, and our taxon concepts
should reference names and publications which probably have the stronger
kind of GUIDs (with benefits 1-3).  The coupling of a name GUID and a
publication GUID would then be a unique combination that would still allow
concepts to be compared.  However there is probably insufficient benefit for
us to gain from having a system where strong GUIDs are assigned to every
taxon concept that any provider wishes to reference.  Such a system would
require every data provider to check a global registry to make sure that any
particular taxon concept did not already have a GUID.

This is just an example, but indicates that we need to think hard about the
use cases for each data object class and its associated GUIDs.

Donald

---------------------------------------------------------------
Donald Hobern (dhobern at gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
---------------------------------------------------------------




More information about the tdwg-tag mailing list