
Richard Pyle wrote:
One paradigm might have each major database create its own LSID:
urn:lsid:catalogoffishes.org:SPNO:123456 urn:lsid:gbif.org:ECAT:876543 urn:lsid:itis.gov:TSN:567890
But then we're burdoned with the task of cross-mapping each of these, and also preserving the legacy IDs into perpetuity after we've eventually converged on a single taxon name GUID system.
This indicates a general issue that we need to consider and which may make the whole problem a little simpler for us in some areas. There are different situations (use cases) in which we may require GUIDs. The service characteristics of GUIDs may not always be the same. It seems reasonable for us to think of a system which associates GUIDs with every data object shared by anyone using TDWG standards. These objects will represent a wide range of different types of record (specimen, collection, taxon name, taxon concept, synonymy assertion, publication, author, user, etc.). Use of genuinely _global_ identifiers in all of these cases will have the following benefits: 1. Users can reliably request associated data items simply by knowing the GUID assigned to them by the data provider. For example, I can view a specimen record and then request an associated taxon concept record identified within the specimen record by GUID. 2. It may simplify a data provider's own task in providing cross-references between data elements which are all under their own control. This is a compelling set of reasons for using GUIDs, and for their being resolvable. Such GUIDs should be relatively simple to manage and we could put together many different solutions that would satisfy these requirements. There is however a further requirement that may only apply in certain situations and for certain classes of data: support for _different_ data providers to re-use the _same_ GUID if they are sharing data for the same real-world object (collection, specimen, etc.). For example institutionA is the first body to reference a taxon concept published under the name Aus bus in publicationC. If institutionB subsequently wishes to reference Aus bus sensu publicationC, they would need to find and reuse the same idengtifier. The benefit from this would be as follows: 3. Users can assume that data records relate to the same real-world object if and only if they share the same GUID. I think that we have generally assumed that all our GUID uses will require this additional function, but this is the part that will get very expensive to implement. I believe that we should take a case-by-case decision on whether the additional infrastructure is really required to support this level of integration. As an example, I can see excellent reasons for us at least to try for benefit 3 for specimen records, and also probably in some way for taxon names and publications (whatever actual form the identifiers for these may take). However (I think, like Rod) I am not totally convinced that we will need to manage GUIDs offering benefit 3 for taxon concepts. We should use GUIDs (with benefits 1 and 2) for taxon concepts, and our taxon concepts should reference names and publications which probably have the stronger kind of GUIDs (with benefits 1-3). The coupling of a name GUID and a publication GUID would then be a unique combination that would still allow concepts to be compared. However there is probably insufficient benefit for us to gain from having a system where strong GUIDs are assigned to every taxon concept that any provider wishes to reference. Such a system would require every data provider to check a global registry to make sure that any particular taxon concept did not already have a GUID. This is just an example, but indicates that we need to think hard about the use cases for each data object class and its associated GUIDs. Donald --------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------