Two kinds of GUIDs?

Roderic Page at BIO.GLA.AC.UK
Fri Nov 4 11:18:13 CET 2005

In my view we do this:

1.If you create a database you have your own primary keys for whatever
information is "yours"

2. If you expose your data to the outside world, you do so as GUIDs,

<my database server>:<my data type>:<my id>

  (see, it's easy)

3. If you refer to any external data you store the GUID for that data.
That way if you and I refer to a GenBank sequence in our databases, we
both store the same GUID.

GUIDs in the above format automatically show provenance (where they
came from).

The notion of a local GUID is nonsensical (by definition).

The notion of "Universal" GUIDs confound inference (do you and I mean
the same thing) with having a globally unique, actionable identifier.

I guess I'm trying to say that I don't this is as hard as it's being
made out to be.

  GUIDs enable you to explicitly identify objects, that's all. This is a
huge step towards integrating databases, but doesn't solve the mapping
issue. Yes, mapping different GUIDs can be tricky, but this is an
entirely separate question. I think there's a huge confusion between
GUIDs and some sort of global taxonomy -- they are not the same thing.

So, if you want to refer to a taxon in your database what do you do,
you use a GUID from your favourite taxonomic database. If GBIF set up a
project to map GUIDs from different databases (good luck to them), you
could use that. This is a separate question -- but by using GUIDs I
know what you are referring to (as an object), and so do others.



On 4 Nov 2005, at 10:55, Yde de Jong wrote:

> Dear all,
> Considering the last messages.
> In my understanding (sorry for repeating part of your input) we can
> distinguish two kinds of GUIDs:
> (1) Universal GUIDs like the ISBN numbers of publications and the
> GeneBank codes of sequences.
> (2) Local GUIDs, which are uniquely linked to objects in your (local)
> database, however, which metadata and metadata structure is
> standardised (in our case by GBIF).
> The Universal GUIDs we need for unambiguous cross-linking of
> databases, because we can't match databases efficiently otherwise
> (e.g. semantically). Indeed in the future everyone should add a
> column to its taxon table and cross-reference to a universal GUIDs
> system (=nomenclator) which keep the standard GUIDs for each name
> (let's leave 'what's in the name' for the moment).
> For me its clear that there is a difference between the use of a name
> within a concept and the name itself. Such a universal GUIDs system
> doesn't need to deal with the use of names and therefore not with
> concepts. Actually names within a nomenclator should never change,
> only the content should grow through time (like sequence data in
> GeneBank).
> Local GUIDs are important for GBIF to show the origin of data. This
> is essential not only for a proper acknowledgement, but also to
> identify possible duplications (e.g. Fauna Europaea data sets are
> being implemented in many other databases, GBIF needs to have a tool
> to detect such duplications).
> In addition those local GUIDs can be used, when cross-mapped with
> universal GUIDs, to provide concepts for the GBIF portal in a
> workable way. Meaning that a GBIF portal user can chose which species
> concept (e.g. that of the CoL) he/she would like to use for
> connecting the requested biodiversity data.
> Kind regards,
> Yde
