In my view we do this:
1.If you create a database you have your own primary keys for whatever information is "yours"
2. If you expose your data to the outside world, you do so as GUIDs, e.g.
<my database server>:<my data type>:<my id>
(see, it's easy)
3. If you refer to any external data you store the GUID for that data. That way if you and I refer to a GenBank sequence in our databases, we both store the same GUID.
GUIDs in the above format automatically show provenance (where they came from).
The notion of a local GUID is nonsensical (by definition).
The notion of "Universal" GUIDs confound inference (do you and I mean the same thing) with having a globally unique, actionable identifier.
I guess I'm trying to say that I don't this is as hard as it's being made out to be.
GUIDs enable you to explicitly identify objects, that's all. This is a huge step towards integrating databases, but doesn't solve the mapping issue. Yes, mapping different GUIDs can be tricky, but this is an entirely separate question. I think there's a huge confusion between GUIDs and some sort of global taxonomy -- they are not the same thing.
So, if you want to refer to a taxon in your database what do you do, you use a GUID from your favourite taxonomic database. If GBIF set up a project to map GUIDs from different databases (good luck to them), you could use that. This is a separate question -- but by using GUIDs I know what you are referring to (as an object), and so do others.
Regards
Rod
On 4 Nov 2005, at 10:55, Yde de Jong wrote:
Dear all,
Considering the last messages.
In my understanding (sorry for repeating part of your input) we can distinguish two kinds of GUIDs:
(1) Universal GUIDs like the ISBN numbers of publications and the GeneBank codes of sequences.
(2) Local GUIDs, which are uniquely linked to objects in your (local) database, however, which metadata and metadata structure is standardised (in our case by GBIF).
The Universal GUIDs we need for unambiguous cross-linking of databases, because we can't match databases efficiently otherwise (e.g. semantically). Indeed in the future everyone should add a column to its taxon table and cross-reference to a universal GUIDs system (=nomenclator) which keep the standard GUIDs for each name (let's leave 'what's in the name' for the moment).
For me its clear that there is a difference between the use of a name within a concept and the name itself. Such a universal GUIDs system doesn't need to deal with the use of names and therefore not with concepts. Actually names within a nomenclator should never change, only the content should grow through time (like sequence data in GeneBank).
Local GUIDs are important for GBIF to show the origin of data. This is essential not only for a proper acknowledgement, but also to identify possible duplications (e.g. Fauna Europaea data sets are being implemented in many other databases, GBIF needs to have a tool to detect such duplications).
In addition those local GUIDs can be used, when cross-mapped with universal GUIDs, to provide concepts for the GBIF portal in a workable way. Meaning that a GBIF portal user can chose which species concept (e.g. that of the CoL) he/she would like to use for connecting the requested biodiversity data.
Kind regards,
Yde
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org