Re: Globally Unique Identifier

30 Sep 2004

      ...
...
What about duplicate specimens?  Although a specimen may be MO 1234,
K5678 and P AABB,
they may in fact all be SMITH 10001 and duplicates of the exact same
specimen, not different specimens. Is that one GUID or 3?
In my view, we would assign only ONE GUID, which represents the
actual, physical specimen.  That this one specimen has multiple
catalog number assigned to it is simply additional information
associated with that one specimen (in the same way that many specimens
may have more than one taxonomic name applied to it, by different
investigators at different times).
I agree on the multiple catalogue numbers, but I believe still
multiple database records of specimens will exists. Since I myself am
not involved in collection curation, but in evaluating the
information therein (specifically we work on organism interactions)
we have a database of now close to 200 000 fungal host parasite
records. Some express opinion without further citation, others
express opinion backed up by voucher specimen that contains all the
information that would be found in collection databases. GBIF seems
to have no place for such data so far - and it would be difficult to
provide, since we usually have none of
"InstitutionCode]+[CollectionCode]+[CatalogNumber" (which is
different from the problem having duplicate CatalogNumbers you
discuss). Still what kind of data is that? What kind of data is
created if a PH.D. student digitizes the specimen records used for a
taxonomic revision in a database that is specific to that revision?

Bottomline: The physical specimen does exist, but in the foreseeable
future all data GUIDs will be attached to data, not to the specimen.
The exceptions is only where indeed it is possible to attach the GUID
to the specimen, then this could be cited.

But then we have descriptions, and for description concepts
(characters, structures, states, modifiers, etc.) we also need GUIDs
to allow federating descriptions that use a common terminology. We
have discussed this in SDD on and off (specifically we are proposing
to prefer semantically neutral identifiers, and propose a simple
optional mechanism called debugid/debugref to enrich data with
calculated, semantically meaningful identifiers to facilitate
debugging) - but at the moment SDD really waits for a more general
and common solution.

So this discussion is highly relevant to descriptions as well. My
main point is: what we are really interested in GBIF in the end is
knowledge, not physical possession. If we limit our thinking of the
GBIF system to the very special case of institutionalized collections
(as both DwC and ABCD in my opinion currently do), or names governed
by a nomenclatural code, I believe we may later have to rearchitect.

BTW, partly for these differences between institutional collection-
customs and knowledge publication customs, I vote against a strongly
central system. LSID authority (lsid.gbif.net) and namespace (with no
or low semantics) should be managed by GBIF, but not the
ids/versions. GBIF may provide a service to generate them, but should
accept any locally generated ID and trust the generator to manage
uniqueness.

Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!