> >which is different from the problem having duplicate CatalogNumbers you discuss

> >The physical specimen does exist, but in the foreseeable future all data GUIDs will be attached to data, not to the specimen.

Duplicate specimens occur because the collector collected multiple samples of the same organism and sent them to other institutions.  The duplicate specimens themselves probably have different CatalogNumbers in each institution.  The specimen database records reflect the actual specimens.  Therefore, the specimen database records when combined from multiple institutions have duplicates of the same organism.  But, only by looking at either the Collector and Collector's number or date/location can the duplication be recognized. 

One key use of GBIF-merged specimen records is to count or plot the number of organisms in an area.  When a wide net is thrown around the globe, the duplicate records are caught and return overstated counts.  Ideally a GUID would identify a single unique organism record and enable duplicates to be identified, but I can see no easy way for that to occur within LSID.

Chuck Miller
CIO
Missouri Botanical Garden

-----Original Message-----
From: Gregor Hagedorn [mailto:G.Hagedorn@BBA.DE]
Sent: Thursday, September 30, 2004 6:22 AM
To: TDWG-SDD@LISTSERV.NHM.KU.EDU
Subject: Re: Globally Unique Identifier


> > What about duplicate specimens?  Although a specimen may be MO 1234,
> > K5678 and P AABB, they may in fact all be SMITH 10001 and duplicates
> > of the exact same specimen, not different specimens. Is that one
> > GUID or 3?
>
> In my view, we would assign only ONE GUID, which represents the
> actual, physical specimen.  That this one specimen has multiple
> catalog number assigned to it is simply additional information
> associated with that one specimen (in the same way that many specimens
> may have more than one taxonomic name applied to it, by different
> investigators at different times).

I agree on the multiple catalogue numbers, but I believe still multiple database records of specimens will exists. Since I myself am not involved in collection curation, but in evaluating the information therein (specifically we work on organism interactions) we have a database of now close to 200 000 fungal host parasite records. Some express opinion without further citation, others express opinion backed up by voucher specimen that contains all the information that would be found in collection databases. GBIF seems to have no place for such data so far - and it would be difficult to provide, since we usually have none of "InstitutionCode]+[CollectionCode]+[CatalogNumber" (which is different from the problem having duplicate CatalogNumbers you discuss). Still what kind of data is that? What kind of data is created if a PH.D. student digitizes the specimen records used for a taxonomic revision in a database that is specific to that revision?

Bottomline: The physical specimen does exist, but in the foreseeable future all data GUIDs will be attached to data, not to the specimen. The exceptions is only where indeed it is possible to attach the GUID to the specimen, then this could be cited.

But then we have descriptions, and for description concepts (characters, structures, states, modifiers, etc.) we also need GUIDs to allow federating descriptions that use a common terminology. We have discussed this in SDD on and off (specifically we are proposing to prefer semantically neutral identifiers, and propose a simple optional mechanism called debugid/debugref to enrich data with calculated, semantically meaningful identifiers to facilitate

debugging) - but at the moment SDD really waits for a more general and common solution.

So this discussion is highly relevant to descriptions as well. My main point is: what we are really interested in GBIF in the end is knowledge, not physical possession. If we limit our thinking of the GBIF system to the very special case of institutionalized collections (as both DwC and ABCD in my opinion currently do), or names governed by a nomenclatural code, I believe we may later have to rearchitect.

BTW, partly for these differences between institutional collection- customs and knowledge publication customs, I vote against a strongly central system. LSID authority (lsid.gbif.net) and namespace (with no or low semantics) should be managed by GBIF, but not the ids/versions. GBIF may provide a service to generate them, but should accept any locally generated ID and trust the generator to manage uniqueness.

Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!