Globally Unique Identifier

Bob Morris ram at CS.UMB.EDU
Thu Sep 30 10:43:46 CEST 2004


Chuck Miller wrote:

>>>which is different from the problem having duplicate CatalogNumbers you
>>>
>>>
>discuss
>
>
>
>>>The physical specimen does exist, but in the foreseeable future all data
>>>
>>>
>GUIDs will be attached to data, not to the specimen.
>
>Duplicate specimens occur because the collector collected multiple samples
>of the same organism and sent them to other institutions.  The duplicate
>specimens themselves probably have different CatalogNumbers in each
>institution.  The specimen database records reflect the actual specimens.
>Therefore, the specimen database records when combined from multiple
>institutions have duplicates of the same organism.  But, only by looking at
>either the Collector and Collector's number or date/location can the
>duplication be recognized.
>
>One key use of GBIF-merged specimen records is to count or plot the number
>of organisms in an area.  When a wide net is thrown around the globe, the
>duplicate records are caught and return overstated counts.  Ideally a GUID
>would identify a single unique organism record and enable duplicates to be
>identified, but I can see no easy way for that to occur within LSID.
>
>
>

There is some discussion of how this might happen with LSID in
http://efgblade.cs.umb.edu/twiki/bin/view/BDEI/AbstractEntities

Please feel free to contribute to it.

Bob

>Chuck Miller
>CIO
>Missouri Botanical Garden
>
>-----Original Message-----
>From: Gregor Hagedorn [mailto:G.Hagedorn at BBA.DE]
>Sent: Thursday, September 30, 2004 6:22 AM
>To: TDWG-SDD at LISTSERV.NHM.KU.EDU
>Subject: Re: Globally Unique Identifier
>
>
>
>
>>>What about duplicate specimens?  Although a specimen may be MO 1234,
>>>K5678 and P AABB, they may in fact all be SMITH 10001 and duplicates
>>>of the exact same specimen, not different specimens. Is that one
>>>GUID or 3?
>>>
>>>
>>In my view, we would assign only ONE GUID, which represents the
>>actual, physical specimen.  That this one specimen has multiple
>>catalog number assigned to it is simply additional information
>>associated with that one specimen (in the same way that many specimens
>>may have more than one taxonomic name applied to it, by different
>>investigators at different times).
>>
>>
>
>I agree on the multiple catalogue numbers, but I believe still multiple
>database records of specimens will exists. Since I myself am not involved in
>collection curation, but in evaluating the information therein (specifically
>we work on organism interactions) we have a database of now close to 200 000
>fungal host parasite records. Some express opinion without further citation,
>others express opinion backed up by voucher specimen that contains all the
>information that would be found in collection databases. GBIF seems to have
>no place for such data so far - and it would be difficult to provide, since
>we usually have none of "InstitutionCode]+[CollectionCode]+[CatalogNumber"
>(which is different from the problem having duplicate CatalogNumbers you
>discuss). Still what kind of data is that? What kind of data is created if a
>PH.D. student digitizes the specimen records used for a taxonomic revision
>in a database that is specific to that revision?
>
>Bottomline: The physical specimen does exist, but in the foreseeable future
>all data GUIDs will be attached to data, not to the specimen. The exceptions
>is only where indeed it is possible to attach the GUID to the specimen, then
>this could be cited.
>
>But then we have descriptions, and for description concepts (characters,
>structures, states, modifiers, etc.) we also need GUIDs to allow federating
>descriptions that use a common terminology. We have discussed this in SDD on
>and off (specifically we are proposing to prefer semantically neutral
>identifiers, and propose a simple optional mechanism called debugid/debugref
>to enrich data with calculated, semantically meaningful identifiers to
>facilitate
>debugging) - but at the moment SDD really waits for a more general and
>common solution.
>
>So this discussion is highly relevant to descriptions as well. My main point
>is: what we are really interested in GBIF in the end is knowledge, not
>physical possession. If we limit our thinking of the GBIF system to the very
>special case of institutionalized collections (as both DwC and ABCD in my
>opinion currently do), or names governed by a nomenclatural code, I believe
>we may later have to rearchitect.
>
>BTW, partly for these differences between institutional collection- customs
>and knowledge publication customs, I vote against a strongly central system.
>LSID authority (lsid.gbif.net) and namespace (with no or low semantics)
>should be managed by GBIF, but not the ids/versions. GBIF may provide a
>service to generate them, but should accept any locally generated ID and
>trust the generator to manage uniqueness.
>
>Gregor
>----------------------------------------------------------
>Gregor Hagedorn (G.Hagedorn at bba.de)
>Institute for Plant Virology, Microbiology, and Biosafety Federal Research
>Center for Agriculture and Forestry (BBA)
>Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
>14195 Berlin, Germany           Fax: +49-30-8304-2203
>
>Often wrong but never in doubt!
>
>
>

--
Robert A. Morris
Professor of Computer Science
UMASS-Boston
ram at cs.umb.edu
http://www.cs.umb.edu/efg
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466




More information about the tdwg-content mailing list