Topic 2: GUIDs for Collections and Specimens

Roderic Page r.page at BIO.GLA.AC.UK
Tue Oct 25 12:56:09 CEST 2005


Apologies to Dave for being slightly dense, I now understand the
proposed solution, which works fine for MVZ records (and others). I
should have paid more attention to the XML I was getting back, which
would have showed me what I needed to know.

This temporary hack isn't a great solution to the problem of GUIDs for
specimens, but works for me at present.  The only pain is trying to map
classic specimen codes such as "MVZ 193037" from GenBank records onto
the correct specimen, but that's another story.

To perhaps be slightly more relevant to this discussion, the issue of
multiple identifiers for the "same" information keeps coming up. For
example, information on a MVZ specimen may be retrieved using DiGIR (as
an XML document), directly from the MVZ web site (as an HTML document,
with a different specimen id from the DiGIR record), or through GBIF
(with yet another id). For my own purposes I'm linking the different
representations, so that my database knows about them (for the
technically minded, I'm using RDF so the link is made using the
"rdf:sameAs" tag). In the case of specimens I'm guessing the
information is usually the "same" (typically it is ultimately served by
the same source database), but in other cases it can be very different
(e.g., publications where resolving a PubMed id and a DOI lead to very
different digital documents).


Regards

Rod

On 21 Oct 2005, at 00:18, Dave Vieglais wrote:

> Hi Rod,
> I was just pointing out that if you include CollectionCode in your
> example then you would not have the duplication of records that occurs
> in the example.  The combination of InstitutionCode, CollectionCode,
> and
> CatalogNumber should provide a GUID to a specimen record.  So to
> slightly modify your example
>
> DiGIR provider URL : resource : CollectionCode : specimen code
>
> will generally be sufficient, but in some cases, a single server
> resource may offer records from several intitutions, hence:
>
> DiGIR provider URL : resource : InstitutionCode : CollectionCode :
> specimen code
>
> would be unique.  It would be a simple matter to extend DiGIR slightly
> to support direct resolution of such an identifier.  Perhaps something
> like:
>
> http://some.server/digir.php?id=resource/InstitutionCode/
> CollectionCode/CatalogNumber
>
> would be sufficient to identify a single record and retrieve its
> digital
> representation as well.
>
> regards,
>   Dave V.
>
> Roderic Page wrote:
>> My point is that it isn't always done (and the MVZ example concerns
>> totally different specimens, rather than preparations of the same
>> specimen). My aim is not to criticise DiGIR and Darwin Core
>> specifically (although the absence of a GUID is a major weakness), but
>> simply to provide a concrete example where digital records for totally
>> different specimens are not clearly distinguished. In the MVZ example,
>> one could retrieve the record for the desired specimen if one searched
>> on the taxonomic name, but this is cumbersome -- ideally I want a GUID
>> that can be resolved to the appropriate specimen independent of any
>> other information. DiGIR can do this, so long as DiGIR providers using
>> different resource names for different collections.
>>
>> Regards
>>
>> Rod
>>
>>
>>
>> On 20 Oct 2005, at 23:11, Dave Vieglais wrote:
>>
>>> Hi Roderic,
>>> In general, for records retrieved from data sources exposed using the
>>> Darwin Core one should be able to combine InstitutionCode,
>>> CollectionCode and CatalogNumber to provide unique identifiers for
>>> those
>>> records.  This is not always the case however, the most common
>>> example
>>> of which is probably the presence of records for different
>>> preparations
>>> of the same specimen.
>>>
>>> regards,
>>>   Dave V.
>>>
>>> Roderic Page wrote:
>>>
>>>> As a consumer of specimen GUIDs, I've found specimens to be
>>>> frustrating
>>>> to deal with as individual collections don't guarantee uniqueness of
>>>> identifiers (Donald's point 2 below). For example, in the absence of
>>>> specimen GUIDs (such as LSIDs) I'd hoped to use a three part
>>>> identifier
>>>> based on the DiGIR provider, e.g.
>>>>
>>>> DiGIR provider URL : resource : specimen code
>>>>
>>>> Hence,
>>>>
>>>> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106
>>>> \-----------------------------------/ \---------/ \--------/
>>>> provider resource specimen
>>>>
>>>> identifies specimen FMNH 158106 of Tatera robusta at the Field
>>>> Museum
>>>> in
>>>> Chicago. The idea behind this crude hack is that the identifier can
>>>> be
>>>> resolved (there's enough information in the identifier to retrieve
>>>> the
>>>> record, see for example
>>>> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
>>>>
>>>> To my horror, if I do this for MVZ 148946, I get three specimens
>>>> back,
>>>> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana
>>>> cascadae. This is an instance where the same specimen code is being
>>>> used
>>>> in three different collections (mammals, birds, and herps). I guess
>>>> MVZ
>>>> could have avoided this by using a different name for the 'resource'
>>>> field for each collection.
>>>>
>>>> I offer this as an example of where GUIDs are vital if we are to
>>>> avoid
>>>> linking to the wrong information, and also where individual
>>>> providers
>>>> need to ensure that the identifiers they generate are unique.
>>>>
>>>> Regards
>>>>
>>>> Rod
>>>>
>>>>
>>>> Professor Roderic D. M. Page
>>>> Editor, Systematic Biology
>>>> DEEB, IBLS
>>>> Graham Kerr Building
>>>> University of Glasgow
>>>> Glasgow G12 8QP
>>>> United Kingdom
>>>>
>>>> Phone: +44 141 330 4778
>>>> Fax: +44 141 330 2792
>>>> email: r.page at bio.gla.ac.uk
>>>> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>>>
>>>> Subscribe to Systematic Biology through the Society of Systematic
>>>> Biologists Website: http://systematicbiology.org
>>>> Search for taxon names at
>>>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>> Professor Roderic D. M. Page
>> Editor, Systematic Biology
>> DEEB, IBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QP
>> United Kingdom
>>
>> Phone:    +44 141 330 4778
>> Fax:      +44 141 330 2792
>> email:    r.page at bio.gla.ac.uk
>> web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>
>> Subscribe to Systematic Biology through the Society of Systematic
>> Biologists Website:  http://systematicbiology.org
>> Search for taxon names at
>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>>
>
>
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/




More information about the tdwg-tag mailing list