Re: Topic 2: GUIDs for Collections and Specimens
Apologies to Dave for being slightly dense, I now understand the proposed solution, which works fine for MVZ records (and others). I should have paid more attention to the XML I was getting back, which would have showed me what I needed to know.
This temporary hack isn't a great solution to the problem of GUIDs for specimens, but works for me at present. The only pain is trying to map classic specimen codes such as "MVZ 193037" from GenBank records onto the correct specimen, but that's another story.
To perhaps be slightly more relevant to this discussion, the issue of multiple identifiers for the "same" information keeps coming up. For example, information on a MVZ specimen may be retrieved using DiGIR (as an XML document), directly from the MVZ web site (as an HTML document, with a different specimen id from the DiGIR record), or through GBIF (with yet another id). For my own purposes I'm linking the different representations, so that my database knows about them (for the technically minded, I'm using RDF so the link is made using the "rdf:sameAs" tag). In the case of specimens I'm guessing the information is usually the "same" (typically it is ultimately served by the same source database), but in other cases it can be very different (e.g., publications where resolving a PubMed id and a DOI lead to very different digital documents).
Regards
Rod
On 21 Oct 2005, at 00:18, Dave Vieglais wrote:
Hi Rod, I was just pointing out that if you include CollectionCode in your example then you would not have the duplication of records that occurs in the example. The combination of InstitutionCode, CollectionCode, and CatalogNumber should provide a GUID to a specimen record. So to slightly modify your example
DiGIR provider URL : resource : CollectionCode : specimen code
will generally be sufficient, but in some cases, a single server resource may offer records from several intitutions, hence:
DiGIR provider URL : resource : InstitutionCode : CollectionCode : specimen code
would be unique. It would be a simple matter to extend DiGIR slightly to support direct resolution of such an identifier. Perhaps something like:
http://some.server/digir.php?id=resource/InstitutionCode/ CollectionCode/CatalogNumber
would be sufficient to identify a single record and retrieve its digital representation as well.
regards, Dave V.
Roderic Page wrote:
My point is that it isn't always done (and the MVZ example concerns totally different specimens, rather than preparations of the same specimen). My aim is not to criticise DiGIR and Darwin Core specifically (although the absence of a GUID is a major weakness), but simply to provide a concrete example where digital records for totally different specimens are not clearly distinguished. In the MVZ example, one could retrieve the record for the desired specimen if one searched on the taxonomic name, but this is cumbersome -- ideally I want a GUID that can be resolved to the appropriate specimen independent of any other information. DiGIR can do this, so long as DiGIR providers using different resource names for different collections.
Regards
Rod
On 20 Oct 2005, at 23:11, Dave Vieglais wrote:
Hi Roderic, In general, for records retrieved from data sources exposed using the Darwin Core one should be able to combine InstitutionCode, CollectionCode and CatalogNumber to provide unique identifiers for those records. This is not always the case however, the most common example of which is probably the presence of records for different preparations of the same specimen.
regards, Dave V.
Roderic Page wrote:
As a consumer of specimen GUIDs, I've found specimens to be frustrating to deal with as individual collections don't guarantee uniqueness of identifiers (Donald's point 2 below). For example, in the absence of specimen GUIDs (such as LSIDs) I'd hoped to use a three part identifier based on the DiGIR provider, e.g.
DiGIR provider URL : resource : specimen code
Hence,
digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106 -----------------------------------/ ---------/ --------/ provider resource specimen
identifies specimen FMNH 158106 of Tatera robusta at the Field Museum in Chicago. The idea behind this crude hack is that the identifier can be resolved (there's enough information in the identifier to retrieve the record, see for example http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
To my horror, if I do this for MVZ 148946, I get three specimens back, one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana cascadae. This is an instance where the same specimen code is being used in three different collections (mammals, birds, and herps). I guess MVZ could have avoided this by using a different name for the 'resource' field for each collection.
I offer this as an example of where GUIDs are vital if we are to avoid linking to the wrong information, and also where individual providers need to ensure that the identifiers they generate are unique.
Regards
Rod
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
participants (1)
-
Roderic Page