[tdwg-tag] Specimen identifiers [SEC=UNCLASSIFIED]

Paul Murray pmurray at anbg.gov.au
Mon Feb 27 04:27:48 CET 2012


On 25/02/2012, at 4:29 AM, Dean Pentcheff wrote:

> This is directly in response to Rod's response to Paul. I think the two of you may have just articulated nearly the same idea, though you seem not to think you did.
> 
> Paul envisions institutions each declaring their own URI-creating formula (to resolve down to a specimen at that institution), promulgated at a "forum" location.
> 
> Rod envisions URI formulation as happening at a GBIFesque centralized site.
> 
> If Paul's forum were GBIF (or similar), with an added function that GBIF (or similar) renegotiates any institutional declaration that collides with a pre-existing declaration, does that map to the same thing for both of you?

Well, if institutions are assigning URIs with their own domain names in them, or if GBIF is handing out URI prefixes that the institutions use, then collisions wouldn't be an issue.

As a technical person, perhaps I don't quite see things from the point of view of institutions whose interest in the web stops at having a pretty website, as someone suggested. It seems to me the easiest thing in the world to spark up a server and say "these are our URIs". But if people are outsourcing their web presence, then I can appreciate that creating a SemWeb presence might not seem as easy a thing to do to them. This is also the case for people who live in large institutions with byzantine rules about what may and may not go on the corporate websites.

If there are places where the issuing of ids to specimens is as chaotic as Rod describes, well - I think the flip side of what I was saying earlier, that people that create the numbers can easily create URIs, is that if the people who create the numbers have bits and bobs all over the place, then an external institution like GBIF is not going to be able to sort it out remotely. Someone has to be on the ground, treading the dusty caverns under the museum, their feeble yellowish torch beam counterpoint to the flickering and burned-out bluish fluorescent lights above, flicking the spiders away and copying labels into their iPad and working out what's what, trying not to accidentally kick over the skeletons.

Or the equivalent in cyberspace - the forgotten databases with their cryptic column names distant echoes of those hidden recesses where the specimen boxes are packed.

A start might be: 

* GBIF issues URI prefixes to people/institutions that want them. A system for doing this would need to be decided on, and that will involve (shudder) people.  
* GBIF advises the institution on setting up the namespace under that, trying to make the point that URIs should be persistent, unique, all those good things
* GBIF acts as a registry for these namespaces, a place to declare "if you have a specimen record from collection X, then for sem-web purposes the URI should look like *this*" - allowing all that legacy data to be knitted together.

The GBIF webserver might manage incoming http requests by
* holding some very basic, minimal data - even just a dcterms:title and nothing else
* or, 303 redirecting to the institution's own webserver (much in the manner of a PURL server) according to rules expressed simply as a regular expression find/replace.
* or, fetching the RDF from the institutions' server, and ADDING some RDF facts of its own to the result

This third option means that the GBIF database can serve as a central spot where movements of specimens (ie, the assignment of a new accession number) can be put. Hopefully not the only spot, though. Best practice is always to serve up the initial and the immediately prior URI along with any URI you give to the specimen. (this only makes sense for RDF, though: you can't just "add" things to a nicely formatted HTML page).

To make all this happen, you would want some sort of usable machine-to-machine service, you'll have to manage authentication (Passwords are a bit of a pain - perhaps a cryptographic certificate given out when the namespace prefix is assigned? Easy enough to do.). You'll want a test/staging service and a real service …

Its a fair bit of work, come to think of it, just on the technical side, and this is without starting on the "part-of" issues.

----------------
(Perhaps "uri.gbif.org" as the virtual host name? http:/uri.gbif.org/institution-code/collection-id/number. We'd also like a URI for "the list of institutions" and for each institution "the list of collections". Perhaps reserve "meta"? Thus http:/uri.gbif.org/uq/meta, http:/uri.gbif.org/uq/collectionX/meta as the well-known locations for README information.)

(Allocation of URIs would cover more than just specimens. here at biodiversity.org.au, we use dotted names rather than slashes for our namespaces, meaning that our URIs have natural LSID equivalents. I think LSID componens can have slashes, so urn:lsid:uri.gbif.org:uq/collecitonX:12345)




If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email.


More information about the tdwg-tag mailing list