[tdwg-tag] Specimen identifiers [SEC=UNCLASSIFIED]

Dean Pentcheff pentcheff at gmail.com
Fri Feb 24 18:29:55 CET 2012


This is directly in response to Rod's response to Paul. I think the two of
you may have just articulated nearly the same idea, though you seem not to
think you did.

Paul envisions institutions each declaring their own URI-creating formula
(to resolve down to a specimen at that institution), promulgated at a
"forum" location.

Rod envisions URI formulation as happening at a GBIFesque centralized site.

If Paul's forum were GBIF (or similar), with an added function that GBIF
(or similar) renegotiates any institutional declaration that collides with
a pre-existing declaration, does that map to the same thing for both of you?

-Dean
-- 
Dean Pentcheff
pentcheff at gmail.com
dpentche at nhm.org

On Fri, Feb 24, 2012 at 12:23 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:

> Dear Paul,
>
> A few quick comments.
>
> Constructing URLs from specimen codes is a nice ideal, but in practise
> breaks down because museum acronyms are not globally unique, and specimen
> codes are not always unique within institutions (this is a big issue for
> vertebrate collections where the same code may be a used for a fish, a
> herp, a mammal, and a bird). So we need ways to disambiguate these. The
> Darwin Core triplet I've been complaining about on my blog is one attempt
> to do this by using collectionCodes as part of the specimen code. But these
> are not terribly stable (a lot of the duplication in GBIF is due to museums
> mucking about with collection codes).
>
> I personally don't hold out much hope for museums being able to develop
> and maintain rules for converting specimen codes into URIs. Let's be
> realistic, most museums have no idea about the web beyond creating pretty
> public interfaces. There are DiGiR servers at major museums running on
> machines with no domain name, just an IP address.
>
> I suspect it's going to be easier to delegate resolving specimens this to
> something like GBIF. As a data consumer, I'd much prefer going to one place
> and getting the codes resolved, rather than have to first figure out where
> to go to find out the rule.  If I want metadata for a scientific article I
> go to CrossRef, not the individual publisher. Distributed begats
> centralised.
>
> I think not insisting on resolvable identifiers is a big mistake. It's
> like saying it's OK to publish source code that you haven't actually
> bothered to check whether it compiles. If they don't have to resolve I can
> publish any identifier I want (witness the number of "fake" LSIDs in the
> wild) and I've made zero commitment that it means anything. And you've
> taken away the ability of the user to test whether your identifier is
> meaningful, and thus build any degree of trust. The acid test of whether
> you are serious is whether your identifiers are "live." The minute we say
> it's OK for them to be unresolvable we are buggered.
>
> Regards
>
> Rod
>
>
>
>
> On 24 Feb 2012, at 06:14, Paul Murray wrote:
>
>
> On 23/02/2012, at 9:37 PM, Roderic Page wrote:
>
> I've recently written an number of posts on the implications of the lack
> of specimen-level identifiers, which makes it very hard to link different
> sources of data together, such as GBIF and Genbank
> http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html , and
> are also a factor in creating duplicate records in GBIF
> http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-really.html
>
>
> This is definitely an issue. In AFD (which is not a specimen database), we
> hold a "museum code" and an "accession number" for types specimens.
> Ideally, I would like to be able to get from these two fields to a URI.
>
> For instance, given the data
> nameT typeTypeT museumT museumDesc accessonNo materialElement latLong
> locality comments
> Holothuria bivittata Mitsukuri, 1912 Syntype TIU Tokyo Imperial
> University, Tokyo, Japan 1217  Okinawa, Riu Kiu and Yayeyana Ils, Japan
> Holothuria bivittata Mitsukuri, 1912 Syntype TIU Tokyo Imperial
> University, Tokyo, Japan 1218
>
> I would like the AFD type specimen records (which are anonymous nodes in
> our profile data) to point to "
> http://collections.tiu.edu.jp/colleciton-X/1217" (or whatever), which
> could be generated from the data we already have. The key is the individual
> institutions holding collections.
>
> The only way I can imagine this happening is for each institution with
> collections to state "you construct URIs from our accession numbers like
> so". With that declaration, stores exposing data (such as the boa silos)
> can perform the mapping when the news reaches them. Once this is in place,
> anyone handling (for instance) TIU accession numbers can publish correct
> URIs in their RDF. Most particularly, other institutions accepting
> specimens from TUI could publish that their new URI for the item is
> "owl:sameAs" the TUI one. And the whole thing begins to knit together.
>
> Importantly: it is not necessary to actually make these URIs resolvable.
> Hopefully, one day there *would* be something at that URL which would issue
> a 303 redirect, but the existence of the identifier as an identifier
> doesn't rely on it. All that is needed is that commitment to the namespace
> on the part of the issuer.
>
> My point is first, that this can be done in stages, and doesn't depend on
> everybody implementing a big and expensive solution right away or in
> synchrony; and second, that we don't need a top-down assignment of
> identifiers. A bottom-up solution can work. Perhaps the main thing missing
> is a forum on which an institution can announce its creation and assignment
> of a URI namespace for persistent identifiers.
>
> Having said all that, Rod's point is about identification of individuals.
> An accession number is put on a "token", of course, a given individual may
> have many "tokens". A case in point is this record in AFD:
>
> nameT typeTypeT museumT museumDesc accessonNo materialElement latLong
> locality comments
> Bregmaceros pseudolanceolatus Torii, Javonillo & Ozawa, 2004 Paratype URM University
> of the Ryukyus, Nishihara, Okinawa, Japan P. 12156, 27508–27511, 29172,
> 29620, 33056
>
> The type specimen has 8 URM accession numbers, and there's really no way
> around that.
>
> Even then, however, the question of identifying the individuals comes down
> to the same solution: if it's to happen, then it will have to be done by
> the curators of the collections - it's only the curators who actually know
> what items are from the same individual. A third party generating UUIDs for
> all these things just isn't going to work out - they won't get it right.
> What is needed is for the curator to announce, for instance, "individuals
> shall be identified by http://specimens.mymuseum.edu/<collection
> id>/<collector's field number for the individual>". It really doesn't
> matter how the URIs are done, as long as it's consistent, persistent, and
> public.
>
>
>
> If you have received this transmission in error please notify us
> immediately by return e-mail and delete all copies. If this e-mail or any
> attachments have been sent to you in error, that error does not constitute
> waiver of any confidentiality, privilege or copyright in respect of
> information in the e-mail or attachments.
>
> Please consider the environment before printing this email.
>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> AIM: rodpage1962 at aim.com
> Facebook: http://www.facebook.com/profile.php?id=1112517192
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>
>
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20120224/4bc5615e/attachment-0001.html 


More information about the tdwg-tag mailing list