Yes. And I would qualify what you said as follows:
On Sat, Feb 25, 2012 at 4:55 AM, Roderic Page <
r.page@bio.gla.ac.uk> wrote:
Dear Dean,
In essence, yes, so long as we:
a) avoid collisions due to non-unique acronyms (hence we can't automatically
generate URIs from specimen codes without some fussing)
It's the function of the centralizing agency to ensure this when they
accept a "listing formula" from an organization. If the URI-generating
formula could result in a collision with an existing listing, the
formula would have to be renegotiated before being accepted for the
registry.
b) realise that we can't necessarily unpack a URI and use that to locate the
specimen (often we could, sometimes we won't be able to, in this sense the
identifiers are "opaque")
Yep. I'm all in favor of opaque, non-information-bearing identifiers.
The moment you accuse the text of the identifier of having intrinsic
meaning, you accept all the ugliness of figuring out how to "update"
the identifier when the underlying data are updated. [Real-life case
in point: some departments in our institution had a system of minting
specimen IDs based on the year of collection plus other digits. With
some frequency we discover that the specimens were actually collected
in some other year. So either: (a) we change the identifier
(unacceptable for all the reasons we know and love); or (b) we know
that we cannot trust the year-part of any identifier (so we used this
formula why?).]
c) avoid changing the URI if a specimen moves collection/institution or if
the host institution relabels it. Once minted the identifier doesn't change
(because that will break any links to it, defeating the point of having the
URIs).
Yes. It's supposed to be a non-data-bearing opaque identifier. In the
worse (but inevitable) case where specimens get additional
identifiers, or get subsampled into additional identifiable pieces,
there has to be a "synonymy" service that would (perhaps recursively)
return the other relevant identifiers. That would be (cough, cough)
trivial to implement as long as any subsequent identifier assignment
includes a reference to the already-existing identifier.
Regards
Rod
On 24 Feb 2012, at 17:29, Dean Pentcheff wrote:
This is directly in response to Rod's response to Paul. I think the two of
you may have just articulated nearly the same idea, though you seem not to
think you did.
Paul envisions institutions each declaring their own URI-creating formula
(to resolve down to a specimen at that institution), promulgated at a
"forum" location.
Rod envisions URI formulation as happening at a GBIFesque centralized site.
If Paul's forum were GBIF (or similar), with an added function that GBIF (or
similar) renegotiates any institutional declaration that collides with a
pre-existing declaration, does that map to the same thing for both of you?
-Dean
--
Dean Pentcheff
pentcheff@gmail.com
dpentche@nhm.org
On Fri, Feb 24, 2012 at 12:23 AM, Roderic Page <r.page@bio.gla.ac.uk> wrote:
Dear Paul,
A few quick comments.
Constructing URLs from specimen codes is a nice ideal, but in practise
breaks down because museum acronyms are not globally unique, and specimen
codes are not always unique within institutions (this is a big issue for
vertebrate collections where the same code may be a used for a fish, a herp,
a mammal, and a bird). So we need ways to disambiguate these. The Darwin
Core triplet I've been complaining about on my blog is one attempt to do
this by using collectionCodes as part of the specimen code. But these are
not terribly stable (a lot of the duplication in GBIF is due to museums
mucking about with collection codes).
I personally don't hold out much hope for museums being able to develop
and maintain rules for converting specimen codes into URIs. Let's be
realistic, most museums have no idea about the web beyond creating pretty
public interfaces. There are DiGiR servers at major museums running on
machines with no domain name, just an IP address.
I suspect it's going to be easier to delegate resolving specimens this to
something like GBIF. As a data consumer, I'd much prefer going to one place
and getting the codes resolved, rather than have to first figure out where
to go to find out the rule. If I want metadata for a scientific article I
go to CrossRef, not the individual publisher. Distributed begats
centralised.
I think not insisting on resolvable identifiers is a big mistake. It's
like saying it's OK to publish source code that you haven't actually
bothered to check whether it compiles. If they don't have to resolve I can
publish any identifier I want (witness the number of "fake" LSIDs in the
wild) and I've made zero commitment that it means anything. And you've taken
away the ability of the user to test whether your identifier is meaningful,
and thus build any degree of trust. The acid test of whether you are serious
is whether your identifiers are "live." The minute we say it's OK for them
to be unresolvable we are buggered.
Regards
Rod
On 24 Feb 2012, at 06:14, Paul Murray wrote:
On 23/02/2012, at 9:37 PM, Roderic Page wrote:
I've recently written an number of posts on the implications of the lack
of specimen-level identifiers, which makes it very hard to link different
sources of data together, such as GBIF and Genbank
http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html , and are
also a factor in creating duplicate records in GBIF
http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-really.html
This is definitely an issue. In AFD (which is not a specimen database), we
hold a "museum code" and an "accession number" for types specimens. Ideally,
I would like to be able to get from these two fields to a URI.
For instance, given the data
nameT typeTypeT museumT museumDesc accessonNo materialElement latLong
locality comments
Holothuria bivittata Mitsukuri, 1912 Syntype TIU Tokyo Imperial
University, Tokyo, Japan 1217 Okinawa, Riu Kiu and Yayeyana Ils, Japan
Holothuria bivittata Mitsukuri, 1912 Syntype TIU Tokyo Imperial
University, Tokyo, Japan 1218
I would like the AFD type specimen records (which are anonymous nodes in
our profile data) to point to
"http://collections.tiu.edu.jp/colleciton-X/1217" (or whatever), which could
be generated from the data we already have. The key is the individual
institutions holding collections.
The only way I can imagine this happening is for each institution with
collections to state "you construct URIs from our accession numbers like
so". With that declaration, stores exposing data (such as the boa silos) can
perform the mapping when the news reaches them. Once this is in place,
anyone handling (for instance) TIU accession numbers can publish correct
URIs in their RDF. Most particularly, other institutions accepting specimens
from TUI could publish that their new URI for the item is "owl:sameAs" the
TUI one. And the whole thing begins to knit together.
Importantly: it is not necessary to actually make these URIs resolvable.
Hopefully, one day there *would* be something at that URL which would issue
a 303 redirect, but the existence of the identifier as an identifier doesn't
rely on it. All that is needed is that commitment to the namespace on the
part of the issuer.
My point is first, that this can be done in stages, and doesn't depend on
everybody implementing a big and expensive solution right away or in
synchrony; and second, that we don't need a top-down assignment of
identifiers. A bottom-up solution can work. Perhaps the main thing missing
is a forum on which an institution can announce its creation and assignment
of a URI namespace for persistent identifiers.
Having said all that, Rod's point is about identification of individuals.
An accession number is put on a "token", of course, a given individual may
have many "tokens". A case in point is this record in AFD:
nameT typeTypeT museumT museumDesc accessonNo materialElement latLong
locality comments
Bregmaceros pseudolanceolatus Torii, Javonillo & Ozawa, 2004 Paratype URM
University of the Ryukyus, Nishihara, Okinawa, Japan P. 12156, 27508–27511,
29172, 29620, 33056
The type specimen has 8 URM accession numbers, and there's really no way
around that.
Even then, however, the question of identifying the individuals comes down
to the same solution: if it's to happen, then it will have to be done by the
curators of the collections - it's only the curators who actually know what
items are from the same individual. A third party generating UUIDs for all
these things just isn't going to work out - they won't get it right. What is
needed is for the curator to announce, for instance, "individuals shall be
identified by http://specimens.mymuseum.edu/<collection id>/<collector's
field number for the individual>". It really doesn't matter how the URIs are
done, as long as it's consistent, persistent, and public.
If you have received this transmission in error please notify us
immediately by return e-mail and delete all copies. If this e-mail or any
attachments have been sent to you in error, that error does not constitute
waiver of any confidentiality, privilege or copyright in respect of
information in the e-mail or attachments.
Please consider the environment before printing this email.
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
_______________________________________________
tdwg-tag mailing list
tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html