This is directly in response to Rod&#39;s response to Paul. I think the two of you may have just articulated nearly the same idea, though you seem not to think you did.<div><br></div><div>Paul envisions institutions each declaring their own URI-creating formula (to resolve down to a specimen at that institution), promulgated at a &quot;forum&quot; location.</div>


<div><br></div><div>Rod envisions URI formulation as happening at a GBIFesque centralized site.</div><div><br></div><div>If Paul&#39;s forum were GBIF (or similar), with an added function that GBIF (or similar) renegotiates any institutional declaration that collides with a pre-existing declaration, does that map to the same thing for both of you?</div>


<div><br clear="all">-Dean<br>-- <br>Dean Pentcheff<br><a href="mailto:pentcheff@gmail.com">pentcheff@gmail.com</a><br><a href="mailto:dpentche@nhm.org">dpentche@nhm.org</a><br>

<br><div class="gmail_quote">On Fri, Feb 24, 2012 at 12:23 AM, Roderic Page <span dir="ltr">&lt;<a href="mailto:r.page@bio.gla.ac.uk">r.page@bio.gla.ac.uk</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style="word-wrap:break-word">Dear Paul,<div><br></div><div>A few quick comments. </div><div><br></div><div>Constructing URLs from specimen codes is a nice ideal, but in practise breaks down because museum acronyms are not globally unique, and specimen codes are not always unique within institutions (this is a big issue for vertebrate collections where the same code may be a used for a fish, a herp, a mammal, and a bird). So we need ways to disambiguate these. The Darwin Core triplet I&#39;ve been complaining about on my blog is one attempt to do this by using collectionCodes as part of the specimen code. But these are not terribly stable (a lot of the duplication in GBIF is due to museums mucking about with collection codes).</div>


<div><br></div><div>I personally don&#39;t hold out much hope for museums being able to develop and maintain rules for converting specimen codes into URIs. Let&#39;s be realistic, most museums have no idea about the web beyond creating pretty public interfaces. There are DiGiR servers at major museums running on machines with no domain name, just an IP address. </div>


<div><br></div><div>I suspect it&#39;s going to be easier to delegate resolving specimens this to something like GBIF. As a data consumer, I&#39;d much prefer going to one place and getting the codes resolved, rather than have to first figure out where to go to find out the rule.  If I want metadata for a scientific article I go to CrossRef, not the individual publisher. Distributed begats centralised.</div>


<div><br></div><div>I think not insisting on resolvable identifiers is a big mistake. It&#39;s like saying it&#39;s OK to publish source code that you haven&#39;t actually bothered to check whether it compiles. If they don&#39;t have to resolve I can publish any identifier I want (witness the number of &quot;fake&quot; LSIDs in the wild) and I&#39;ve made zero commitment that it means anything. And you&#39;ve taken away the ability of the user to test whether your identifier is meaningful, and thus build any degree of trust. The acid test of whether you are serious is whether your identifiers are &quot;live.&quot; The minute we say it&#39;s OK for them to be unresolvable we are buggered. </div>


<div><br></div><div>Regards</div><div><br></div><div>Rod</div><div><br></div><div><br></div><div><br></div><div><div><div class="h5"><br><div><div>On 24 Feb 2012, at 06:14, Paul Murray wrote:</div><br><blockquote type="cite">


<div><br>On 23/02/2012, at 9:37 PM, Roderic Page wrote:<br><br><blockquote type="cite">I&#39;ve recently written an number of posts on the implications of the lack of specimen-level identifiers, which makes it very hard to link different sources of data together, such as GBIF and Genbank <a href="http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html" target="_blank">http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html</a> , and are also a factor in creating duplicate records in GBIF <a href="http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-really.html" target="_blank">http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-really.html</a><br>


</blockquote><br>This is definitely an issue. In AFD (which is not a specimen database), we hold a &quot;museum code&quot; and an &quot;accession number&quot; for types specimens. Ideally, I would like to be able to get from these two fields to a URI.<br>


<br>For instance, given the data<br>nameT<span style="white-space:pre-wrap">        </span>typeTypeT<span style="white-space:pre-wrap">        </span>museumT<span style="white-space:pre-wrap">        </span>museumDesc<span style="white-space:pre-wrap">        </span>accessonNo<span style="white-space:pre-wrap">        </span>materialElement<span style="white-space:pre-wrap">        </span>latLong<span style="white-space:pre-wrap">        </span>locality<span style="white-space:pre-wrap">        </span>comments<br>


Holothuria bivittata Mitsukuri, 1912<span style="white-space:pre-wrap">        </span>Syntype<span style="white-space:pre-wrap">        </span>TIU<span style="white-space:pre-wrap">        </span>Tokyo Imperial University, Tokyo, Japan<span style="white-space:pre-wrap">        </span>1217<span style="white-space:pre-wrap">        </span><span style="white-space:pre-wrap">        </span><span style="white-space:pre-wrap">        </span>Okinawa, Riu Kiu and Yayeyana Ils, Japan<span style="white-space:pre-wrap">        </span><br>


Holothuria bivittata Mitsukuri, 1912<span style="white-space:pre-wrap">        </span>Syntype<span style="white-space:pre-wrap">        </span>TIU<span style="white-space:pre-wrap">        </span>Tokyo Imperial University, Tokyo, Japan<span style="white-space:pre-wrap">        </span>1218<br>


<br>I would like the AFD type specimen records (which are anonymous nodes in our profile data) to point to &quot;<a href="http://collections.tiu.edu.jp/colleciton-X/1217" target="_blank">http://collections.tiu.edu.jp/colleciton-X/1217</a>&quot; (or whatever), which could be generated from the data we already have. The key is the individual institutions holding collections.<br>


<br>The only way I can imagine this happening is for each institution with collections to state &quot;you construct URIs from our accession numbers like so&quot;. With that declaration, stores exposing data (such as the boa silos) can perform the mapping when the news reaches them. Once this is in place, anyone handling (for instance) TIU accession numbers can publish correct URIs in their RDF. Most particularly, other institutions accepting specimens from TUI could publish that their new URI for the item is &quot;owl:sameAs&quot; the TUI one. And the whole thing begins to knit together.<br>


<br>Importantly: it is not necessary to actually make these URIs resolvable. Hopefully, one day there *would* be something at that URL which would issue a 303 redirect, but the existence of the identifier as an identifier doesn&#39;t rely on it. All that is needed is that commitment to the namespace on the part of the issuer.<br>


<br>My point is first, that this can be done in stages, and doesn&#39;t depend on everybody implementing a big and expensive solution right away or in synchrony; and second, that we don&#39;t need a top-down assignment of identifiers. A bottom-up solution can work. Perhaps the main thing missing is a forum on which an institution can announce its creation and assignment of a URI namespace for persistent identifiers.<br>


<br>Having said all that, Rod&#39;s point is about identification of individuals. An accession number is put on a &quot;token&quot;, of course, a given individual may have many &quot;tokens&quot;. A case in point is this record in AFD:<br>


<br>nameT<span style="white-space:pre-wrap">        </span>typeTypeT<span style="white-space:pre-wrap">        </span>museumT<span style="white-space:pre-wrap">        </span>museumDesc<span style="white-space:pre-wrap">        </span>accessonNo<span style="white-space:pre-wrap">        </span>materialElement<span style="white-space:pre-wrap">        </span>latLong<span style="white-space:pre-wrap">        </span>locality<span style="white-space:pre-wrap">        </span>comments<br>


Bregmaceros pseudolanceolatus Torii, Javonillo &amp; Ozawa, 2004<span style="white-space:pre-wrap">        </span>Paratype<span style="white-space:pre-wrap">        </span>URM<span style="white-space:pre-wrap">        </span>University of the Ryukyus, Nishihara, Okinawa, Japan<span style="white-space:pre-wrap">        </span>P. 12156, 27508–27511, 29172, 29620, 33056<br>


<br>The type specimen has 8 URM accession numbers, and there&#39;s really no way around that.<br><br>Even then, however, the question of identifying the individuals comes down to the same solution: if it&#39;s to happen, then it will have to be done by the curators of the collections - it&#39;s only the curators who actually know what items are from the same individual. A third party generating UUIDs for all these things just isn&#39;t going to work out - they won&#39;t get it right. What is needed is for the curator to announce, for instance, &quot;individuals shall be identified by <a href="http://specimens.mymuseum.edu" target="_blank">http://specimens.mymuseum.edu</a>/&lt;collection id&gt;/&lt;collector&#39;s field number for the individual&gt;&quot;. It really doesn&#39;t matter how the URIs are done, as long as it&#39;s consistent, persistent, and public.<br>


<br><br><br>If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.<br>


<br>Please consider the environment before printing this email.<br><br></div></blockquote></div><br></div></div><div>

<span style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;border-collapse:separate;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px">---------------------------------------------------------<br>


Roderic Page<br>Professor of Taxonomy<br>Institute of Biodiversity, Animal Health and Comparative Medicine<br>College of Medical, Veterinary and Life Sciences<br>Graham Kerr Building<br>University of Glasgow<br>Glasgow G12 8QQ, UK<br>


<br>Email: <a href="mailto:r.page@bio.gla.ac.uk" target="_blank">r.page@bio.gla.ac.uk</a><br>Tel: <a href="tel:%2B44%20141%20330%204778" value="+441413304778" target="_blank">+44 141 330 4778</a><br>Fax: <a href="tel:%2B44%20141%20330%202792" value="+441413302792" target="_blank">+44 141 330 2792</a><br>


AIM: <a href="mailto:rodpage1962@aim.com" target="_blank">rodpage1962@aim.com</a><br>Facebook: <a href="http://www.facebook.com/profile.php?id=1112517192" target="_blank">http://www.facebook.com/profile.php?id=1112517192</a><br>


Twitter: <a href="http://twitter.com/rdmpage" target="_blank">http://twitter.com/rdmpage</a><br>Blog: <a href="http://iphylo.blogspot.com" target="_blank">http://iphylo.blogspot.com</a><br>Home page: <a href="http://taxonomy.zoology.gla.ac.uk/rod/rod.html" target="_blank">http://taxonomy.zoology.gla.ac.uk/rod/rod.html</a><br>


</span>

</div>

<br></div></div><br>_______________________________________________<br>

tdwg-tag mailing list<br>

<a href="mailto:tdwg-tag@lists.tdwg.org">tdwg-tag@lists.tdwg.org</a><br>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-tag" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-tag</a><br>

<br></blockquote></div><br></div>