
I certainly agree; but we've been talking about shared identifiers for more than twenty years now, and that hasn't gotten us anywhere either. In fact, the focus on that goal may have hampered or delayed the development of the "matrix". I used to be a HUGE proponent of shared identifiers, back in the day. It seemed like the obvious answer to data integration. But I've come to realize that the silos exist, and each silo has its own way of doing things that relies on their own internal identifiers, and in most cases, the silos don't have the resources (or the motivation) to update their systems to incorporate shared identifiers. Speaking as someone who manages the natural sciences data resources for a Museum, I CERTAINLY wouldn't want to make that sort of investment unless there were universally acknowledged identifiers for the data objects I manage (which, for most of our classes of data objects, there decidedly is not). I'd much rather leave my legacy systems in place, then build a small indexing system that cross-links my own identifiers to the ones that exist "out there" (TSNs, LSIDs, DOIs, OCLC, ISSN/ISBN, etc.). If others do the same, then I see that as a first step towards building a bridge between the silos. By far, the most arduous part in the process is reconciling one's own identifiers against the identifiers of other data sources. This is particularly messy for items that are otherwise identified with "noisy text" - like taxon names, people names, place names, and literature citations (which, collectively, represent by far the bulk of the records that overlap among datasets, and hence are the identifiers we stand the most to gain from by sharing the same identifiers. This work would have to be done anyway - regardless of whether we aimed for a shared identifier approach, or a cross-mapped identifier approach. With the mapped-identifier approach, that's almost all the work that needs to be done. With the shared identifier approach, it's only part of the work that would need to be done - the other part is to upgrade all the applications used by all the silos to incorporate the shared identifers. And, of course, there's the problem with actually converging on what the shared identifiers are. It is indeed wishful thinking that by building the infrastructure, the identifiers will coalesce. However, it is also wishful thinking that all the players will ever come to an agreement on what the "one true" identifier is for shared objects, and moreover devote the necessary resources to convert existing systems to accommodate them. I've come to believe that the former wishful thinking is more plausible (if only slightly) than the latter wishful thinking. Also, the latter has had more time to demonstrate its infeasibility. Aloha, Rich From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Thursday, February 23, 2012 11:54 AM To: TDWG TAG Cc: Kevin Richards; Richard Pyle Subject: Re: [tdwg-tag] Specimen identifiers Dear Rich, I guess I'd argue the reverse, in that pumping data out with unique identifiers demonstrably doesn't get us very far. We've had "globally unique", "persistent", and "actionable" identifiers for years (LSIDs, URLs, DOIs, etc.) and very little to show for it. In other words, there isn't a biodiversity informatics "matrix" to "plug in" to. Building a "cross-mapping service" is not necessarily simple, again because the individual data providers rarely use existing identifiers for things outside their domain. Hence we have text strings for literature when perfectly good identifiers exist. The benefits of the "matrix" come from the links, and we aren't providing them. The notion this is all going to magically coalesce at some unspecified point in the future strikes me as wishful thinking. Someone is soon going to point out that the Emperor has no clothes... Regards Rod On 23 Feb 2012, at 21:09, Richard Pyle wrote: Hi All, As I've said many times before, the "shared" bit is useful, but far less important than the "globally unique", "persistent", and "actionable" bits. As Kevin says, we can handle the non-shared GUIDs (as long as they meet the other three criteria) by simply building a cross-mapping service; but that's only useful to the extent that the identifiers are truly unique, persistent, and actionable (in that order of importance). Once we have a real infrastructure that achieves critical mass of adoption for integrating the silos, then I'm sure eventually our community will converge toward shared identifiers (specifically, towards the ones that are most robustly persistent, and provide the best services when actioned upon), and the superfluous identifiers will eventually fade into becoming historical metadata (like NODC numbers in the context of ITIS). But without an infrastructure to get people to come out of their silos and "plug in" to the biodiversity informatics "matrix", it's unlikely that we'll ever get to the point of collapsing multiple identical GUIDs into a single shared GUID for the same object. Aloha, Rich -----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag- bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Thursday, February 23, 2012 9:41 AM To: Roderic Page; TDWG TAG Subject: Re: [tdwg-tag] Specimen identifiers I agree Rod, it would be ideal to have unique, shared identifiers for specimens, and as many other types of data as possible. The problem here is the "shared" bit. This is what most people hope for and hoped would come out of all the GUID and vocabulary work that has been done. But you know how hard it is to get different projects, organisations, datasets to really share IDs. Pretty much impossible, so I have moved on from this dream and hope to solve this more by linkages, linked data type approaches instead. Another problem is what the identifier refers to. As someone (I think Rich) said in a recent post, two different people may apply the same identifier to slightly different things - eg to the "name" of a person, or to the "person" itself. This is another barrier to reuse of shared identifiers. You may think that specimens should be very simple, it is just a specimen that you refer to, but there can be subtle differences, for example if someone has data about the accessioned physical specimen and another has an image of that specimen - they could both well say that they are discussing the same specimen so give these two "different" objects the same identifier. Kevin -----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag- bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, 23 February 2012 11:38 p.m. To: TDWG TAG Subject: [tdwg-tag] Specimen identifiers I've recently written an number of posts on the implications of the lack of specimen-level identifiers, which makes it very hard to link different sources of data together, such as GBIF and Genbank http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html , and are also a factor in creating duplicate records in GBIF http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif- really.html I know this is something of a hobby horse of mine, but we can have all the wonderful ontologies and vocabularies we want, if we don't have globally unique, shared identifiers to glue this stuff together we are going to find ourselves making yet more silos... Regards Rod --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum. --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html