[tdwg-tag] Specimen identifiers
Richard Pyle
deepreef at bishopmuseum.org
Thu Feb 23 23:12:52 CET 2012
I certainly agree; but we've been talking about shared identifiers for more
than twenty years now, and that hasn't gotten us anywhere either. In fact,
the focus on that goal may have hampered or delayed the development of the
"matrix".
I used to be a HUGE proponent of shared identifiers, back in the day. It
seemed like the obvious answer to data integration. But I've come to
realize that the silos exist, and each silo has its own way of doing things
that relies on their own internal identifiers, and in most cases, the silos
don't have the resources (or the motivation) to update their systems to
incorporate shared identifiers. Speaking as someone who manages the natural
sciences data resources for a Museum, I CERTAINLY wouldn't want to make that
sort of investment unless there were universally acknowledged identifiers
for the data objects I manage (which, for most of our classes of data
objects, there decidedly is not). I'd much rather leave my legacy systems in
place, then build a small indexing system that cross-links my own
identifiers to the ones that exist "out there" (TSNs, LSIDs, DOIs, OCLC,
ISSN/ISBN, etc.). If others do the same, then I see that as a first step
towards building a bridge between the silos.
By far, the most arduous part in the process is reconciling one's own
identifiers against the identifiers of other data sources. This is
particularly messy for items that are otherwise identified with "noisy text"
- like taxon names, people names, place names, and literature citations
(which, collectively, represent by far the bulk of the records that overlap
among datasets, and hence are the identifiers we stand the most to gain from
by sharing the same identifiers. This work would have to be done anyway -
regardless of whether we aimed for a shared identifier approach, or a
cross-mapped identifier approach. With the mapped-identifier approach,
that's almost all the work that needs to be done. With the shared
identifier approach, it's only part of the work that would need to be done -
the other part is to upgrade all the applications used by all the silos to
incorporate the shared identifers. And, of course, there's the problem with
actually converging on what the shared identifiers are.
It is indeed wishful thinking that by building the infrastructure, the
identifiers will coalesce. However, it is also wishful thinking that all
the players will ever come to an agreement on what the "one true" identifier
is for shared objects, and moreover devote the necessary resources to
convert existing systems to accommodate them. I've come to believe that the
former wishful thinking is more plausible (if only slightly) than the latter
wishful thinking. Also, the latter has had more time to demonstrate its
infeasibility.
Aloha,
Rich
From: Roderic Page [mailto:r.page at bio.gla.ac.uk]
Sent: Thursday, February 23, 2012 11:54 AM
To: TDWG TAG
Cc: Kevin Richards; Richard Pyle
Subject: Re: [tdwg-tag] Specimen identifiers
Dear Rich,
I guess I'd argue the reverse, in that pumping data out with unique
identifiers demonstrably doesn't get us very far. We've had "globally
unique", "persistent", and "actionable" identifiers for years (LSIDs, URLs,
DOIs, etc.) and very little to show for it. In other words, there isn't a
biodiversity informatics "matrix" to "plug in" to. Building a "cross-mapping
service" is not necessarily simple, again because the individual data
providers rarely use existing identifiers for things outside their domain.
Hence we have text strings for literature when perfectly good identifiers
exist.
The benefits of the "matrix" come from the links, and we aren't providing
them. The notion this is all going to magically coalesce at some unspecified
point in the future strikes me as wishful thinking. Someone is soon going to
point out that the Emperor has no clothes...
Regards
Rod
On 23 Feb 2012, at 21:09, Richard Pyle wrote:
Hi All,
As I've said many times before, the "shared" bit is useful, but far less
important than the "globally unique", "persistent", and "actionable" bits.
As Kevin says, we can handle the non-shared GUIDs (as long as they meet the
other three criteria) by simply building a cross-mapping service; but that's
only useful to the extent that the identifiers are truly unique, persistent,
and actionable (in that order of importance).
Once we have a real infrastructure that achieves critical mass of adoption
for integrating the silos, then I'm sure eventually our community will
converge toward shared identifiers (specifically, towards the ones that are
most robustly persistent, and provide the best services when actioned upon),
and the superfluous identifiers will eventually fade into becoming
historical metadata (like NODC numbers in the context of ITIS).
But without an infrastructure to get people to come out of their silos and
"plug in" to the biodiversity informatics "matrix", it's unlikely that we'll
ever get to the point of collapsing multiple identical GUIDs into a single
shared GUID for the same object.
Aloha,
Rich
-----Original Message-----
From: tdwg-tag-bounces at lists.tdwg.org [mailto:tdwg-tag-
bounces at lists.tdwg.org] On Behalf Of Kevin Richards
Sent: Thursday, February 23, 2012 9:41 AM
To: Roderic Page; TDWG TAG
Subject: Re: [tdwg-tag] Specimen identifiers
I agree Rod, it would be ideal to have unique, shared identifiers for
specimens, and as many other types of data as possible.
The problem here is the "shared" bit. This is what most people hope for
and
hoped would come out of all the GUID and vocabulary work that has been
done. But you know how hard it is to get different projects,
organisations,
datasets to really share IDs. Pretty much impossible, so I have moved on
from this dream and hope to solve this more by linkages, linked data type
approaches instead.
Another problem is what the identifier refers to. As someone (I think
Rich)
said in a recent post, two different people may apply the same identifier
to
slightly different things - eg to the "name" of a person, or to the
"person"
itself. This is another barrier to reuse of shared identifiers.
You may think that specimens should be very simple, it is just a specimen
that you refer to, but there can be subtle differences, for example if
someone has data about the accessioned physical specimen and another has
an image of that specimen - they could both well say that they are
discussing
the same specimen so give these two "different" objects the same
identifier.
Kevin
-----Original Message-----
From: tdwg-tag-bounces at lists.tdwg.org [mailto:tdwg-tag-
bounces at lists.tdwg.org] On Behalf Of Roderic Page
Sent: Thursday, 23 February 2012 11:38 p.m.
To: TDWG TAG
Subject: [tdwg-tag] Specimen identifiers
I've recently written an number of posts on the implications of the lack
of
specimen-level identifiers, which makes it very hard to link different
sources
of data together, such as GBIF and Genbank
http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html , and are
also a factor in creating duplicate records in GBIF
http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-
really.html
I know this is something of a hobby horse of mine, but we can have all the
wonderful ontologies and vocabularies we want, if we don't have globally
unique, shared identifiers to glue this stuff together we are going to
find
ourselves making yet more silos...
Regards
Rod
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine College
of
Medical, Veterinary and Life Sciences Graham Kerr Building University of
Glasgow Glasgow G12 8QQ, UK
Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
_______________________________________________
tdwg-tag mailing list
tdwg-tag at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Please consider the environment before printing this email
Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use,
disclose,
copy or retain it; (ii) please contact the sender immediately by reply
email
and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
_______________________________________________
tdwg-tag mailing list
tdwg-tag at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
This message is only intended for the addressee named above. Its contents
may be privileged or otherwise protected. Any unauthorized use, disclosure
or copying of this message or its contents is prohibited. If you have
received this message by mistake, please notify us immediately by reply mail
or by collect telephone call. Any personal opinions expressed in this
message do not necessarily represent the views of the Bishop Museum.
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20120223/d4e02b35/attachment-0001.html
More information about the tdwg-tag
mailing list