Re: [tdwg-tag] Specimen identifiers

23 Feb 2012

      I certainly agree; but we've been talking about shared identifiers for more
than twenty years now, and that hasn't gotten us anywhere either.  In fact,
the focus on that goal may have hampered or delayed the development of the
"matrix".

I used to be a HUGE proponent of shared identifiers, back in the day.  It
seemed like the obvious answer to data integration.  But I've come to
realize that the silos exist, and each silo has its own way of doing things
that relies on their own internal identifiers, and in most cases, the silos
don't have the resources (or the motivation) to update their systems to
incorporate shared identifiers. Speaking as someone who manages the natural
sciences data resources for a Museum, I CERTAINLY wouldn't want to make that
sort of investment unless there were universally acknowledged identifiers
for the data objects I manage (which, for most of our classes of data
objects, there decidedly is not). I'd much rather leave my legacy systems in
place, then build a small indexing system that cross-links my own
identifiers to the ones that exist "out there" (TSNs, LSIDs, DOIs, OCLC,
ISSN/ISBN, etc.).  If others do the same, then I see that as a first step
towards building a bridge between the silos.

By far, the most arduous part in the process is reconciling one's own
identifiers against the identifiers of other data sources.  This is
particularly messy for items that are otherwise identified with "noisy text"
- like taxon names, people names, place names, and literature citations
(which, collectively, represent by far the bulk of the records that overlap
among datasets, and hence are the identifiers we stand the most to gain from
by sharing the same identifiers.  This work would have to be done anyway -
regardless of whether we aimed for a shared identifier approach, or a
cross-mapped identifier approach.  With the mapped-identifier approach,
that's almost all the work that needs to be done.  With the shared
identifier approach, it's only part of the work that would need to be done -
the other part is to upgrade all the applications used by all the silos to
incorporate the shared identifers.  And, of course, there's the problem with
actually converging on what the shared identifiers are.

It is indeed wishful thinking that by building the infrastructure, the
identifiers will coalesce.  However, it is also wishful thinking that all
the players will ever come to an agreement on what the "one true" identifier
is for shared objects, and moreover devote the necessary resources to
convert existing systems to accommodate them.  I've come to believe that the
former wishful thinking is more plausible (if only slightly) than the latter
wishful thinking.  Also, the latter has had more time to demonstrate its
infeasibility.

Aloha,

Rich

From: Roderic Page [mailto:r.page@bio.gla.ac.uk] 
Sent: Thursday, February 23, 2012 11:54 AM
To: TDWG TAG
Cc: Kevin Richards; Richard Pyle
Subject: Re: [tdwg-tag] Specimen identifiers

Dear Rich,

I guess I'd argue the reverse, in that pumping data out with unique
identifiers demonstrably doesn't get us very far. We've had "globally
unique", "persistent", and "actionable"  identifiers for years (LSIDs, URLs,
DOIs, etc.) and very little to show for it. In other words, there isn't a
biodiversity informatics "matrix" to "plug in" to. Building a "cross-mapping
service" is not necessarily simple, again because the individual data
providers rarely use existing identifiers for things outside their domain.
Hence we have text strings for literature when perfectly good identifiers
exist.

The benefits of the "matrix" come from the links, and we aren't providing
them. The notion this is all going to magically coalesce at some unspecified
point in the future strikes me as wishful thinking. Someone is soon going to
point out that the Emperor has no clothes...

Regards

Rod

On 23 Feb 2012, at 21:09, Richard Pyle wrote:

Hi All,

As I've said many times before, the "shared" bit is useful, but far less
important than the "globally unique", "persistent", and "actionable" bits.
As Kevin says, we can handle the non-shared GUIDs (as long as they meet the
other three criteria) by simply building a cross-mapping service; but that's
only useful to the extent that the identifiers are truly unique, persistent,
and actionable (in that order of importance).

Once we have a real infrastructure that achieves critical mass of adoption
for integrating the silos, then I'm sure eventually our community will
converge toward shared identifiers (specifically, towards the ones that are
most robustly persistent, and provide the best services when actioned upon),
and the superfluous identifiers will eventually fade into becoming
historical metadata (like NODC numbers in the context of ITIS).

But without an infrastructure to get people to come out of their silos and
"plug in" to the biodiversity informatics "matrix", it's unlikely that we'll
ever get to the point of collapsing multiple identical GUIDs into a single
shared GUID for the same object.

Aloha,
Rich

-----Original Message-----

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-

bounces@lists.tdwg.org] On Behalf Of Kevin Richards

Sent: Thursday, February 23, 2012 9:41 AM

To: Roderic Page; TDWG TAG

Subject: Re: [tdwg-tag] Specimen identifiers

I agree Rod, it would be ideal to have unique, shared identifiers for

specimens, and as many other types of data as possible.

The problem here is the "shared" bit.  This is what most people hope for

and

hoped would come out of all the GUID and vocabulary work that has been

done.  But you know how hard it is to get different projects,

organisations,

datasets to really share IDs.  Pretty much impossible, so I have moved on

from this dream and hope to solve this more by linkages, linked data type

approaches instead.

Another problem is what the identifier refers to.  As someone (I think

Rich)

said in a recent post, two different people may apply the same identifier

to

slightly different things - eg to the "name" of a person, or to the

"person"

itself.  This is another barrier to reuse of shared identifiers.

You may think that specimens should be very simple, it is just a specimen

that you refer to, but there can be subtle differences, for example if

someone has data about the accessioned physical specimen and another has

an image of that specimen - they could both well say that they are

discussing

the same specimen so give these two "different" objects the same

identifier.

Kevin

-----Original Message-----

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-

bounces@lists.tdwg.org] On Behalf Of Roderic Page

Sent: Thursday, 23 February 2012 11:38 p.m.

To: TDWG TAG

Subject: [tdwg-tag] Specimen identifiers

I've recently written an number of posts on the implications of the lack

of

specimen-level identifiers, which makes it very hard to link different

sources

of data together, such as GBIF and Genbank

http://iphylo.blogspot.com/2012/02/linking-gbif-and-genbank.html , and are

also a factor in creating duplicate records in GBIF

http://iphylo.blogspot.com/2012/02/how-many-specimens-does-gbif-

really.html

I know this is something of a hobby horse of mine, but we can have all the

wonderful ontologies and vocabularies we want, if we don't have globally

unique, shared identifiers to glue this stuff together we are going to

find

ourselves making yet more silos...

Regards

Rod

---------------------------------------------------------

Roderic Page

Professor of Taxonomy

Institute of Biodiversity, Animal Health and Comparative Medicine College

of

Medical, Veterinary and Life Sciences Graham Kerr Building University of

Glasgow Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk

Tel: +44 141 330 4778

Fax: +44 141 330 2792

AIM: rodpage1962@aim.com

Facebook: http://www.facebook.com/profile.php?id=1112517192

Twitter: http://twitter.com/rdmpage

Blog: http://iphylo.blogspot.com

Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

_______________________________________________

tdwg-tag mailing list

tdwg-tag@lists.tdwg.org

http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Please consider the environment before printing this email

Warning:  This electronic message together with any attachments is

confidential. If you receive it in error: (i) you must not read, use,

disclose,

copy or retain it; (ii) please contact the sender immediately by reply

email

and then delete the emails.

The views expressed in this email may not be those of Landcare Research

New Zealand Limited. http://www.landcareresearch.co.nz

_______________________________________________

tdwg-tag mailing list

tdwg-tag@lists.tdwg.org

http://lists.tdwg.org/mailman/listinfo/tdwg-tag

This message is only intended for the addressee named above.  Its contents
may be privileged or otherwise protected.  Any unauthorized use, disclosure
or copying of this message or its contents is prohibited.  If you have
received this message by mistake, please notify us immediately by reply mail
or by collect telephone call.  Any personal opinions expressed in this
message do not necessarily represent the views of the Bishop Museum.

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html