[tdwg-tag] Specimen identifiers [SEC=UNCLASSIFIED]

Sat Feb 25 13:46:59 CET 2012

Dear Bob,

Perhaps I used "trust" a little too loosely. It's not so much whether I trust your content (a whole separate question), it's whether I can trust your identifiers.

Put another way, if identifiers are cheap to create, and there's no expectation that they resolve, then we can end up with identifiers that have no value, in which case why would I use them?

I "trust" DOIs because they tend to work, they cost money, and the agencies that issue them frown on them not working. Hence it's unlikely that someone is going to use one to identify some data and make no commitment that the identifier will resolve, and that it will resolve to something useful. Given that, I'm more confident of linking my data to a DOI than, say, a URL from a publisher's web site.

Given that I want to link stuff together I am reliant on using other people's identifiers to make those links. If those identifiers are labile then my hard work may be all for nought. So I need some way of judging whether an identifier is likely to persist or not (this may influence whether I decide to rely on the external resource being around, or whether I cache it locally, for example).

So I guess I'm using the resolvability of identifiers as a proxy of whether to take someone seriously or not. If you can't be bothered to make them resolvable, then you clearly don't value your own content, and therefore why should I?

Regards

Rod

On 24 Feb 2012, at 15:30, Bob Morris wrote:

> On Fri, Feb 24, 2012 at 3:23 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:
>> [...]
>> 
>> 
>> I think not insisting on resolvable identifiers is a big mistake. It's like
>> saying it's OK to publish source code that you haven't actually bothered to
>> check whether it compiles. If they don't have to resolve I can publish any
>> identifier I want (witness the number of "fake" LSIDs in the wild) and I've
>> made zero commitment that it means anything. And you've taken away the
>> ability of the user to test whether your identifier is meaningful, and thus
>> build any degree of trust. The acid test of whether you are serious is
>> whether your identifiers are "live." The minute we say it's OK for them to
>> be unresolvable we are buggered.
> 
> Rod-
> 
> First, please forgive me for accusing you of thinking like a human.
> :-)  Well of course I mean, thinking like a human about problems which
> have to be solved by machines.
> 
> Second, I agree with you that identifiers should be resolvable, but it
> is neither universally necessary nor does it always solve the problems
> one hopes.  IMO for science data, the desire for resolution and
> dereferencing arises from the replicability practices in science that
> essentially require that original data supporting claims should always
> be examinable by third parties. But to a machine, this is not the only
> way, sometimes not even the best, way to solve some problems that
> dereferencing solves..  One alternative in information science lies in
> the theory and practice of software trust relationships, which,
> happily, often models the similarly named human theory and practice.
> 
> For example, to start with your analogy, there are important real
> cases where it is not actually necessary to compile your code to come
> to a belief that it is compilable.  That case is the one where the
> source code has been generated by another program that is "known" to
> generate only compilable code.
> 
> Closer to the discussion at hand,  consider a message that arrives at
> a software agent and whose content is, in human terms:
> 
> 1. The URI http;//md5.hash/fb3d0c347e2c602f4ec650c0e777c1d3
> designates specimen with accession number 3251 at the Harvard
> University Herbaria.
>  2. There are no other specimens at the Harvard University Herbaria
> with that accession number and never have been.
>  3. As of Fri Feb 24 14:36:45 UTC 2012 the most recent determination
> carried in the Harvard records for this specimen is Aus bus.
>  4. My name is Roderic Page
> http;//md5.hash/7cee01cb3cff705f850d15c357767ca0 and I approved this
> message.
>  5. This message has MD5 hash code 88f1c348afea5082f1f375910fe814f3 .
> 
> Even if NONE of the identifiers in the above are resolvable or
> dereferencable, and whether or not there is a dereferencable
> identifier at all for the specimen mentioned, there are scenarios in
> which the above kind of message is at least a trustworthy as
> information delivered via an http request based on an identifier for
> the specimen itself.
> 
> Going to the primary sources is a time honored scientific and
> scholarly practice---and following a community's human practices, can,
> if done with great care, produce more usable and trustworthy software
> than following the practices of software engineers--- but so is the
> use of trusted secondary sources, and the latter serve many purposes
> well. Hey, Rod, why do you think I read iPhylo at all?  :-)  Anyway,
> on the internet,  \all/ acquisition of data and information is
> mediated by software, so in the end, trust by humans or software in
> the assertions about the real world that are delivered on the internet
> should never depend alone on whether the identifiers are resolvable
> and dereferencable.
> 
> Inside joke: in the message above, assertion 5 is the only one that
> would always have a very low probability of being correct.  Why?
> 
> The Wikipedia plot summary of Borges' "The Library of Babel" ends with
> the wonderful paragraph, perhaps appropriate to tdwg-tag:
> 
> "Despite — indeed, because of — this glut of information, all books
> are totally useless to the reader, leaving the librarians in a state
> of suicidal despair. This leads some librarians to superstitions and
> cult-like behaviour, such as the "Purifiers", who arbitrarily destroy
> books they deem nonsense as they scour through the library seeking the
> "Crimson Hexagon" and its illustrated, magical books. Another is the
> belief that since all books exist in the library, somewhere one of the
> books must be a perfect index of the library's contents; some even
> believe that a messianic figure known as the "Man of the Book" has
> read it, and they travel through the library seeking him."
>      http://en.wikipedia.org/w/index.php?title=Special:Cite&page=The_Library_of_Babel&id=473083545
> 
> ---
> 
> Bob Morris
> -- 
> Robert A. Morris
> 
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> 
> IT Staff
> Filtered Push Project
> Harvard University Herbaria
> Harvard University
> 
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> ===
> The content of this communication is made entirely on my
> own behalf and in no way should be deemed to express
> official positions of The University of Massachusetts at Boston or
> Harvard University.
> 

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20120225/c8155dfc/attachment.html