[tdwg-tag] Specimen identifiers [SEC=UNCLASSIFIED]

Bob Morris morris.bob at gmail.com
Fri Feb 24 16:30:16 CET 2012

On Fri, Feb 24, 2012 at 3:23 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:
> [...]
> I think not insisting on resolvable identifiers is a big mistake. It's like
> saying it's OK to publish source code that you haven't actually bothered to
> check whether it compiles. If they don't have to resolve I can publish any
> identifier I want (witness the number of "fake" LSIDs in the wild) and I've
> made zero commitment that it means anything. And you've taken away the
> ability of the user to test whether your identifier is meaningful, and thus
> build any degree of trust. The acid test of whether you are serious is
> whether your identifiers are "live." The minute we say it's OK for them to
> be unresolvable we are buggered.


First, please forgive me for accusing you of thinking like a human.
:-)  Well of course I mean, thinking like a human about problems which
have to be solved by machines.

Second, I agree with you that identifiers should be resolvable, but it
is neither universally necessary nor does it always solve the problems
one hopes.  IMO for science data, the desire for resolution and
dereferencing arises from the replicability practices in science that
essentially require that original data supporting claims should always
be examinable by third parties. But to a machine, this is not the only
way, sometimes not even the best, way to solve some problems that
dereferencing solves..  One alternative in information science lies in
the theory and practice of software trust relationships, which,
happily, often models the similarly named human theory and practice.

For example, to start with your analogy, there are important real
cases where it is not actually necessary to compile your code to come
to a belief that it is compilable.  That case is the one where the
source code has been generated by another program that is "known" to
generate only compilable code.

Closer to the discussion at hand,  consider a message that arrives at
a software agent and whose content is, in human terms:

 1. The URI http;//md5.hash/fb3d0c347e2c602f4ec650c0e777c1d3
designates specimen with accession number 3251 at the Harvard
University Herbaria.
  2. There are no other specimens at the Harvard University Herbaria
with that accession number and never have been.
  3. As of Fri Feb 24 14:36:45 UTC 2012 the most recent determination
carried in the Harvard records for this specimen is Aus bus.
  4. My name is Roderic Page
http;//md5.hash/7cee01cb3cff705f850d15c357767ca0 and I approved this
  5. This message has MD5 hash code 88f1c348afea5082f1f375910fe814f3 .

Even if NONE of the identifiers in the above are resolvable or
dereferencable, and whether or not there is a dereferencable
identifier at all for the specimen mentioned, there are scenarios in
which the above kind of message is at least a trustworthy as
information delivered via an http request based on an identifier for
the specimen itself.

Going to the primary sources is a time honored scientific and
scholarly practice---and following a community's human practices, can,
if done with great care, produce more usable and trustworthy software
than following the practices of software engineers--- but so is the
use of trusted secondary sources, and the latter serve many purposes
well. Hey, Rod, why do you think I read iPhylo at all?  :-)  Anyway,
on the internet,  \all/ acquisition of data and information is
mediated by software, so in the end, trust by humans or software in
the assertions about the real world that are delivered on the internet
should never depend alone on whether the identifiers are resolvable
and dereferencable.

Inside joke: in the message above, assertion 5 is the only one that
would always have a very low probability of being correct.  Why?

The Wikipedia plot summary of Borges' "The Library of Babel" ends with
the wonderful paragraph, perhaps appropriate to tdwg-tag:

"Despite — indeed, because of — this glut of information, all books
are totally useless to the reader, leaving the librarians in a state
of suicidal despair. This leads some librarians to superstitions and
cult-like behaviour, such as the "Purifiers", who arbitrarily destroy
books they deem nonsense as they scour through the library seeking the
"Crimson Hexagon" and its illustrated, magical books. Another is the
belief that since all books exist in the library, somewhere one of the
books must be a perfect index of the library's contents; some even
believe that a messianic figure known as the "Man of the Book" has
read it, and they travel through the library seeking him."


Bob Morris
Robert A. Morris

Emeritus Professor  of Computer Science
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob at gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.

More information about the tdwg-tag mailing list