Questions and Suggestions to the ongoing discussion

Mon Sep 19 09:08:35 CEST 2005

Hi Michael,

> a) Growth of the specimen
> b) Sampling of the specimen
> c) Labeling of the specimen (locality, date, leg, etc.)
> d) Conserving of the specimen
> e) Determination
> f) Digitisation (in house database)
> g) Linking up the information to a database network
> h) Analysis of the linked up specimen information
> i) Revision (loop back to point f))
>
> Just two simple question:
> Where exactly should an artificial GUID be implemented?

I would suspect somewhere between steps f & g (I'd lean more towards f).
But in many institutions (and in my own taxonomic work), step c is merged
with step f; and may come before or after steps d & e.

> What is the potential impact on the rest of the product life-cicle?

I would hope that it would result in increased efficiency & accuracy of
steps g & h.

> Learing a lot of latin names with a meaning is much more easy for
> humans then to learn some obscure artificial codes.
> Just try to rememeber 10 5 digit numbers! And no, not everybody is
> eager to run around on a field trip equiped like
> a high-tech soldier.

I think this is one of the most misunderstood aspects of GUIDs.  They should
*NOT* be optimized for human reading. I personally don't think they should
ever even be viewed by a pair of human eyes (except the occassional database
manager). The scientific names (and associated information) are what the
Human should read.  The GUIDs are intended for computers, which are
extraordinarily adept at remembering billions of 5-digit numbers, let alone
10. But I would rather see the GUIDs as 19-digit (10^64) numbers.

> Just a simple statement, as Peter already pointed out, by using
> Linnean code carefully and with high quality within
> our databases, we have already a working GUID in place! The thing
> Linne has already invented. IMHO it is only a
> question of database quality, whether the info given is a realy GUID
> or only a part of it.

I think it's less a question of database quality than of taxonomic quality.
With a few very noteworthy exceptions, the taxonomic information we need for
consistent reliance on "natural key" unique identifiers for names
(Nomenclatural Code + Monomial/Binomial/Trinomial + Author(s)/Year/Page) is
not organized to the point where this complex key can be used as a unique
identifier by a computer with any reasonable degree of certainty and
repeatability. As I said above, the GUIDs should be optimized for computer
usage, not human usage.

> Therefore I have the following suggestions for the 1.5 Million $
> project:
>
> 1) Build a schema, that enforces better overall quality, but not only
> in the taxon information.
[...]
> 2) Help and convince database holders to raise the quality of their
> contributions
> 3) Enhance the database wrappers to speed up the network (I had to
> look into this for a GBIF Austria protal
> and found out, that the software can be improved in terms of speed
> just by rewriting some parts of it in C).

All excellent suggestions, and all of which would, I firmly believe, be
enhanced through the establishment of universal GUIDs for
biologically-relevant data objects.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html