Topic 3: GUIDs for Taxon Names and Taxon Concepts

Roderic Page at BIO.GLA.AC.UK
Fri Nov 4 09:39:34 CET 2005

Perhaps to try and clarify my previous post (not a good idea to write
before the coffee has kicked in), as a straw man what's wrong with the
following approach:

1. Data providers serve up their own records, each identified by a GUID

2. At a minimum a "record" is metadata about that object, using
existing vocabulary as much as possible. This means Dublin Core for
basic stuff (title, creator, etc.), Prism for bibliographic details,
Basic Geo (WGS84 for geographic co-ordinates, Darwin Core for other
specimen stuff, etc.

3. Such records may refer  to other objects (e.g., a sequence may refer
to a publication and a museum voucher specimen), all by way of GUIDs.

4. Each record carries a Creative Commons license specifying what you
can do with the data. The present system where museums have some text
saying what can and can't be done is too clumsy, and putting the GBIF
data usage agreement in the way of searching GBIF is way too

If we do this (and we're pretty close to this already), then I think
we've got the core of a useful resource. Partly because all of this is
technically easy to do, and is already being done in other areas. So,
why not just do it?


I think the role of GUIDs is to (a) unambiguously identify digital
objects, and (b) tell us where to get that digital record.

Many of the issues that crop up here are, in my opinion, not really
about GUIDs. Regarding the issue of taxonomic concepts, I think this is
something of a red herring. If concepts are essentially a pairing of
name and use, then given a GUID for a name and a publication, there
will be huge numbers of these, and they will occur in all sorts of
domains (taxonomy, ecology, development, conservation, medicine, etc.).
They will also exist in databases (e.g., GenBank, TreeBASE, etc.). But
if they are pairings of names and usage (e.g., publication, database),
then given a name GUID and a usage GUID, do we need anything else,
really? Names, publications, and specimens are primary, concepts are

It seems to me the core notion of a concept is knowing what somebody
meant when using a name. This is (a) a problem of  inference, and (b)
probably intractable  in most cases (who knows what ecologist "x" meant
by species "y" in 1910, regardless of what classification he or she may
have claimed to be using).

One example where this inference is more straightforward, especially
given 1-4 above, is if observations are linked to specimens. For
example, in TreeBASE there are various occurrences of "Apomys datae" in
different studies. Given that these names are linked to sequences,
which are linked to specimens, we can infer that these different
occurrences are the same thing. This kind of inference is made much
easier when data is openly available. I also think that inference of
concepts will be domain-specific, localised to particular questions,
and not necessarily something taxonomic databases need to themselves
support. Do we seriously think we can build a database of the usage of
all names and what they meant? I suggest this knowledge will emerge
over time on the foundations of 1-4 above. Since taxonomists don't
control how names are used, and probably use names a lot less than
other biologists (after all, we provide names so biologists can
communicate), I don't think it is the role of taxonomists to document
every concept.

The KISS principle (Keep It Simple Stupid) would seem to apply here.
Why not focus on something that is achievable, has value, and can be
linked to what people in other areas are doing (such as bioinformatics,
the Semantic Web, etc.).

In other words, maybe it's time to stop thinking like taxonomists...

Professor Roderic D. M. Page
Editor, Systematic Biology
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email: at

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:
Search for taxon names at
Find out what we know about a species at

