[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

Roderic Page r.page at bio.gla.ac.uk
Sun Jan 9 11:24:19 CET 2011


Why aren't identifiers reused?
----------------------------------------

Because in most cases they offer no added value. If I have a ITIS TSN  
there's not much I can do with it. I can get a name from ITIS (with  
some vague assurance that thus name is accepted - or not - with no  
evidence for this assertion). I can think of only two taxonomic  
identifiers that have real value and get much reuse, NCBI taxonomy ids  
and uBio NameBankIDs. NCBI ids are reused because they underpin the  
genomics databases, and genomics does real computational biology, and  
makes extensive reuse of data (as exemplified by the annual Nucleic  
Acids Research database issue). uBio NameBankIDs get reused because  
uBio has lots of names, and provides services for discovering those  
names in text (see, e.g., their use by BHL).

Few taxonomic name databases provide a compelling reason for anyone to  
use their identifiers, most being digital sinks (you go there, get an  
identifier for a name, and nothing else).


Why UUIDS?
----------------

UUIDs are ugly, and solve a problem that for the most part we don't  
have. They are ideal for minting globally unique identifiers in a  
distributed system, but we don't have distributed systems. Catalogue  
of Life uses UUIDs, but these are centrally created (I suspect using  
MySQL's UUID function, given how similar the UUIDs are to each other).  
ZooBank uses them, but it is not (yet) a distributed system. If the  
Catalogue of Life were genuinely a distributed system UUIDs would make  
sense, but that's not actually how it works.

I think users would cope with UUIDs if the databases using them  
provides clear value. For example, MusicBrainz uses UUIDs http://musicbrainz.org/artist/563201cb-721c-4cfb-acca-c1ba69e3d1fb.html 
, as does Mendeley, the latter hiding them from users via human- 
readable URLs. Given that we have obvious user-friendly candidates for  
URLs (taxonomic names), it would be trivial to hide UUIDs in names  
(making homonyms distinct by adding authorship, or whatever it took to  
make them unique as strings).

What, if anything, is a taxonomic name?
----------------------------------------------------

In my experience, when non-taxonomists meet taxonomists things get  
ugly. For example, a publisher wanting to mark up taxonomic names in  
text might ask taxonomists how to do this,and within minutes the  
taxonomists are off into discussions of namestrings versus usages  
versus concepts and pretty soon the publisher deeply regrets ever  
asking the question. I've been at meetings where the look in  
publishers' eyes said "run away, run away".

Part of the reason we have multiple databases is because different  
projects are capturing different things (roughly speaking, uBio is  
mostly about namestrings, Catalogue of Life is about concepts, IPNI  
and ZooBank are about first usage of a name, etc.)

Most users outside our field won't give a damn about the niceties of  
these distinctions, yet we persist in discussing them ad nauseam.  
Until we provide a single, very simple service that takes a name  
string and hides all this complexity (unless the user chooses to see  
the gory details) while still providing useful information, we will be  
stuck in multiple identifier hell. The tragedy is  we've never had  
more people genuinely interested in linking to names than at present  
-- publishers are desperately trying to add "semantic value" to their  
content, and we are spectacularly ill-equipped to deliver this (and  
it's our own fault).

I rather suspect we're rapidly approaching the point where users  
outside taxonomy will simply say "to hell with these taxonomists,  
let's just use Wikipedia and be done with it."

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html









More information about the tdwg-content mailing list