How GUIDs will be used

Roderic Page at BIO.GLA.AC.UK
Mon Jan 30 11:00:02 CET 2006

> I would imagine that the former group would lean toward
> provider-assigned
> GUIDs and emphasis on fast/easy implementation; whereas the latter
> group
> would lean towards centralized GUID assignment (or at least
> coordination of
> GUID assignment) and emphasis on developing software tools to
> cross-walk
> multiple independent datasets with broadly overlapping content of
> shared
> data objects, in order to minimize redundant GUID assignment to shared
> objects.

To me centralisation is red rag to a bull, especially as the objects of
interest (names and concepts) are things we might reasonably disagree
over. Why not let users decide this, by which I mean, if a provider
comes up with a comprehensive list of names with good supporting
metadata, users will gravitate towards using them. There will also be a
"market" for people building services that map between GUIDs (I'm
thinking of making one for TreeBASE, for example). Why centralise this
activity? I see the point that multiple GUIDs for the same thing can be
a pain (for papers we have DOIs, PubMed ids, Google Scholar ids, DSpace
handles, etc.), but in the end centralised GUID assignment reeks of
committees, etc., in other words, impediments to actually getting
things done. I agree that software tools to "cross-walk multiple
independent datasets with broadly overlapping data objects" would be
very nice, but let's separate this from centralising GUID assignment.
One of the lessons of the web, IMHO, is that centralisation doesn't



> Aloha,
> Rich
>> -----Original Message-----
>> From: Taxonomic Databases Working Group GUID Project
>> [mailto:TDWG-GUID at LISTSERV.NHM.KU.EDU]On Behalf Of Sally Hinchcliffe
>> Sent: Sunday, January 29, 2006 11:13 PM
>> Subject: How GUIDs will be used
>> Something I was thinking about over the weekend (I really must get a
>> life)
>> I was just reading the California Digital Library paper on ARK
>> identifiers (I think it got circulated many moons ago, yes I am just
>> getting caught up on my 'homework' now...) and the following
>> assertion stood out: (page 3)
>> '[W]hat we're looking for are persistent actionable identifiers,
>> where an actionable identifier is one that widely available tools
>> such as web browsers can use to convert a simple "click" into access
>> to the object ...'
>> This strikes me as only half true. For specimens, yes, most of the
>> use of GUIDs will be ultimately to allow a user to get their hands on
>> either the specimen itself or an electronic object sufficiently
>> informative about the specimen (e.g. a picture or a grid ref) that
>> they can do something scientific with it. In that case it mostly
>> doesn't matter if one specimen has multiple different ids (due to
>> aggregations, splittings, derivatives, sampling etc.) as long as all
>> of those ids lead back to the same thing.
>> But for the more abstract things like names, concepts and maybe even
>> descriptive terms, I can imagine the guids themselves becoming widely
>> used without there ever (or very rarely) being a reference made back
>> to the source of the id because there's not much more _there_ (other
>> than confirmation that no mistakes have been made in transmitting the
>> id). The real benefit of _these_ GUIDs is in being able to make
>> machine-computable statements about x being the same as or different
>> to or composed of y. In this case, multiple guids referring to the
>> same thing does become a problem because each duplication dilutes the
>> amount of information carried by that GUID as to whether x truly is
>> different from y, and reduces the chance that two of the same things
>> will carry the same id.
>> Does this make sense? It suggests that the delegation model for
>> specimens (e.g. just let each collection issue its own ids for
>> anything it likes) might have to differ from the one for names,
>> concepts or other largely abstract objects (where we might want to
>> explicitly divvy out the domains to particular nomenclators, concept
>> banks and so on)
>> Sally
>> *** Sally Hinchcliffe
>> *** Computer section, Royal Botanic Gardens, Kew
>> *** tel: +44 (0)20 8332 5708
>> *** S.Hinchcliffe at
Professor Roderic D. M. Page
Editor, Systematic Biology
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email: at

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:
Search for taxon names at
Find out what we know about a species at

More information about the tdwg-tag mailing list