Re: How GUIDs will be used

29 Jan 2006

      Thank you, Sally!  I have been beating the drums on a related point (the
difference between data objects singlularly associated with particular
organisations, people or projects -- like specimens and datasets; vs. data
objects likely to be shared by many -- like taxon names & concepts,
reference citations, and named localities) for years now. Your observation
of other practical differences relating to the impact of GUID duplication
further emphasizes the distinction.

Perhaps the two break-out sections at the workshop should be split according
to "GUIDs for data objects not likely to be duplicated across multiple
providers" (existing "specimens" group), vs. "GUIDs likely to be shared by
multiple providers" (existing taxon names/concepts group).

I would imagine that the former group would lean toward provider-assigned
GUIDs and emphasis on fast/easy implementation; whereas the latter group
would lean towards centralized GUID assignment (or at least coordination of
GUID assignment) and emphasis on developing software tools to cross-walk
multiple independent datasets with broadly overlapping content of shared
data objects, in order to minimize redundant GUID assignment to shared
objects.

Aloha,
Rich
...
-----Original Message-----
From: Taxonomic Databases Working Group GUID Project
[mailto:TDWG-GUID@LISTSERV.NHM.KU.EDU]On Behalf Of Sally Hinchcliffe
Sent: Sunday, January 29, 2006 11:13 PM
To: TDWG-GUID@LISTSERV.NHM.KU.EDU
Subject: How GUIDs will be used
Something I was thinking about over the weekend (I really must get a
life)
I was just reading the California Digital Library paper on ARK
identifiers (I think it got circulated many moons ago, yes I am just
getting caught up on my 'homework' now...) and the following
assertion stood out: (page 3)
'[W]hat we're looking for are persistent actionable identifiers,
where an actionable identifier is one that widely available tools
such as web browsers can use to convert a simple "click" into access
to the object ...'
This strikes me as only half true. For specimens, yes, most of the
use of GUIDs will be ultimately to allow a user to get their hands on
either the specimen itself or an electronic object sufficiently
informative about the specimen (e.g. a picture or a grid ref) that
they can do something scientific with it. In that case it mostly
doesn't matter if one specimen has multiple different ids (due to
aggregations, splittings, derivatives, sampling etc.) as long as all
of those ids lead back to the same thing.
But for the more abstract things like names, concepts and maybe even
descriptive terms, I can imagine the guids themselves becoming widely
used without there ever (or very rarely) being a reference made back
to the source of the id because there's not much more _there_ (other
than confirmation that no mistakes have been made in transmitting the
id). The real benefit of _these_ GUIDs is in being able to make
machine-computable statements about x being the same as or different
to or composed of y. In this case, multiple guids referring to the
same thing does become a problem because each duplication dilutes the
amount of information carried by that GUID as to whether x truly is
different from y, and reduces the chance that two of the same things
will carry the same id.
Does this make sense? It suggests that the delegation model for
specimens (e.g. just let each collection issue its own ids for
anything it likes) might have to differ from the one for names,
concepts or other largely abstract objects (where we might want to
explicitly divvy out the domains to particular nomenclators, concept
banks and so on)
Sally
*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe@rbgkew.org.uk

Richard Pyle

tags

participants (1)