Re: How GUIDs will be used

30 Jan 2006

      I certainly agree that collaborative assignment of identifier is one of the
most critical issues we need to address in our discussions.

Here are some of my thoughts attempting to approach the problem.

If identifier I1 refers to data object O1 and identifier I2 refers to data
object O2, our minimal requirement for GUIDs would be that:

A)   I1 == I2 --> O1 == O2 (for all relevant purposes)

We need to decide under which circumstances we want this to be a
bidirectional implication, i.e. that:

B)   O1 == O2 --> I1 == I2

I believe that we will need to address this question separately for
different classes of data.  We need to relate the objects we identify to
clearly defined data classes.  If a nomenclatural GUID is actually the
identifier for a record in a nomenclatural database rather than for the
associated nomenclatural event, we may not have to worry about getting IPNI
and MOBOT to use the same identifier for their records.  The possible
downside is that such an approach is an unambitious one that only solves a
subset of our problems.

We should consider what it would take for us to devise a system that could
support (enforce?) bidirectional inference in those cases in which it
matters to us.  It seems pretty clear to me that such a system would operate
by layering further standards and best practices on top of the actual
identifier system.

In general we need to think hard about management of GUIDs for each data
class.  I suggest that the goal of these workshops is to allow us to select
a framework of identifiers that will meet our needs, but that TDWG and
others should then develop applicability statements which define exactly how
we expect to use them in different contexts.  These statements would in each
case also make explicit the semantic and operational characteristics that a
provider is expected to support.

Donald

---------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
---------------------------------------------------------------

-----Original Message-----
From: Taxonomic Databases Working Group GUID Project
[mailto:TDWG-GUID@LISTSERV.NHM.KU.EDU] On Behalf Of Roderic Page
Sent: 30 January 2006 13:37
To: TDWG-GUID@LISTSERV.NHM.KU.EDU
Subject: Re: How GUIDs will be used

I'm not worried about centralised taxonomy, I'm simply wondering who is
going to do all this work of deciding what GUID gets allocated for,
say, a name (and yes, we DO need GUIDs for names).

Yes, in some cases things are simple. For example, we could simply ask
uBio to store every name string (which is pretty much what they are
doing already), and use their ids as the basis of name GUIDs. But
mapping between some of the "higher-level" name databases is not
trivial.

Are IPNI and MOBOT going to sit down and go through their databases and
match things up, are we then going to do the same thing with IPNI,
MOBOT, NCBI  and TreeBASE? Will we wait until this is done before
assigning GUIDs? And given that mapping between databases can be
contentious (is this name really the same as that name, how do we know,
etc.) -- and I should point out that current attempts to do this, such
as NCBI's LinkOut which uses names are riddled with errors -- it seems
this is knowledge that will evolve over time.

In the same vain,  I suggest that we are likely to make more progress
if we have resolvable GUIDs now so that major data sources open their
data up, then we use data mining tools to go in an finding mappings,
inconsistencies, etc. Many of these things can be computed, i.e. can be
automated. Being open could encourage anybody to have a go at examining
mappings.

I'm probably being wildly naive, but I think concern for getting it
"right" might get in the way of getting it "done".

Ducks incoming flames/brickbats/etc.

Regards

Rod

On 30 Jan 2006, at 11:59, Richard Pyle wrote:
...
Hi Rod,
...
To me centralisation is red rag to a bull, especially as the objects
of
interest (names and concepts) are things we might reasonably disagree
over.
Please don't misunderstand what I'm talking about here. Of course we
might
reasonably disagree over which names to regard as valid and which to
regard
as synonyms.  We will also disagree about the scope of organisms to
include
within the circumscription of a taxon concept.  However, in most
cases, we
will not disagree that Smith (1955) described the species "bus", and
placed
it in the genus "Aus" (i.e., the taxon name object "Aus bus Smith
1955"); or
that Jones (1975) regarded Smith's "Aus bus" as a junior synonym of
Brown's
(1935) "Aus xus" (i.e., the taxon concept object "Aus xus Brown 1935
SEC
Jones 1975"; the circumscription of which includes the taxon concept
object
"Aus bus Smith 1955 SEC Smith 1955").
Centralizing the issuance of GUIDs for things like taxon name objects
and
concepts/usage instances does NOT, in any way, centralize "taxonomy".
It
simply serves to avoid issuing 150 different GUIDs for the taxon name
object
"Aus bus Smith 1955" -- one GUID from each of 150 different data
providers
that happen to list that name in their taxonomic authority table.
...
Why not let users decide this, by which I mean, if a provider
comes up with a comprehensive list of names with good supporting
metadata, users will gravitate towards using them. There will also be
a
"market" for people building services that map between GUIDs (I'm
thinking of making one for TreeBASE, for example). Why centralise this
activity?
So that we don't need a "market" for services to cross-map duplicate
GUIDs
that never needed to be created in the first place.  Instead, we should
"market" services that utilize a common/shared set of GUIDs for
objective
name objects (and concept/usage objects) to assist *taxonomy*. (And,
in the
shorter term, market tools that allow data providers to cross-map their
internal taxonomic authorities to shared GUIDs.)
We certainly can't eliminate duplicates, but at least we can try to
minimize
the unnecessary duplicates. I spend an inordinate chunk of my time
doing two
things that I should not have to do: 1) cross-mapping large datasets
to a
common shared authority (like taxon names); and 2) cleaning up the
database
messes created by earlier workers who were pressed for time, and opted
for
the quick & dirty solution.
Frankly, I'm not sure why we even need GUIDs for things like Taxon
Names,
other than to mitigate these two kinds of problems. I thought the
point was
to facilitate electronic information flow.  How have we facilitated
electronic information flow if you assign one GUID for "Aus bus Smith
1955",
and I assign another GUID to the same taxon name, and a pair of human
eyes
is required to ascertain that they are, indeed, two pointers to the
same
abstract data object?
...
I see the point that multiple GUIDs for the same thing can be
a pain (for papers we have DOIs, PubMed ids, Google Scholar ids,
DSpace
handles, etc.), but in the end centralised GUID assignment reeks of
committees, etc., in other words, impediments to actually getting
things done.
Again, please do not confuse the idea of centralized (or at least
coordinated) issuance of GUIDs for unambiguously shared data objects
(like
taxon name objects), with some sort of ill-advised centralized effort
for a
"shared taxonomy".  I have not seen anybody in recent years even
suggest the
possibility of the latter.
...
I agree that software tools to "cross-walk multiple
independent datasets with broadly overlapping data objects" would be
very nice, but let's separate this from centralising GUID assignment.
One of the lessons of the web, IMHO, is that centralisation doesn't
scale.
You can't scale much bigger than the global pool of IP addresses which
are,
ultimately, issued in blocks in a coordinated, semi-centralized way
(not
althogether unlike a model of GUID issuance that I have previously
suggested).
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org

Donald Hobern

tags

participants (1)