How GUIDs will be used

Mon Jan 30 12:36:42 CET 2006

I'm not worried about centralised taxonomy, I'm simply wondering who is
going to do all this work of deciding what GUID gets allocated for,
say, a name (and yes, we DO need GUIDs for names).

Yes, in some cases things are simple. For example, we could simply ask
uBio to store every name string (which is pretty much what they are
doing already), and use their ids as the basis of name GUIDs. But
mapping between some of the "higher-level" name databases is not
trivial.

Are IPNI and MOBOT going to sit down and go through their databases and
match things up, are we then going to do the same thing with IPNI,
MOBOT, NCBI  and TreeBASE? Will we wait until this is done before
assigning GUIDs? And given that mapping between databases can be
contentious (is this name really the same as that name, how do we know,
etc.) -- and I should point out that current attempts to do this, such
as NCBI's LinkOut which uses names are riddled with errors -- it seems
this is knowledge that will evolve over time.

In the same vain,  I suggest that we are likely to make more progress
if we have resolvable GUIDs now so that major data sources open their
data up, then we use data mining tools to go in an finding mappings,
inconsistencies, etc. Many of these things can be computed, i.e. can be
automated. Being open could encourage anybody to have a go at examining
mappings.

I'm probably being wildly naive, but I think concern for getting it
"right" might get in the way of getting it "done".

Ducks incoming flames/brickbats/etc.

Regards

Rod

On 30 Jan 2006, at 11:59, Richard Pyle wrote:

> Hi Rod,
>
>> To me centralisation is red rag to a bull, especially as the objects
>> of
>> interest (names and concepts) are things we might reasonably disagree
>> over.
>
> Please don't misunderstand what I'm talking about here. Of course we
> might
> reasonably disagree over which names to regard as valid and which to
> regard
> as synonyms.  We will also disagree about the scope of organisms to
> include
> within the circumscription of a taxon concept.  However, in most
> cases, we
> will not disagree that Smith (1955) described the species "bus", and
> placed
> it in the genus "Aus" (i.e., the taxon name object "Aus bus Smith
> 1955"); or
> that Jones (1975) regarded Smith's "Aus bus" as a junior synonym of
> Brown's
> (1935) "Aus xus" (i.e., the taxon concept object "Aus xus Brown 1935
> SEC
> Jones 1975"; the circumscription of which includes the taxon concept
> object
> "Aus bus Smith 1955 SEC Smith 1955").
>
> Centralizing the issuance of GUIDs for things like taxon name objects
> and
> concepts/usage instances does NOT, in any way, centralize "taxonomy".
> It
> simply serves to avoid issuing 150 different GUIDs for the taxon name
> object
> "Aus bus Smith 1955" -- one GUID from each of 150 different data
> providers
> that happen to list that name in their taxonomic authority table.
>
>> Why not let users decide this, by which I mean, if a provider
>> comes up with a comprehensive list of names with good supporting
>> metadata, users will gravitate towards using them. There will also be
>> a
>> "market" for people building services that map between GUIDs (I'm
>> thinking of making one for TreeBASE, for example). Why centralise this
>> activity?
>
> So that we don't need a "market" for services to cross-map duplicate
> GUIDs
> that never needed to be created in the first place.  Instead, we should
> "market" services that utilize a common/shared set of GUIDs for
> objective
> name objects (and concept/usage objects) to assist *taxonomy*. (And,
> in the
> shorter term, market tools that allow data providers to cross-map their
> internal taxonomic authorities to shared GUIDs.)
>
> We certainly can't eliminate duplicates, but at least we can try to
> minimize
> the unnecessary duplicates. I spend an inordinate chunk of my time
> doing two
> things that I should not have to do: 1) cross-mapping large datasets
> to a
> common shared authority (like taxon names); and 2) cleaning up the
> database
> messes created by earlier workers who were pressed for time, and opted
> for
> the quick & dirty solution.
>
> Frankly, I'm not sure why we even need GUIDs for things like Taxon
> Names,
> other than to mitigate these two kinds of problems. I thought the
> point was
> to facilitate electronic information flow.  How have we facilitated
> electronic information flow if you assign one GUID for "Aus bus Smith
> 1955",
> and I assign another GUID to the same taxon name, and a pair of human
> eyes
> is required to ascertain that they are, indeed, two pointers to the
> same
> abstract data object?
>
>> I see the point that multiple GUIDs for the same thing can be
>> a pain (for papers we have DOIs, PubMed ids, Google Scholar ids,
>> DSpace
>> handles, etc.), but in the end centralised GUID assignment reeks of
>> committees, etc., in other words, impediments to actually getting
>> things done.
>
> Again, please do not confuse the idea of centralized (or at least
> coordinated) issuance of GUIDs for unambiguously shared data objects
> (like
> taxon name objects), with some sort of ill-advised centralized effort
> for a
> "shared taxonomy".  I have not seen anybody in recent years even
> suggest the
> possibility of the latter.
>
>> I agree that software tools to "cross-walk multiple
>> independent datasets with broadly overlapping data objects" would be
>> very nice, but let's separate this from centralising GUID assignment.
>> One of the lessons of the web, IMHO, is that centralisation doesn't
>> scale.
>
> You can't scale much bigger than the global pool of IP addresses which
> are,
> ultimately, issued in blocks in a coordinated, semi-centralized way
> (not
> althogether unlike a model of GUID issuance that I have previously
> suggested).
>
> Aloha,
> Rich
>
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
>
>
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org