Re: How GUIDs will be used

30 Jan 2006


      Hi all,
having started this hare ...
...
I'm not worried about centralised taxonomy, I'm simply wondering who is
going to do all this work of deciding what GUID gets allocated for,
say, a name (and yes, we DO need GUIDs for names).
Well I think one of the consensus positions we were converging on was
that nomenclators who issue ids (IPNI, IF) could continue to do so,
and that those ids, suitably qualified would form the basis of a guid
system for names. Whether these would be the _sole_ id for that name
or whether some other rival nomenclator might spring up issuing its
own ids (and if somebody wants to do plants let me know!!) was not
really settled
...
Yes, in some cases things are simple. For example, we could simply ask
uBio to store every name string (which is pretty much what they are
doing already), and use their ids as the basis of name GUIDs. But
mapping between some of the "higher-level" name databases is not
trivial.
Are IPNI and MOBOT going to sit down and go through their databases and
match things up, are we then going to do the same thing with IPNI,
MOBOT, NCBI  and TreeBASE? Will we wait until this is done before
assigning GUIDs? And given that mapping between databases can be
contentious (is this name really the same as that name, how do we know,
etc.) -- and I should point out that current attempts to do this, such
as NCBI's LinkOut which uses names are riddled with errors -- it seems
this is knowledge that will evolve over time.
Curiously enough, in answer to your first rhetorical question, IPNI &
MOBOT _ARE_ going to map up our ids for entirely other reasons. We
are staring into the black hole of this particular task and I know
that when we have done so the fact 'IPNI id 12345-1' is the same
names as 'TROPICOS 34509' will be a hard won and expensive fact and
we aren't going to want to lose it.
...
In the same vain,  I suggest that we are likely to make more progress
if we have resolvable GUIDs now so that major data sources open their
data up, then we use data mining tools to go in an finding mappings,
inconsistencies, etc. Many of these things can be computed, i.e. can be
automated. Being open could encourage anybody to have a go at examining
mappings.
I do think the GUIDs for names should be resolvable, and as soon as
an agreed technology is selected I'll be putting in place plans to
have IPNI to implement it. What I'm saying is that that is (to some
users) secondary to being able to state facts like the one above - x
is the same as y - in a stable way without having to go down the long
road of fuzzy string matching, different author abbreviations, latin
gender endings etc etc.
...
I'm probably being wildly naive, but I think concern for getting it
"right" might get in the way of getting it "done".
Ducks incoming flames/brickbats/etc.
not from me ... looking forward to discussing this one in person in a
couple of days
Sally *** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe@rbgkew.org.uk