How GUIDs will be used

Mon Jan 30 13:41:34 CET 2006

Hi all,
having started this hare ...

> I'm not worried about centralised taxonomy, I'm simply wondering who is
> going to do all this work of deciding what GUID gets allocated for,
> say, a name (and yes, we DO need GUIDs for names).
>
Well I think one of the consensus positions we were converging on was
that nomenclators who issue ids (IPNI, IF) could continue to do so,
and that those ids, suitably qualified would form the basis of a guid
system for names. Whether these would be the _sole_ id for that name
or whether some other rival nomenclator might spring up issuing its
own ids (and if somebody wants to do plants let me know!!) was not
really settled

> Yes, in some cases things are simple. For example, we could simply ask
> uBio to store every name string (which is pretty much what they are
> doing already), and use their ids as the basis of name GUIDs. But
> mapping between some of the "higher-level" name databases is not
> trivial.
>
> Are IPNI and MOBOT going to sit down and go through their databases and
> match things up, are we then going to do the same thing with IPNI,
> MOBOT, NCBI  and TreeBASE? Will we wait until this is done before
> assigning GUIDs? And given that mapping between databases can be
> contentious (is this name really the same as that name, how do we know,
> etc.) -- and I should point out that current attempts to do this, such
> as NCBI's LinkOut which uses names are riddled with errors -- it seems
> this is knowledge that will evolve over time.
>
Curiously enough, in answer to your first rhetorical question, IPNI &
MOBOT _ARE_ going to map up our ids for entirely other reasons. We
are staring into the black hole of this particular task and I know
that when we have done so the fact 'IPNI id 12345-1' is the same
names as 'TROPICOS 34509' will be a hard won and expensive fact and
we aren't going to want to lose it.

> In the same vain,  I suggest that we are likely to make more progress
> if we have resolvable GUIDs now so that major data sources open their
> data up, then we use data mining tools to go in an finding mappings,
> inconsistencies, etc. Many of these things can be computed, i.e. can be
> automated. Being open could encourage anybody to have a go at examining
> mappings.
>
I do think the GUIDs for names should be resolvable, and as soon as
an agreed technology is selected I'll be putting in place plans to
have IPNI to implement it. What I'm saying is that that is (to some
users) secondary to being able to state facts like the one above - x
is the same as y - in a stable way without having to go down the long
road of fuzzy string matching, different author abbreviations, latin
gender endings etc etc.

> I'm probably being wildly naive, but I think concern for getting it
> "right" might get in the way of getting it "done".
>
> Ducks incoming flames/brickbats/etc.
>
not from me ... looking forward to discussing this one in person in a
couple of days
Sally *** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk