Hi all, having started this hare ...
I'm not worried about centralised taxonomy, I'm simply wondering who is going to do all this work of deciding what GUID gets allocated for, say, a name (and yes, we DO need GUIDs for names).
Well I think one of the consensus positions we were converging on was that nomenclators who issue ids (IPNI, IF) could continue to do so, and that those ids, suitably qualified would form the basis of a guid system for names. Whether these would be the _sole_ id for that name or whether some other rival nomenclator might spring up issuing its own ids (and if somebody wants to do plants let me know!!) was not really settled
Yes, in some cases things are simple. For example, we could simply ask uBio to store every name string (which is pretty much what they are doing already), and use their ids as the basis of name GUIDs. But mapping between some of the "higher-level" name databases is not trivial.
Are IPNI and MOBOT going to sit down and go through their databases and match things up, are we then going to do the same thing with IPNI, MOBOT, NCBI and TreeBASE? Will we wait until this is done before assigning GUIDs? And given that mapping between databases can be contentious (is this name really the same as that name, how do we know, etc.) -- and I should point out that current attempts to do this, such as NCBI's LinkOut which uses names are riddled with errors -- it seems this is knowledge that will evolve over time.
Curiously enough, in answer to your first rhetorical question, IPNI & MOBOT _ARE_ going to map up our ids for entirely other reasons. We are staring into the black hole of this particular task and I know that when we have done so the fact 'IPNI id 12345-1' is the same names as 'TROPICOS 34509' will be a hard won and expensive fact and we aren't going to want to lose it.
In the same vain, I suggest that we are likely to make more progress if we have resolvable GUIDs now so that major data sources open their data up, then we use data mining tools to go in an finding mappings, inconsistencies, etc. Many of these things can be computed, i.e. can be automated. Being open could encourage anybody to have a go at examining mappings.
I do think the GUIDs for names should be resolvable, and as soon as an agreed technology is selected I'll be putting in place plans to have IPNI to implement it. What I'm saying is that that is (to some users) secondary to being able to state facts like the one above - x is the same as y - in a stable way without having to go down the long road of fuzzy string matching, different author abbreviations, latin gender endings etc etc.
I'm probably being wildly naive, but I think concern for getting it "right" might get in the way of getting it "done".
Ducks incoming flames/brickbats/etc.
not from me ... looking forward to discussing this one in person in a couple of days Sally *** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
participants (1)
-
Sally Hinchcliffe