On 30 Jan 2006, at 13:41, Sally Hinchcliffe wrote:
Hi all, having started this hare ...
I'm not worried about centralised taxonomy, I'm simply wondering who is going to do all this work of deciding what GUID gets allocated for, say, a name (and yes, we DO need GUIDs for names).
Well I think one of the consensus positions we were converging on was that nomenclators who issue ids (IPNI, IF) could continue to do so, and that those ids, suitably qualified would form the basis of a guid system for names. Whether these would be the _sole_ id for that name or whether some other rival nomenclator might spring up issuing its own ids (and if somebody wants to do plants let me know!!) was not really settled
Maybe I've been influenced too much by Google, but more and more I think that in order to scale up to the task in hand we need to break things into bite size bits, automate as much as possible, and avoid manual intervention if we can (i.e., use it sparingly where it matters -- expertise is rare and expensive). Likewise, I suspect attempts to settle on sole ids from on hi will not be productive. Rather, if a nomenclature offers a good service then people will use it.
Yes, in some cases things are simple. For example, we could simply ask uBio to store every name string (which is pretty much what they are doing already), and use their ids as the basis of name GUIDs. But mapping between some of the "higher-level" name databases is not trivial.
Are IPNI and MOBOT going to sit down and go through their databases and match things up, are we then going to do the same thing with IPNI, MOBOT, NCBI and TreeBASE? Will we wait until this is done before assigning GUIDs? And given that mapping between databases can be contentious (is this name really the same as that name, how do we know, etc.) -- and I should point out that current attempts to do this, such as NCBI's LinkOut which uses names are riddled with errors -- it seems this is knowledge that will evolve over time.
Curiously enough, in answer to your first rhetorical question, IPNI & MOBOT _ARE_ going to map up our ids for entirely other reasons. We are staring into the black hole of this particular task and I know that when we have done so the fact 'IPNI id 12345-1' is the same names as 'TROPICOS 34509' will be a hard won and expensive fact and we aren't going to want to lose it.
Nor should you have to. IPNI can include in the metadata it serves for IPNI id 12345-1 that it is the same as TROPICOS 34509 (and visa versa). In the same way, in the LSIDs I server I include mappings between databases where possible (e.g., uBio records linked to ITIS, NCBI linked to ITIS, MOBOT, and TreeBASE).
In the same vain, I suggest that we are likely to make more progress if we have resolvable GUIDs now so that major data sources open their data up, then we use data mining tools to go in an finding mappings, inconsistencies, etc. Many of these things can be computed, i.e. can be automated. Being open could encourage anybody to have a go at examining mappings.
I do think the GUIDs for names should be resolvable, and as soon as an agreed technology is selected I'll be putting in place plans to have IPNI to implement it. What I'm saying is that that is (to some users) secondary to being able to state facts like the one above - x is the same as y - in a stable way without having to go down the long road of fuzzy string matching, different author abbreviations, latin gender endings etc etc.
Everybody has different needs, and resources are limited. To me one appeal of GUIDs is that I can have a list of them (say, 10,000) and fire them off to a GUID resolver, and get back all the info I need (for example, I could dump the metadata straight into a database and do some work). If IPNI has mappings to MOBOT, then with the MOBOT GUIDs I can extract what I need from them. Going to a web site and doing it manually is agony. Of course, for a few cases a web site is the way to do it.
I'm probably being wildly naive, but I think concern for getting it "right" might get in the way of getting it "done".
Ducks incoming flames/brickbats/etc.
not from me ... looking forward to discussing this one in person in a couple of days Sally *** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org