How GUIDs will be used

Mon Jan 30 14:13:26 CET 2006

On 30 Jan 2006, at 13:41, Sally Hinchcliffe wrote:

> Hi all,
> having started this hare ...
>
>> I'm not worried about centralised taxonomy, I'm simply wondering who
>> is
>> going to do all this work of deciding what GUID gets allocated for,
>> say, a name (and yes, we DO need GUIDs for names).
>>
> Well I think one of the consensus positions we were converging on was
> that nomenclators who issue ids (IPNI, IF) could continue to do so,
> and that those ids, suitably qualified would form the basis of a guid
> system for names. Whether these would be the _sole_ id for that name
> or whether some other rival nomenclator might spring up issuing its
> own ids (and if somebody wants to do plants let me know!!) was not
> really settled

Maybe I've been influenced too much by Google, but more and more I
think that in order to scale up to the task in hand we need to break
things into bite size bits, automate as much as possible, and avoid
manual intervention if we can (i.e., use it sparingly where it matters
-- expertise is rare and expensive). Likewise, I suspect attempts to
settle on sole ids from on hi will not be productive. Rather, if a
nomenclature offers a good service then people will use it.

>
>> Yes, in some cases things are simple. For example, we could simply ask
>> uBio to store every name string (which is pretty much what they are
>> doing already), and use their ids as the basis of name GUIDs. But
>> mapping between some of the "higher-level" name databases is not
>> trivial.
>>
>> Are IPNI and MOBOT going to sit down and go through their databases
>> and
>> match things up, are we then going to do the same thing with IPNI,
>> MOBOT, NCBI  and TreeBASE? Will we wait until this is done before
>> assigning GUIDs? And given that mapping between databases can be
>> contentious (is this name really the same as that name, how do we
>> know,
>> etc.) -- and I should point out that current attempts to do this, such
>> as NCBI's LinkOut which uses names are riddled with errors -- it seems
>> this is knowledge that will evolve over time.
>>
> Curiously enough, in answer to your first rhetorical question, IPNI &
> MOBOT _ARE_ going to map up our ids for entirely other reasons. We
> are staring into the black hole of this particular task and I know
> that when we have done so the fact 'IPNI id 12345-1' is the same
> names as 'TROPICOS 34509' will be a hard won and expensive fact and
> we aren't going to want to lose it.
>

Nor should you have to. IPNI can include in the metadata it serves for
IPNI id 12345-1 that it is the same as TROPICOS 34509 (and visa versa).
In the same way, in the LSIDs I server I include mappings between
databases where possible (e.g., uBio records linked to ITIS, NCBI
linked to ITIS, MOBOT, and TreeBASE).

>> In the same vain,  I suggest that we are likely to make more progress
>> if we have resolvable GUIDs now so that major data sources open their
>> data up, then we use data mining tools to go in an finding mappings,
>> inconsistencies, etc. Many of these things can be computed, i.e. can
>> be
>> automated. Being open could encourage anybody to have a go at
>> examining
>> mappings.
>>
> I do think the GUIDs for names should be resolvable, and as soon as
> an agreed technology is selected I'll be putting in place plans to
> have IPNI to implement it. What I'm saying is that that is (to some
> users) secondary to being able to state facts like the one above - x
> is the same as y - in a stable way without having to go down the long
> road of fuzzy string matching, different author abbreviations, latin
> gender endings etc etc.

Everybody has different needs, and resources are limited. To me one
appeal of GUIDs is that I can have a list of them (say, 10,000) and
fire them off to a GUID resolver, and get back all the info I need (for
example, I could dump the metadata straight into a database and do some
work). If IPNI has mappings to MOBOT, then with the MOBOT GUIDs I can
extract what I need from them. Going to a web site and doing it
manually is agony. Of course, for a few cases a web site is the way to
do it.

>
>> I'm probably being wildly naive, but I think concern for getting it
>> "right" might get in the way of getting it "done".
>>
>> Ducks incoming flames/brickbats/etc.
>>
> not from me ... looking forward to discussing this one in person in a
> couple of days
> Sally *** Sally Hinchcliffe
> *** Computer section, Royal Botanic Gardens, Kew
> *** tel: +44 (0)20 8332 5708
> *** S.Hinchcliffe at rbgkew.org.uk
>
>
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org