Re: Topic 3: GUIDs for Taxon Names and Taxon Concepts
And here's what I've got regarding actual use of taxon names / concepts within Kew.
Taxon GUIDs
Again, a combined response including our LCD database, the Monocot Checklist, Generic Index, SEPASAL, Flora Zambesiaca and IPNI
1. Is your data orgainsied using taxon names or taxon concepts?
General: Most of our systems were designed before concepts were thought of and therefore take a simplified approach to taxa in general. Where we use internal checklists like the various collection taxon lists (lookups) we're actually using concepts in a way but our own concepts and not necessarily identifiable published concepts and they are not explicitly stated in any useful way. These will often contain a basic synonymy to keep databases up to date with recent revisions. In HerbCat what is usually being recorded is a determination (i.e. in some people's eyes a concept - but an unpublished one) and we record both the name given and the name of the determiner and the date it was made. The Monocot Checklist creates concepts and cites bibliographic references in support or dissension from these concepts. SEPASAL has very rich data with bibliographic references for almost every piece of information, including synonymy and vernacular names, so multiple concepts could probably be extracted from it with some pain. The African Floras such as FZ also create concepts and give synonymy and descriptions and may cite other concepts but usually just nominal ones (although you get the occasional 'sensu' in there). IPNI explicitly contains only original concepts. The Generic Index does not explicitly contain concepts but does give synonymy at the generic level.
2. Do you assign reusable identifiers to taxon names or concepts (i.e. identifiers used in more than one database) Almost all collections at Kew store identifiers linked to the Generic Index for genus names. We are beginning to link some databases to IPNI via the ipni id. E.g. Herbcat can link to the name via IPNI and then it stores the ipni id but this is not required Monocot Checklist exposes its ids for names for use by whoever wants them
3. If so what is the process in assigning new identifiers for additional taxa and for accommodating taxonomic change?
New identifiers are usually generated internally and automatically by the database system concerned. IPNI does not include taxonomy so there is no process for taxonomic change, but _any_ edit to an IPNI records results in an incrementing of the version number. For other systems, taxonomy could be changed quite substantially (names raised or sunk, new descriptions, new distributions etc.) without a new id being generated or any sort of versioning. This includes the Generic Index, which means that databases which link to the Gen Index via Genus number might suddenly find that their specimens have been moved wholesale into another family ... (sinking or raising a genus should not have an effect on the name although the linking system might generate a warning that the genus is no longer accepted the next time the record is edited)
4. Where are these identifiers used? Mostly internally although I hope IPNI ids will become more widely used in the future. We send out data from IPNI and from various checklists (e.g. Monocots Checklist) with the ids attached so that people can use them if they wish although we have no idea of whether they are used. Data sent out from IPNI comes with a rider that the ids be retained so that when any changes are fed back to us they can be easily dealt with
5. Do you use identifiers from any external classification within your database?
the Monocots Checklist retains both TROPICOS and IPNI ids. Otherwise no, although perhaps we should
6. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
- This would result in an enormous amount of reprogramming and relinking of legacy data for almost all our databases at Kew. This would mean that any use of GUIDs would have to be waived for existing records at the least. - Some names (on herbarium labels for example) may not be possible to match (although fuzzy matching and so on may help with this). There also all the usual problems of sp.A and sp.B to contend with. A lot of databasing goes on in the field or under time pressure and if there was any sort of central server which had to be linked to (and which could be down or slow) this would be a barrier to data entry. It would only be workable if names could be entered off line and validated & linked later. Having automatic updating of taxonomy through these links (e.g. if a taxon is sunk into another one) could lead to instability of the data in linked databases; they would have to be flagged or warned that taxonomy had been changed, rather than having the names changed automatically leading to specimens becoming unfindable or information on labels being different from the information held in databases.
None of these are unsurmountable ... but careful use cases would have to be drawn up!
Have a merry Christmas / Saturnalia / New Year / festival of your choice Sally *** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
participants (1)
-
Sally Hinchcliffe