Topic 3: GUIDs for Taxon Names and Taxon Concepts

Sally Hinchcliffe S.Hinchcliffe at KEW.ORG
Thu Dec 22 15:40:56 CET 2005


And here's what I've got regarding actual use of taxon names /
concepts within Kew.

Taxon GUIDs

Again, a combined response including our LCD database, the Monocot
Checklist, Generic Index, SEPASAL, Flora Zambesiaca and IPNI

1. Is your data orgainsied using taxon names or taxon concepts?

General: Most of our systems were designed before concepts were
thought of and therefore take a simplified approach to taxa in
general.
Where we use internal checklists like the various collection taxon
lists (lookups) we're actually using concepts in a way but our own
concepts and not necessarily identifiable published concepts and they
are not explicitly stated in any useful way. These will often contain
a basic synonymy to keep databases up to date with recent revisions.
In HerbCat what is usually being recorded is a determination (i.e. in
some people's eyes a concept - but an unpublished one) and we record
both the name given and the name of the determiner and the date it
was made.
The Monocot Checklist creates concepts and cites bibliographic
references in support or dissension from these concepts.
SEPASAL has very rich data with bibliographic references for almost
every piece of information, including synonymy and vernacular names,
so multiple concepts could probably be extracted from it with some
pain.
The African Floras such as FZ also create concepts and give synonymy
and descriptions and may cite other concepts but usually just nominal
ones (although you get the occasional 'sensu' in there).
IPNI explicitly contains only original concepts.
The Generic Index does not explicitly contain concepts but does give
synonymy at the generic level.

2. Do you assign reusable identifiers to taxon names or concepts
(i.e. identifiers used in more than one database)
Almost all collections at Kew store identifiers linked to the Generic
Index for genus names.
We are beginning to link some databases to IPNI via the ipni id. E.g.
Herbcat can link to the name via IPNI and then it stores the ipni id
but this is not required
Monocot Checklist exposes its ids for names for use by whoever wants
them

3. If so what is the process in assigning new identifiers for
additional taxa and for accommodating taxonomic change?

New identifiers are usually generated internally and automatically by
the database system concerned.
IPNI does not include taxonomy so there is no process for taxonomic
change, but _any_ edit to an IPNI records results in an incrementing
of the version number.
For other systems, taxonomy could be changed quite substantially
(names raised or sunk, new descriptions, new distributions etc.)
without a new id being generated or any sort of versioning. This
includes the Generic Index, which means that databases which link to
the Gen Index via Genus number might suddenly find that their
specimens have been moved wholesale into another family ... (sinking
or raising a genus should not have an effect on the name although the
linking system might generate a warning that the genus is no longer
accepted the next time the record is edited)

4. Where are these identifiers used?
Mostly internally although I hope IPNI ids will become more widely
used in the future. We send out data from IPNI and from various
checklists (e.g. Monocots Checklist) with the ids attached so that
people can use them if they wish although we have no idea of whether
they are used. Data sent out from IPNI comes with a rider that the
ids be retained so that when any changes are fed back to us they can
be easily dealt with

5. Do you use identifiers from any external classification within
your database?

the Monocots Checklist retains both TROPICOS and IPNI ids. Otherwise
no, although perhaps we should

6. Would there be any social or technical roadblocks to replacing
these identifiers with a single identifier that was guaranteed to be
unique?

- This would result in an enormous amount of reprogramming and
relinking of legacy data for almost all our databases at Kew. This
would mean that any use of GUIDs would have to be waived for existing
records at the least.
- Some names (on herbarium labels for example) may not be possible to
match (although fuzzy matching and so on may help with this). There
also all the usual problems of sp.A and sp.B to contend with.
A lot of databasing goes on in the field or under time pressure and
if there was any sort of central server which had to be linked to
(and which could be down or slow) this would be a barrier to data
entry. It would only be workable if names could be entered off line
and validated & linked later.
Having automatic updating of taxonomy through these links (e.g. if a
taxon is sunk into another one) could lead to instability of the data
in linked databases; they would have to be flagged or warned that
taxonomy had been changed, rather than having the names changed
automatically leading to specimens becoming unfindable or information
on labels being different from the information held in databases.

None of these are unsurmountable ... but careful use cases would have
to be drawn up!

Have a merry Christmas / Saturnalia / New Year / festival of your
choice
Sally
*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk




More information about the tdwg-tag mailing list