Apologies that this has taken so long ... I have even forgotten the topic number. Taxon Guids to follow
Guids - specimens
This is a combined response looking at our Herbarium catalogue, Seed Bank, & Living collections. There are other specimen holding collections within the gardens, but more or less the same answers would generally apply
1. What identifiers (how many per specimen) get assigned to specimens in your organisation or domain (field numbers, catalogue number etc.)
HerbCat: collector number (collection event level), barcode (per sheet; multi part collections get multiple barcodes, joined together by a nominaated 'top sheet'), bottle number (spirit collections), mycology number (mycological collections)
Living Collection (LCD): Each accession has the following identifiers: Accession key - unique to each accession (an Accession may cover a single plant if a tree or a number of plants if something smaller like a bulb) Collection key - each collection (ie. collecting event) can contain one or many accession. A collection contains information specific to the batch in which the accession arrived, for example from an expedition. Taxon key - specific to each living collection taxon. Many-to-one between accession and taxon as you might guess
Seedbank (MSB): Serial Number is the unique identifier per seed sample specimen. There is also a batch number which relates to a batch of seeds coming from a group of collections.
2. What is the scope of uniqueness for each of these identifiers?
All: Collector numbers are theoretically unique to their collector although not always. Not all collectors number their collections. Collector names are not necessarily unique or consistently cited. One collection event will give rise to multiple duplicates - which could be of different types of material, for example fresh material used for dna + a dried voucher + a spirit collection.
Herbcat: Kew's barcodes are theoretically global, at least in the Herbarium world as they consist of a prefix (K, taken from Index Herbariorum) plus a unique number. Other numbers (Mycology and spirit collection) are unique to the database they are stored in.
LCD: Accession key is instititution socpe. Collection is the same. Taxon is system-scope (we are currently discussing the possibility of having one Kew-wide taxon system but it's currently no more than a bright idea ...) Accession keys might be used in other systems - e.g. if plants are grown from the seed bank seeds or if material is sent to the herbarium or to the labs for DNA analysis
MSB: Serial Number: Unique within SBD Batch Number: Unique within SBD for each batch of collections.
3. Can you explain the life cycle of these identifiers
All: Collector number - These are assigned in the field by the collector to the entire collection (which will give rise to duplicates both at Kew and at other institutions) When databasing from an existing specimen, the collectors number and the collector name are taken off the label.
Herbcat: Barcodes come from preprinted rolls of barcodes which are assigned to specimens as the specimens are databased. The barcode label is then glued onto the sheet and the barcode scanned into the database. If a specimen has multiple sheets or other parts then it is given multiple barcodes and the specimen is grouped in the database with its constituent parts using a joining table. This 'specimen' does not have any externally visible id. Mycology numbers are automatically generated by the database and printed on the specimen label. Spirit collection bottle numbers are printed on the bottle label and (I think?) generated by the database.
LCD: Taxon key is assigned by the system on creation of new taxa. The collection key is assigned automatically on creation of new accession and remains associated with an accession for its lifetime. It follows any splits, transfers, or other actions that the accession undergoes. The accession key is assigned on creation and is either automatically assigned or user defined. Splitting or propagating an accession (creating a new accession and retaining some of the former's fields) will result in the creation of a new accession key (again automatic or user-defined) and the persistence of the current accession which retains its current key.
MSB: Serial Number: Assigned automatically by SBD. Incremental but with a check digit at the end. Batch Number: Assigned automatically by SBD when required. Incremental.
4. Can you give examples of how these identifiers are used to retrieve the specimen and/or information on the specimen?
Herbcat: Database can be searched by collector name and number to retrieve information on the specimen, then use family, genus, species and country to locate the specimen in the Herbarium cupboards. Database can be searched by barcode to retrieve information on the specimen, then use family, genus, species and country to locate the specimen in the Herbarium cupboards. Barcodes are also used to name image files made of each specimen and will be used to search our picture index Database can be searched by bottle number. Spirit collection bottles are stored in cupboards ordered by bottle number. Database can be searched by mycology number. Not sure how these are then arranged in the herbarium cupboards.
LCD: The LivColl system can search on collector, taxon and accession key. The accession keys are also used on the garden labels and are visible to the public.
MSB: The SBD interface is based around entry of Serial Numbers. The physical locations(s) of the specimen within the seed bank can be found using Serial Number. Queries can be made using a reporting tool to retrieve information using Batch Number.
5. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
All: Collectors and their numbers would be a difficult one to turn into guids and keep track of as as far as I know there is not one database holding information about all collectors or even all botanical collectors (It would be great if there were though). Standard forms (eg Surname, initials) might go a long way towards identifying collectors more-or-less uniquely but it would not be foolproof.
Herbcat: Bottle number and mycology number are used in publications, not sure if barcodes have been already as well but I assume they might in future. There would be a lot of additional curatorial work to replace / add labels on the specimens with the new numbers. Spirit specimens require special labels to fit on small bottles (bottle numbers are currently printed on the specimen label rather than on a separate label for this reason). Not sure how the new numbering system would impact on the spirit collections as this is arranged in the cupboards according to the current numbering. Changes to software systems would be required to accept a new format for barcodes, bottle numbers and mycology numbers. Barcode numbers are also used also to store digital images of the specimens and these files would need to be renamed. (None of these should prove to be a problem if the GUID was a combination - ie. took a prefix to indicate the institution and collection and then continued with the existing numbers. This would mean that the GUID could be generated if all you knew was the barcode and how to generate the GUID from a given barcode)
MSB: There would be opposition from users who have used the Serial Number system for many years. There would be lots of technical issues with tracking existing collections and rewriting code. (As above, as long as the Serial number continued within a Guid 'wrapper' a lot of the objections cited here could be avoided)
6. In the case of subsamples from a specimen, can you identify issues around associating the sample and associated information with the source specimen and associated information
HerbCat: If this is about specimens with multiple parts I have come to the conclusion that these should be linked, but also dealt with as separate entities, each with their own identifier and associated data. Any attempt to define a subset of common data between the various parts seems destined to fail at the first arising exception. The issue of updating data of related parts needs to be addressed as well (i.e. what happens when a new determination is done on one of the parts). At the moment this is entirely left to the curators and them manually notifying curators of related parts. An automated notification that gives the option to accept or reject changes is one of the requirements for the Herbarium Catalogue, but has not been implemented yet. There is also a difficulty in identifying parts during curation of historical collection and their digitisation and users often need to detach parts that had been initially considered as part of the main speciment.
(This is also complicated by some of the specimens in the Herbarium being vouchers for specimens in other collections, i.e. the Living Collections, DNA material, Seed Bank, plus of course duplicates in other herbaria). Making sure changes in names and identification status propagate outwards to the related collections would (to me) be one of the big pluses in having a global identifier if it could join related subsamples (however defined) from the original collection event)
LCD: Can't think of sub-sample of accessions
MSB: Sub-samples: Herbarium Vouchers can be easily traced within SBD using the Serial Number as can Grow Events. When Grow Events are entered into the Living Collection Database, the SBD Serial Number is stored with the record. Regenerations of seed samples can be traced using a parental Serial Number.
Follow up questions:
1. How are you specimens organised into larger identifiable sets (collections, named collections databases, institutions, etc.)?
Herbcat: Beside the main Herbarium Collection we have 2 major named collections (Spirit and Mycology collection). These are kept separately from the main Herbarium collections. The spirit collection is stored in a separate room /cupboards and has been incorporated in the Herbarium Catalogue where itÂ’s identified by the preparation code and collection name fields. The mycology collection is kept and databased separately for now. There are then other minor named collection (i.e. the Wallace collection, the Palms collection, the Orchid Herbarium, and so on). Most of these are kept in separate parts of the Herbarium and still stored in various datasets some of which have been incorporated to the Herbarium Catalogue with the main herbarium collection. The main herbarium collection have been databased in various datasets in the past. Some of these have been incorporated in the Herbarium Catalogue, while new digitisers are currently databasing directly in HerbCat. We do record details of duplicates / other specimens in other institutions. For these we record the owning herbarium acronym and, if available on the specimen, we record the specimen identifier as well.
LCD: Only the collection - groups of accessions based on how they were acquired. Used only internally and more as a way of normalising information (e.g. lats and longs at the collection site) than as any fixed concept.
2. What identifiers get assigned to each of these sets in your organisation or domain?
Apart from 'K' - the IH acronym for Kew as a whole - there are no internal collection identifiers within Kew, beyond the individual names of the databases which hold the various collections. (e.g. HerbCat, HerbTrack (mycology) LCD (Living collections) MSB (Seedbank) and so on, which are only really visible internally
*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk