GUIDS for specimens

Sally Hinchcliffe S.Hinchcliffe at KEW.ORG
Thu Dec 22 14:56:37 CET 2005


Apologies that this has taken so long ... I have even forgotten the
topic number. Taxon Guids to follow

Guids - specimens

This is a combined response looking at our Herbarium catalogue, Seed
Bank, & Living collections. There are other specimen holding
collections within the gardens, but more or less the same answers
would generally apply

1. What identifiers (how many per specimen) get assigned to specimens
in your organisation or domain (field numbers, catalogue number etc.)

HerbCat: collector number (collection event level), barcode (per
sheet; multi part collections get multiple barcodes, joined together
by a nominaated 'top sheet'), bottle number (spirit collections),
mycology number (mycological collections)

Living Collection (LCD):
Each accession has the following identifiers:
Accession key - unique to each accession (an Accession may cover a
single plant if a tree or a number of plants if something smaller
like a bulb)
Collection key - each collection (ie. collecting event) can contain
one or many accession. A collection contains information specific to
the batch in which the accession arrived, for example from an
expedition.
Taxon key - specific to each living collection taxon. Many-to-one
between accession and taxon as you might guess

Seedbank (MSB):
Serial Number is the unique identifier per seed sample specimen.
There is
also a batch number which relates to a batch of seeds coming from a
group
of collections.

2. What is the scope of uniqueness for each of these identifiers?

All: Collector numbers are theoretically unique to their collector
although not always. Not all collectors number their collections.
Collector names are not necessarily unique or consistently cited. One
collection event will give rise to multiple duplicates - which could
be of different types of material, for example fresh material used
for dna + a dried voucher + a spirit collection.

Herbcat: Kew's barcodes are theoretically global, at least in the
Herbarium world as they consist of a prefix (K, taken from Index
Herbariorum) plus a unique number.
Other numbers (Mycology and spirit collection) are unique to the
database they are stored in.

LCD: Accession key is instititution socpe. Collection is the same.
Taxon is system-scope (we are currently discussing the possibility of
having one Kew-wide taxon system but it's currently no more than a
bright idea ...) Accession keys might be used in other systems - e.g.
if plants are grown from the seed bank seeds or if material is sent
to the herbarium or to the labs for DNA analysis

MSB:
Serial Number: Unique within SBD
Batch Number: Unique within SBD for each batch of collections.

3. Can you explain the life cycle of these identifiers

All: Collector number - These are assigned in the field by the
collector to the entire collection (which will give rise to
duplicates both at Kew and at other institutions)
When databasing from an existing specimen, the collectors number and
the collector name are taken off the label.

Herbcat: Barcodes come from preprinted rolls of barcodes which are
assigned to specimens as the specimens are databased. The barcode
label is then glued onto the sheet and the barcode scanned into the
database. If a specimen has multiple sheets or other parts then it is
given multiple barcodes and the specimen is grouped in the database
with its constituent parts using a joining table. This 'specimen'
does not have any externally visible id.
Mycology numbers are automatically generated by the database and
printed on the specimen label.
Spirit collection bottle numbers are printed on the bottle label and
(I think?) generated by the database.

LCD:
Taxon key is assigned by the system on creation of new taxa. The
collection key is assigned automatically on creation of new accession
and remains associated with an accession for its lifetime. It follows
any splits, transfers, or other actions that the accession undergoes.
The accession key is assigned on creation and is either automatically
assigned or user defined. Splitting or propagating an accession
(creating a new accession and retaining some of the former's fields)
will result in the creation of a new accession key (again automatic
or user-defined) and the persistence of the current accession which
retains its current key.

MSB:
Serial Number: Assigned automatically by SBD. Incremental but with a
check
digit at the end.
Batch Number: Assigned automatically by SBD when required.
Incremental.


4. Can you give examples of how these identifiers are used to
retrieve the specimen and/or information on the specimen?

Herbcat: Database can be searched by collector name and number to
retrieve information on the specimen, then use family, genus, species
and country to locate the specimen in the Herbarium cupboards.
Database can be searched by barcode  to retrieve information on the
specimen, then use family, genus, species and country to locate the
specimen in the Herbarium cupboards.
Barcodes are also used to name image files made of each specimen and
will be used to search our picture index
Database can be searched by bottle number. Spirit collection bottles
are stored in cupboards ordered by bottle number.
Database can be searched by mycology number. Not sure how these are
then arranged in the herbarium cupboards.

LCD: The LivColl system can search on collector, taxon and accession
key. The accession keys are also used on the garden labels and are
visible to the public.

MSB:
The SBD interface is based around entry of Serial Numbers. The
physical locations(s) of the specimen within the seed bank can be
found using Serial Number. Queries can be made using a reporting tool
to retrieve information using Batch Number.


5. Would there be any social or technical roadblocks to replacing
these identifiers with a single identifier that was guaranteed to be
unique?

All: Collectors and their numbers would be a difficult one to turn
into guids and keep track of as as far as I know there is not one
database holding information about all collectors or even all
botanical collectors (It would be great if there were though).
Standard forms (eg Surname, initials) might go a long way towards
identifying collectors more-or-less uniquely but it would not be
foolproof.

Herbcat:
Bottle number and mycology number are used in publications, not sure
if barcodes have been already as well but I assume they might in
future.
There would be a lot of additional curatorial work to replace / add
labels on the specimens with the new numbers. Spirit specimens
require special labels to fit on small bottles (bottle numbers are
currently printed on the specimen label rather than on a separate
label for this reason).
Not sure how the new numbering system would impact on the spirit
collections as this is arranged in the cupboards according to the
current numbering.
Changes to software systems would be required to accept a new format
for barcodes, bottle numbers and mycology numbers.
Barcode numbers are also used also to store digital images of the
specimens and these files would need to be renamed.
(None of these should prove to be a problem if the GUID  was a
combination - ie. took a prefix to indicate the institution and
collection and then continued with the existing numbers. This would
mean that the GUID could be generated if all you knew was the barcode
and how to generate the GUID from a given barcode)

MSB:
There would be opposition from users who have used the Serial Number
system for many years. There would be lots of technical issues with
tracking existing collections and rewriting code.
(As above, as long as the Serial number continued within a Guid
'wrapper' a lot of the objections cited here could be avoided)


6. In the case of subsamples from a specimen, can you identify issues
around associating the sample and associated information with the
source specimen and associated information

HerbCat: If this is about specimens with multiple parts I have come
to the conclusion that these should be linked, but also dealt with as
separate entities, each with their own identifier and associated
data. Any attempt to define a subset of common data between the
various parts seems destined to fail at the first arising exception.
The issue of updating data of related parts needs to be addressed as
well (i.e. what happens when a new determination is done on one of
the parts). At the moment this is entirely left to the curators and
them manually notifying curators of related parts. An automated
notification that gives the option to accept or reject changes is one
of the requirements for the Herbarium Catalogue, but has not been
implemented yet.
There is also a difficulty in identifying parts during curation of
historical collection and their digitisation and users often need  to
detach parts that had been initially considered as part of the main
speciment.

(This is also complicated by some of the specimens in the Herbarium
being vouchers for specimens in other collections, i.e. the Living
Collections, DNA material, Seed Bank, plus of course duplicates in
other herbaria). Making sure changes in names and identification
status propagate outwards to the related collections would (to me) be
one of the big pluses in having a global identifier if it could join
related subsamples (however defined) from the original collection
event)

LCD: Can't think of sub-sample of accessions

MSB: Sub-samples: Herbarium Vouchers can be easily traced within SBD
using the Serial Number as can Grow Events. When Grow Events are
entered into the Living Collection Database, the SBD Serial Number is
stored with the record. Regenerations of seed samples can be traced
using a parental Serial Number.


Follow up questions:

1. How are you specimens organised into larger identifiable sets
(collections, named collections databases, institutions, etc.)?

Herbcat: Beside the main Herbarium Collection we have 2 major named
collections (Spirit and Mycology collection). These are  kept
separately from the main Herbarium collections. The spirit collection
is stored in a separate room /cupboards and has been  incorporated in
the Herbarium Catalogue where it’s identified by the preparation code
and collection name fields.
The mycology collection is kept and databased separately for now.
There are then other minor named collection (i.e. the Wallace
collection, the Palms collection, the Orchid Herbarium, and so on).
Most of these are kept in separate parts of the Herbarium and still
stored in various datasets some of which have been incorporated to
the Herbarium Catalogue with the main herbarium collection.
The main herbarium collection have been databased in various datasets
in the past. Some of these have been incorporated in the Herbarium
Catalogue, while new digitisers are currently databasing directly in
HerbCat.
We do record details of duplicates / other specimens in other
institutions. For these we record the owning herbarium acronym and,
if available on the specimen, we record the specimen identifier as
well.

LCD: Only the collection - groups of accessions based on how they
were acquired. Used only internally and more as a way of normalising
information (e.g. lats and longs at the collection site) than as any
fixed concept.

2. What identifiers get assigned to each of these sets in your
organisation or domain?

Apart from 'K' - the IH acronym for Kew as a whole - there are no
internal collection identifiers within Kew, beyond the individual
names of the databases which hold the various collections. (e.g.
HerbCat, HerbTrack (mycology) LCD (Living collections) MSB (Seedbank)
and so on, which are only really visible internally

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk




More information about the tdwg-tag mailing list