Thanks, Chuck for this detailed
response. You are quite right that we need to be clear what we mean by
“specimen”. Your clarification of MOBOT’s use of
identifiers shows not only that there are many identifiers in use, but also
they may apply to any in a series of increasingly refined objects (or sets of
objects), and that there are good reasons for wanting to be able to identify
each item in that series. If we think of this in software modeling terms,
each of these could be a separate object which could be manipulated and
referenced independently of the others.
Different communities within biological
collections, will clearly have different series of identifiable objects.
For example an entomological collection could have the following series:
(Survey?) -> Contents of an
(malaise/light/water/etc.) trap -> Individual insect -> Insect part
(genitalia preparation, leg removed for DNA analysis) -> (DNA preparation?)
Handling of plankton samples, culture
collections and seedbank accessions will be different again. Within
botanical collections, is there any attempt to indicate that two separate
collecting events relate to the same plant or clonal population?
Depending on the needs and purpose of an
individual collection, it may track different items in these series.
Individual insects may be part of a numbered series or have their own numbers.
As Chuck suggests, this means that it is
not clear that we have a single common definition of “specimen”
that would be accepted by all of us. My use of the word
“subsample” and the phrase “identifiable set” in my
original question was an attempt to recognise that one group’s specimen
may be seen by another group as just a part of a specimen or as a set of
specimens. The ABCD Schema uses the general term Unit to reflect the
variation between different items recorded by different providers.
It seems to me that there are various ways
that we can try to handle this:
As we consider the use of GUIDs, I would
really also like us to think about the fourth of these options. Any “Unit”
(or whatever else we may use as a generic term for a biological item being
recorded) can be identified as belonging to a particular class of objects identified
within a shared ontology. We can do this by having an element whose value
must be the identifier for an object class registered in the ontology. This
allows an institution to make an assertion that one record relates to an
individual dead organism and that another relates to a tissue sample, and for
those assertions to be ones that software applications can process. Better
still, the presence of GUIDs for each of these records would allow us to add an
extra element to the tissue sample record that securely identifies the specimen
from which it was taken.
The bottom line here is that we certainly
need to do some work to make sure that we know what we are talking about when
we speak of a “specimen” (or any other similar term), but that we
can use a combination of GUIDs and a shared ontology to transcend the
difficulties this could present, and to construct subtle and informative webs
of information.
Donald
---------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100
Tel: +45-35321483
---------------------------------------------------------------
From:
Sent: 22 October 2005 00:40
To:
Subject: Re: Topic 2: GUIDs for
Collections and Specimens
I am responding to Donald’s
questions as they apply at
As several have described, there are
multiple layers of identification that occur with specimens, particularly
botanical specimens.
Our physical herbarium specimens are
structured in a hierarchy, starting from the original plant that was collected
down to individual pieces with labels.
COLLECTION
Identification begins at collection.
Multiple “samples” are usually taken from one plant or an entire
small plant may be taken, a collector’s number is assigned to the
sample in the collector’s field book along with notes and samples also
numbered. Samples of other plants of the same kind may also be taken with
different numbers assigned to each in the field book and on the sample.
Samples may be made up of multiple pieces - leaves and stems, fruits,
seeds, bark, etc. – some may be dried, others left wet. All of the
pieces/samples of the one plant described in one numbered field book entry
belong to the one organism noted by the collector.
PREPARATION
The pieces of dried or wet samples are
shipped back to MBG with their identifying numbers. Nowadays, the
information from the field book is recorded in Tropicos including the
collector’s number. A unique TropicosID number is assigned
in database to the specimen or “sample” and the data from the field
book is recorded including the collector’s name and number. Accession
numbers are assigned to each of the pieces of the sample that will be
“mounted” in a different way. A mounting sheet has the
accession number pre-printed on the sheet and the number applies to whatever is
mounted on the sheet. But, a separate large fruit from the same plant
would be put in a bag for instance and assigned a different accession number.
Nowadays, these accession numbers are also recorded in Tropicos. A
label is printed for the sheet and duplicate labels are printed for each of the
related “accessions”. They are all the same label with the
TropicosID and collector’s number on them.
DUPLICATES
Labels are also printed for the
“duplicate” samples but no accession numbers are assigned to them
and they are not mounted. The duplicates may be sent unmounted to
specialists for determination or to other herbaria. The identification of these
samples/specimens is what is printed on the included label – which
includes Tropicos ID, Collector’s Name and Collector’s Number.
The receiving institution may or may not assign additional numbers, mount
the sample on a sheet, database it, etc. Totally up to them.
MOUNTING
The flat pieces are mounted on the sheets,
large samples may require multiple sheets for one copy. Large things (fruits,
bark, branches) may be put into bags or other holding methods. A barcode
number is attached to the sheet and any additional pieces/accessions and
recorded in Tropicos. A different barcode is on each piece or accession.
So, barcodes have a one-to-one match to accession numbers. The duplicate
printed labels are also attached to the sheet and any related
pieces/accessions. If an attached barcode comes off and is lost, a new,
replacement barcode is attached and updated in Tropicos.
The use of Lead Collector’s Last
Name and field book (also called catalog) number is very common in botany
– eg. CROAT 10100. The collector-number method is frequently used
in reference literature plus the addition of the Index Herbariorium code for
the institution where the specimen was seen or gotten from. Duplicates of
CROAT 10100 could be at MO, K, P, F, etc. and those sheets may have different
accession numbers or no accession number at all.
Donald’s Questions:
On one
mounted specimen sheet at MBG are the following numbers/identifiers:
-
Accession number (100% unique)
-
Barcode number (100% unique)
-
Tropicos ID (applies to all accessions and barcodes for one sample/specimen)
-
Collector’s name and number (applies to all accessions, barcodes,
TropicosIDs, and duplicate samples/labels sent to other institutions from the
original collected organism)
All of
these numbers are recorded in the Tropicos database.
I
attempted to describe this above.
Collector’s
numbers are commonly unique to a collector and don’t repeat across notebooks,
but the numbers are not unique themselves and are only unique when combined
with Collector’s name
Accession
numbers and barcodes are unique to the sheet/bag they are attached to and are
one-to-one with each other and are unique within the institution
TropicosID
is unique within the database and the institution and is supposed to be
one-to-one with collector/collector number.
Lead
collector last name plus number is unique within the database and within the
institution but not unique globally.
Described
at the beginning.
The
primary search for specimens in Tropicos is by collector name and number.
Technically,
it would require addition of an “alias” identifier and additional
programming to enable searching on the alias.
Since
there are 4 identifiers in hierarchical relationship, which of them could be
the “single” identifier? This goes to my continuing question
of “what are we trying to identify”? The original specimen
(and its duplicates), a specific sheet, a specific part of a sheet, or part of
a specimen in an alcohol bottle separate from the sheet?
By
subsample, are we referring to the occurrence of “duplicates” of
the original organism or rather to the pieces of it, like bark, fruit, leaves?
What constitutes the “specimen” versus the sample? We
really need to sharpen the language in these discussions to eliminate the
round-robin responses that occur as everyone states their opinion of what they
think the terms mean but no one decides exactly the definition to be used by
everyone.
The
biggest issue to me is that there are no standards for identification of
anything below the level of the original collecting event and even the
collector name + number is just a common practice in botany, not a “standard”
and not universal by any means. The term “accession” means
different things to different institutions. Accession number at MBG
refers to an associated part of a specimen, not the whole specimen. Does
catalog number mean the same thing everywhere? To some it means the
collector’s number.
I
suppose another issue is that because of the common practice in botany of
collecting duplicate samples and sending them around to other institutions, any
worldwide count of databased specimens that does not account for these
duplicates will overstate the real number.
The subject of specimen identifiers is somewhat linked to
that of collection identifiers, since Darwin Core and the ABCD Schema have used
institution and collection codes together with catalogue numbers to identify
specimens in the absence of GUIDs. It would also be useful here to
collect information on the following:
We
don’t separate our collections into sets, they are all part of one
herbarium collection.
Accessions
combine into one specimen.
Duplicate
specimens can be at other institutions.
We do record
the institutions where we know duplicates of a specimen are located but we do
not record the other institution’s catalog numbers
Previously
discussed.