Re: Topic 2: GUIDs for Collections and Specimens
I am responding to Donald's questions as they apply at Missouri Botanical Garden.
As several have described, there are multiple layers of identification that occur with specimens, particularly botanical specimens.
Our physical herbarium specimens are structured in a hierarchy, starting from the original plant that was collected down to individual pieces with labels.
COLLECTION
Identification begins at collection. Multiple "samples" are usually taken from one plant or an entire small plant may be taken, a collector's number is assigned to the sample in the collector's field book along with notes and samples also numbered. Samples of other plants of the same kind may also be taken with different numbers assigned to each in the field book and on the sample. Samples may be made up of multiple pieces - leaves and stems, fruits, seeds, bark, etc. - some may be dried, others left wet. All of the pieces/samples of the one plant described in one numbered field book entry belong to the one organism noted by the collector.
PREPARATION
The pieces of dried or wet samples are shipped back to MBG with their identifying numbers. Nowadays, the information from the field book is recorded in Tropicos including the collector's number. A unique TropicosID number is assigned in database to the specimen or "sample" and the data from the field book is recorded including the collector's name and number. Accession numbers are assigned to each of the pieces of the sample that will be "mounted" in a different way. A mounting sheet has the accession number pre-printed on the sheet and the number applies to whatever is mounted on the sheet. But, a separate large fruit from the same plant would be put in a bag for instance and assigned a different accession number. Nowadays, these accession numbers are also recorded in Tropicos. A label is printed for the sheet and duplicate labels are printed for each of the related "accessions". They are all the same label with the TropicosID and collector's number on them.
DUPLICATES
Labels are also printed for the "duplicate" samples but no accession numbers are assigned to them and they are not mounted. The duplicates may be sent unmounted to specialists for determination or to other herbaria. The identification of these samples/specimens is what is printed on the included label - which includes Tropicos ID, Collector's Name and Collector's Number. The receiving institution may or may not assign additional numbers, mount the sample on a sheet, database it, etc. Totally up to them.
MOUNTING
The flat pieces are mounted on the sheets, large samples may require multiple sheets for one copy. Large things (fruits, bark, branches) may be put into bags or other holding methods. A barcode number is attached to the sheet and any additional pieces/accessions and recorded in Tropicos. A different barcode is on each piece or accession. So, barcodes have a one-to-one match to accession numbers. The duplicate printed labels are also attached to the sheet and any related pieces/accessions. If an attached barcode comes off and is lost, a new, replacement barcode is attached and updated in Tropicos.
The use of Lead Collector's Last Name and field book (also called catalog) number is very common in botany - eg. CROAT 10100. The collector-number method is frequently used in reference literature plus the addition of the Index Herbariorium code for the institution where the specimen was seen or gotten from. Duplicates of CROAT 10100 could be at MO, K, P, F, etc. and those sheets may have different accession numbers or no accession number at all.
Donald's Questions:
1. What identifiers (how many per specimen) get assigned to specimens in your organisation or domain (field numbers, catalogue numbers, etc.)?
On one mounted specimen sheet at MBG are the following numbers/identifiers:
- Accession number (100% unique)
- Barcode number (100% unique)
- Tropicos ID (applies to all accessions and barcodes for one sample/specimen)
- Collector's name and number (applies to all accessions, barcodes, TropicosIDs, and duplicate samples/labels sent to other institutions from the original collected organism)
All of these numbers are recorded in the Tropicos database.
2. What is the scope of uniqueness for each of these identifiers (notebook page, collector, database, institution, global, etc.)?
I attempted to describe this above.
Collector's numbers are commonly unique to a collector and don't repeat across notebooks, but the numbers are not unique themselves and are only unique when combined with Collector's name
Accession numbers and barcodes are unique to the sheet/bag they are attached to and are one-to-one with each other and are unique within the institution
TropicosID is unique within the database and the institution and is supposed to be one-to-one with collector/collector number.
Lead collector last name plus number is unique within the database and within the institution but not unique globally.
3. Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)?
Described at the beginning.
4. Can you give examples of how these identifiers are used to retrieve the specimen and/or information on the specimen?
The primary search for specimens in Tropicos is by collector name and number.
5. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
Technically, it would require addition of an "alias" identifier and additional programming to enable searching on the alias.
Since there are 4 identifiers in hierarchical relationship, which of them could be the "single" identifier? This goes to my continuing question of "what are we trying to identify"? The original specimen (and its duplicates), a specific sheet, a specific part of a sheet, or part of a specimen in an alcohol bottle separate from the sheet?
6. In the case of subsamples from a specimen, can you identify issues around associating the sample and associated information with the source specimen and associated information?
By subsample, are we referring to the occurrence of "duplicates" of the original organism or rather to the pieces of it, like bark, fruit, leaves? What constitutes the "specimen" versus the sample? We really need to sharpen the language in these discussions to eliminate the round-robin responses that occur as everyone states their opinion of what they think the terms mean but no one decides exactly the definition to be used by everyone.
The biggest issue to me is that there are no standards for identification of anything below the level of the original collecting event and even the collector name + number is just a common practice in botany, not a "standard" and not universal by any means. The term "accession" means different things to different institutions. Accession number at MBG refers to an associated part of a specimen, not the whole specimen. Does catalog number mean the same thing everywhere? To some it means the collector's number.
I suppose another issue is that because of the common practice in botany of collecting duplicate samples and sending them around to other institutions, any worldwide count of databased specimens that does not account for these duplicates will overstate the real number.
The subject of specimen identifiers is somewhat linked to that of collection identifiers, since Darwin Core and the ABCD Schema have used institution and collection codes together with catalogue numbers to identify specimens in the absence of GUIDs. It would also be useful here to collect information on the following:
7. How are your specimens organised into larger identifiable sets (collections, named collections, databases, institutions, etc.)?
We don't separate our collections into sets, they are all part of one herbarium collection.
Accessions combine into one specimen.
Duplicate specimens can be at other institutions.
We do record the institutions where we know duplicates of a specimen are located but we do not record the other institution's catalog numbers
8. What identifiers get assigned to each of these sets in your organization or domain (institution codes, collection codes, Index Herbarium acronyms, etc.)? 9. Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)? 10. Can you give examples of how these identifiers are used to locate the set and/or information on the set? 11. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
Previously discussed.
participants (1)
-
Chuck Miller