John Wieczorek has
made an additional proposal
(http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002574.html)
to
resolve the issue of "Evidence" by creating a class called
"CollectionObject". I believe that his intended meaning
for the term is exactly what I have intended in the past when I used
the term
"token". I believe that the proposal for the CollectionObject class is
intimately related to the question of
the definition of Individual/BiologicalEntity because I think that the
competency questions Rich wants to address through the
Individual/BiologicalEntity class are a subset of what I would consider
to be the competency questions for the CollectionObject class.
Because of TDWG's historical roots in the collections community,
"occurrences" have been subconsciously or even explicitly linked with
the evidence that documents them. For example, PreservedSpecimens have
been considered a subclass of Occurrence in the Darwin Core type
vocabulary. But in the lengthy tdwg-content discussion of 2009-10
(summarized at
http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary), there
seemed to be a consensus that an Occurrence is a record of an
Individual at a particular Time and Location. That Occurrence is an
independent entity from the evidence that serves to document it, which
could include one (or perhaps zero) to many PreservedSpecimens, Images,
text files recording MachineObservations, etc. Separating the
Occurrence from its documenting evidence makes it easier to be explicit
about the number and types of evidence that document the Occurrence.
So I would define competency question #1 for the proposed
CollectionObject class as to:
1. document an Occurrence (which I will refer to as the "Occurrence"
use of evidence).
However, I would also assert that CollectionObject need not be
restricted to documenting an Occurrence, but that it may also:
2. document the existence of an Individual/BiologicalEntity, aggregates
that include it, and pieces that came from it (which I will refer to as
the "Provenance" use of evidence)
and
3. support an Identification to a particular Taxon (the
"Identification" use of evidence).
[note: see http://code.google.com/p/darwin-sw/wiki/ClassToken for
further discussion of this]
If the CollectionObject is a whole dead organism serving as a museum
specimen, then it might simultaneously address all three of these
questions. The time and place listed on the specimen label provide the
information about the Occurrence, the dead organism serves as proof of
its own existence, and determinations typed or written on the label are
vouched for by the presence of the dead organism. In this case the
need to separate these three uses of CollectionObject may not be
obvious.
But consider a more complex situation of the sort that BiSciCol would
like to be able to handle. A wildebeest calf is digitally photographed
in South Africa as it is being captured. The calf is shipped to a zoo
in England where a blood sample is taken. Part of this tissue sample
is stored in liquid nitrogen, but part is used for a DNA extraction.
The DNA is sequenced and used in a molecular phylogeny project. Here
we have 5 pieces of evidence (each of which could be assigned GUIDs):
the digital StillImage, the calf itself in the zoo (a LivingSpecimen),
the blood sample, the DNA sample, and the DNA sequence in digital text
form. All five of these pieces of evidence could potentially reside as
CollectionObjects in different physical or electronic repositories.
Competency question 1 (Occurrence)
StillImage: timestamp and embedded GPS metadata associated with the
image document the time and place where the calf was located at the
time of capture.
LivingSpecimen: the collection record that the zoo keeps for the calf
document the time and place where the calf was located at the time of
capture.
(the other three pieces of evidence do not provide information about
any time and place where the calf was located)
Competency question 2 (Provenance)
StillImage: a foaf:depiction of the calf (Individual/BiologicalEntity)
LivingSpecimen: owl:sameAs the calf (Individual/BiologicalEntity)
blood sample: dcterms:isPartOf the calf (Individual/BiologicalEntity)
DNA sample: dcterms:isPartOf the blood sample which dcterms:isPartOf
the calf (Individual/BiologicalEntity)
DNA sequence: [sequencedFrom] the DNA sample which dcterms:isPartOf the
blood sample which dcterms:isPartOf the calf
(Individual/BiologicalEntity)
(all five pieces of evidence support the existence of the calf, and the
provenance of each piece of evidence can be traced back to the calf)
Competency question 3 (Identification)
As part of the phylogenetic analysis, the DNA sequence could serve as
evidence for assigning the calf to a particular taxon.
A mammal expert in Australia might examine the digital StillImage via
the web and assert that the calf is a wildebeest.
Another mammal expert in England might examine the calf at the zoo and
assert that the calf is a wildebeest (or perhaps look at the calf in
the zoo when it is full grown and also look at the StillImage taken at
the time it was captured and make the assertion based on two forms of
Evidence).
(the DNA sample itself apart from the sequence probably wouldn't be
used as evidence for an Identification; the blood sample might if the
cytology were distinctive)
-----------
I would assert that the bottom line here is that an entity that falls
within the proposed CollectionObject class would need to address at
least one of these three competency questions. It would not be
necessary for it to address all three. The properties of instances of
the CollectionObject class would be that they could be connected
through object properties to Occurrences,
Individuals/BiologicalEntities, or Identifications; and that they could
have data properties that we typically assign to collected items such
as catalogNumber, collectionCode, preparation, etc. John has suggested
moving DwC terms for such properties from under Occurrence to the
proposed CollectionObject class
(http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002574.html).
As John has noted, this is a significant change, but I believe that it
is an important one if one is to accept the distinction between
Occurrences and the evidence that documents them - an imperative in
more complex cases such as I outlined above.
----------
I want to end this email by returning to Rich's outlook on the
"Individual" issue. In his various posts, it seems to me that much of
what he wants to accomplish through the "Individual" class falls within
what I've defined here as competency question 2 (tracking provenance of
resources, in particular physical things that are, include, or are
taken from living organisms). It seems like the CollectionObject class
is fully capable of doing much of what he wants to accomplish. Just
get rid of the term "Individual" - as Paul Murray has noted
(http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002615.html) it
already has a different meaning. Use "CollectionObject" to address
Rich's provenance competency questions (tracking and connecting
collected objects) and "BiologicalEntity" to address mine (joining
zero-to-many Occurrences to zero-to-many forms of evidence to
zero-to-many Identifications). So then what IS an actual individual
organism like the wildebeest calf? It is a BiologicalEntity if it has
been documented as an Occurrence or assigned an Identification. It is
a CollectionObject if it was collected for a zoo, or shot and mounted
in a museum. Or it can be both simultaneously if it is both documented
and collected. If none of these things were done, then it's neither a
BiologicalEntity nor a CollectionObject - it's simply a wildebeest
calf. Define the class/type of the thing by the properties that you
wish to assert for it (or the competency questions that you can answer
for it).
With regard to the issue of taxonomically heterogeneous entities:
tracking the provenance of taxonomically heterogeneous
CollectionObjects and CollectionObjects that are pieces of organisms is
not really a big deal. A fish fin isPartOf an individual fish isPartOf
a jar of a mixed fish isPart of a marine trawl. However, tracking and
reconciling Identifications of taxonomically heterogeneous collections
of things and their subsamples (the second part of what Rich wanted to
accomplish) is more complex task that I cannot at this point wrap my
head around (see the "Additional Comments" at
http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity for more
commentary on this).
Steve
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu