New subject: Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?

13 Nov 2010

      In the first and second installment of this series, I have tried to show 
that the class Individual as I have proposed it is a central part of a 
fully denormalized Darwin Core model.  It's connective role allows for 
one-to-many relationships between itself and both the Occurrence and 
Identification classes (see 
http://bioimages.vanderbilt.edu/pages/full-model.jpg).  I have also 
pointed out that in that role, it has very few properties.  The reason 
for this is described in detail on p.26 of my Biodiversity Informatics 
paper (https://journals.ku.edu/index.php/jbi/article/view/3664), but in 
summary the only way we can actually find out anything about an 
individual organism is through some kind of observation or collection, 
which is exactly what happens in an Occurrence.  Thus things that we 
"know" about Individuals generally are directly or indirectly associated 
with Occurrences, not with the instances of Individual themselves.

Rich has suggested that we should consider whether some properties that 
are currently properties of Occurrence should be moved into the proposed 
Individual class.  It is good to think about this, because we do want to 
have an economy of classes and terms (no point in having two classes for 
something when one would do), and because the mental image that we have 
about an individual organism does include aspects of both the proposed 
Individual class and the part of the ASC diagram called "Collecting 
Unit".  There are a number of ways of approaching this problem.  The 
first approach, which is the way the discussion developed on the email 
list, is to just try moving terms from Occurrence to the proposed 
Individual class and to see whether that would "work" or not.  As the 
discussion progressed, I began to feel increasingly uncomfortable with 
this process, but wasn't sure why.  After I went back to the ASC 
diagram, it became clear to me what was the problem was.  I believe that 
the question is really being framed incorrectly.  What I have proposed 
for the class Individual is precisely what I have described in the 
previous posts: for it to serve as a node connecting Occurrences to 
Identificaitons.  What I think Rich wants to recognize is the section of 
the ASC model called Collecting Unit and the boxes below it: Unsorted 
Lot, Lot, Specimen, and Specimen Component (I'm not sure exactly what 
"Derived Object" is - maybe things like images of specimens?).  If I am 
correct in understanding what Rich wants, then the question boils down 
to: can or should my proposed class be the same as (or possibly include) 
the section on the ASC diagram called Collecting Unit.  I think that I 
have a pretty clear idea in my mind what Individual as I have defined it 
means, so my task has been to try to understand what exactly is a 
CollectingUnit and what properties should it have.  The I can approach 
the question of congruence with "my" Individual.  If all things that we 
would want to fold within CollectingUnit share properties that can be 
placed within the Individual class, then they are congruent and should 
be the same thing.  If some or most properties that we want to fit 
within CollectingUnit don't fit the defined purpose of the Individual 
class, then they should be two separate classes. 

Because the ASC model was developed by the museum community, I think 
that its creators were primarily concerned with handling dead 
specimens.  However, as Rich has correctly pointed out, the distinction 
between dead and living CollectingUnits is probably artificial.  Rather, 
both living and preserved specimens may be instances of the same class 
which have a different value for some "live/dead" property (see 
http://code.google.com/p/darwincore/issues/detail?id=91).  So for the 
moment, I'm assuming that a CollectingUnit can be either living or 
preserved.  The case of preserved specimens is fairly straightforward.  
The have their origin in a single Occurrence that happens at a single 
Event (what I called a "resource creation event" in my Biodiversity 
Informatics paper).  Living specimens are more complex.  They may 
originate when the whole organism is collected from the wild and moved 
to a zoo or botanical garden (John's wildebeest calf).  In that case 
there is a clear "resource creation event" if we call the living 
specimen a resource that is distinct from the organism when it was in 
the wild.  In some cases, the living specimen is born in captivity, 
grown from a seed, or propagated vegetatively from a cutting.  In that 
case, there is also a definable event when the living specimen 
originated.  What was really driving me crazy was this:
http://bioimages.vanderbilt.edu/vanderbilt/7-314
The Bicentennial oak is a tree that is growing in Vanderbilt's 
arboretum.  It seemed to me that it was a living specimen because it is 
now a part of a collection of trees (the arboretum).  But it is over 230 
years old and Vanderbilt itself is only 137 years old. So clearly nobody 
captured, moved, or planted it to make it a part of the arboretum.  For 
a while I tried to define it out of being a living specimen, but then I 
realized that the thing that made it different from other old trees that 
are standing around Nashville is that it has been accessioned.  In other 
words, when the tree was claimed as a part of the arboretum, assigned an 
identifier (7-314), and added to the arboretum database, it became a 
living specimen in addition to being just a normal tree.  The event of 
calling the tree a part of the arboretum, assigning it an identifier, 
and adding it to the arboreutm database is the Occurrence that marks the 
creation of the thing "living specimen".  At that point it can have any 
attribute that other Occurrences have and it is then capable of serving 
as evidence for the Occurrence because anybody can examine it at will.  
The "claimed as a part of the arboretum" part is important, because I 
can go out into the woods and collect information about a tree there, 
assign it an identifier, and add it to my database, but that doesn't 
make it a living specimen because I don't assert that I have any control 
over it or that I can guarantee anyone that I can verify its status at 
will.  If I band a bird and release it, I have assigned it an identifier 
and hopefully will be able to track it over time, but I can't claim it 
is a living specimen because I don't claim to exert control over it.  
That's different from John's wildebeest calf which is in a pen and be 
observed at will.  It is similar to a maize plant in a field in Iowa 
which was cultivated by a human, but has no curator who is making sure 
that it can be found again and that it won't be harvested and ground up 
into wildebeest food without his or her knowledge. 

If I think about all of the kinds of things that I would like to put 
into the spot on the ASC diagram labeled "Collecting Unit" (including 
things like the Bicentennial Oak that was never "collected" by anybody), 
the one thing that they all seem to have in common is this aspect of 
being "accessioned".  So I would assert that in a general model, 
"AccessionedUnit" would be a better name than "CollectingUnit".  Some of 
the terms that I think should come out of Occurrence (such as 
preparations and disposition) could apply to any AccessionedUnit. 

So that brings me back to the question of whether this thing that I'm 
calling AccessionedUnit (which is sitting in the spot on the ASC diagram 
where Collecting Unit was originally) can or should be considered the 
same as what I have proposed to be the class dwc:Individual.  The 
decision on this should not be made based on what we "think" an 
Individual should be, but rather on what we need it to be to fulfill the 
role that we have assigned it in our model.  With that in mind, it might 
be better for the moment to change the name dwc:Individual to 
dwc:ResamplingUnitHavingDetermination because that is what it needs to 
do according to its current definition and location in the model diagram 
(I'm considering resampling to be the documentation of multiple 
Occurrences).  The question them becomes: should AccessionedUnit be 
considered the same as ResamplingUnitHavingDetermination because they 
share the same properties (i.e. are described by the same terms)?  To me 
the answer is clearly "no".  It is very likely that an AccessionedUnit 
will never be associated with more than one Occurrence (i.e. be 
resampled), particuarly if it is dead and has been put in a museum 
collection.  It is possible that the thing referred to by an 
AccessionedUnit might be documented by multiple Occurrences if it is 
alive (like the Bicentennial Oak), but that is not an intrinsic property 
of an AccessionedUnit in the same way that preparations or disposition 
would be.  On the other hand it is also quite clear that many 
"ResamplingUnitHavingDetermination"s will never become accessioned.  
That would include the banded bird, a tree photographed in the forest, 
or a whale observed swimming in the ocean.  The longer I think about 
this, the more convinced I am that making a distinction between 
AccessionedUnit and ResamplingUnitHavingDetermination is the best course 
of action.

Having made a decision about this based on functional need and shared 
properties, it is still helpful for me to try to develop a mental image 
of what these two things are.  In my mind, I imagine the 
ResamplingUnitHavingDetermination (which I will henceforth return to 
calling dwc:Individual) to be an entity having a homogeneous taxonomic 
identity.  It has some moment when it came into existance as a living 
thing (by being born, planted, or founded) although we will never know 
when that moment was unless an Occurrence happens that allows us to 
document that Event.  The Individual remains an entity as long as it has 
the potential to be documented as an Occurrence.  That doesn't 
necessarily means that it must be alive.  But if it decomposes, or is 
preserved and put into a collection, it no longer is capable of being 
resampled (i.e. documented by an Occurrence).  Thus a fossil that is 
dead for a million years and is sitting in some stratum still fits my 
mental image of an Individual.  If it gets chipped out of the rock and 
put in a museum, there would no longer be any point in documenting 
another Occurrence for it since there would be no useful Location or 
GeologicalContext information to be gained from that.  A roadside 
population of herbaceous plants having homogenous taxonomic identity 
would be an Individual from the first time it was capable of being 
sampled (when it was founded) and would end being an Individual when it 
was extirpated by some road construction crew and was no longer capable 
of being documented by an Occurrence.  A wolf pack would be a similar case.

My mental image of AccessionedUnit is an entity that comes into 
existence when some human person or institution takes control of it, 
assigns it an identifier, and keeps records of it.  I think I would 
never see it as coming to an end.  Even if it is lost or destroyed, it 
would continue to exist as long as the person or institution maintains 
its record.  It would just have dwc:disposition "lost" or "destroyed".  
It could be a dead, preserved specimen in a jar or glued to a sheet of 
paper, a living wildebeest calf in a zoo, or even a field sampling plot 
in a park as long as the park exerts control and ownership over it and 
maintains records about it.  It could not be any wild, free-ranging 
animal or plant.  It could not be roadkill left on the side of the road 
to decompose.  It could not be a photograph of a wildebeest calf in the 
zoo, or the sound recording of the wildebeest calf's grunt. It COULD be 
a tissue sample from the wildebeest calf or from the roadkill.  The 
critical thing is that it is a physical artifact originating from a 
living thing that has been cataloged and placed under human control.  I 
think this is the kind of thing that Rich wanted to be able to define 
when he wanted to broaden the definition of Individual. 

For any entity having an origin as a living thing (in my mental image), 
its status as an Individual is independent of its status as an 
AccessionedUnit.  If the entity is removed and preserved in its entirety 
(fish killed and put in a jar of formaldehyde), it ceases to exist as a 
dwc:Individual and begins to exist as an AccessionedUnit.  If a branch 
is removed from a tree or one plant pulled from a roadside population to 
become specimens, the removed part becomes an AccessionedUnit while the 
dwc:Individual continues to exist.  In the case of the Bicentennial Oak 
or a permanent sampling plot, the entity simultaneously exists as both 
an AccessionedUnit and a dwc:Individual.  In terms of metadata records, 
the establishment of any AccessionedUnit is an Occurrence (grouped under 
the Individual) having a property of recordedBy.  Whether or not 
subsequent Occurrences are possible for the Individual depends on 
whether the act of creating the AccessionedUnit has rendered subsequent 
sampling irrelevant. 

I agree with the point that was made previously that no specific 
taxonomic level should be placed in the definition of Individual.  That 
would allow for the possibility that Individuals could contain several 
different lower level taxa as long as the Individual is homogeneous at 
the taxonomic level at with the determination is applied.  I am open to 
suggestion for how this could be accomplished.  Somehow there needs to 
be a value for a term like "individualScope" that allows one to make the 
kind of inferences about duplicates that I described previously.  Maybe 
one controlled value for "individualScope" should be "DuplicateLevel" 
meaning that the Individual is homogeneous in taxonomic identity to the 
level at which a taxonomist would collect multiple specimens and call 
them duplicates.  That would get us out of the problem of deciding 
whether the several grass stems we collect and send off to different 
herbaria are actually the same biological individual or clones connected 
by underground stems.  Other possible levels could be 
"BiologicalIndividual" for things known to be single biological 
individuals, and "Heterogeneous" for things that are know or suspect to 
be mixtures of lower level taxa but for which it is convenient to assign 
a determination at a higher taxonomic level at which we know the mixture 
to be homogeneous. 

For AccessionedUnit, I think there should also be an 
accessionedUnitScope term.  I defer to the museum people on this, but 
the boxes in the ASC diagram (unsorted lot, lot (presumably 
homogeneous), specimen (presumably one biological individual), and 
specimen component) could be a starting point.  The "partOf" and 
"hasPart" properties could be used to related AccessionedUnits that are 
related to each other.  Relating these various levels of 
AccessionedUnits to levels of Individual above "DuplicateLevel" is going 
to be tricky, but if people want to do this, I'm sure there is a way to 
represent the relationships in RDF. 

THE BOTTOM LINE
I believe that the proposed definition for the DwC class Individual 
should stand as it is (i.e. as a node to connect multiple Occurrences to 
multiple Identifications).  To allow Identifications for Individuals 
that are homogeneous at higher taxonomic levels, we also need a term 
like dwc:individualScope.  I believe that there needs to be a separate 
class that represents what I've described here as "AccessionedUnit" 
which also has some kind of scope property.  I am not going to propose a 
name for this thing or propose what properties belong with it.  Rich and 
the herbarium/museum/botanical garden/zoo people need to decide and 
propose that.  AccessionedUnit then becomes one of several types of 
evidence that can be used to support an Occurrence, with 
dctype:StillImage, dctype:Sound, dctype:Text as other possibilities.  
Darwin Core does not need to define their properties and types since 
others (MRTG, DCMI) have already done so.  We then need two more terms: 
one to relate the evidence to the Occurrence and one to relate the 
Occurrence to the evidence (I would suggest "hasEvidence" and 
"isEvidenceFor" as possibilities).  If we can do these things, I think 
we could say that a general (i.e. denormalized enough to satisfy 
everyone who is dissatisfied at the present moment) Darwin Core model is 
"complete" to the "left" of Identification on the 
http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram.  I'm not 
going to touch the Taxon side right now. 

Whether or not action is taken on creating a class for what I'm calling 
"AccessionedUnit", there is no reason to hold up action on my Individual 
class proposal if people agree with the points I've made here.

Steve

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?

Steve Baskauf

Richard Pyle

Steve Baskauf

Richard Pyle

Steve Baskauf

Richard Pyle

Richard Pyle

Bob Morris

Richard Pyle

Steve Baskauf

Richard Pyle

Kevin Richards

Paul Murray

Steve Baskauf

tags

participants (5)