[tdwg-content] Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?

Richard Pyle deepreef at bishopmuseum.org
Sat Nov 13 20:49:21 CET 2010


Hi Steve,

I'm starting with your Part 3, as I have limited time right now, and this
was the one most relevant to our discussions. 

First, a clarification: Several times in earlier posts I made reference to
"BiologicalObject" in the ASC model.  I now realize that was my term, in one
of my early databases, that corresponded to "Collecting Unit" in ASC. Sorry
for the confusion.

> What I have proposed for the class 
> Individual is precisely what I have described in the previous 
> posts: for it to serve as a node connecting Occurrences to 
> Identificaitons.  

I think we're both in agreement on this.  The only real question is what are
the scope of allowable and distinguishable things that can serve this
function.  I think the only issue on scoping Individual boils down to this:

A whole specimen is collected. Two tissue samples are taken from it.  The
(remainder of the) whole specimen is preserved as a voucher, and the two
tissue samples are sent to two different institutions that specialize in DNA
sequencing and analysis.  A morphologist establishes an identification
instance for the whole specimen, and each of the molecular-based
institutions assign their own Identification instances to their respective
tissue samples, based on DNA sequencs of different genes. (in this example,
it doesn't matter whether the Identifications are all the same and
reinforcing; or different and competing).

To me, this is the crux of the issue:  Given that each of the three
institutions will generate a record in their respective database to
represent the physical object that they are managing, should we [Option A]
treat these as three separate instances of the "Individual" class, which
link to each other (and hence share Identification instances) via
parent/child relationships (e.g., the two tissue-sample Individuals as
"child" instances of the whole-specimen Individual)?

Or, [Option B] do we establish an abstract "Individual" instance to which
all Identification instances are linked, and for which these three objects
are representative "tokens"?

In my mind, both allow us to represent the information we want to represent.
I think Option B might be better in terms of representing the "reality" we
wish to model; but I think Option A is overwhelmingly more practical in the
"reality" of how data of this sort are actually managed by the people who
mange them.

> If I think about all of the kinds of things that I would like 
> to put into the spot on the ASC diagram labeled "Collecting 
> Unit" (including things like the Bicentennial Oak that was 
> never "collected" by anybody), the one thing that they all 
> seem to have in common is this aspect of being "accessioned". 
>  So I would assert that in a general model, "AccessionedUnit" 
> would be a better name than "CollectingUnit".  Some of the 
> terms that I think should come out of Occurrence (such as 
> preparations and disposition) could apply to any AccessionedUnit.

I don't think that's quite right.  Certainly, the ASC model was designed
specifically for DeadSpecimen data management; but I think conceptually a
CollectingUnit is not intrinsically defined as an Accessioned object
(definitely don't use the word "Accession" as part of the term -- this is
too loaded of a term).  I think of a CollectingUnit as what I called
"BiologicalObject".  That is, a physical object consisting of biological
material (or mineralized representations of biological material, in the case
of fossils).  Whether it is alive or dead, accessioned or not, captive or in
the wild ... are all attributes of a BiologicalObject, but do not *define*
what a BiologicalObject is.  The definition of a BiologicalObject is that it
is an object, primarily consisting of biological stuff (or mineralized
biological stuff).

> The decision on this should 
> not be made based on what we "think" an Individual should be, 
> but rather on what we need it to be to fulfill the role that 
> we have assigned it in our model.  

I completely agree!!  But the problem is that there are two, sometimes
competing, sets of needs: the needs of the data consumers, who want to use
this information for answering biological questions; and the needs of the
data providers, who are constrined by the nature of the data they manage.
The "art" in DwC is in reconciling gaps between these needs in an elegant
and simple way.

> With that in mind, it 
> might be better for the moment to change the name 
> dwc:Individual to dwc:ResamplingUnitHavingDetermination
> because that is what it needs to do according to its current 
> definition and location in the model diagram (I'm considering 
> resampling to be the documentation of multiple Occurrences).  

As we've discussed before, I think "Resampling" is only one benefit of an
Individual class, and should not be the defining metric of an Individual.
I'd be more comfortable following along with
dwc:SampledUnitHavingDetermination; because some of these Units that need to
have Determinations are resampled multiple times, and some are sampled only
once.

> The question them becomes: should AccessionedUnit be 
> considered the same as ResamplingUnitHavingDetermination 
> because they share the same properties (i.e. are described by 
> the same terms)?  To me the answer is clearly "no".  

I would agree.  But I don't think that's the right question.  To me, the
right question is:

Should the scope of SampledUnitHavingDetermination include all instances of
BiologicalObject; or only a subset of them? (i.e., Should
SampledUnitHavingDetermination *include* BiologicalObject, or *overlap with*
BiologicalObject?)

> It is 
> very likely that an AccessionedUnit will never be associated 
> with more than one Occurrence (i.e. be resampled), 

Agreed! But the vast, vast majority of things that have Determinations and
are associated with Occurrences are only sampled once.  Even if they
represented the vast, vast minority (but non-trivial), there would still be
solid rationale for not restricting the scope of Individual to only those
things that are Sampled more than once.

> Having made a decision about this based on functional need 
> and shared properties, it is still helpful for me to try to 
> develop a mental image of what these two things are.  In my 
> mind, I imagine the ResamplingUnitHavingDetermination (which 
> I will henceforth return to calling dwc:Individual) to be an 
> entity having a homogeneous taxonomic identity.  It has some 
> moment when it came into existance as a living thing (by 
> being born, planted, or founded) although we will never know 
> when that moment was unless an Occurrence happens that allows 
> us to document that Event.  The Individual remains an entity 
> as long as it has the potential to be documented as an 
> Occurrence.  That doesn't necessarily means that it must be 
> alive.  But if it decomposes, or is preserved and put into a 
> collection, it no longer is capable of being resampled (i.e. 
> documented by an Occurrence).  Thus a fossil that is dead for 
> a million years and is sitting in some stratum still fits my 
> mental image of an Individual.  If it gets chipped out of the 
> rock and put in a museum, there would no longer be any point 
> in documenting another Occurrence for it since there would be 
> no useful Location or GeologicalContext information to be 
> gained from that.  A roadside population of herbaceous plants 
> having homogenous taxonomic identity would be an Individual 
> from the first time it was capable of being sampled (when it 
> was founded) and would end being an Individual when it was 
> extirpated by some road construction crew and was no longer 
> capable of being documented by an Occurrence.  A wolf pack 
> would be a similar case.
> 
> My mental image of AccessionedUnit is an entity that comes 
> into existence when some human person or institution takes 
> control of it, assigns it an identifier, and keeps records of 
> it.  I think I would never see it as coming to an end.  Even 
> if it is lost or destroyed, it would continue to exist as 
> long as the person or institution maintains its record.  It 
> would just have dwc:disposition "lost" or "destroyed".
> It could be a dead, preserved specimen in a jar or glued to a 
> sheet of paper, a living wildebeest calf in a zoo, or even a 
> field sampling plot in a park as long as the park exerts 
> control and ownership over it and maintains records about it. 
>  It could not be any wild, free-ranging animal or plant.  It 
> could not be roadkill left on the side of the road to 
> decompose.  It could not be a photograph of a wildebeest calf 
> in the zoo, or the sound recording of the wildebeest calf's 
> grunt. It COULD be a tissue sample from the wildebeest calf 
> or from the roadkill.  The critical thing is that it is a 
> physical artifact originating from a living thing that has 
> been cataloged and placed under human control.  I think this 
> is the kind of thing that Rich wanted to be able to define 
> when he wanted to broaden the definition of Individual.
> 
> For any entity having an origin as a living thing (in my 
> mental image), its status as an Individual is independent of 
> its status as an AccessionedUnit.  If the entity is removed 
> and preserved in its entirety (fish killed and put in a jar 
> of formaldehyde), it ceases to exist as a dwc:Individual and 
> begins to exist as an AccessionedUnit.  If a branch is 
> removed from a tree or one plant pulled from a roadside 
> population to become specimens, the removed part becomes an 
> AccessionedUnit while the dwc:Individual continues to exist.  
> In the case of the Bicentennial Oak or a permanent sampling 
> plot, the entity simultaneously exists as both an 
> AccessionedUnit and a dwc:Individual.  In terms of metadata 
> records, the establishment of any AccessionedUnit is an 
> Occurrence (grouped under the Individual) having a property 
> of recordedBy.  Whether or not subsequent Occurrences are 
> possible for the Individual depends on whether the act of 
> creating the AccessionedUnit has rendered subsequent sampling 
> irrelevant.
> 
> I agree with the point that was made previously that no 
> specific taxonomic level should be placed in the definition 
> of Individual.  That would allow for the possibility that 
> Individuals could contain several different lower level taxa 
> as long as the Individual is homogeneous at the taxonomic 
> level at with the determination is applied.  I am open to 
> suggestion for how this could be accomplished.  Somehow there 
> needs to be a value for a term like "individualScope" that 
> allows one to make the kind of inferences about duplicates 
> that I described previously.  Maybe one controlled value for 
> "individualScope" should be "DuplicateLevel"
> meaning that the Individual is homogeneous in taxonomic 
> identity to the level at which a taxonomist would collect 
> multiple specimens and call them duplicates.  That would get 
> us out of the problem of deciding whether the several grass 
> stems we collect and send off to different herbaria are 
> actually the same biological individual or clones connected 
> by underground stems.  Other possible levels could be 
> "BiologicalIndividual" for things known to be single 
> biological individuals, and "Heterogeneous" for things that 
> are know or suspect to be mixtures of lower level taxa but 
> for which it is convenient to assign a determination at a 
> higher taxonomic level at which we know the mixture to be homogeneous.
> 
> For AccessionedUnit, I think there should also be an 
> accessionedUnitScope term.  I defer to the museum people on 
> this, but the boxes in the ASC diagram (unsorted lot, lot 
> (presumably homogeneous), specimen (presumably one biological 
> individual), and specimen component) could be a starting 
> point.  The "partOf" and "hasPart" properties could be used 
> to related AccessionedUnits that are related to each other.  
> Relating these various levels of AccessionedUnits to levels 
> of Individual above "DuplicateLevel" is going to be tricky, 
> but if people want to do this, I'm sure there is a way to 
> represent the relationships in RDF.
> 
> THE BOTTOM LINE
> I believe that the proposed definition for the DwC class 
> Individual should stand as it is (i.e. as a node to connect 
> multiple Occurrences to multiple Identifications).  To allow 
> Identifications for Individuals that are homogeneous at 
> higher taxonomic levels, we also need a term like 
> dwc:individualScope.  I believe that there needs to be a 
> separate class that represents what I've described here as 
> "AccessionedUnit"
> which also has some kind of scope property.  I am not going 
> to propose a name for this thing or propose what properties 
> belong with it.  Rich and the herbarium/museum/botanical 
> garden/zoo people need to decide and propose that.  
> AccessionedUnit then becomes one of several types of evidence 
> that can be used to support an Occurrence, with 
> dctype:StillImage, dctype:Sound, dctype:Text as other possibilities.
> Darwin Core does not need to define their properties and 
> types since others (MRTG, DCMI) have already done so.  We 
> then need two more terms:
> one to relate the evidence to the Occurrence and one to 
> relate the Occurrence to the evidence (I would suggest 
> "hasEvidence" and "isEvidenceFor" as possibilities).  If we 
> can do these things, I think we could say that a general 
> (i.e. denormalized enough to satisfy everyone who is 
> dissatisfied at the present moment) Darwin Core model is 
> "complete" to the "left" of Identification on the 
> http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. 
>  I'm not going to touch the Taxon side right now.
> 
> Whether or not action is taken on creating a class for what 
> I'm calling "AccessionedUnit", there is no reason to hold up 
> action on my Individual class proposal if people agree with 
> the points I've made here.
> 
> Steve
> 
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt 
> University Dept. of Biological Sciences
> 
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> 
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> 
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707 
> http://bioimages.vanderbilt.edu
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content




More information about the tdwg-content mailing list