Re: [tdwg-content] Why it matters what kind of things we include in the definition of Individual

6 Nov 2010

      Rich,
I have done a couple read-throughs of your posts and I had two immediate 
comments.  The first is that I think we want to accomplish many of the 
same things here and that the problem really is what we want to call 
things, not what we want to accomplish.  So I'm encouraged by that.  I 
think that what I need to do is to get a piece of paper out and try to 
map out what you are saying and what I'm saying.  I think they will turn 
out to be mostly congruent but with different labels. 

The second thing is that the point of recounting that story had nothing 
to do with the rank of the tree (species, subspecies, or whatever).  I 
have no opinion on lumping, splitting, or whether species, subspecies, 
etc. actually exist or not or whether it is better to call something a 
species, subspecies, or variety.  I really don't care what kind of name 
you apply to the whole organism.  The point I was trying to make with 
the story was that on the scale we've been talking about from the entire 
biosphere to populations to individual organisms to parts of organisms 
to molecules, the individual organism is the point at which we no longer 
have to worry that further subdivisions might not share a common 
Identification.  That is what I'm saying is "special" about the whole 
organism level (vs. parts).  If you  known that pieces came from the 
same whole organism, then you can be confident that an identification 
that is assigned to any of the pieces down to any level of further 
subdivision will be the same as an identification assigned to any other 
piece.  Thus it is superfluous to assign separate identifications to 
every piece when you can simply assign a single identification to the 
whole organism and infer that that identification applies to all of the 
pieces.   This is assuming that you have all of the pieces from that 
organism.  If you have some of the pieces and somebody else has some of 
them, of course the two of you would assign separate identifications to 
your sets of pieces (unless you had synchronized databases that "knew" 
that you were both talking about the same organism - one of the points 
of having an identifier for individual organisms is so you can do 
that).  I think you pretty much said the same thing below using 
different words.

I'm going to digest what you wrote for a while before I make further 
comments.
Steve

Richard Pyle wrote:
...
Hi Steve,
I've finally had time to carefully read your recent series of emails on the
acceptible scope of "Individual".
It has become somewhat apparent that we each support the establishment of
the class "Individual" in DWC for different reasons, as evidenced by our
different perspectives on what the acceptable scope of an "Individual" can
be.  I tend to think of "Individual" in the context of the ASC model's
"BiologicalObject"; whereas you tend to see it more in terms of an
"organismal" individual.
DwC began as a very-much PreservedSpecimen-oriented exercise.  In order to
include non-PreservedSpecimen instances of biodiversity data, the attributes
of PreservedSpecimen were largely folded into the core class "Occurrence".
I am a HUGE fan of broadening the scope of data that can be represented and
exchanged via DwC, so I mostly saw this as a Good Thing.  But I always had a
pang of apprehension for representing PreservedSpecimens as Occurrences,
because whereas both HumanObservations and PreservedSpecimens bear
Occurrence-related information, and this Occurrence-related information is
one of the most popular uses of DwC content (e.g., maps, modelling),
PreservedSpecimens are much more than "Occurrence".  Things like DNA
sequences, morphological characteristics, preservation methods, storage
details, loan information, and so on are all kinds of information that
people holding the data associate with a PreservedSpecimen and share via,
but it seems somewhat convoluted to represent these as attributes of an
Occurrence.
I had supported the notion of a class "Individual" in large part to serve as
a conceptual object on to which many of these things would be more
appropriately attached as attributes than to Occurrence.  My concern now is
that the pendulum is swinging too far in the other direction.  In
otherwords, the move from supporting PreservedSpecimen data almost
exclusively, to supporting more general biodiversity data, may be swinging
further into a realm where it fails to support Specimen data adequately.  As
I said, I am very much a supporter of "big tent" DwC, and I would hate to
see objects in DwC scoped in such a way that it unnecessarily excludes
content representtion.
So I guess what I'm trying to say is, that the less the proposed class
"Individual" can solve what I see as problems with DwC, the less supportive
of it I become.
Before I get into the nitty gritty, I want to dispense with your "splitter"
example.  "Splitters" work at the rank of species every bit as much (even
moreso) than at the rank of subspecies.  There are analagous stories where
the hyper-splitter would treat different parts of the same organism as
different taxa at the rank of species.  My point is, this story does not, in
my mind, in any way support the exclusion of "Individuals" being identified
to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of
species.  As far as I'm concerned, limiting an "Individual" to be only those
things we can confidently assign to a taxon at the rank of "species" is a
non-starter.  I could fill this email with reasons why, but I think I've
already done that in previous emails, so no need to repeat here.
But I do concede there is a rational basis for not treating "parts" of an
organism as distinct Individuals.  I'm not yet completely convinced,
however.  To be persuaded that subcomponents (parts) of a single "organism"
should not be represented through records of the proposed DwC "Individual"
class, I'll need to believe that the potential harm/confusion in doing so
would (still not clear on what that is) cannot be easily mitigated by
filtering with a "individualScope" property.
OK, so I'll try to address each of your reasons why you think that the scope
of instances of the proposed "Individual" class should not include units
below a "single organism".
...
if you consider the comment, which describes the primary
function of Individual: "Instances of this class can serve
the purpose of connecting one or more instances of the
Darwin Core class Occurrence to one or more instances of the
Darwin Core class Identification" it becomes clear that
making parts of organisms Individuals defeats this primary
purpose for the term.
I'm not sure I agree with that last statement.  In other words, I don't see
how the purpose of "Individual" is defeated if the lower limit of the scope
of "Individual" is a whole organism.
Setting aside the cases where "whole organism" can be a bit ambiguous
(corals, sponges, fungi, etc.), suppose we only have a preserved part of an
organism -- a Herbarium specimen, for example.  It's common practice to have
multiple samples of the same plant preserved as different
PreservedSpecimens, sometimes housed in different institutions.  A large
problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.)
are trying to solve, is the problem of linking this disparate
PreservedSpecimens (as well as the tissue samples derived therefrom)
together. Different collections that house multiple specimens from the same
individual plant (but don't yet realize it), would presumably each establish
an instance of "Individual" to represent their specimen data via DwC. Thus,
each of the indivudal PreservedSpecimens would have its own unique value of
dwc:individualID.  The question then becomes, how do we aggregate these
instances of Individuals to represent the "same thing"?
In my way of thinking, where "Individual" is functionally equivalent to the
ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me
I'm wrong on this), then the original dwc:individualID records would
continue to exist as their own distinct record, with
dwc:individualScope="PartOfOrganism", with their own distinct associated
data for preservation method, linked photos, etc., etc.  They would be
aggregated by the establishment of a new instance of "Individual", with its
own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism".
The various Individual instances where dwc:individualScope="PartOfOrganism"
would be aggregated when they each establish a "isPartOf" or "derivedFrom"
relationship with the single Individual instance where
dwc:individualScope="WholeOrganism".  The same model could apply to tissue
samples, and other derived bits of a whole organism.  As long as the
dwc:individualScope value is properly applied, then it should be easy to
apply appropriate reasoning logic. No?
How, then, would you represent this sort of information if the class
Individual were not allowed to be applied to less-than whole organism
instances?  I gather that the dwc:individualID values established by the
different collections for parts of the same whole organism would each
effectively refer to the same whole organism, so you would link them
together via "sameAs" relationships?
...
The major selling point for having Individuals at all is
to get out of the business of applying determinations to
all of the pieces of evidence such as specimens, images,
sounds, etc. that get collected from the same biological
individual through multiple Occurrences.
For me the main selling point of the Individual class is to remove
information that does not intrinsically belong to an "Occurrence" out of
that class, and into a more appropriate class.
...
This has the benefit that if one applies an Identification
to the Individual, all physical and information resources
that are derived from the individual automatically get
associated with the Identification and hence the taxonomic
informations referenced by the Identification.  If we call
preserved specimens that are pieces of organism Individuals
having a value of individualScope="part", then do we do
the same thing to them as we do with Individuals at higher
levels, namely apply Identifications to them?
If appropriate, yes.  By "appropriate", I mean if you are a herbarium, and
have a specimen in your collection, and you don't know if other specimens
from the same individual whole plant exist in other collections, then you
assign it an individualID, and scope it as "PartOfOrganism". You attached a
taxon Identification to it (of course), because you have nothing else to
attach the Identification to.  If later it is discovered that another
specimen in another herbarium had a different dwc:individualID assigned to
it (with it's own Identification), then you establish a semantic link
between them (either by aggregating them under a new Individual instance
with scope "WholeOrganism", or by "sameAs" relationships as I imagine you
would suggest).  In either case, you've got two Identification instances
applying to the same WholeOrganism, which have exactly the same relationship
to each other as any Individual instance with more than one Identification.
That is, the Identifications either compete with each other (if different
taxa are implicated), or they reinforce each other (if the same taxon is
implicated). Using my approach (establishing a new Individual instance with
scope "WholeOrganism"), it's fairly easy to rationalize, because you simply
impose the logic that parent Individual instances scoped as "WholeOrganism"
inherit the Identifications of their constituent parts, and treat them
accordingly.
So, where might it not be appropriate?  Well, suppose I collect a fish, and
establish it as a WholeOrganism PreservedSpecimen instance of Indivdual.
Then I derive from it a tissue sample, that I assign a new Individual
instance for, with scope "PartOfOrganism".  In that case, the child would
probably not receive its own Idientification instance it all; rather, it
would inherit the Identification instance from its parent.  But then suppose
I send that tissue off to Kansas, where it is accessioned in the tissue
repository there, and then sequenced.  Suppose the sequence then yields a
competing Identification, different from the one assigned to the
WholeOrganism.  What I want to have happen is that this competing
Identification instance becomes known to me, the holder of the
WholeSpecimen.  Conversely, if an expert re-identifies the WholeSpecimen, I
would like to see that new Identification instance transferred to the
derived Individuals that are "PartOfOrganism".
I *think* I understand how you would manage these things if instances of the
class "Individual" were not allowed to apply to anything less than a
WholeOrganism, but it would be better if you described it in your own words.
...
If so, then we are back in the business of assigning
Identifications to all of our derivative resources
rather than the biological individuals from which
they came.
I don't think so.  A photograph and a DNA sequence are *not* individuals.
They are reflections of individuals. Very much like morphological character
states scored for a particular WholeOrganism are not Individuals.  These are
clearly different classes of things, because they are not formed of physical
biological material. The "essence" that unites everything from a population
to a single cell extracted from a multicellular organism is that all of them
represent biological material.  The distinction between "WholeOrganism" and
"PartOfOrganism" is reasonably clear in most cases, but not all cases.  And
to me, it seems to be a lesser offense in such cases to have to decide
arbitrarily whether something falls into one of two different classes of
thing, vs. whether it gets scores as one of two alternate scope terms (e.g.,
"WholeOrganism" vs. "PartOfOrganism").
...
If we just say that we'll skip assigning separate
Identifications to the derivative resources, then
we have something that doesn't fit the functional
role for which Individual was designed.
That assumes that the *only* functional role of an Individual is to join an
Occurrence to an Identification. As I have described above and elsewhere, I
do not see this as the *only* functional role of an Individual.
...
In that case an "Individual" which is an organism
part is such a different thing that one might as
well call it as something else (i.e. a PreservedSpecimen).
I don't think "PreservedSpecimen" is the appropriate alternative.  This term
can certainly apply to parts of an organism as well as whole organisms, etc.
I think the alternative to including parts within the scope of Individual is
to establish something new, like "DerivedIndividual", or "IndividualPart".
But like I said, it seems dangerous to me to establish a new class for
something that transitionally overlaps with another class.  There is no
overlap between the scope of "Taxon" and the scope of "Location".  Indeed, I
can't think of a single other case among the DwC classes where one would
have to think carefully about which class a particular data belonged.  But
if you wanted to treat Populations through Whole organisms as one class, and
derived components of Whole organisms as a separate class, I can think of
many examples where there is potential overlap between the two.
...
The case of a whole organism (live as a LivingSpecimen
or dead as a PreservedSpecimen) is different because in
that case we would have a single resource serving as the
evidence (the whole organism itself).
Evidence of what?  Occurrence?  I guess this comes back to my original
point, and my reason for supporting an Individual class, which is that
specimens serve the function of much more than evidence of occurrence.  (So
do images and HumanObservations and most other things of that sort -- but
that's a topic for another thread).
...
By definition, there can't be many of those (there
would just be one) and it would already have an
Identification assigned to it, because it is the same
Individual that it is providing evidence for.
So there is no superfluous assignment of
Identifications in that case.
In principle, I tend to agree -- but as we have dicussed before, DwC is an
exchange standard, and as such necessarily serves as a compromise between
the way data "are", and the way data "ought to be".
...
I have had the tendency of thinking that the tokens
supported the Occurrence, but there does not need
to be just one purpose for the token.  They also
support the existence of the Individual.
Yes, exactly!
...
This should probably make you happy, because the
pieces of the Individual (preserved specimens,
tissue samples) would be derived from the Individual.
Yup! :-)
...
I have created a number of similar charts showing
how these relationships could apply to various types of tokens:
I'll need to digest these some more before commenting.
I guess I'm still having difficulty understanding how you envision placing
properties/attributes of tokens into records represented via DwC.  I'll need
to spend some more time thinking through what a token ins, ahow it maps to
fields and tables in my database, and how I structure their specific
properties into DwC terms.
But what I'm not sure I understand is how any of this supports your
contention that the scope of "Individual" should not be allowed to apply to
parts of a WholeOrganism.
...
...
This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be
re-assigned to Record-level terms. Was there some reason this isn't
appropriate?
I think it is appropriate because they should be usable with at
least two classes: Individual (for living specimens) and
Occurrences (e.g. preserved specimens, images)
I think this gets to the heart of the difference you and I have in viewing
the function of "Individual".  My *primary* reason for supporting it is to
get properties/attributes of PreservedSpecimen *out* of the Occurrence
class.
...
...
And, a mechanism to track series of "derived from" Individuals.  The ASC
model covered this, I think (right, Stan?)
I didn't see it in the flow chart, but it could be there somewhere.
I don't have the chart in front of me now, but I'm fairly certain that
BiologicalObject can be a child of another BiologicalObject, and the scope
included things like Lot, individual whole organism, part of organism, etc.
...
The risk that we make the definition of Individual so broad
that it can't perform any of the functions it was
defined to serve.  We've already lost one of them (the ability to infer
duplicates) when I agreed to the broader definition, but that's the subject
of another post.
These are some principles that I always try to keep in mind when discussing
these things:
- DwC is a data exchange standard, not so much a physical data model.
- There is a necessary balance between structuring DwC around how data
actually exist in content-provider databases, and how data *should* be
represented in a normalised world
- When in doubt, DwC should be accomodating, rather than restrictive --
especially when more restrictive needs can be met via associated data
filtering
There are other principles as well, but these are the ones I keep having to
remind myself of.
I think that what I I have suggested above is very unrestrictive.  We let
evidence be the type of things that they are (PreservedSpecimens,
Individuals, StillImages, SoundRecordings, DNA sequences, etc.).  We don't
determine their type by what we want to use them for.  That was the mistake
that I made in the Biodiversity Informatics paper.  If we follow this
approach, then a StillImage can fill any role that we want: evidence that an
Occurrence happened, information to support an Identification, a character
for a visual key, a logo, etc. We let it fulfill those roles by giving it an
identifier and connecting it to other resources using appropriate terms
(hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
I think maybe so.  Maybe the appropriate course
of action here as well is to let people try
different approaches out and if they turn out
to work and be needed, then we talk about
applying them to Darwin Core.
Ultimately, I think people will use it in accordance to what terms are
nested within it -- which is why I think it's important to have this
conversation we're having now.
As I indicated at an earlier time, I think that there are very few terms
that should be properties of Individual since it is primarily a node that
connects Occurrences to Identifications (and I guess now to derived tokens).
Aloha,
Rich
Looking forward to responses!  But I don't think development of these ideas
should hold up the proposal for the class Individual, which can stand on its
own with its current (revised) definition.
Steve
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt
.
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

Re: [tdwg-content] Why it matters what kind of things we include in the definition of Individual

Steve Baskauf