[tdwg-content] Why it matters what kind of things we include in the definition of Individual

Richard Pyle deepreef at bishopmuseum.org
Sat Nov 6 22:43:46 CET 2010


Hi Steve,

I've finally had time to carefully read your recent series of emails on the
acceptible scope of "Individual".

It has become somewhat apparent that we each support the establishment of
the class "Individual" in DWC for different reasons, as evidenced by our
different perspectives on what the acceptable scope of an "Individual" can
be.  I tend to think of "Individual" in the context of the ASC model's
"BiologicalObject"; whereas you tend to see it more in terms of an
"organismal" individual.

DwC began as a very-much PreservedSpecimen-oriented exercise.  In order to
include non-PreservedSpecimen instances of biodiversity data, the attributes
of PreservedSpecimen were largely folded into the core class "Occurrence".
I am a HUGE fan of broadening the scope of data that can be represented and
exchanged via DwC, so I mostly saw this as a Good Thing.  But I always had a
pang of apprehension for representing PreservedSpecimens as Occurrences,
because whereas both HumanObservations and PreservedSpecimens bear
Occurrence-related information, and this Occurrence-related information is
one of the most popular uses of DwC content (e.g., maps, modelling),
PreservedSpecimens are much more than "Occurrence".  Things like DNA
sequences, morphological characteristics, preservation methods, storage
details, loan information, and so on are all kinds of information that
people holding the data associate with a PreservedSpecimen and share via,
but it seems somewhat convoluted to represent these as attributes of an
Occurrence.

I had supported the notion of a class "Individual" in large part to serve as
a conceptual object on to which many of these things would be more
appropriately attached as attributes than to Occurrence.  My concern now is
that the pendulum is swinging too far in the other direction.  In
otherwords, the move from supporting PreservedSpecimen data almost
exclusively, to supporting more general biodiversity data, may be swinging
further into a realm where it fails to support Specimen data adequately.  As
I said, I am very much a supporter of "big tent" DwC, and I would hate to
see objects in DwC scoped in such a way that it unnecessarily excludes
content representtion.

So I guess what I'm trying to say is, that the less the proposed class
"Individual" can solve what I see as problems with DwC, the less supportive
of it I become.

Before I get into the nitty gritty, I want to dispense with your "splitter"
example.  "Splitters" work at the rank of species every bit as much (even
moreso) than at the rank of subspecies.  There are analagous stories where
the hyper-splitter would treat different parts of the same organism as
different taxa at the rank of species.  My point is, this story does not, in
my mind, in any way support the exclusion of "Individuals" being identified
to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of
species.  As far as I'm concerned, limiting an "Individual" to be only those
things we can confidently assign to a taxon at the rank of "species" is a
non-starter.  I could fill this email with reasons why, but I think I've
already done that in previous emails, so no need to repeat here.

But I do concede there is a rational basis for not treating "parts" of an
organism as distinct Individuals.  I'm not yet completely convinced,
however.  To be persuaded that subcomponents (parts) of a single "organism"
should not be represented through records of the proposed DwC "Individual"
class, I'll need to believe that the potential harm/confusion in doing so
would (still not clear on what that is) cannot be easily mitigated by
filtering with a "individualScope" property.

OK, so I'll try to address each of your reasons why you think that the scope
of instances of the proposed "Individual" class should not include units
below a "single organism".

> if you consider the comment, which describes the primary 
> function of Individual: "Instances of this class can serve 
> the purpose of connecting one or more instances of the 
> Darwin Core class Occurrence to one or more instances of the 
> Darwin Core class Identification" it becomes clear that 
> making parts of organisms Individuals defeats this primary 
> purpose for the term.   

I'm not sure I agree with that last statement.  In other words, I don't see
how the purpose of "Individual" is defeated if the lower limit of the scope
of "Individual" is a whole organism. 

Setting aside the cases where "whole organism" can be a bit ambiguous
(corals, sponges, fungi, etc.), suppose we only have a preserved part of an
organism -- a Herbarium specimen, for example.  It's common practice to have
multiple samples of the same plant preserved as different
PreservedSpecimens, sometimes housed in different institutions.  A large
problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.)
are trying to solve, is the problem of linking this disparate
PreservedSpecimens (as well as the tissue samples derived therefrom)
together. Different collections that house multiple specimens from the same
individual plant (but don't yet realize it), would presumably each establish
an instance of "Individual" to represent their specimen data via DwC. Thus,
each of the indivudal PreservedSpecimens would have its own unique value of
dwc:individualID.  The question then becomes, how do we aggregate these
instances of Individuals to represent the "same thing"?

In my way of thinking, where "Individual" is functionally equivalent to the
ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me
I'm wrong on this), then the original dwc:individualID records would
continue to exist as their own distinct record, with
dwc:individualScope="PartOfOrganism", with their own distinct associated
data for preservation method, linked photos, etc., etc.  They would be
aggregated by the establishment of a new instance of "Individual", with its
own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism".
The various Individual instances where dwc:individualScope="PartOfOrganism"
would be aggregated when they each establish a "isPartOf" or "derivedFrom"
relationship with the single Individual instance where
dwc:individualScope="WholeOrganism".  The same model could apply to tissue
samples, and other derived bits of a whole organism.  As long as the
dwc:individualScope value is properly applied, then it should be easy to
apply appropriate reasoning logic. No?

How, then, would you represent this sort of information if the class
Individual were not allowed to be applied to less-than whole organism
instances?  I gather that the dwc:individualID values established by the
different collections for parts of the same whole organism would each
effectively refer to the same whole organism, so you would link them
together via "sameAs" relationships?
	
> The major selling point for having Individuals at all is 
> to get out of the business of applying determinations to 
> all of the pieces of evidence such as specimens, images, 
> sounds, etc. that get collected from the same biological 
> individual through multiple Occurrences.  

For me the main selling point of the Individual class is to remove
information that does not intrinsically belong to an "Occurrence" out of
that class, and into a more appropriate class.

> This has the benefit that if one applies an Identification 
> to the Individual, all physical and information resources 
> that are derived from the individual automatically get 
> associated with the Identification and hence the taxonomic 
> informations referenced by the Identification.  If we call 
> preserved specimens that are pieces of organism Individuals 
> having a value of individualScope="part", then do we do 
> the same thing to them as we do with Individuals at higher 
> levels, namely apply Identifications to them?  

If appropriate, yes.  By "appropriate", I mean if you are a herbarium, and
have a specimen in your collection, and you don't know if other specimens
from the same individual whole plant exist in other collections, then you
assign it an individualID, and scope it as "PartOfOrganism". You attached a
taxon Identification to it (of course), because you have nothing else to
attach the Identification to.  If later it is discovered that another
specimen in another herbarium had a different dwc:individualID assigned to
it (with it's own Identification), then you establish a semantic link
between them (either by aggregating them under a new Individual instance
with scope "WholeOrganism", or by "sameAs" relationships as I imagine you
would suggest).  In either case, you've got two Identification instances
applying to the same WholeOrganism, which have exactly the same relationship
to each other as any Individual instance with more than one Identification.
That is, the Identifications either compete with each other (if different
taxa are implicated), or they reinforce each other (if the same taxon is
implicated). Using my approach (establishing a new Individual instance with
scope "WholeOrganism"), it's fairly easy to rationalize, because you simply
impose the logic that parent Individual instances scoped as "WholeOrganism"
inherit the Identifications of their constituent parts, and treat them
accordingly.

So, where might it not be appropriate?  Well, suppose I collect a fish, and
establish it as a WholeOrganism PreservedSpecimen instance of Indivdual.
Then I derive from it a tissue sample, that I assign a new Individual
instance for, with scope "PartOfOrganism".  In that case, the child would
probably not receive its own Idientification instance it all; rather, it
would inherit the Identification instance from its parent.  But then suppose
I send that tissue off to Kansas, where it is accessioned in the tissue
repository there, and then sequenced.  Suppose the sequence then yields a
competing Identification, different from the one assigned to the
WholeOrganism.  What I want to have happen is that this competing
Identification instance becomes known to me, the holder of the
WholeSpecimen.  Conversely, if an expert re-identifies the WholeSpecimen, I
would like to see that new Identification instance transferred to the
derived Individuals that are "PartOfOrganism".

I *think* I understand how you would manage these things if instances of the
class "Individual" were not allowed to apply to anything less than a
WholeOrganism, but it would be better if you described it in your own words.

> If so, then we are back in the business of assigning 
> Identifications to all of our derivative resources 
> rather than the biological individuals from which 
> they came.  

I don't think so.  A photograph and a DNA sequence are *not* individuals.
They are reflections of individuals. Very much like morphological character
states scored for a particular WholeOrganism are not Individuals.  These are
clearly different classes of things, because they are not formed of physical
biological material. The "essence" that unites everything from a population
to a single cell extracted from a multicellular organism is that all of them
represent biological material.  The distinction between "WholeOrganism" and
"PartOfOrganism" is reasonably clear in most cases, but not all cases.  And
to me, it seems to be a lesser offense in such cases to have to decide
arbitrarily whether something falls into one of two different classes of
thing, vs. whether it gets scores as one of two alternate scope terms (e.g.,
"WholeOrganism" vs. "PartOfOrganism").

> If we just say that we'll skip assigning separate 
> Identifications to the derivative resources, then 
> we have something that doesn't fit the functional 
> role for which Individual was designed.  

That assumes that the *only* functional role of an Individual is to join an
Occurrence to an Identification. As I have described above and elsewhere, I
do not see this as the *only* functional role of an Individual.

> In that case an "Individual" which is an organism 
> part is such a different thing that one might as 
> well call it as something else (i.e. a PreservedSpecimen).  

I don't think "PreservedSpecimen" is the appropriate alternative.  This term
can certainly apply to parts of an organism as well as whole organisms, etc.
I think the alternative to including parts within the scope of Individual is
to establish something new, like "DerivedIndividual", or "IndividualPart".
But like I said, it seems dangerous to me to establish a new class for
something that transitionally overlaps with another class.  There is no
overlap between the scope of "Taxon" and the scope of "Location".  Indeed, I
can't think of a single other case among the DwC classes where one would
have to think carefully about which class a particular data belonged.  But
if you wanted to treat Populations through Whole organisms as one class, and
derived components of Whole organisms as a separate class, I can think of
many examples where there is potential overlap between the two.

> The case of a whole organism (live as a LivingSpecimen 
> or dead as a PreservedSpecimen) is different because in 
> that case we would have a single resource serving as the 
> evidence (the whole organism itself).  

Evidence of what?  Occurrence?  I guess this comes back to my original
point, and my reason for supporting an Individual class, which is that
specimens serve the function of much more than evidence of occurrence.  (So
do images and HumanObservations and most other things of that sort -- but
that's a topic for another thread).

> By definition, there can't be many of those (there 
> would just be one) and it would already have an 
> Identification assigned to it, because it is the same 
> Individual that it is providing evidence for.  
> So there is no superfluous assignment of 
> Identifications in that case.  

In principle, I tend to agree -- but as we have dicussed before, DwC is an
exchange standard, and as such necessarily serves as a compromise between
the way data "are", and the way data "ought to be".
	
> I have had the tendency of thinking that the tokens 
> supported the Occurrence, but there does not need 
> to be just one purpose for the token.  They also 
> support the existence of the Individual.  

Yes, exactly!

> This should probably make you happy, because the 
> pieces of the Individual (preserved specimens, 
> tissue samples) would be derived from the Individual. 

Yup! :-)

> I have created a number of similar charts showing 
> how these relationships could apply to various types of tokens:

I'll need to digest these some more before commenting.

I guess I'm still having difficulty understanding how you envision placing
properties/attributes of tokens into records represented via DwC.  I'll need
to spend some more time thinking through what a token ins, ahow it maps to
fields and tables in my database, and how I structure their specific
properties into DwC terms.

But what I'm not sure I understand is how any of this supports your
contention that the scope of "Individual" should not be allowed to apply to
parts of a WholeOrganism.

> > This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be
> > re-assigned to Record-level terms. Was there some reason this isn't
> > appropriate?
> 
> I think it is appropriate because they should be usable with at 
> least two classes: Individual (for living specimens) and 
> Occurrences (e.g. preserved specimens, images)

I think this gets to the heart of the difference you and I have in viewing
the function of "Individual".  My *primary* reason for supporting it is to
get properties/attributes of PreservedSpecimen *out* of the Occurrence
class.

> > And, a mechanism to track series of "derived from" Individuals.  The ASC
> > model covered this, I think (right, Stan?)
> 
> I didn't see it in the flow chart, but it could be there somewhere. 

I don't have the chart in front of me now, but I'm fairly certain that
BiologicalObject can be a child of another BiologicalObject, and the scope
included things like Lot, individual whole organism, part of organism, etc.

> The risk that we make the definition of Individual so broad 
> that it can't perform any of the functions it was 
> defined to serve.  We've already lost one of them (the ability to infer
duplicates) when I agreed to the broader definition, but that's the subject
of another post.

These are some principles that I always try to keep in mind when discussing
these things:

- DwC is a data exchange standard, not so much a physical data model.
- There is a necessary balance between structuring DwC around how data
actually exist in content-provider databases, and how data *should* be
represented in a normalised world
- When in doubt, DwC should be accomodating, rather than restrictive --
especially when more restrictive needs can be met via associated data
filtering
  
There are other principles as well, but these are the ones I keep having to
remind myself of.
  
I think that what I I have suggested above is very unrestrictive.  We let
evidence be the type of things that they are (PreservedSpecimens,
Individuals, StillImages, SoundRecordings, DNA sequences, etc.).  We don't
determine their type by what we want to use them for.  That was the mistake
that I made in the Biodiversity Informatics paper.  If we follow this
approach, then a StillImage can fill any role that we want: evidence that an
Occurrence happened, information to support an Identification, a character
for a visual key, a logo, etc. We let it fulfill those roles by giving it an
identifier and connecting it to other resources using appropriate terms
(hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.

	
  
I think maybe so.  Maybe the appropriate course 
of action here as well is to let people try 
different approaches out and if they turn out 
to work and be needed, then we talk about 
applying them to Darwin Core.
    

Ultimately, I think people will use it in accordance to what terms are
nested within it -- which is why I think it's important to have this
conversation we're having now.
  
As I indicated at an earlier time, I think that there are very few terms
that should be properties of Individual since it is primarily a node that
connects Occurrences to Identifications (and I guess now to derived tokens).


Aloha,
Rich


  
Looking forward to responses!  But I don't think development of these ideas
should hold up the proposal for the class Individual, which can stand on its
own with its current (revised) definition.
Steve

.

  


-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt




More information about the tdwg-content mailing list