[tdwg-content] Why it matters what kind of things we include in the definition of Individual

Steve Baskauf steve.baskauf at vanderbilt.edu
Sun Nov 7 05:53:23 CET 2010


Rich,
I have done a couple read-throughs of your posts and I had two immediate 
comments.  The first is that I think we want to accomplish many of the 
same things here and that the problem really is what we want to call 
things, not what we want to accomplish.  So I'm encouraged by that.  I 
think that what I need to do is to get a piece of paper out and try to 
map out what you are saying and what I'm saying.  I think they will turn 
out to be mostly congruent but with different labels. 

The second thing is that the point of recounting that story had nothing 
to do with the rank of the tree (species, subspecies, or whatever).  I 
have no opinion on lumping, splitting, or whether species, subspecies, 
etc. actually exist or not or whether it is better to call something a 
species, subspecies, or variety.  I really don't care what kind of name 
you apply to the whole organism.  The point I was trying to make with 
the story was that on the scale we've been talking about from the entire 
biosphere to populations to individual organisms to parts of organisms 
to molecules, the individual organism is the point at which we no longer 
have to worry that further subdivisions might not share a common 
Identification.  That is what I'm saying is "special" about the whole 
organism level (vs. parts).  If you  known that pieces came from the 
same whole organism, then you can be confident that an identification 
that is assigned to any of the pieces down to any level of further 
subdivision will be the same as an identification assigned to any other 
piece.  Thus it is superfluous to assign separate identifications to 
every piece when you can simply assign a single identification to the 
whole organism and infer that that identification applies to all of the 
pieces.   This is assuming that you have all of the pieces from that 
organism.  If you have some of the pieces and somebody else has some of 
them, of course the two of you would assign separate identifications to 
your sets of pieces (unless you had synchronized databases that "knew" 
that you were both talking about the same organism - one of the points 
of having an identifier for individual organisms is so you can do 
that).  I think you pretty much said the same thing below using 
different words.

I'm going to digest what you wrote for a while before I make further 
comments.
Steve

Richard Pyle wrote:
> Hi Steve,
>
> I've finally had time to carefully read your recent series of emails on the
> acceptible scope of "Individual".
>
> It has become somewhat apparent that we each support the establishment of
> the class "Individual" in DWC for different reasons, as evidenced by our
> different perspectives on what the acceptable scope of an "Individual" can
> be.  I tend to think of "Individual" in the context of the ASC model's
> "BiologicalObject"; whereas you tend to see it more in terms of an
> "organismal" individual.
>
> DwC began as a very-much PreservedSpecimen-oriented exercise.  In order to
> include non-PreservedSpecimen instances of biodiversity data, the attributes
> of PreservedSpecimen were largely folded into the core class "Occurrence".
> I am a HUGE fan of broadening the scope of data that can be represented and
> exchanged via DwC, so I mostly saw this as a Good Thing.  But I always had a
> pang of apprehension for representing PreservedSpecimens as Occurrences,
> because whereas both HumanObservations and PreservedSpecimens bear
> Occurrence-related information, and this Occurrence-related information is
> one of the most popular uses of DwC content (e.g., maps, modelling),
> PreservedSpecimens are much more than "Occurrence".  Things like DNA
> sequences, morphological characteristics, preservation methods, storage
> details, loan information, and so on are all kinds of information that
> people holding the data associate with a PreservedSpecimen and share via,
> but it seems somewhat convoluted to represent these as attributes of an
> Occurrence.
>
> I had supported the notion of a class "Individual" in large part to serve as
> a conceptual object on to which many of these things would be more
> appropriately attached as attributes than to Occurrence.  My concern now is
> that the pendulum is swinging too far in the other direction.  In
> otherwords, the move from supporting PreservedSpecimen data almost
> exclusively, to supporting more general biodiversity data, may be swinging
> further into a realm where it fails to support Specimen data adequately.  As
> I said, I am very much a supporter of "big tent" DwC, and I would hate to
> see objects in DwC scoped in such a way that it unnecessarily excludes
> content representtion.
>
> So I guess what I'm trying to say is, that the less the proposed class
> "Individual" can solve what I see as problems with DwC, the less supportive
> of it I become.
>
> Before I get into the nitty gritty, I want to dispense with your "splitter"
> example.  "Splitters" work at the rank of species every bit as much (even
> moreso) than at the rank of subspecies.  There are analagous stories where
> the hyper-splitter would treat different parts of the same organism as
> different taxa at the rank of species.  My point is, this story does not, in
> my mind, in any way support the exclusion of "Individuals" being identified
> to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of
> species.  As far as I'm concerned, limiting an "Individual" to be only those
> things we can confidently assign to a taxon at the rank of "species" is a
> non-starter.  I could fill this email with reasons why, but I think I've
> already done that in previous emails, so no need to repeat here.
>
> But I do concede there is a rational basis for not treating "parts" of an
> organism as distinct Individuals.  I'm not yet completely convinced,
> however.  To be persuaded that subcomponents (parts) of a single "organism"
> should not be represented through records of the proposed DwC "Individual"
> class, I'll need to believe that the potential harm/confusion in doing so
> would (still not clear on what that is) cannot be easily mitigated by
> filtering with a "individualScope" property.
>
> OK, so I'll try to address each of your reasons why you think that the scope
> of instances of the proposed "Individual" class should not include units
> below a "single organism".
>
>   
>> if you consider the comment, which describes the primary
>> function of Individual: "Instances of this class can serve
>> the purpose of connecting one or more instances of the
>> Darwin Core class Occurrence to one or more instances of the
>> Darwin Core class Identification" it becomes clear that
>> making parts of organisms Individuals defeats this primary
>> purpose for the term.
>>     
>
> I'm not sure I agree with that last statement.  In other words, I don't see
> how the purpose of "Individual" is defeated if the lower limit of the scope
> of "Individual" is a whole organism.
>
> Setting aside the cases where "whole organism" can be a bit ambiguous
> (corals, sponges, fungi, etc.), suppose we only have a preserved part of an
> organism -- a Herbarium specimen, for example.  It's common practice to have
> multiple samples of the same plant preserved as different
> PreservedSpecimens, sometimes housed in different institutions.  A large
> problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.)
> are trying to solve, is the problem of linking this disparate
> PreservedSpecimens (as well as the tissue samples derived therefrom)
> together. Different collections that house multiple specimens from the same
> individual plant (but don't yet realize it), would presumably each establish
> an instance of "Individual" to represent their specimen data via DwC. Thus,
> each of the indivudal PreservedSpecimens would have its own unique value of
> dwc:individualID.  The question then becomes, how do we aggregate these
> instances of Individuals to represent the "same thing"?
>
> In my way of thinking, where "Individual" is functionally equivalent to the
> ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me
> I'm wrong on this), then the original dwc:individualID records would
> continue to exist as their own distinct record, with
> dwc:individualScope="PartOfOrganism", with their own distinct associated
> data for preservation method, linked photos, etc., etc.  They would be
> aggregated by the establishment of a new instance of "Individual", with its
> own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism".
> The various Individual instances where dwc:individualScope="PartOfOrganism"
> would be aggregated when they each establish a "isPartOf" or "derivedFrom"
> relationship with the single Individual instance where
> dwc:individualScope="WholeOrganism".  The same model could apply to tissue
> samples, and other derived bits of a whole organism.  As long as the
> dwc:individualScope value is properly applied, then it should be easy to
> apply appropriate reasoning logic. No?
>
> How, then, would you represent this sort of information if the class
> Individual were not allowed to be applied to less-than whole organism
> instances?  I gather that the dwc:individualID values established by the
> different collections for parts of the same whole organism would each
> effectively refer to the same whole organism, so you would link them
> together via "sameAs" relationships?
>
>   
>> The major selling point for having Individuals at all is
>> to get out of the business of applying determinations to
>> all of the pieces of evidence such as specimens, images,
>> sounds, etc. that get collected from the same biological
>> individual through multiple Occurrences.
>>     
>
> For me the main selling point of the Individual class is to remove
> information that does not intrinsically belong to an "Occurrence" out of
> that class, and into a more appropriate class.
>
>   
>> This has the benefit that if one applies an Identification
>> to the Individual, all physical and information resources
>> that are derived from the individual automatically get
>> associated with the Identification and hence the taxonomic
>> informations referenced by the Identification.  If we call
>> preserved specimens that are pieces of organism Individuals
>> having a value of individualScope="part", then do we do
>> the same thing to them as we do with Individuals at higher
>> levels, namely apply Identifications to them?
>>     
>
> If appropriate, yes.  By "appropriate", I mean if you are a herbarium, and
> have a specimen in your collection, and you don't know if other specimens
> from the same individual whole plant exist in other collections, then you
> assign it an individualID, and scope it as "PartOfOrganism". You attached a
> taxon Identification to it (of course), because you have nothing else to
> attach the Identification to.  If later it is discovered that another
> specimen in another herbarium had a different dwc:individualID assigned to
> it (with it's own Identification), then you establish a semantic link
> between them (either by aggregating them under a new Individual instance
> with scope "WholeOrganism", or by "sameAs" relationships as I imagine you
> would suggest).  In either case, you've got two Identification instances
> applying to the same WholeOrganism, which have exactly the same relationship
> to each other as any Individual instance with more than one Identification.
> That is, the Identifications either compete with each other (if different
> taxa are implicated), or they reinforce each other (if the same taxon is
> implicated). Using my approach (establishing a new Individual instance with
> scope "WholeOrganism"), it's fairly easy to rationalize, because you simply
> impose the logic that parent Individual instances scoped as "WholeOrganism"
> inherit the Identifications of their constituent parts, and treat them
> accordingly.
>
> So, where might it not be appropriate?  Well, suppose I collect a fish, and
> establish it as a WholeOrganism PreservedSpecimen instance of Indivdual.
> Then I derive from it a tissue sample, that I assign a new Individual
> instance for, with scope "PartOfOrganism".  In that case, the child would
> probably not receive its own Idientification instance it all; rather, it
> would inherit the Identification instance from its parent.  But then suppose
> I send that tissue off to Kansas, where it is accessioned in the tissue
> repository there, and then sequenced.  Suppose the sequence then yields a
> competing Identification, different from the one assigned to the
> WholeOrganism.  What I want to have happen is that this competing
> Identification instance becomes known to me, the holder of the
> WholeSpecimen.  Conversely, if an expert re-identifies the WholeSpecimen, I
> would like to see that new Identification instance transferred to the
> derived Individuals that are "PartOfOrganism".
>
> I *think* I understand how you would manage these things if instances of the
> class "Individual" were not allowed to apply to anything less than a
> WholeOrganism, but it would be better if you described it in your own words.
>
>   
>> If so, then we are back in the business of assigning
>> Identifications to all of our derivative resources
>> rather than the biological individuals from which
>> they came.
>>     
>
> I don't think so.  A photograph and a DNA sequence are *not* individuals.
> They are reflections of individuals. Very much like morphological character
> states scored for a particular WholeOrganism are not Individuals.  These are
> clearly different classes of things, because they are not formed of physical
> biological material. The "essence" that unites everything from a population
> to a single cell extracted from a multicellular organism is that all of them
> represent biological material.  The distinction between "WholeOrganism" and
> "PartOfOrganism" is reasonably clear in most cases, but not all cases.  And
> to me, it seems to be a lesser offense in such cases to have to decide
> arbitrarily whether something falls into one of two different classes of
> thing, vs. whether it gets scores as one of two alternate scope terms (e.g.,
> "WholeOrganism" vs. "PartOfOrganism").
>
>   
>> If we just say that we'll skip assigning separate
>> Identifications to the derivative resources, then
>> we have something that doesn't fit the functional
>> role for which Individual was designed.
>>     
>
> That assumes that the *only* functional role of an Individual is to join an
> Occurrence to an Identification. As I have described above and elsewhere, I
> do not see this as the *only* functional role of an Individual.
>
>   
>> In that case an "Individual" which is an organism
>> part is such a different thing that one might as
>> well call it as something else (i.e. a PreservedSpecimen).
>>     
>
> I don't think "PreservedSpecimen" is the appropriate alternative.  This term
> can certainly apply to parts of an organism as well as whole organisms, etc.
> I think the alternative to including parts within the scope of Individual is
> to establish something new, like "DerivedIndividual", or "IndividualPart".
> But like I said, it seems dangerous to me to establish a new class for
> something that transitionally overlaps with another class.  There is no
> overlap between the scope of "Taxon" and the scope of "Location".  Indeed, I
> can't think of a single other case among the DwC classes where one would
> have to think carefully about which class a particular data belonged.  But
> if you wanted to treat Populations through Whole organisms as one class, and
> derived components of Whole organisms as a separate class, I can think of
> many examples where there is potential overlap between the two.
>
>   
>> The case of a whole organism (live as a LivingSpecimen
>> or dead as a PreservedSpecimen) is different because in
>> that case we would have a single resource serving as the
>> evidence (the whole organism itself).
>>     
>
> Evidence of what?  Occurrence?  I guess this comes back to my original
> point, and my reason for supporting an Individual class, which is that
> specimens serve the function of much more than evidence of occurrence.  (So
> do images and HumanObservations and most other things of that sort -- but
> that's a topic for another thread).
>
>   
>> By definition, there can't be many of those (there
>> would just be one) and it would already have an
>> Identification assigned to it, because it is the same
>> Individual that it is providing evidence for.
>> So there is no superfluous assignment of
>> Identifications in that case.
>>     
>
> In principle, I tend to agree -- but as we have dicussed before, DwC is an
> exchange standard, and as such necessarily serves as a compromise between
> the way data "are", and the way data "ought to be".
>
>   
>> I have had the tendency of thinking that the tokens
>> supported the Occurrence, but there does not need
>> to be just one purpose for the token.  They also
>> support the existence of the Individual.
>>     
>
> Yes, exactly!
>
>   
>> This should probably make you happy, because the
>> pieces of the Individual (preserved specimens,
>> tissue samples) would be derived from the Individual.
>>     
>
> Yup! :-)
>
>   
>> I have created a number of similar charts showing
>> how these relationships could apply to various types of tokens:
>>     
>
> I'll need to digest these some more before commenting.
>
> I guess I'm still having difficulty understanding how you envision placing
> properties/attributes of tokens into records represented via DwC.  I'll need
> to spend some more time thinking through what a token ins, ahow it maps to
> fields and tables in my database, and how I structure their specific
> properties into DwC terms.
>
> But what I'm not sure I understand is how any of this supports your
> contention that the scope of "Individual" should not be allowed to apply to
> parts of a WholeOrganism.
>
>   
>>> This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be
>>> re-assigned to Record-level terms. Was there some reason this isn't
>>> appropriate?
>>>       
>> I think it is appropriate because they should be usable with at
>> least two classes: Individual (for living specimens) and
>> Occurrences (e.g. preserved specimens, images)
>>     
>
> I think this gets to the heart of the difference you and I have in viewing
> the function of "Individual".  My *primary* reason for supporting it is to
> get properties/attributes of PreservedSpecimen *out* of the Occurrence
> class.
>
>   
>>> And, a mechanism to track series of "derived from" Individuals.  The ASC
>>> model covered this, I think (right, Stan?)
>>>       
>> I didn't see it in the flow chart, but it could be there somewhere.
>>     
>
> I don't have the chart in front of me now, but I'm fairly certain that
> BiologicalObject can be a child of another BiologicalObject, and the scope
> included things like Lot, individual whole organism, part of organism, etc.
>
>   
>> The risk that we make the definition of Individual so broad
>> that it can't perform any of the functions it was
>> defined to serve.  We've already lost one of them (the ability to infer
>>     
> duplicates) when I agreed to the broader definition, but that's the subject
> of another post.
>
> These are some principles that I always try to keep in mind when discussing
> these things:
>
> - DwC is a data exchange standard, not so much a physical data model.
> - There is a necessary balance between structuring DwC around how data
> actually exist in content-provider databases, and how data *should* be
> represented in a normalised world
> - When in doubt, DwC should be accomodating, rather than restrictive --
> especially when more restrictive needs can be met via associated data
> filtering
>
> There are other principles as well, but these are the ones I keep having to
> remind myself of.
>
> I think that what I I have suggested above is very unrestrictive.  We let
> evidence be the type of things that they are (PreservedSpecimens,
> Individuals, StillImages, SoundRecordings, DNA sequences, etc.).  We don't
> determine their type by what we want to use them for.  That was the mistake
> that I made in the Biodiversity Informatics paper.  If we follow this
> approach, then a StillImage can fill any role that we want: evidence that an
> Occurrence happened, information to support an Identification, a character
> for a visual key, a logo, etc. We let it fulfill those roles by giving it an
> identifier and connecting it to other resources using appropriate terms
> (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
>
>
>
> I think maybe so.  Maybe the appropriate course
> of action here as well is to let people try
> different approaches out and if they turn out
> to work and be needed, then we talk about
> applying them to Darwin Core.
>
>
> Ultimately, I think people will use it in accordance to what terms are
> nested within it -- which is why I think it's important to have this
> conversation we're having now.
>
> As I indicated at an earlier time, I think that there are very few terms
> that should be properties of Individual since it is primarily a node that
> connects Occurrences to Identifications (and I guess now to derived tokens).
>
>
> Aloha,
> Rich
>
>
>
> Looking forward to responses!  But I don't think development of these ideas
> should hold up the proposal for the class Individual, which can stand on its
> own with its current (revised) definition.
> Steve
>
> .
>
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt
>
>
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101106/94ab2bdb/attachment.html 


More information about the tdwg-content mailing list