[tdwg-content] Plea for competency questions. Was Re: New terms need resolution: "Individual"

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Jul 26 20:09:01 CEST 2011

I am back in the land of Internet access again after several weeks of 
glorious lack of it. 

There have been a number of posts regarding John Wieczorek's proposed 
resolution of the "class Individual" proposal 
(http://code.google.com/p/darwincore/issues/detail?id=69) though the 
creation of the "BiologicalEntity" class 
In these posts two general issues raised:

I. Competency questions, i.e. how will the creation of this class help 
us do something useful?
II. What sorts of resources should be instances of this class, i.e. how 
should the class be defined?

These general questions were previously discussed at length between 
October 2009 and February 2011 concurrently with general questions about 
the meaning of "Event", "Occurrence","Taxon" and other existing DwC 
classes.  Since it requires many hours to review the many tdwg-content 
posts during that period, I will refer to a summary at 
http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary for 
anyone who is interested in skimming the main points of that discussion. 

When I proposed the Individual class, I wanted to explicitly recognize a 
class of resources that would facilitate three things, which I suppose 
are "competency questions" of a sort:
1. To allow for linking multiple Occurrence records that involved the 
same organism at different times and/or places (i.e. the intended 
purpose of the existing dwc:individualID term; i.e. to facilitate 
resampling).  Examples would be mark/recapture, radio tracking, 
photo-identification of whales, tracking the status of a sessile 
organism over time, etc.
2. To allow for the linking or grouping of multiple forms of evidence 
associated with the same organism which may have been collected at the 
same time and place (multiple forms of documentation for a single 
occurrence) or during several Occurrences.  Examples would be 
collections of several images, several specimens, or both images and 
specimens from the same organism. 
3. To link multiple Identifications of the same individual organism, 
particularly when these Identifications were based one different pieces 
of evidence arising from the same individual.  For example, if 
"duplicate" specimens from the same organism ended up in different 
museums they might be assigned different Identifications.  An 
Identification asserted for one specimen would apply to other specimens 
from the same individual organism even if that Identification wasn't 
explicitly assigned to the other specimens.  I have referred to this 
function as "inferring duplicates" - it could also be called "tracking 
duplicates" if the two samples were known to have arisen from the same 
individual organism before they were identified rather than after.
(Since I'm not really up on the technical definition of "competency 
question" please excuse me if I'm misapplying the term to mean "what we 
want something to be able to do".)

As the discussion progressed, it became clear that others wanted the 
proposed Individual class to do other things as well.  In particular, 
Rich Pyle articulated a desire to provide a mechanism for grouping in a 
hierarchical manner pieces of physical evidence that were derived from 
living organisms.  These could include aggregates of organisms, 
organisms, and pieces of organisms.  After some rather heated 
discussion, Kevin Richards put his finger on the critical distinction 
between what I wanted the Individual class to do and what Rich wanted: 
"it seems like Steve's idea for the Individual more closely resembles a 
many-to-many joining table in a database (ie doesn't serve much use 
other than connecting two tables/classes together - and doesn't normally 
relate to a 'real world' type of object).  Whereas it seems Rich's idea 
is to relate it more to 'real-world' objects, such as samples, 
re-samples, etc, to allow tracking and connectability of the 
observed/collected/processed individuals..." 
After thinking about this for a very long time, I've become convinced 
that Kevin was on the right track.  It seems to me that what we have 
here is two different sets of "competency questions" which define sets 
of entities that overlap but which are not congruent.  An actual single, 
live organism can serve both as a unit for resampling and "attachment" 
of Identifications, AND as an organizational unit that is part of and 
which has parts that are biological samples.  Some other entities, such 
as cohesive pack/herds and clonal organisms can also serve both 
purposes.  Other entities cannot: it doesn't make sense to resample dead 
organisms or pieces of organisms, and an identification applied to part 
of a taxonomically heterogeneous unit (e.g. a mixed flock of birds) 
cannot be reliably inferred to apply to another part of the same unit.  
Neither intended purpose/definition of "Individual" (the many to many 
database join or the mechanism for hierarchical grouping of physical 
evidence) is intrinsically "wrong", they simply facilitate different 
competency questions.

So I believe that the answer to question II (how do you define the 
class?) is better answered by saying "instances of the class are 
resources that facilitate the competency questions" than to base the 
definition on a philosophical discussion of what people imagine an 
"Individual" or "BiologicalEntity" to be (I think I'm agreeing with the 
point Bob was trying to make).  I think that at a minimum, functions 1 
and 2 (defining an entity to which multiple occurrences can be linked 
and to which multiple forms of evidence can be linked) must be 
accommodated by the Individual/BiologicalEntity class; the utility of 
those two functions is certainly implied by the already-existing term 
dwc:individualID . 

Personally, I would like for the third function ("inferring duplicates"; 
linking multiple Identifications to the same entity and being assured 
that all Identifications of the same Individual would apply to all 
artifacts associated with that Individual) to be accommodated by the 
definition, but the current state of the discussion is unclear on this 
point.  The sticking point is related to the definition of "taxonomic 
homogeneity".  When I used that term, I intended for it to mean that the 
entity is believed to be homogeneous to the lowest possible level in the 
way that one knows that two branches from the same tree or two parts of 
the same clonal organism are guaranteed to have the same taxonomic 
identify at every level.  This is NOT the way Rich was using the term - 
he explained that he intended that a "taxonomically homogeneous" 
biological entity could be an aggregate of organisms that are known to 
be heterogeneous at a lower taxonomic level but that for whatever reason 
were identified at a higher taxonomic level common to all organisms 
(e.g. we know that the five fish in this jar are different species of 
fish, but we are going to identify the lot as class=Actinopterygii for 
the time being).  However, Cam and I have previously defined such 
entities as "taxonomically heterogeneous" (see 
http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity ).  I 
would prefer that "taxonomic homogeneity" be restricted to the 
definition that I intended simply because I don't know of any other 
thing to call something that is believed to be taxonomically homogeneous 
to the lowest possible level.  The reason why this is important is that 
if an Individual/BiologicalEntity is allowed to be taxonomically 
heterogeneous (sensu darwin-sw) then one cannot infer that discovered 
"duplicates" share all Identifications given to any duplicate pieces of 
evidence, whereas if an Individual/BiologicalEntity is required to be 
taxonomcally homogeneous (sensu darwin-sw, NOT sensu Pyle) then one can 
make that inference.  This is a circumstance that occurs fairly 
regularly when comparing specimens in different herbaria - one discovers 
two "duplicate" specimens that have been assigned different 
Identifications and one infers that both Identifications apply to the 
source tree, clump of moss, small population of herbs, etc. (i.e. 
Individual/BiologicalEntity).  On the other hand, if a taxonomically 
heterogeneous (sensu darwin-sw) marine trawl sample is subdivided into 
two samples that are sent to two museums, one could not safely infer 
that an Identification made based on one sample could be applied to the 
other sample.  The two subsamples may contain sets of fish that can be 
identified to taxa that are at a lower level that the initial 
Identification of the conglomerate sample and which are not the same. 

So I'm not saying that the definition of Individual/BiologicalEntity 
MUST facilitate my competency question 3.  What I'm saying is that we 
MUST make it clear whether or not we intend for the proposed class to be 
restricted to taxonomically homogeneous entities (sensu darwin-sw) 
because that will determine whether the class will facilitate competency 
question 3 or not.  It would be a bad thing for different people to have 
different understandings about the restrictiveness of the term.  If 
there is a consensus that an Individual/BiologicalEntity should be 
taxonomically homogeneous, then to some extent that provides a practical 
functional definition of what kinds of things should qualify as 
Individuals/BiologicalEntities.  A herd of caribou, a clump of moss, a 
tree, a small uniform patch of herbaceous plants, a coral head, a tissue 
culture sample, and a network of slime mold would be 
Individuals/BiologicalEntities because there is a reasonable expectation 
that they are taxonomically homogeneous.  If there is a consensus that 
an Individual/BiologicalEntity need NOT be taxonomically homogeneous, 
then pretty much any kind of thing involving life would qualify: all of 
the animals in Yellowstone Park, all the jars sitting on shelves in the 
Smithsonian Institution, the Great Barrier Reef, etc. If the definition 
is broadened to that level, then I'm left wondering what competency 
questions the proposed class could still serve.

This email is now at or has exceeded the length of an email that many 
people will take the time to read.  So I will draw it to a close and 
post a separate email on the topic of competency questions for John's 
proposed class "CollectionObject" which I believe address Rich's desire 
to track "real-world" objects (samples, re-samples, etc.). 


Bob Morris wrote:
> There is a series of jokes, and an entire TV quiz show, essentially
> starting from the meme "What is the question to which the answer is
> <X>".  Now, I am not a biologist (surprise!), so it is  likely that
> domain ignorance  leaves me unable to understand whether all the
> postings in the thread about new DwC term resolution  are arguing from
> the same set of questions their authors hope to have answered by a
> resolution of the term "Individual".  (It's even a little unclear to
> me whether everybody has the same notion of "resolution of a term",
> but that's a whole different discussion, which would contain a lot of
> uses of  "rdf:type" and the contentious "rdfs:domain").
> I speculate that lengthy term definition debates would be shorter if
> they started with agreement on competency questions for the term.
> Competency questions are sort of usage scenarios cast as questions.
> See http://marinemetadata.org/references/competencyquestionsoverview .
> Bob Morris
> On Thu, Jul 14, 2011 at 2:41 AM, Richard Pyle <deepreef at bishopmuseum.org> wrote:
>> My turn to disagree (strongly, in this case).  It's not an instance of a
>> taxon, it's an instance of an Organism.  A taxon is merely a non-factual
>> (i.e., opinion-based)  attribute of an organism, secondarily associated via
>> an Identification instance.
>> I could probably be comfortable with "OrganismInstance"; but in that case,
>> why not just "Organism" as Paul suggested?  Isn't "Instance" sort of implied
>> by all the classes?
>> I am certainly open to debate about where the "upper boundary" of an
>> instance of this class, and I agree that "population" could be interpreted
>> more as a low level of "taxon", rather than a high level of "organism".  But
>> I certainly don't think that instances of this class should be limited to a
>> singular organism.  Would a coral head then constitute thousands of
>> instances of this class?  Surely such colonies could be collapsed into a
>> single instance of this class.  And the same would likely also be useful for
>> colonies of insects (ants, termites, bees, etc.), as well as small groups
>> (pack of wolves, pod of whales, etc.); not to mention a specimen "lot" in a
>> Museum collection.
>> I agree it should have only *one* taxon, but that there should be no upper
>> limit on the rank of this taxon. If more than one taxon is identified, then
>> there needs to be a separate instance of this class for each identified
>> taxon.  But this only applies when multiple taxa are acknowledged -- it does
>> NOT restrict multiple taxa being linked to the same instance via multiple
>> identifications when there is a difference of opinion about what the correct
>> taxon identity should be.  In other words, an instance of this class may be
>> identified as "A" *or* "B", but could not legitimately be identified as "A"
>> *and* "B" simultaneously (except, perhaps in the case of hybrids, but that's
>> another situation altogether).
>> More later.
>> Aloha,
>> Rich
>>> -----Original Message-----
>>> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
>>> bounces at lists.tdwg.org] On Behalf Of Gregor Hagedorn
>>> Sent: Tuesday, July 12, 2011 10:09 AM
>>> To: Steven J. Baskauf
>>> Cc: tdwg-content at lists.tdwg.org
>>> Subject: Re: [tdwg-content] New terms need resolution: "Individual"
>>>> represent a single taxon.  I think that Individual is probably not a
>>>> good name due to confusion with the technical use of that term
>>> elsewhere.
>>> TaxonInstance seems to me to be perhaps most precise.
>>> Personally I have a problem merging individual with population, since
>>> population -> metapopulation -> subspecies form a continuum in my
>>> understanding. But I am quite willing to be pragmatical :-)
>>> Gregor
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110726/4381e38f/attachment.html 

More information about the tdwg-content mailing list