[tdwg-content] Plea for competency questions. Was Re: New terms need resolution: "Individual"
Steve Baskauf
steve.baskauf at vanderbilt.edu
Tue Jul 26 20:09:01 CEST 2011
I am back in the land of Internet access again after several weeks of
glorious lack of it.
There have been a number of posts regarding John Wieczorek's proposed
resolution of the "class Individual" proposal
(http://code.google.com/p/darwincore/issues/detail?id=69) though the
creation of the "BiologicalEntity" class
(http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002575.html).
In these posts two general issues raised:
I. Competency questions, i.e. how will the creation of this class help
us do something useful?
II. What sorts of resources should be instances of this class, i.e. how
should the class be defined?
These general questions were previously discussed at length between
October 2009 and February 2011 concurrently with general questions about
the meaning of "Event", "Occurrence","Taxon" and other existing DwC
classes. Since it requires many hours to review the many tdwg-content
posts during that period, I will refer to a summary at
http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary for
anyone who is interested in skimming the main points of that discussion.
When I proposed the Individual class, I wanted to explicitly recognize a
class of resources that would facilitate three things, which I suppose
are "competency questions" of a sort:
1. To allow for linking multiple Occurrence records that involved the
same organism at different times and/or places (i.e. the intended
purpose of the existing dwc:individualID term; i.e. to facilitate
resampling). Examples would be mark/recapture, radio tracking,
photo-identification of whales, tracking the status of a sessile
organism over time, etc.
2. To allow for the linking or grouping of multiple forms of evidence
associated with the same organism which may have been collected at the
same time and place (multiple forms of documentation for a single
occurrence) or during several Occurrences. Examples would be
collections of several images, several specimens, or both images and
specimens from the same organism.
3. To link multiple Identifications of the same individual organism,
particularly when these Identifications were based one different pieces
of evidence arising from the same individual. For example, if
"duplicate" specimens from the same organism ended up in different
museums they might be assigned different Identifications. An
Identification asserted for one specimen would apply to other specimens
from the same individual organism even if that Identification wasn't
explicitly assigned to the other specimens. I have referred to this
function as "inferring duplicates" - it could also be called "tracking
duplicates" if the two samples were known to have arisen from the same
individual organism before they were identified rather than after.
(Since I'm not really up on the technical definition of "competency
question" please excuse me if I'm misapplying the term to mean "what we
want something to be able to do".)
As the discussion progressed, it became clear that others wanted the
proposed Individual class to do other things as well. In particular,
Rich Pyle articulated a desire to provide a mechanism for grouping in a
hierarchical manner pieces of physical evidence that were derived from
living organisms. These could include aggregates of organisms,
organisms, and pieces of organisms. After some rather heated
discussion, Kevin Richards put his finger on the critical distinction
between what I wanted the Individual class to do and what Rich wanted:
"it seems like Steve's idea for the Individual more closely resembles a
many-to-many joining table in a database (ie doesn't serve much use
other than connecting two tables/classes together - and doesn't normally
relate to a 'real world' type of object). Whereas it seems Rich's idea
is to relate it more to 'real-world' objects, such as samples,
re-samples, etc, to allow tracking and connectability of the
observed/collected/processed individuals..."
(http://lists.tdwg.org/pipermail/tdwg-content/2010-November/001956.html).
After thinking about this for a very long time, I've become convinced
that Kevin was on the right track. It seems to me that what we have
here is two different sets of "competency questions" which define sets
of entities that overlap but which are not congruent. An actual single,
live organism can serve both as a unit for resampling and "attachment"
of Identifications, AND as an organizational unit that is part of and
which has parts that are biological samples. Some other entities, such
as cohesive pack/herds and clonal organisms can also serve both
purposes. Other entities cannot: it doesn't make sense to resample dead
organisms or pieces of organisms, and an identification applied to part
of a taxonomically heterogeneous unit (e.g. a mixed flock of birds)
cannot be reliably inferred to apply to another part of the same unit.
Neither intended purpose/definition of "Individual" (the many to many
database join or the mechanism for hierarchical grouping of physical
evidence) is intrinsically "wrong", they simply facilitate different
competency questions.
So I believe that the answer to question II (how do you define the
class?) is better answered by saying "instances of the class are
resources that facilitate the competency questions" than to base the
definition on a philosophical discussion of what people imagine an
"Individual" or "BiologicalEntity" to be (I think I'm agreeing with the
point Bob was trying to make). I think that at a minimum, functions 1
and 2 (defining an entity to which multiple occurrences can be linked
and to which multiple forms of evidence can be linked) must be
accommodated by the Individual/BiologicalEntity class; the utility of
those two functions is certainly implied by the already-existing term
dwc:individualID .
Personally, I would like for the third function ("inferring duplicates";
linking multiple Identifications to the same entity and being assured
that all Identifications of the same Individual would apply to all
artifacts associated with that Individual) to be accommodated by the
definition, but the current state of the discussion is unclear on this
point. The sticking point is related to the definition of "taxonomic
homogeneity". When I used that term, I intended for it to mean that the
entity is believed to be homogeneous to the lowest possible level in the
way that one knows that two branches from the same tree or two parts of
the same clonal organism are guaranteed to have the same taxonomic
identify at every level. This is NOT the way Rich was using the term -
he explained that he intended that a "taxonomically homogeneous"
biological entity could be an aggregate of organisms that are known to
be heterogeneous at a lower taxonomic level but that for whatever reason
were identified at a higher taxonomic level common to all organisms
(e.g. we know that the five fish in this jar are different species of
fish, but we are going to identify the lot as class=Actinopterygii for
the time being). However, Cam and I have previously defined such
entities as "taxonomically heterogeneous" (see
http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity ). I
would prefer that "taxonomic homogeneity" be restricted to the
definition that I intended simply because I don't know of any other
thing to call something that is believed to be taxonomically homogeneous
to the lowest possible level. The reason why this is important is that
if an Individual/BiologicalEntity is allowed to be taxonomically
heterogeneous (sensu darwin-sw) then one cannot infer that discovered
"duplicates" share all Identifications given to any duplicate pieces of
evidence, whereas if an Individual/BiologicalEntity is required to be
taxonomcally homogeneous (sensu darwin-sw, NOT sensu Pyle) then one can
make that inference. This is a circumstance that occurs fairly
regularly when comparing specimens in different herbaria - one discovers
two "duplicate" specimens that have been assigned different
Identifications and one infers that both Identifications apply to the
source tree, clump of moss, small population of herbs, etc. (i.e.
Individual/BiologicalEntity). On the other hand, if a taxonomically
heterogeneous (sensu darwin-sw) marine trawl sample is subdivided into
two samples that are sent to two museums, one could not safely infer
that an Identification made based on one sample could be applied to the
other sample. The two subsamples may contain sets of fish that can be
identified to taxa that are at a lower level that the initial
Identification of the conglomerate sample and which are not the same.
So I'm not saying that the definition of Individual/BiologicalEntity
MUST facilitate my competency question 3. What I'm saying is that we
MUST make it clear whether or not we intend for the proposed class to be
restricted to taxonomically homogeneous entities (sensu darwin-sw)
because that will determine whether the class will facilitate competency
question 3 or not. It would be a bad thing for different people to have
different understandings about the restrictiveness of the term. If
there is a consensus that an Individual/BiologicalEntity should be
taxonomically homogeneous, then to some extent that provides a practical
functional definition of what kinds of things should qualify as
Individuals/BiologicalEntities. A herd of caribou, a clump of moss, a
tree, a small uniform patch of herbaceous plants, a coral head, a tissue
culture sample, and a network of slime mold would be
Individuals/BiologicalEntities because there is a reasonable expectation
that they are taxonomically homogeneous. If there is a consensus that
an Individual/BiologicalEntity need NOT be taxonomically homogeneous,
then pretty much any kind of thing involving life would qualify: all of
the animals in Yellowstone Park, all the jars sitting on shelves in the
Smithsonian Institution, the Great Barrier Reef, etc. If the definition
is broadened to that level, then I'm left wondering what competency
questions the proposed class could still serve.
This email is now at or has exceeded the length of an email that many
people will take the time to read. So I will draw it to a close and
post a separate email on the topic of competency questions for John's
proposed class "CollectionObject" which I believe address Rich's desire
to track "real-world" objects (samples, re-samples, etc.).
Steve
Bob Morris wrote:
> There is a series of jokes, and an entire TV quiz show, essentially
> starting from the meme "What is the question to which the answer is
> <X>". Now, I am not a biologist (surprise!), so it is likely that
> domain ignorance leaves me unable to understand whether all the
> postings in the thread about new DwC term resolution are arguing from
> the same set of questions their authors hope to have answered by a
> resolution of the term "Individual". (It's even a little unclear to
> me whether everybody has the same notion of "resolution of a term",
> but that's a whole different discussion, which would contain a lot of
> uses of "rdf:type" and the contentious "rdfs:domain").
>
> I speculate that lengthy term definition debates would be shorter if
> they started with agreement on competency questions for the term.
> Competency questions are sort of usage scenarios cast as questions.
> See http://marinemetadata.org/references/competencyquestionsoverview .
>
> Bob Morris
>
>
>
> On Thu, Jul 14, 2011 at 2:41 AM, Richard Pyle <deepreef at bishopmuseum.org> wrote:
>
>> My turn to disagree (strongly, in this case). It's not an instance of a
>> taxon, it's an instance of an Organism. A taxon is merely a non-factual
>> (i.e., opinion-based) attribute of an organism, secondarily associated via
>> an Identification instance.
>>
>> I could probably be comfortable with "OrganismInstance"; but in that case,
>> why not just "Organism" as Paul suggested? Isn't "Instance" sort of implied
>> by all the classes?
>>
>> I am certainly open to debate about where the "upper boundary" of an
>> instance of this class, and I agree that "population" could be interpreted
>> more as a low level of "taxon", rather than a high level of "organism". But
>> I certainly don't think that instances of this class should be limited to a
>> singular organism. Would a coral head then constitute thousands of
>> instances of this class? Surely such colonies could be collapsed into a
>> single instance of this class. And the same would likely also be useful for
>> colonies of insects (ants, termites, bees, etc.), as well as small groups
>> (pack of wolves, pod of whales, etc.); not to mention a specimen "lot" in a
>> Museum collection.
>>
>> I agree it should have only *one* taxon, but that there should be no upper
>> limit on the rank of this taxon. If more than one taxon is identified, then
>> there needs to be a separate instance of this class for each identified
>> taxon. But this only applies when multiple taxa are acknowledged -- it does
>> NOT restrict multiple taxa being linked to the same instance via multiple
>> identifications when there is a difference of opinion about what the correct
>> taxon identity should be. In other words, an instance of this class may be
>> identified as "A" *or* "B", but could not legitimately be identified as "A"
>> *and* "B" simultaneously (except, perhaps in the case of hybrids, but that's
>> another situation altogether).
>>
>> More later.
>>
>> Aloha,
>> Rich
>>
>>
>>> -----Original Message-----
>>> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
>>> bounces at lists.tdwg.org] On Behalf Of Gregor Hagedorn
>>> Sent: Tuesday, July 12, 2011 10:09 AM
>>> To: Steven J. Baskauf
>>> Cc: tdwg-content at lists.tdwg.org
>>> Subject: Re: [tdwg-content] New terms need resolution: "Individual"
>>>
>>>
>>>> represent a single taxon. I think that Individual is probably not a
>>>> good name due to confusion with the technical use of that term
>>>>
>>> elsewhere.
>>>
>>> TaxonInstance seems to me to be perhaps most precise.
>>> Personally I have a problem merging individual with population, since
>>> population -> metapopulation -> subspecies form a continuum in my
>>> understanding. But I am quite willing to be pragmatical :-)
>>>
>>> Gregor
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>
>
>
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110726/4381e38f/attachment.html
More information about the tdwg-content
mailing list