[tdwg-content] Treatise on Occurrence, tokens, and basisOfRecord

Tue Oct 26 08:32:01 CEST 2010

Hi Steve,

I read every word, and don't really disagree with anything.  I think the key
point is this:

"separate the token (evidence) from the Occurrence no matter what kind of
evidence the token is"

This is the main reason I've been quick to support your notion of a separate
"Individual" (sensu lato) class, and why certain properties of Occurrence
should port over to that Class.

Aloha,
Rich

________________________________

	From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Steve Baskauf
	Sent: Monday, October 25, 2010 7:32 PM
	To: Cam Webb
	Cc: tdwg-content at lists.tdwg.org
	Subject: Re: [tdwg-content] Treatise on Occurrence, tokens, and
basisOfRecord

	This is a composite response to several posts by Cam, Rich, and Jim.

	This thread has been extremely enlightening to me for several
reasons.  One is that as a "right brain" type person, the evolving diagram
of the relationships among the Darwin Core classes (i.e.
http://bioimages.vanderbilt.edu/pages/token-explicit.gif) has really
clarified some things in my mind.  The other reason is that the thread has
convinced me that the best approach is to clearly separate things
conceptually and avoid "overloading" the terms and classes by expecting them
to simultaneously accomplish too many different things.  Although this
overloading may be convenient from the standpoint of how we like to think
about "things" (a.k.a. resources), it causes problems when we try to
explicitly define the properties and relationships of those resources.  In
particular, I'm thinking of trying to have classes both "be" and "do" two
things at once.

	Some of the disagreement that has emerged regarding Occurrences
comes from what we (based on our different personal experiences) think that
an Occurrence should BE.  I think that a more productive approach would be
to ask "what do we want Occurrences to DO?"  I will illustrate that approach
with the case of the proposed class Individual, then try to see what this
approach tells us about how Occurrences should be defined.  

	Initially, I wanted to think of instances of the proposed class
Individual as actual biological individuals.  That was, in most cases, what
I was interested in tracking.  However, when I considered what I wanted the
record for an Individual to "do" I realized that many times it was useful to
consider an "individual" to include small populations of organisms of the
same taxon (species or lower rank if it exists; assume this when I say
"taxon" here).  Sometimes this was just convenient and sometimes it was
necessary (like in the case of moss) because I couldn't tell where one
biological individual ended and another began.  When I began to try to map
out what I meant by an Individual (in terms of diagrams or in RDF), it
became clear to me that what I really was interested in was a way to connect
multiple Occurrences to (possibly) multiple determinations.  That's why I
included in my paper's title "... as resource relationship nodes", i.e. as a
way to connect those things.  Since the beginning of this recent thread, it
has been even clearer that the functional approach to defining Individuals
defines them better than any conceptual idea that I had about what an
individual was.  The consensus definition of an Occurrence seemed to be
something like "a record that a taxon representative occurred at a
particular location at a particular time".  "Taxon representative" could
legitimately include any unit that could reliably said to represent a single
taxon, from a single biological organism to a small group as long as one
could be reasonably sure that all of the biological individuals in that
group were of the same taxon.  If (as someone noted) the group of biological
individuals got big enough that it included (perhaps by accident) several
species, then it was too large and needed to be split into smaller groups
where only a single taxon was included.  If that group were to be resampled
at a later time (as individualID was designed to facilitate), then the group
would need to have some kind of stability (like plants growing together or a
stable herd of animals).  The point I'm trying to get at here is that the
useful way of defining Individual is to define it in a way that it "does"
what we want: connect Occurrences to Determinations in a way that allows for
resampling (which is functionally equivalent to saying multiple Occurrences
per Individual).  That is far more productive than trying to make a
philosophical argument about what constitutes an individual, or what we
would like for an individual to "be".  

	Applying this approach to Occurrence, we should ask the functional
question "What do we want Occurrences to do?" rather than "What do we think
that they are?"  Let's return to the diagram which seems to be the current
favorite model: http://bioimages.vanderbilt.edu/pages/token-explicit.gif .
If the "consensus" definition of an Occurrence is that it tells us that a
taxon representative was at a particular location at a particular time (and
if we accept that Event represents a time and a Location), then what we want
Occurrence to "do" is to act as a node that connects an Event to an
Individual (i.e. the taxon representative).  There also seems to be a
consensus that we would, if possible, like to associate Occurrence records
with evidence that supports them (called "tokens" by me).  Thus we can
expand the description of what we want an Occurrence to "do" to include
connecting one or more tokens to an Event and an Individual.  I submit that
we should really forget about whether we think that specimens are somehow
more representative of the Individual than sounds, photos, etc. or not.  The
bottom line is that what we need an Occurrence record to do is to act as a
conceptual resource that connects an Event, an Individual, and zero to many
tokens (or one to many tokens if a memory is considered a token).  

	By this functional definition, we can clearly say what an Occurrence
is (a resource of the type dwctype:Occurrence) and say what its properties
are (ones that always have a one-to-one relationship with a single
occurrence, such as recordedBy).  If we take a philosophical approach to
defining an occurrence and say that specimen metadata should be included
with Occurrence metadata because somehow specimens better represent the
individuals than "representations" like image, then we have a mess.  We
would have to say that an Occurrence has dwctype:Occurrence, but that it's
also a resource of dwctype:PreservedSpecimen, except of course if its an
observation, in which case it's NOT also dwctype:PreservedSpecimen.  We
would have to say that Occurrences always can have a recordedBy property,
but sometimes they will have a dwc:preparations, or a dwc:disposition
property but sometimes they won't.  It seems to me that it would be far
simpler and semantically clearer to just say that an occurrence is a
dwctype:Occurrence with properties that only occurrences have, that a
specimen is a dwctype:PreservedSpecimen with properties that only specimens
have, and that an image is a dctype:StillImage with properties that MRTG
says it has.  In other words, separate the token (evidence) from the
Occurrence no matter what kind of evidence the token is.  

	I was thinking about walking out onto the dwctype:LivingSpecimen
minefield tonight (because I think it is related to this issue), but decided
that I would rather hold off until somebody who was involved in the
development of the current and previous incarnations of DwC explains exactly
what dwc:basisOfRecord is for (since LivingSpecimen is a controlled value
for basisOfRecord).  I think there is danger of me blowing myself up (i.e.
making an idiot of myself) if I don't know the answer to that question
first.  However, since those people may not be reading the detailed posts,
I'm going to post that question as a separate item.

	Steve