[tdwg-content] Treatise on Occurrence, tokens, and basisOfRecord

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Oct 26 07:31:37 CEST 2010


This is a composite response to several posts by Cam, Rich, and Jim. 

This thread has been extremely enlightening to me for several reasons.  
One is that as a "right brain" type person, the evolving diagram of the 
relationships among the Darwin Core classes (i.e. 
http://bioimages.vanderbilt.edu/pages/token-explicit.gif) has really 
clarified some things in my mind.  The other reason is that the thread 
has convinced me that the best approach is to clearly separate things 
conceptually and avoid "overloading" the terms and classes by expecting 
them to simultaneously accomplish too many different things.  Although 
this overloading may be convenient from the standpoint of how we like to 
think about "things" (a.k.a. resources), it causes problems when we try 
to explicitly define the properties and relationships of those 
resources.  In particular, I'm thinking of trying to have classes both 
"be" and "do" two things at once.

Some of the disagreement that has emerged regarding Occurrences comes 
from what we (based on our different personal experiences) think that an 
Occurrence should BE.  I think that a more productive approach would be 
to ask "what do we want Occurrences to DO?"  I will illustrate that 
approach with the case of the proposed class Individual, then try to see 
what this approach tells us about how Occurrences should be defined. 

Initially, I wanted to think of instances of the proposed class 
Individual as actual biological individuals.  That was, in most cases, 
what I was interested in tracking.  However, when I considered what I 
wanted the record for an Individual to "do" I realized that many times 
it was useful to consider an "individual" to include small populations 
of organisms of the same taxon (species or lower rank if it exists; 
assume this when I say "taxon" here).  Sometimes this was just 
convenient and sometimes it was necessary (like in the case of moss) 
because I couldn't tell where one biological individual ended and 
another began.  When I began to try to map out what I meant by an 
Individual (in terms of diagrams or in RDF), it became clear to me that 
what I really was interested in was a way to connect multiple 
Occurrences to (possibly) multiple determinations.  That's why I 
included in my paper's title "... as resource relationship nodes", i.e. 
as a way to connect those things.  Since the beginning of this recent 
thread, it has been even clearer that the functional approach to 
defining Individuals defines them better than any conceptual idea that I 
had about what an individual was.  The consensus definition of an 
Occurrence seemed to be something like "a record that a taxon 
representative occurred at a particular location at a particular time".  
"Taxon representative" could legitimately include any unit that could 
reliably said to represent a single taxon, from a single biological 
organism to a small group as long as one could be reasonably sure that 
all of the biological individuals in that group were of the same taxon.  
If (as someone noted) the group of biological individuals got big enough 
that it included (perhaps by accident) several species, then it was too 
large and needed to be split into smaller groups where only a single 
taxon was included.  If that group were to be resampled at a later time 
(as individualID was designed to facilitate), then the group would need 
to have some kind of stability (like plants growing together or a stable 
herd of animals).  The point I'm trying to get at here is that the 
useful way of defining Individual is to define it in a way that it 
"does" what we want: connect Occurrences to Determinations in a way that 
allows for resampling (which is functionally equivalent to saying 
multiple Occurrences per Individual).  That is far more productive than 
trying to make a philosophical argument about what constitutes an 
individual, or what we would like for an individual to "be". 

Applying this approach to Occurrence, we should ask the functional 
question "What do we want Occurrences to do?" rather than "What do we 
think that they are?"  Let's return to the diagram which seems to be the 
current favorite model: 
http://bioimages.vanderbilt.edu/pages/token-explicit.gif .  If the 
"consensus" definition of an Occurrence is that it tells us that a taxon 
representative was at a particular location at a particular time (and if 
we accept that Event represents a time and a Location), then what we 
want Occurrence to "do" is to act as a node that connects an Event to an 
Individual (i.e. the taxon representative).  There also seems to be a 
consensus that we would, if possible, like to associate Occurrence 
records with evidence that supports them (called "tokens" by me).  Thus 
we can expand the description of what we want an Occurrence to "do" to 
include connecting one or more tokens to an Event and an Individual.  I 
submit that we should really forget about whether we think that 
specimens are somehow more representative of the Individual than sounds, 
photos, etc. or not.  The bottom line is that what we need an Occurrence 
record to do is to act as a conceptual resource that connects an Event, 
an Individual, and zero to many tokens (or one to many tokens if a 
memory is considered a token). 

By this functional definition, we can clearly say what an Occurrence is 
(a resource of the type dwctype:Occurrence) and say what its properties 
are (ones that always have a one-to-one relationship with a single 
occurrence, such as recordedBy).  If we take a philosophical approach to 
defining an occurrence and say that specimen metadata should be included 
with Occurrence metadata because somehow specimens better represent the 
individuals than "representations" like image, then we have a mess.  We 
would have to say that an Occurrence has dwctype:Occurrence, but that 
it's also a resource of dwctype:PreservedSpecimen, except of course if 
its an observation, in which case it's NOT also 
dwctype:PreservedSpecimen.  We would have to say that Occurrences always 
can have a recordedBy property, but sometimes they will have a 
dwc:preparations, or a dwc:disposition property but sometimes they 
won't.  It seems to me that it would be far simpler and semantically 
clearer to just say that an occurrence is a dwctype:Occurrence with 
properties that only occurrences have, that a specimen is a 
dwctype:PreservedSpecimen with properties that only specimens have, and 
that an image is a dctype:StillImage with properties that MRTG says it 
has.  In other words, separate the token (evidence) from the Occurrence 
no matter what kind of evidence the token is. 

I was thinking about walking out onto the dwctype:LivingSpecimen 
minefield tonight (because I think it is related to this issue), but 
decided that I would rather hold off until somebody who was involved in 
the development of the current and previous incarnations of DwC explains 
exactly what dwc:basisOfRecord is for (since LivingSpecimen is a 
controlled value for basisOfRecord).  I think there is danger of me 
blowing myself up (i.e. making an idiot of myself) if I don't know the 
answer to that question first.  However, since those people may not be 
reading the detailed posts, I'm going to post that question as a 
separate item.

Steve

Cam Webb wrote:
> Dear Steve,
>
> Thanks for this clear and compelling argument in favor of Occurrences 
> being different from the tokens created in their documentation.
>
>   
>> So I'll return to the basic question: is the consensus for modeling the 
>> relationship between an Occurrence and associated token(s) the assumed 
>> token model:
>> ...
>> or the explicit token model
>>     
>
> Having no long personal history of use of Occurrence, and with respect for 
> the huge amount of work that crafting the DwC terms must have taken, but 
> having tried semantic modeling (in a previous post) using the overloaded 
> term Occurrence, I for one vote for the latter, as conceptually clearer. 
> A specimen is then a Specimen, an image an Image, and so on.
>
> But then what exactly are the Occurrences themselves?  From Richard Pyle:
>
>    ``So, an Occurrence is the intersection of an Individual and an Event.
>    An Event is a Location+Time[+other metadata].  Each Event may have
>    multiple Occurrences (i.e., one for each distinct Individual at the same
>    Location+Time).  Also, an Individual may have multiple Occurrences (one
>    for each Event at which the same Individual was documented).''
>
> So the Occurrence is the Individual _itself_ bounded by space and time, 
> the latter data currently recorded in the Event class.  What I then want 
> to ask is, 1. do the terms for clearly defining the bounds of the 
> Occurrence already exist?  There exist terms for spatial uncertainty: 
> dwc:coordinateUncertaintyInMeters, and coarse ones for temporal bounds: 
> startDayOfYear + endDayOfYear, but not for temporal uncertainty, or 
> spatial bounds (but see Pete's 
> http://lod.taxonconcept.org/ontology/dwc_area.owl).  Also, 2. if there was 
> a consensus for moving to the `explicit token' model, should the 
> space-time bounds of the Occurrence still be contained in an associated 
> (often blank) Event, or accepted as properties of the Occurrence itself 
> (e.g., occurrenceDate, occurrenceDuration, occurrenceLocation, 
> occurrenceRadius)?  I would support the latter.
>
> Finally, 3. if there was a consensus for moving to the `explicit token' 
> model, and a human observation was a token-less Occurrence, would we best 
> specify who made the observation with dwc:recordedBy and what the 
> observation was with dwc:occurrenceRemarks, or would it be better to 
> create a second new token (along with `Physical specimens') that was an 
> explicit Observation class, that would link explicitly to, say, an 
> external observational ontology (i.e., OBOE)?  The issue of GUIDs for 
> non-physical observations comes up, but this could still be solved in 
> various ways.
>
> Stepping back from the details for a moment, and reading some of the 
> replies to Steve's post that have come in, I am wondering how many readers 
> are thinking, ``the need for a semantic web standard for biodiversity 
> information might be better achieved by a deep fork of Darwin Core, 
> adopting new Classes and explicit domains and ranges for each term, to 
> create a `Darwin SW,' rather than by an effort to evolve Darwin Core 
> itself.''  I'm sure the question of forking Darwin Core has come up 
> before, and I'm sure the discussion was passionate!
>
> Best,
>
> Cam
>
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101026/9786aded/attachment-0001.html 


More information about the tdwg-content mailing list