[tdwg-content] Treatise on Occurrence, tokens, and basisOfRecord

Mon Oct 25 10:37:03 CEST 2010

Dear Steve,

Thanks for this clear and compelling argument in favor of Occurrences 
being different from the tokens created in their documentation.

> So I'll return to the basic question: is the consensus for modeling the 
> relationship between an Occurrence and associated token(s) the assumed 
> token model:
> ...
> or the explicit token model

Having no long personal history of use of Occurrence, and with respect for 
the huge amount of work that crafting the DwC terms must have taken, but 
having tried semantic modeling (in a previous post) using the overloaded 
term Occurrence, I for one vote for the latter, as conceptually clearer. 
A specimen is then a Specimen, an image an Image, and so on.

But then what exactly are the Occurrences themselves?  From Richard Pyle:

   ``So, an Occurrence is the intersection of an Individual and an Event.
   An Event is a Location+Time[+other metadata].  Each Event may have
   multiple Occurrences (i.e., one for each distinct Individual at the same
   Location+Time).  Also, an Individual may have multiple Occurrences (one
   for each Event at which the same Individual was documented).''

So the Occurrence is the Individual _itself_ bounded by space and time, 
the latter data currently recorded in the Event class.  What I then want 
to ask is, 1. do the terms for clearly defining the bounds of the 
Occurrence already exist?  There exist terms for spatial uncertainty: 
dwc:coordinateUncertaintyInMeters, and coarse ones for temporal bounds: 
startDayOfYear + endDayOfYear, but not for temporal uncertainty, or 
spatial bounds (but see Pete's 
http://lod.taxonconcept.org/ontology/dwc_area.owl).  Also, 2. if there was 
a consensus for moving to the `explicit token' model, should the 
space-time bounds of the Occurrence still be contained in an associated 
(often blank) Event, or accepted as properties of the Occurrence itself 
(e.g., occurrenceDate, occurrenceDuration, occurrenceLocation, 
occurrenceRadius)?  I would support the latter.

Finally, 3. if there was a consensus for moving to the `explicit token' 
model, and a human observation was a token-less Occurrence, would we best 
specify who made the observation with dwc:recordedBy and what the 
observation was with dwc:occurrenceRemarks, or would it be better to 
create a second new token (along with `Physical specimens') that was an 
explicit Observation class, that would link explicitly to, say, an 
external observational ontology (i.e., OBOE)?  The issue of GUIDs for 
non-physical observations comes up, but this could still be solved in 
various ways.

Stepping back from the details for a moment, and reading some of the 
replies to Steve's post that have come in, I am wondering how many readers 
are thinking, ``the need for a semantic web standard for biodiversity 
information might be better achieved by a deep fork of Darwin Core, 
adopting new Classes and explicit domains and ranges for each term, to 
create a `Darwin SW,' rather than by an effort to evolve Darwin Core 
itself.''  I'm sure the question of forking Darwin Core has come up 
before, and I'm sure the discussion was passionate!

Best,

Cam