Re: [tdwg-content] Treatise on Occurrence, tokens, and basisOfRecord

2 Nov 2010


      Two points (sorry about the pun) of information. First, DwC supports
geometries (all of Rich's examples, and many more) through dwc:footprintWKT
as Well-known Text. Second, there are no startDate and endDate terms in DwC.
The single eventDate term is meant to comply with ISO 8601, which is capable
of expressing not just dates, but also intervals, among other expressions of
time. eventDate is one of the DwC terms that does have an elaboration in the
DwC wiki pages, at http://code.google.com/p/darwincore/wiki/Event#eventDate.
Note that those who would express eventDate in an application profile as
conforming the the W3C xs:dateTime will be be over-restrictive. The
constraints on xs:dateTime can be found at
http://www.w3.org/TR/xmlschema11-2/#dateTime.

On Mon, Oct 25, 2010 at 12:26 PM, Richard Pyle <deepreef@bishopmuseum.org>wrote:
...
Hi Cam,
...
What I then want to ask is, 1. do the terms for
clearly defining the bounds of the Occurrence already exist?
There exist terms for spatial uncertainty:
dwc:coordinateUncertaintyInMeters, and coarse ones for
temporal bounds:
startDayOfYear + endDayOfYear, but not for temporal
uncertainty, or spatial bounds (but see Pete's
http://lod.taxonconcept.org/ontology/dwc_area.owl).
The question of temporal uncertainty is an excellent one, and after years
of
struggling, I have no "elegant" solution ("elegant" in this case being a
high degree of capturing what what we want to capture, with a minimal set
of
attributes in a simple structure).  The problem is that given a range
represented by startDate and endDate (how I handle it in my database), the
interpretation may be any of the following:
- Event occurred at a singular (imprecisely known) point between startDate
and endDate
- Event occurred at multiple points between startDate and endDate
- Event occurred continuously beginning startDate and ending endDate
To overcome this, I added two additional fields:
verbatimDate: Used for historical datasets to record the verbatim date
information (useful for things like "Summer 1984")
dateRemarks: Some sort of text description of what is meant by the
startDate
and endDate values
Of course, nether of these date qualifiers is of much use in the semantic
context.  So what we may need some sort of controlled vocabulary for
"dateRangeQualifier" or something, that indicates how to interpret a date
range given only dateStart and dateEnd.
Another solution would be something analagous to my approach for handling
the second part of your question: spatial bounds.
In my universe of datasets, I have lat/lon coordinates that fall into one
of
several types:
- single point with uncertainty
- two points representing a transect line (commonly used for plankton tows)
- two points representing two corners of a bounding box
- series of multiple points representing a non-straight line (e.g., a
river,
road, or a non-straight survey path)
- series of multiple points representing a polygon
In addition to the fact that there are 1...n points to represent the
bounding of a place, there may be multiple re-interpretations of a point or
set of points, when retroactively georeferencing locality data.
So...the model I came up with looks something like this (using my same
ASCII-art notation as before)
Location--<CoordinateSet--<Coordinate
               |
       CoordinateSetType
A Coordinate minimally consists of a decimalLatitute, decimalLongitude, and
Sequence.
CoordinateSetType is a controlled vocabulary that defines the five types
listed above (point, transect, boundingBox, line, polygon).
Each CoordinateSet consists of 1, 2, or >2 Coordinates, depending on the
CoordinateSetType.
Attached to "CoordinateSet" is all the MaNIS-style metadata for
who/when/how
the coordinate was derived.
The reason for the 1:M Location:CoordinateSet is to allow for multiple
interpretations of retroactively established coordinates (e.g., following
the MaNIS protocol).
Whether or not things like Datum and Uncertainty are attached to
CoordinateSet or Coordinate depends on how much flexibility you need for
capturing heterogenous Datum or Uncertainty values within a particular set
(e.g., if certain nodes on a polygon are more precise than other nodes).  I
would defintiely put Datum on CoordinateSet, and probably also put
uncertainty there as well (which assumes that the same datum applies to
each
point in a set, and also that uncertainty is consistent for each point in a
set).
The pretty-much covers all spacial bounding protocols, and it's not too
difficult to derive a "point" coordinate from any of the other four, for
purposes of "dumbing the data down" to fit into DwC.  There are some
problems, but they are not important enough to go into now.
But the point is, you could conceivably handle dates using a similar
structure -- but the cases where parsing the precise date information is
imprtant are so few, that we probably don't need a semantic structure for
it, and can simly capture it in a human-readible dateRemarks.
Now....obviosuly this is all in data modelling space, not necessarily
DwC-space.  But I think it is useful to discuss how databases capture this
kind of information at the source when trying to figure out how best to
simplify it for aq content exchange protocol.
...
Also, 2.
if there was a consensus for moving to the `explicit token'
model, should the space-time bounds of the Occurrence still
be contained in an associated (often blank) Event, or
accepted as properties of the Occurrence itself (e.g.,
occurrenceDate, occurrenceDuration, occurrenceLocation,
occurrenceRadius)?  I would support the latter.
I would support the former.  I'm not sure I understand why Event is "often
blank".  If there is any space-time information, then Event is not blank.
In the context of the data I manage, it makes much more sense (in a DwC
context) to capture Events and Locations as distinct classes, than
representing multiple tokens for the same Occurrence.  Even if we don't
establish an individual class, dwc:individualID within the Occurrence class
allows us to deal with both the "same-organism-at-multiple-events"
situation, and the "multiple-tokens-for-same-organism-at-same-event"
situation.
...
Finally, 3. if there was a consensus for moving to the
`explicit token'
model, and a human observation was a token-less Occurrence,
would we best specify who made the observation with
dwc:recordedBy and what the observation was with
dwc:occurrenceRemarks, or would it be better to create a
second new token (along with `Physical specimens') that was
an explicit Observation class, that would link explicitly to,
say, an external observational ontology (i.e., OBOE)?  The
issue of GUIDs for non-physical observations comes up, but
this could still be solved in various ways.
I would favor at the very least a "place-holder" or "implied" token for a
human observation.  It's functionally analagous to the situation where a
photo was taken, but then accidentally destroyed or lost.  The only
difference between an image and a memory is that the image is generally
more
durable, and is more easily and precisely conveyed from person to person.
...
Stepping back from the details for a moment, and reading some
of the replies to Steve's post that have come in, I am
wondering how many readers are thinking, ``the need for a
semantic web standard for biodiversity information might be
better achieved by a deep fork of Darwin Core, adopting new
Classes and explicit domains and ranges for each term, to
create a `Darwin SW,' rather than by an effort to evolve
Darwin Core itself.''  I'm sure the question of forking
Darwin Core has come up before, and I'm sure the discussion
was passionate!
To the extent that I understand both DwC and the semantic web, this seems
to
me to be the most parsimonious approach.
Aloha,
Rich
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

Re: [tdwg-content] Treatise on Occurrence, tokens, and basisOfRecord

John Wieczorek