Two points (sorry about the pun) of information. First, DwC supports geometries (all of Rich's examples, and many more) through dwc:footprintWKT as Well-known Text. Second, there are no startDate and endDate terms in DwC. The single eventDate term is meant to comply with ISO 8601, which is capable of expressing not just dates, but also intervals, among other expressions of time. eventDate is one of the DwC terms that does have an elaboration in the DwC wiki pages, at <a href="http://code.google.com/p/darwincore/wiki/Event#eventDate">http://code.google.com/p/darwincore/wiki/Event#eventDate</a>. Note that those who would express eventDate in an application profile as conforming the the W3C xs:dateTime will be be over-restrictive. The constraints on xs:dateTime can be found at <a href="http://www.w3.org/TR/xmlschema11-2/#dateTime">http://www.w3.org/TR/xmlschema11-2/#dateTime</a>.<br>
<br><div class="gmail_quote">On Mon, Oct 25, 2010 at 12:26 PM, Richard Pyle <span dir="ltr"><<a href="mailto:deepreef@bishopmuseum.org">deepreef@bishopmuseum.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Cam,<br>
<div class="im"><br>
> What I then want to ask is, 1. do the terms for<br>
> clearly defining the bounds of the Occurrence already exist?<br>
> There exist terms for spatial uncertainty:<br>
> dwc:coordinateUncertaintyInMeters, and coarse ones for<br>
> temporal bounds:<br>
> startDayOfYear + endDayOfYear, but not for temporal<br>
> uncertainty, or spatial bounds (but see Pete's<br>
> <a href="http://lod.taxonconcept.org/ontology/dwc_area.owl" target="_blank">http://lod.taxonconcept.org/ontology/dwc_area.owl</a>).<br>
<br>
</div>The question of temporal uncertainty is an excellent one, and after years of<br>
struggling, I have no "elegant" solution ("elegant" in this case being a<br>
high degree of capturing what what we want to capture, with a minimal set of<br>
attributes in a simple structure). The problem is that given a range<br>
represented by startDate and endDate (how I handle it in my database), the<br>
interpretation may be any of the following:<br>
<br>
- Event occurred at a singular (imprecisely known) point between startDate<br>
and endDate<br>
- Event occurred at multiple points between startDate and endDate<br>
- Event occurred continuously beginning startDate and ending endDate<br>
<br>
To overcome this, I added two additional fields:<br>
<br>
verbatimDate: Used for historical datasets to record the verbatim date<br>
information (useful for things like "Summer 1984")<br>
dateRemarks: Some sort of text description of what is meant by the startDate<br>
and endDate values<br>
<br>
Of course, nether of these date qualifiers is of much use in the semantic<br>
context. So what we may need some sort of controlled vocabulary for<br>
"dateRangeQualifier" or something, that indicates how to interpret a date<br>
range given only dateStart and dateEnd.<br>
<br>
Another solution would be something analagous to my approach for handling<br>
the second part of your question: spatial bounds.<br>
<br>
In my universe of datasets, I have lat/lon coordinates that fall into one of<br>
several types:<br>
<br>
- single point with uncertainty<br>
- two points representing a transect line (commonly used for plankton tows)<br>
- two points representing two corners of a bounding box<br>
- series of multiple points representing a non-straight line (e.g., a river,<br>
road, or a non-straight survey path)<br>
- series of multiple points representing a polygon<br>
<br>
In addition to the fact that there are 1...n points to represent the<br>
bounding of a place, there may be multiple re-interpretations of a point or<br>
set of points, when retroactively georeferencing locality data.<br>
<br>
So...the model I came up with looks something like this (using my same<br>
ASCII-art notation as before)<br>
<br>
Location--<CoordinateSet--<Coordinate<br>
|<br>
CoordinateSetType<br>
<br>
A Coordinate minimally consists of a decimalLatitute, decimalLongitude, and<br>
Sequence.<br>
<br>
CoordinateSetType is a controlled vocabulary that defines the five types<br>
listed above (point, transect, boundingBox, line, polygon).<br>
<br>
Each CoordinateSet consists of 1, 2, or >2 Coordinates, depending on the<br>
CoordinateSetType.<br>
<br>
Attached to "CoordinateSet" is all the MaNIS-style metadata for who/when/how<br>
the coordinate was derived.<br>
<br>
The reason for the 1:M Location:CoordinateSet is to allow for multiple<br>
interpretations of retroactively established coordinates (e.g., following<br>
the MaNIS protocol).<br>
<br>
Whether or not things like Datum and Uncertainty are attached to<br>
CoordinateSet or Coordinate depends on how much flexibility you need for<br>
capturing heterogenous Datum or Uncertainty values within a particular set<br>
(e.g., if certain nodes on a polygon are more precise than other nodes). I<br>
would defintiely put Datum on CoordinateSet, and probably also put<br>
uncertainty there as well (which assumes that the same datum applies to each<br>
point in a set, and also that uncertainty is consistent for each point in a<br>
set).<br>
<br>
The pretty-much covers all spacial bounding protocols, and it's not too<br>
difficult to derive a "point" coordinate from any of the other four, for<br>
purposes of "dumbing the data down" to fit into DwC. There are some<br>
problems, but they are not important enough to go into now.<br>
<br>
But the point is, you could conceivably handle dates using a similar<br>
structure -- but the cases where parsing the precise date information is<br>
imprtant are so few, that we probably don't need a semantic structure for<br>
it, and can simly capture it in a human-readible dateRemarks.<br>
<br>
Now....obviosuly this is all in data modelling space, not necessarily<br>
DwC-space. But I think it is useful to discuss how databases capture this<br>
kind of information at the source when trying to figure out how best to<br>
simplify it for aq content exchange protocol.<br>
<div class="im"><br>
> Also, 2.<br>
> if there was a consensus for moving to the `explicit token'<br>
> model, should the space-time bounds of the Occurrence still<br>
> be contained in an associated (often blank) Event, or<br>
> accepted as properties of the Occurrence itself (e.g.,<br>
> occurrenceDate, occurrenceDuration, occurrenceLocation,<br>
> occurrenceRadius)? I would support the latter.<br>
<br>
</div>I would support the former. I'm not sure I understand why Event is "often<br>
blank". If there is any space-time information, then Event is not blank.<br>
In the context of the data I manage, it makes much more sense (in a DwC<br>
context) to capture Events and Locations as distinct classes, than<br>
representing multiple tokens for the same Occurrence. Even if we don't<br>
establish an individual class, dwc:individualID within the Occurrence class<br>
allows us to deal with both the "same-organism-at-multiple-events"<br>
situation, and the "multiple-tokens-for-same-organism-at-same-event"<br>
situation.<br>
<div class="im"><br>
> Finally, 3. if there was a consensus for moving to the<br>
> `explicit token'<br>
> model, and a human observation was a token-less Occurrence,<br>
> would we best specify who made the observation with<br>
> dwc:recordedBy and what the observation was with<br>
> dwc:occurrenceRemarks, or would it be better to create a<br>
> second new token (along with `Physical specimens') that was<br>
> an explicit Observation class, that would link explicitly to,<br>
> say, an external observational ontology (i.e., OBOE)? The<br>
> issue of GUIDs for non-physical observations comes up, but<br>
> this could still be solved in various ways.<br>
<br>
</div>I would favor at the very least a "place-holder" or "implied" token for a<br>
human observation. It's functionally analagous to the situation where a<br>
photo was taken, but then accidentally destroyed or lost. The only<br>
difference between an image and a memory is that the image is generally more<br>
durable, and is more easily and precisely conveyed from person to person.<br>
<div class="im"><br>
> Stepping back from the details for a moment, and reading some<br>
> of the replies to Steve's post that have come in, I am<br>
> wondering how many readers are thinking, ``the need for a<br>
> semantic web standard for biodiversity information might be<br>
> better achieved by a deep fork of Darwin Core, adopting new<br>
> Classes and explicit domains and ranges for each term, to<br>
> create a `Darwin SW,' rather than by an effort to evolve<br>
> Darwin Core itself.'' I'm sure the question of forking<br>
> Darwin Core has come up before, and I'm sure the discussion<br>
> was passionate!<br>
<br>
</div>To the extent that I understand both DwC and the semantic web, this seems to<br>
me to be the most parsimonious approach.<br>
<br>
Aloha,<br>
Rich<br>
<div><div></div><div class="h5"><br>
<br>
_______________________________________________<br>
tdwg-content mailing list<br>
<a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>
<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
</div></div></blockquote></div><br>