[tdwg-content] Data capture software and Darwin Core
justin at steventon.com
Tue Jul 23 05:02:33 CEST 2013
Hi John, Markus & Rich,
I’m adding Louis, who has expressed interest in this topic.
I did look at the CSV+XML option, but concluded that it was a bit strange to
have two different text formats. I do see that it is sufficient to have just
the CSV as long as the first line header is properly specified. Adding JSON
into the dynamicProperties is good for portability (and we may well use
this), but it means readers are more complex.
My feeling is that I need to think more deeply about the problem space. For
many years, we’ve focused on providing an easy way for people to get data
from the field. This means software that places few restrictions on defining
the semantics of the data: users enter a bunch of fields and off they go.
They are free to separate contextual and observational data into distinct
records if that’s what makes sense to them. However, we must then have a
consistent way of inferring the meaning later. In some cases this is easy,
but in others not so much.
Ultimately I’m trying to avoid a situation where users have to design their
field user interface to be Darwin Core friendly.
From: gtuco.btuco at gmail.com [mailto:gtuco.btuco at gmail.com] On Behalf Of John
Sent: Monday, July 22, 2013 4:19 AM
To: Markus Döring (GBIF)
Cc: Justin Steventon; TDWG Content Mailing List
Subject: Re: [tdwg-content] Data capture software and Darwin Core
And yet, if you want to publish structured weather information along with
the occurrence information, you can do so with key:value pairs, or even a
JSON string, in the term called dynamicProperties, even within Simple Darwin
Core (see http://rs.tdwg.org/dwc/terms/index.htm#dynamicProperties and
Just want to confirm my perception of Event, which, as Rich says, adds time
to the Location, but also adds information associated with the methods
(samplingProtocol, samplingEffort, fieldNotes). The distinction between
eventRemarks and fieldNotes may seem a little tenuous, but the intention is
to have fieldNotes be the as-close-to verbatim documentation actually taken
in the field - ideally a URL to a digital version of the document.
On Mon, Jul 22, 2013 at 5:26 AM, "Markus Döring (GBIF)" <mdoering at gbif.org>
when mapping your data to simple darwin core you do not need to think about
classes, its a flat, single record.
Using basisOfRecord to distinguish between the two kind of records is
exactly what this dwc term is for.
When you say sightings could represent the weather though I am not sure if
you really want to publish all your records as darwin core. There should be
a species (observation) of some sort involved to make up a simple darwin
PS: Did you consider to use the newer darwin core archive format instead of
It would also allow you to bundle a dataset metadata file (e.g. EML) that
can be used to describe the different methods used to generate the data
mdoering at gbif.org
On 22.07.2013, at 07:10, Justin Steventon wrote:
> Hi Rich,
> Thank you, this certainly does clarify the intent.
> Snapped was indeed referring to taking a GPS reading, date and time.
Therefore it makes more sense as an Event rather than a Location. In the
interim basisOfRecord (HumanObservation and MachineObservation) is a good
way to distinguish these.
> Aggregating the timer tracks into higher level structures is a longer term
goal. Good luck at the next meeting. I’ll continue to track changes as this
> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
> Sent: Sunday, July 21, 2013 4:07 PM
> To: 'Justin Steventon'; tdwg-content at lists.tdwg.org
> Subject: RE: [tdwg-content] Data capture software and Darwin Core
> Hi Justin,
> These questions strike at the heart of some of what I think are the key
unresolved aspects of DarwinCore – aspects that I hope will be the focus of
some specific attention at the next TDWG meeting.
> I will provide some answers from my own personal perspective.
> In my mind, Location = Place (and all metadata associated with describing
a place in three-dimensional space).
> Event adds the fourth dimension of Time (i.e., Place + Time). Depending
on who you talk to, Event may also include metadata related to “who” (which
doesn’t necessarily need to be a human – it might involve telemetry devices
as well). And, assuming there is some sort of sampling activity associated
with the Event, there may be some metadata related to that sampling and its
> I’m not entirely sure what you mean by “snapped” for Timer tracks. Does
“snapped” refer to capturing images or some other sort of other data
capturing protocol? Or does “snapped” simply mean that you log a timestamp
and Lat/Long coordinates? If the latter, then I would treat each node on
the Timer track (i.e., each “snap”) as representing an Event (Location +
Time). If something other than the simple logging of Lat + Long + Time
happens at each “snap”, then that opens up another set of issues which I’d
be happy to comment on.
> Likewise, presuming that each Sighting comes with its own Lat/Long
(Location) data, as well as a time, then these, too, would represent Events.
But anything documented at those Events (e.g., sightings of an individual
organism) would represent Occurrence instances. Non-biological
documentations (such as weather) would probably best be represented as
properties of the Event. In DwC you’d probably express those in
dwc:fieldNotes or dwc:eventRemarks.
> A different (and equally legitimate) interpretation of how to represent
Timer tracks in DwC would be to represent the entire Track as a single
Event, capturing the “Location” component as a sequence of Lat/Long points
(effectively describing a linear path as a single location), or as a simple
polygon (bounding box or point+radius), and the “time” component as a range
from the time of capture for the first “snap” to the time of the last
> My own personal approach (which extends beyond what DwC:Event class is
currently set up to accommodate) would be to do both for your timer tracks.
That is, represent one Event as the entire track, with the Location
described either with an ordered array of Lat/Long points or as a bounding
box or point+radius that describes the smallest rectangle or circle that
encompasses all of the points, and range of min-max timestamps to represent
the Time component of the Event. Then I would capture each “snap” on the
track as a distinct Event (in our data model, we support hierarchical
events, so the individual points would be referenced as “child” events of
the “parent” Event representing the entire track).
> I’m not sure if that helps or only confuses things; but perhaps after the
next TDWG meeting we might have more clarity and/or consensus on these
> From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Justin Steventon
> Sent: Sunday, July 21, 2013 10:32 AM
> To: tdwg-content at lists.tdwg.org
> Subject: [tdwg-content] Data capture software and Darwin Core
> Hi folks,
> Just starting to get into Darwin Core. Thanks for any help and suggestions
you can provide.
> I’m the builder of a data capture application (using PDAs and smart
phones) called CyberTracker (http://www.cybertracker.org). We want to create
a feature to export to Simple Darwin Core as XML.
> We have two kinds of data: timer tracks and sightings. Timer tracks are
automatically snapped at regular intervals and only contain a timestamp and
location. Sightings are manually captured data and vary quite a bit. For
example, they could represent the weather or a direct sighting of an animal.
> It seems clear that a sighting maps directly to an “Event”.
> If we have a long list of timer track points, should these show up as many
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
tdwg-content mailing list
tdwg-content at lists.tdwg.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tdwg-content