Data capture software and Darwin Core
Hi folks,
Just starting to get into Darwin Core. Thanks for any help and suggestions you can provide.
I'm the builder of a data capture application (using PDAs and smart phones) called CyberTracker (http://www.cybertracker.org). We want to create a feature to export to Simple Darwin Core as XML.
We have two kinds of data: timer tracks and sightings. Timer tracks are automatically snapped at regular intervals and only contain a timestamp and location. Sightings are manually captured data and vary quite a bit. For example, they could represent the weather or a direct sighting of an animal.
It seems clear that a sighting maps directly to an "Event".
If we have a long list of timer track points, should these show up as many "Location" records?
Regards,
-Justin
Hi Justin,
These questions strike at the heart of some of what I think are the key unresolved aspects of DarwinCore - aspects that I hope will be the focus of some specific attention at the next TDWG meeting.
I will provide some answers from my own personal perspective.
In my mind, Location = Place (and all metadata associated with describing a place in three-dimensional space).
Event adds the fourth dimension of Time (i.e., Place + Time). Depending on who you talk to, Event may also include metadata related to "who" (which doesn't necessarily need to be a human - it might involve telemetry devices as well). And, assuming there is some sort of sampling activity associated with the Event, there may be some metadata related to that sampling and its methodology.
I'm not entirely sure what you mean by "snapped" for Timer tracks. Does "snapped" refer to capturing images or some other sort of other data capturing protocol? Or does "snapped" simply mean that you log a timestamp and Lat/Long coordinates? If the latter, then I would treat each node on the Timer track (i.e., each "snap") as representing an Event (Location + Time). If something other than the simple logging of Lat + Long + Time happens at each "snap", then that opens up another set of issues which I'd be happy to comment on.
Likewise, presuming that each Sighting comes with its own Lat/Long (Location) data, as well as a time, then these, too, would represent Events. But anything documented at those Events (e.g., sightings of an individual organism) would represent Occurrence instances. Non-biological documentations (such as weather) would probably best be represented as properties of the Event. In DwC you'd probably express those in dwc:fieldNotes or dwc:eventRemarks.
A different (and equally legitimate) interpretation of how to represent Timer tracks in DwC would be to represent the entire Track as a single Event, capturing the "Location" component as a sequence of Lat/Long points (effectively describing a linear path as a single location), or as a simple polygon (bounding box or point+radius), and the "time" component as a range from the time of capture for the first "snap" to the time of the last "snap".
My own personal approach (which extends beyond what DwC:Event class is currently set up to accommodate) would be to do both for your timer tracks. That is, represent one Event as the entire track, with the Location described either with an ordered array of Lat/Long points or as a bounding box or point+radius that describes the smallest rectangle or circle that encompasses all of the points, and range of min-max timestamps to represent the Time component of the Event. Then I would capture each "snap" on the track as a distinct Event (in our data model, we support hierarchical events, so the individual points would be referenced as "child" events of the "parent" Event representing the entire track).
I'm not sure if that helps or only confuses things; but perhaps after the next TDWG meeting we might have more clarity and/or consensus on these issues.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Justin Steventon Sent: Sunday, July 21, 2013 10:32 AM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Data capture software and Darwin Core
Hi folks,
Just starting to get into Darwin Core. Thanks for any help and suggestions you can provide.
I'm the builder of a data capture application (using PDAs and smart phones) called CyberTracker (http://www.cybertracker.org). We want to create a feature to export to Simple Darwin Core as XML.
We have two kinds of data: timer tracks and sightings. Timer tracks are automatically snapped at regular intervals and only contain a timestamp and location. Sightings are manually captured data and vary quite a bit. For example, they could represent the weather or a direct sighting of an animal.
It seems clear that a sighting maps directly to an "Event".
If we have a long list of timer track points, should these show up as many "Location" records?
Regards,
-Justin
Hi Rich,
Thank you, this certainly does clarify the intent.
Snapped was indeed referring to taking a GPS reading, date and time. Therefore it makes more sense as an Event rather than a Location. In the interim basisOfRecord (HumanObservation and MachineObservation) is a good way to distinguish these.
Aggregating the timer tracks into higher level structures is a longer term goal. Good luck at the next meeting. I'll continue to track changes as this evolves.
Regards,
-Justin
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Sunday, July 21, 2013 4:07 PM To: 'Justin Steventon'; tdwg-content@lists.tdwg.org Subject: RE: [tdwg-content] Data capture software and Darwin Core
Hi Justin,
These questions strike at the heart of some of what I think are the key unresolved aspects of DarwinCore - aspects that I hope will be the focus of some specific attention at the next TDWG meeting.
I will provide some answers from my own personal perspective.
In my mind, Location = Place (and all metadata associated with describing a place in three-dimensional space).
Event adds the fourth dimension of Time (i.e., Place + Time). Depending on who you talk to, Event may also include metadata related to "who" (which doesn't necessarily need to be a human - it might involve telemetry devices as well). And, assuming there is some sort of sampling activity associated with the Event, there may be some metadata related to that sampling and its methodology.
I'm not entirely sure what you mean by "snapped" for Timer tracks. Does "snapped" refer to capturing images or some other sort of other data capturing protocol? Or does "snapped" simply mean that you log a timestamp and Lat/Long coordinates? If the latter, then I would treat each node on the Timer track (i.e., each "snap") as representing an Event (Location + Time). If something other than the simple logging of Lat + Long + Time happens at each "snap", then that opens up another set of issues which I'd be happy to comment on.
Likewise, presuming that each Sighting comes with its own Lat/Long (Location) data, as well as a time, then these, too, would represent Events. But anything documented at those Events (e.g., sightings of an individual organism) would represent Occurrence instances. Non-biological documentations (such as weather) would probably best be represented as properties of the Event. In DwC you'd probably express those in dwc:fieldNotes or dwc:eventRemarks.
A different (and equally legitimate) interpretation of how to represent Timer tracks in DwC would be to represent the entire Track as a single Event, capturing the "Location" component as a sequence of Lat/Long points (effectively describing a linear path as a single location), or as a simple polygon (bounding box or point+radius), and the "time" component as a range from the time of capture for the first "snap" to the time of the last "snap".
My own personal approach (which extends beyond what DwC:Event class is currently set up to accommodate) would be to do both for your timer tracks. That is, represent one Event as the entire track, with the Location described either with an ordered array of Lat/Long points or as a bounding box or point+radius that describes the smallest rectangle or circle that encompasses all of the points, and range of min-max timestamps to represent the Time component of the Event. Then I would capture each "snap" on the track as a distinct Event (in our data model, we support hierarchical events, so the individual points would be referenced as "child" events of the "parent" Event representing the entire track).
I'm not sure if that helps or only confuses things; but perhaps after the next TDWG meeting we might have more clarity and/or consensus on these issues.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Justin Steventon Sent: Sunday, July 21, 2013 10:32 AM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Data capture software and Darwin Core
Hi folks,
Just starting to get into Darwin Core. Thanks for any help and suggestions you can provide.
I'm the builder of a data capture application (using PDAs and smart phones) called CyberTracker (http://www.cybertracker.org). We want to create a feature to export to Simple Darwin Core as XML.
We have two kinds of data: timer tracks and sightings. Timer tracks are automatically snapped at regular intervals and only contain a timestamp and location. Sightings are manually captured data and vary quite a bit. For example, they could represent the weather or a direct sighting of an animal.
It seems clear that a sighting maps directly to an "Event".
If we have a long list of timer track points, should these show up as many "Location" records?
Regards,
-Justin
Thanks, Justin. Keep in mind that the description I sent represents my own interpretation of these DwC class terms - not necessarily shared by others (and, thus, not necessarily representative of the true "intent" of the dwc terms).
Aloha,
Rich
From: Justin Steventon [mailto:justin@steventon.com] Sent: Sunday, July 21, 2013 7:11 PM To: 'Richard Pyle'; tdwg-content@lists.tdwg.org Subject: RE: [tdwg-content] Data capture software and Darwin Core
Hi Rich,
Thank you, this certainly does clarify the intent.
Snapped was indeed referring to taking a GPS reading, date and time. Therefore it makes more sense as an Event rather than a Location. In the interim basisOfRecord (HumanObservation and MachineObservation) is a good way to distinguish these.
Aggregating the timer tracks into higher level structures is a longer term goal. Good luck at the next meeting. I'll continue to track changes as this evolves.
Regards,
-Justin
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Sunday, July 21, 2013 4:07 PM To: 'Justin Steventon'; tdwg-content@lists.tdwg.org Subject: RE: [tdwg-content] Data capture software and Darwin Core
Hi Justin,
These questions strike at the heart of some of what I think are the key unresolved aspects of DarwinCore - aspects that I hope will be the focus of some specific attention at the next TDWG meeting.
I will provide some answers from my own personal perspective.
In my mind, Location = Place (and all metadata associated with describing a place in three-dimensional space).
Event adds the fourth dimension of Time (i.e., Place + Time). Depending on who you talk to, Event may also include metadata related to "who" (which doesn't necessarily need to be a human - it might involve telemetry devices as well). And, assuming there is some sort of sampling activity associated with the Event, there may be some metadata related to that sampling and its methodology.
I'm not entirely sure what you mean by "snapped" for Timer tracks. Does "snapped" refer to capturing images or some other sort of other data capturing protocol? Or does "snapped" simply mean that you log a timestamp and Lat/Long coordinates? If the latter, then I would treat each node on the Timer track (i.e., each "snap") as representing an Event (Location + Time). If something other than the simple logging of Lat + Long + Time happens at each "snap", then that opens up another set of issues which I'd be happy to comment on.
Likewise, presuming that each Sighting comes with its own Lat/Long (Location) data, as well as a time, then these, too, would represent Events. But anything documented at those Events (e.g., sightings of an individual organism) would represent Occurrence instances. Non-biological documentations (such as weather) would probably best be represented as properties of the Event. In DwC you'd probably express those in dwc:fieldNotes or dwc:eventRemarks.
A different (and equally legitimate) interpretation of how to represent Timer tracks in DwC would be to represent the entire Track as a single Event, capturing the "Location" component as a sequence of Lat/Long points (effectively describing a linear path as a single location), or as a simple polygon (bounding box or point+radius), and the "time" component as a range from the time of capture for the first "snap" to the time of the last "snap".
My own personal approach (which extends beyond what DwC:Event class is currently set up to accommodate) would be to do both for your timer tracks. That is, represent one Event as the entire track, with the Location described either with an ordered array of Lat/Long points or as a bounding box or point+radius that describes the smallest rectangle or circle that encompasses all of the points, and range of min-max timestamps to represent the Time component of the Event. Then I would capture each "snap" on the track as a distinct Event (in our data model, we support hierarchical events, so the individual points would be referenced as "child" events of the "parent" Event representing the entire track).
I'm not sure if that helps or only confuses things; but perhaps after the next TDWG meeting we might have more clarity and/or consensus on these issues.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Justin Steventon Sent: Sunday, July 21, 2013 10:32 AM To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Data capture software and Darwin Core
Hi folks,
Just starting to get into Darwin Core. Thanks for any help and suggestions you can provide.
I'm the builder of a data capture application (using PDAs and smart phones) called CyberTracker (http://www.cybertracker.org). We want to create a feature to export to Simple Darwin Core as XML.
We have two kinds of data: timer tracks and sightings. Timer tracks are automatically snapped at regular intervals and only contain a timestamp and location. Sightings are manually captured data and vary quite a bit. For example, they could represent the weather or a direct sighting of an animal.
It seems clear that a sighting maps directly to an "Event".
If we have a long list of timer track points, should these show up as many "Location" records?
Regards,
-Justin
Hi Justin,
when mapping your data to simple darwin core you do not need to think about classes, its a flat, single record. Using basisOfRecord to distinguish between the two kind of records is exactly what this dwc term is for.
When you say sightings could represent the weather though I am not sure if you really want to publish all your records as darwin core. There should be a species (observation) of some sort involved to make up a simple darwin core record.
best, Markus
PS: Did you consider to use the newer darwin core archive format instead of XML? It would also allow you to bundle a dataset metadata file (e.g. EML) that can be used to describe the different methods used to generate the data
http://rs.tdwg.org/dwc/terms/guides/text/ http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-sta...
-- Markus Döring Senior Developer GBIF Secretariat mdoering@gbif.org
On 22.07.2013, at 07:10, Justin Steventon wrote:
And yet, if you want to publish structured weather information along with the occurrence information, you can do so with key:value pairs, or even a JSON string, in the term called dynamicProperties, even within Simple Darwin Core (see http://rs.tdwg.org/dwc/terms/index.htm#dynamicProperties and http://rs.tdwg.org/dwc/terms/simple/index.htm#domore).
Just want to confirm my perception of Event, which, as Rich says, adds time to the Location, but also adds information associated with the methods (samplingProtocol, samplingEffort, fieldNotes). The distinction between eventRemarks and fieldNotes may seem a little tenuous, but the intention is to have fieldNotes be the as-close-to verbatim documentation actually taken in the field - ideally a URL to a digital version of the document.
On Mon, Jul 22, 2013 at 5:26 AM, "Markus Döring (GBIF)" mdoering@gbif.orgwrote:
Hi John, Markus & Rich,
I’m adding Louis, who has expressed interest in this topic.
I did look at the CSV+XML option, but concluded that it was a bit strange to have two different text formats. I do see that it is sufficient to have just the CSV as long as the first line header is properly specified. Adding JSON into the dynamicProperties is good for portability (and we may well use this), but it means readers are more complex.
My feeling is that I need to think more deeply about the problem space. For many years, we’ve focused on providing an easy way for people to get data from the field. This means software that places few restrictions on defining the semantics of the data: users enter a bunch of fields and off they go. They are free to separate contextual and observational data into distinct records if that’s what makes sense to them. However, we must then have a consistent way of inferring the meaning later. In some cases this is easy, but in others not so much.
Ultimately I’m trying to avoid a situation where users have to design their field user interface to be Darwin Core friendly.
Regards,
-Justin
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Monday, July 22, 2013 4:19 AM To: Markus Döring (GBIF) Cc: Justin Steventon; TDWG Content Mailing List Subject: Re: [tdwg-content] Data capture software and Darwin Core
And yet, if you want to publish structured weather information along with the occurrence information, you can do so with key:value pairs, or even a JSON string, in the term called dynamicProperties, even within Simple Darwin Core (see http://rs.tdwg.org/dwc/terms/index.htm#dynamicProperties and http://rs.tdwg.org/dwc/terms/simple/index.htm#domore).
Just want to confirm my perception of Event, which, as Rich says, adds time to the Location, but also adds information associated with the methods (samplingProtocol, samplingEffort, fieldNotes). The distinction between eventRemarks and fieldNotes may seem a little tenuous, but the intention is to have fieldNotes be the as-close-to verbatim documentation actually taken in the field - ideally a URL to a digital version of the document.
On Mon, Jul 22, 2013 at 5:26 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Hi Justin,
when mapping your data to simple darwin core you do not need to think about classes, its a flat, single record. Using basisOfRecord to distinguish between the two kind of records is exactly what this dwc term is for.
When you say sightings could represent the weather though I am not sure if you really want to publish all your records as darwin core. There should be a species (observation) of some sort involved to make up a simple darwin core record.
best, Markus
PS: Did you consider to use the newer darwin core archive format instead of XML? It would also allow you to bundle a dataset metadata file (e.g. EML) that can be used to describe the different methods used to generate the data
http://rs.tdwg.org/dwc/terms/guides/text/ http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-sta ndards/darwin-core-archives/
-- Markus Döring Senior Developer GBIF Secretariat mdoering@gbif.org
On 22.07.2013, at 07:10, Justin Steventon wrote:
Therefore it makes more sense as an Event rather than a Location. In the interim basisOfRecord (HumanObservation and MachineObservation) is a good way to distinguish these.
Aggregating the timer tracks into higher level structures is a longer term
goal. Good luck at the next meeting. I’ll continue to track changes as this evolves.
unresolved aspects of DarwinCore – aspects that I hope will be the focus of some specific attention at the next TDWG meeting.
I will provide some answers from my own personal perspective.
In my mind, Location = Place (and all metadata associated with describing
a place in three-dimensional space).
Event adds the fourth dimension of Time (i.e., Place + Time). Depending
on who you talk to, Event may also include metadata related to “who” (which doesn’t necessarily need to be a human – it might involve telemetry devices as well). And, assuming there is some sort of sampling activity associated with the Event, there may be some metadata related to that sampling and its methodology.
I’m not entirely sure what you mean by “snapped” for Timer tracks. Does
“snapped” refer to capturing images or some other sort of other data capturing protocol? Or does “snapped” simply mean that you log a timestamp and Lat/Long coordinates? If the latter, then I would treat each node on the Timer track (i.e., each “snap”) as representing an Event (Location + Time). If something other than the simple logging of Lat + Long + Time happens at each “snap”, then that opens up another set of issues which I’d be happy to comment on.
Likewise, presuming that each Sighting comes with its own Lat/Long
(Location) data, as well as a time, then these, too, would represent Events. But anything documented at those Events (e.g., sightings of an individual organism) would represent Occurrence instances. Non-biological documentations (such as weather) would probably best be represented as properties of the Event. In DwC you’d probably express those in dwc:fieldNotes or dwc:eventRemarks.
A different (and equally legitimate) interpretation of how to represent
Timer tracks in DwC would be to represent the entire Track as a single Event, capturing the “Location” component as a sequence of Lat/Long points (effectively describing a linear path as a single location), or as a simple polygon (bounding box or point+radius), and the “time” component as a range from the time of capture for the first “snap” to the time of the last “snap”.
My own personal approach (which extends beyond what DwC:Event class is
currently set up to accommodate) would be to do both for your timer tracks. That is, represent one Event as the entire track, with the Location described either with an ordered array of Lat/Long points or as a bounding box or point+radius that describes the smallest rectangle or circle that encompasses all of the points, and range of min-max timestamps to represent the Time component of the Event. Then I would capture each “snap” on the track as a distinct Event (in our data model, we support hierarchical events, so the individual points would be referenced as “child” events of the “parent” Event representing the entire track).
I’m not sure if that helps or only confuses things; but perhaps after the
next TDWG meeting we might have more clarity and/or consensus on these issues.
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Justin Steventon
you can provide.
I’m the builder of a data capture application (using PDAs and smart
phones) called CyberTracker (http://www.cybertracker.org). We want to create a feature to export to Simple Darwin Core as XML.
We have two kinds of data: timer tracks and sightings. Timer tracks are
automatically snapped at regular intervals and only contain a timestamp and location. Sightings are manually captured data and vary quite a bit. For example, they could represent the weather or a direct sighting of an animal.
It seems clear that a sighting maps directly to an “Event”.
If we have a long list of timer track points, should these show up as many
“Location” records?
Regards, -Justin
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
participants (4)
-
"Markus Döring (GBIF)"
-
John Wieczorek
-
Justin Steventon
-
Richard Pyle