provenance chains and DwC:recordedBy
All,
In preparation for the tdwg bioblitz, I'd like to configure our Spotter tool (http://spire.umbc.edu/spotter) to compose DwC records. Currently, it uses an observation ontology that we whipped up a few years ago. (Here's an illustrative record - http://spire.umbc.edu/spotter/observation/data.php?record=1534)
For the most part, the mapping is straightforward. However, I'm wondering about two terms: "hasObserver" and "hasReporter". We distinguished between these two terms to accommodate situations where a student makes an observation, but her teacher reports it. Similarly, in a bioblitz event, one model is that a survey team leader will fill out and submit a spreadsheet comprised of the observations made by members of the survey team.
Both these terms seem to map to DwC:recordedBy. According to http://rs.tdwg.org/dwc/terms/#recordedBy, "The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first." So if we simply listed observer followed by reporter, we would comply with the spec. Of course, the ordering would be lost in typical rdf representations, since triples are considered unordered. And whether in rdf or in text, the distinction between observer and reporter would be pretty much lost.
Since one of the goals of the bioblitz is figuring out good ways to use DwC in citizen science, I'm interested in opinions on whether we should preserve the observer/reporter distinction, and include these non-DwC terms in the bioblitz data profile.
Many thanks - Joel.
Hi,
If I understand correctly from http://rs.tdwg.org/dwc/terms/#recordedBy you would concatenate the observer and then the reporter in one single string that will be transferred in recordedBy. What should go first, if them observer or the reporter i dont know, in the case of Bioblitz I would say the observer (it is all about giving credit to people on the field no?). Of course this is a data loose transformation as the concatenated list will not tell you what is what.
I feel I am missing something...
Javier de la Torre www.vizzuality.com
On Aug 4, 2010, at 4:12 PM, joel sachs wrote:
All,
In preparation for the tdwg bioblitz, I'd like to configure our Spotter tool (http://spire.umbc.edu/spotter) to compose DwC records. Currently, it uses an observation ontology that we whipped up a few years ago. (Here's an illustrative record - http://spire.umbc.edu/spotter/observation/data.php?record=1534)
For the most part, the mapping is straightforward. However, I'm wondering about two terms: "hasObserver" and "hasReporter". We distinguished between these two terms to accommodate situations where a student makes an observation, but her teacher reports it. Similarly, in a bioblitz event, one model is that a survey team leader will fill out and submit a spreadsheet comprised of the observations made by members of the survey team.
Both these terms seem to map to DwC:recordedBy. According to http://rs.tdwg.org/dwc/terms/#recordedBy, "The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first." So if we simply listed observer followed by reporter, we would comply with the spec. Of course, the ordering would be lost in typical rdf representations, since triples are considered unordered. And whether in rdf or in text, the distinction between observer and reporter would be pretty much lost.
Since one of the goals of the bioblitz is figuring out good ways to use DwC in citizen science, I'm interested in opinions on whether we should preserve the observer/reporter distinction, and include these non-DwC terms in the bioblitz data profile.
Many thanks - Joel.
Javier,
Excatly - moving from our observation ontology to Darwin Core is lossy. In terms of the TDWG bioblitz, it would be nice to use TDWG standards. But it would also be nice to maintain the provenance of observations. I guess what I'm getting at is: What does it mean to be a Darwin Core record on the semantic web, where a single document often mixes terms from multiple vocabularies?
I'd like the answer to be: "Don't worry about it. Use DwC terms when necessary, but don't necessarily use DwC terms."
I think the obvious approach is to capture both the observer and reporter of the data where appropriate, and maintain this distinction in the data store. When we have to generate "pure" DwC records (for example, for harvesting by GBIF), we map to DwC (sometimes in a lossy manner). But is that obvious to everyone, or just to me?
Thanks again - Joel.
On Thu, 5 Aug 2010, Javier de la Torre wrote:
Hi,
If I understand correctly from http://rs.tdwg.org/dwc/terms/#recordedBy you would concatenate the observer and then the reporter in one single string that will be transferred in recordedBy. What should go first, if them observer or the reporter i dont know, in the case of Bioblitz I would say the observer (it is all about giving credit to people on the field no?). Of course this is a data loose transformation as the concatenated list will not tell you what is what.
I feel I am missing something...
Javier de la Torre www.vizzuality.com
On Aug 4, 2010, at 4:12 PM, joel sachs wrote:
All,
In preparation for the tdwg bioblitz, I'd like to configure our Spotter tool (http://spire.umbc.edu/spotter) to compose DwC records. Currently, it uses an observation ontology that we whipped up a few years ago. (Here's an illustrative record - http://spire.umbc.edu/spotter/observation/data.php?record=1534)
For the most part, the mapping is straightforward. However, I'm wondering about two terms: "hasObserver" and "hasReporter". We distinguished between these two terms to accommodate situations where a student makes an observation, but her teacher reports it. Similarly, in a bioblitz event, one model is that a survey team leader will fill out and submit a spreadsheet comprised of the observations made by members of the survey team.
Both these terms seem to map to DwC:recordedBy. According to http://rs.tdwg.org/dwc/terms/#recordedBy, "The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first." So if we simply listed observer followed by reporter, we would comply with the spec. Of course, the ordering would be lost in typical rdf representations, since triples are considered unordered. And whether in rdf or in text, the distinction between observer and reporter would be pretty much lost.
Since one of the goals of the bioblitz is figuring out good ways to use DwC in citizen science, I'm interested in opinions on whether we should preserve the observer/reporter distinction, and include these non-DwC terms in the bioblitz data profile.
Many thanks - Joel.
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
On Fri, Aug 6, 2010 at 8:16 AM, joel sachs jsachs@csee.umbc.edu wrote:
I think the obvious approach is to capture both the observer and reporter of the data where appropriate, and maintain this distinction in the data store. When we have to generate "pure" DwC records (for example, for harvesting by GBIF), we map to DwC (sometimes in a lossy manner). But is that obvious to everyone, or just to me?
Sounds pretty darn obvious to me. :)
You might want to capture the birthdays of your collectors' children, so you can send them nice gifts. Darwin Core is probably not going to support that in the near future.
In the work I'm doing, there is a) a set of data, and b) a number of representations of that data. The elements in each don't map one to one.
///ark Web Applications Developer Center for Applied Biodiversity Informatics California Academy of Sciences
(I simply cannot get used to mailing lists where replying to the list is not the default!)
+1
Javier www.vizzuality.com
On 06/08/2010, at 19:59, Mark Wilden mark@mwilden.com wrote:
On Fri, Aug 6, 2010 at 8:16 AM, joel sachs jsachs@csee.umbc.edu wrote:
I think the obvious approach is to capture both the observer and reporter of the data where appropriate, and maintain this distinction in the data store. When we have to generate "pure" DwC records (for example, for harvesting by GBIF), we map to DwC (sometimes in a lossy manner). But is that obvious to everyone, or just to me?
Sounds pretty darn obvious to me. :)
You might want to capture the birthdays of your collectors' children, so you can send them nice gifts. Darwin Core is probably not going to support that in the near future.
In the work I'm doing, there is a) a set of data, and b) a number of representations of that data. The elements in each don't map one to one.
///ark Web Applications Developer Center for Applied Biodiversity Informatics California Academy of Sciences
(I simply cannot get used to mailing lists where replying to the list is not the default!) _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
An alternative is to regard assertions of other roles as an annotation of the DwC record. This would let you assert other roles in any vocabulary you choose as long as you can signal what that vocabulary might be to any application that knows how to consume it. That should cover your recordedBy, as well as the children of the recordedBy agent as Mark requires :-)
A proposed charter for a TDWG Biodiversity Data Annotations Interest Group is nearing the end of its internal review, soon to emerge open for public membership and community activism.
On Fri, Aug 6, 2010 at 11:16 AM, joel sachs jsachs@csee.umbc.edu wrote:
Javier,
Excatly - moving from our observation ontology to Darwin Core is lossy. In terms of the TDWG bioblitz, it would be nice to use TDWG standards. But it would also be nice to maintain the provenance of observations. I guess what I'm getting at is: What does it mean to be a Darwin Core record on the semantic web, where a single document often mixes terms from multiple vocabularies?
I'd like the answer to be: "Don't worry about it. Use DwC terms when necessary, but don't necessarily use DwC terms."
I think the obvious approach is to capture both the observer and reporter of the data where appropriate, and maintain this distinction in the data store. When we have to generate "pure" DwC records (for example, for harvesting by GBIF), we map to DwC (sometimes in a lossy manner). But is that obvious to everyone, or just to me?
Thanks again - Joel.
On Thu, 5 Aug 2010, Javier de la Torre wrote:
Hi,
If I understand correctly from http://rs.tdwg.org/dwc/terms/#recordedBy you would concatenate the observer and then the reporter in one single string that will be transferred in recordedBy. What should go first, if them observer or the reporter i dont know, in the case of Bioblitz I would say the observer (it is all about giving credit to people on the field no?). Of course this is a data loose transformation as the concatenated list will not tell you what is what.
I feel I am missing something...
Javier de la Torre www.vizzuality.com
On Aug 4, 2010, at 4:12 PM, joel sachs wrote:
All,
In preparation for the tdwg bioblitz, I'd like to configure our Spotter tool (http://spire.umbc.edu/spotter) to compose DwC records. Currently, it uses an observation ontology that we whipped up a few years ago. (Here's an illustrative record - http://spire.umbc.edu/spotter/observation/data.php?record=1534)
For the most part, the mapping is straightforward. However, I'm wondering about two terms: "hasObserver" and "hasReporter". We distinguished between these two terms to accommodate situations where a student makes an observation, but her teacher reports it. Similarly, in a bioblitz event, one model is that a survey team leader will fill out and submit a spreadsheet comprised of the observations made by members of the survey team.
Both these terms seem to map to DwC:recordedBy. According to http://rs.tdwg.org/dwc/terms/#recordedBy, "The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first." So if we simply listed observer followed by reporter, we would comply with the spec. Of course, the ordering would be lost in typical rdf representations, since triples are considered unordered. And whether in rdf or in text, the distinction between observer and reporter would be pretty much lost.
Since one of the goals of the bioblitz is figuring out good ways to use DwC in citizen science, I'm interested in opinions on whether we should preserve the observer/reporter distinction, and include these non-DwC terms in the bioblitz data profile.
Many thanks - Joel.
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
participants (4)
-
Bob Morris
-
Javier de la Torre
-
joel sachs
-
Mark Wilden