Darwin Core: proposed news terms for expressing sample data
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an Event with associated taxon occurrences in an Occurrence extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document Publishing sample data using the GBIF IPT [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494; Fax: +45 3532 1480
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
*quantity*: the number or enumeration value of the quantityType
(e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
*quantityType*: : the entity being referred to by quantity,
e.g., individuals, biomass, %species, scale type.
*samplingGeometry*: an indication of what kind of space was
sampled; select from point, line, area or volume.
*samplingUnit*: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
*eventSeriesID*: an identifier for a set of events that are
associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
*Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org eotuama@gbif.org), *
*Senior Programme Officer for Interoperability, *
*Global Biodiversity Information Facility Secretariat, *
*Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK*
*Phone: +45 3532 1494 <%2B45%203532%201494>; Fax: +45 3532 1480 <%2B45%203532%201480>*
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494 tel:%2B45%203532%201494 ; Fax: +45 3532 1480 tel:%2B45%203532%201480
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
**Event core**
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn't stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a "specific IPT version" to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic "core + extension" format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
**New terms around abundance**
Yes, the discussion on TDWG did fade out but it was clear that the term "abundance" as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
*From:*robgur@gmail.com [mailto:robgur@gmail.com] *On Behalf Of *Robert Guralnick *Sent:* 19 August 2014 16:56 *To:* Éamonn Ó Tuama [GBIF] *Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One
is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] <eotuama@gbif.org mailto:eotuama@gbif.org> wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an "Event" with associated taxon occurrences in an "Occurrence" extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document "Publishing sample data using the GBIF IPT" [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1.*quantity*: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2.*quantityType*: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3.*samplingGeometry*: an indication of what kind of space was sampled; select from point, line, area or volume.
4.*samplingUnit*: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5.*eventSeriesID*: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
/Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org mailto:eotuama@gbif.org), /
/Senior Programme Officer for Interoperability, /
/Global Biodiversity Information Facility Secretariat, /
/Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/
/Phone: +45 3532 1494 tel:%2B45%203532%201494; Fax: +45 3532 1480 tel:%2B45%203532%201480/
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Same here Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesnt stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a specific IPT version to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic core + extension format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term abundance as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an Event with associated taxon occurrences in an Occurrence extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document Publishing sample data using the GBIF IPT [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494 tel:%2B45%203532%201494 ; Fax: +45 3532 1480 tel:%2B45%203532%201480
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
1) I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
2) There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
3) Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Same here – Events are central to the work that we do.
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Anne Thessen *Sent:* Wednesday, August 20, 2014 2:59 AM *To:* tdwg-content@lists.tdwg.org
*Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
**Event core**
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
**New terms around abundance**
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
*From:* robgur@gmail.com [mailto:robgur@gmail.com robgur@gmail.com] *On Behalf Of *Robert Guralnick *Sent:* 19 August 2014 16:56 *To:* Éamonn Ó Tuama [GBIF] *Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is
the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
*quantity*: the number or enumeration value of the quantityType
(e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
*quantityType*: : the entity being referred to by quantity,
e.g., individuals, biomass, %species, scale type.
*samplingGeometry*: an indication of what kind of space was
sampled; select from point, line, area or volume.
*samplingUnit*: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
*eventSeriesID*: an identifier for a set of events that are
associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
*Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org eotuama@gbif.org), *
*Senior Programme Officer for Interoperability, *
*Global Biodiversity Information Facility Secretariat, *
*Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK*
*Phone: +45 3532 1494 <%2B45%203532%201494>; Fax: +45 3532 1480 <%2B45%203532%201480>*
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
- I am surprised that you plan to use of MeasurementorFact
extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
- There may be a subtle issue here extending "Event" to be more what
you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
- Is Sampling Unit a controlled vocabulary? For another project, I
have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Same here – Events are central to the work that we do. Aloha, Rich *From:*tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] *On Behalf Of *Anne Thessen *Sent:* Wednesday, August 20, 2014 2:59 AM *To:* tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote: Hi Rob, Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration. **Event core** As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model. I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace. **New terms around abundance** Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down. As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets. Best regards, Éamonn *From:*robgur@gmail.com <mailto:robgur@gmail.com> [mailto:robgur@gmail.com] *On Behalf Of *Robert Guralnick *Sent:* 19 August 2014 16:56 *To:* Éamonn Ó Tuama [GBIF] *Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper: "Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type." That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it. So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue? Best, Rob On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] <eotuama@gbif.org <mailto:eotuama@gbif.org>> wrote: Dear All, GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension. The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4]. As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are: 1.*quantity*: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample. 2.*quantityType*: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type. 3.*samplingGeometry*: an indication of what kind of space was sampled; select from point, line, area or volume. 4.*samplingUnit*: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2. 5.*eventSeriesID*: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series. Best regards, Éamonn [1] http://eubon.eu [2] http://eubon-ipt.gbif.org [3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640 [4] http://links.gbif.org/sample_data_model [5] https://code.google.com/p/darwincore/issues/list ____________________________________________________ /Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org <mailto:eotuama@gbif.org>), / /Senior Programme Officer for Interoperability, / /Global Biodiversity Information Facility Secretariat, / /Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/ /Phone: +45 3532 1494 <tel:%2B45%203532%201494>; Fax: +45 3532 1480 <tel:%2B45%203532%201480>/ _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content -- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185 <tel:443.225.9185> _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case? Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote:
Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
- I am surprised that you plan to use of MeasurementorFact extension
in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
- There may be a subtle issue here extending "Event" to be more what
you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
- Is Sampling Unit a controlled vocabulary? For another project, I
have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle <deepreef@bishopmuseum.org
wrote:
Same here – Events are central to the work that we do.
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Anne Thessen *Sent:* Wednesday, August 20, 2014 2:59 AM *To:* tdwg-content@lists.tdwg.org
*Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
**Event core**
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
**New terms around abundance**
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
*From:* robgur@gmail.com [mailto:robgur@gmail.com robgur@gmail.com] *On Behalf Of *Robert Guralnick *Sent:* 19 August 2014 16:56 *To:* Éamonn Ó Tuama [GBIF] *Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is
the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
*quantity*: the number or enumeration value of the quantityType
(e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
*quantityType*: : the entity being referred to by quantity,
e.g., individuals, biomass, %species, scale type.
*samplingGeometry*: an indication of what kind of space was
sampled; select from point, line, area or volume.
*samplingUnit*: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
*eventSeriesID*: an identifier for a set of events that are
associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
*Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org eotuama@gbif.org), *
*Senior Programme Officer for Interoperability, *
*Global Biodiversity Information Facility Secretariat, *
*Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK*
*Phone: +45 3532 1494 <%2B45%203532%201494>; Fax: +45 3532 1480 <%2B45%203532%201480>*
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar443.225.9185
Rob,
this proposal if for monitoring surveys really, not to be confused with material samples like environmental or tissue samples which have a distinct new dwc class MaterialSample.
We tend to overload the term sampling a lot and it helps treating material samples different from pure observational "sampling". That is why the existing Event class was used as the core and classic Occurrence records as extensions. A classic example is a vegetation survey where each plot represents an Event record and each recorded species in that plot will be an Occurrence extension record with a given quantity. Darwin Core already offers individualCount to specify quantity, but it is a very specific way of measuring "abundance" restricted to only some use cases. Abiotic measurements about the plot (e.g. soil type, pH, temperature) can be published using the measurements or facts extension linked to the Event core.
Markus
On 20 Aug 2014, at 20:08, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case? Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote: Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote: Same here – Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494; Fax: +45 3532 1480
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Some time ago I tried to write an inventory of all the different ways how occurrences are published with dwc archives. It includes the Occurrence, Taxon, Event and MaterialSample cores with various extensions: https://github.com/mdoering/dwca-examples/blob/master/README.md
It might need a little update now, but hopefully it is still useful to get a complete overview.
Markus
On 20 Aug 2014, at 23:46, Markus Döring mdoering@gbif.org wrote:
Rob,
this proposal if for monitoring surveys really, not to be confused with material samples like environmental or tissue samples which have a distinct new dwc class MaterialSample.
We tend to overload the term sampling a lot and it helps treating material samples different from pure observational "sampling". That is why the existing Event class was used as the core and classic Occurrence records as extensions. A classic example is a vegetation survey where each plot represents an Event record and each recorded species in that plot will be an Occurrence extension record with a given quantity. Darwin Core already offers individualCount to specify quantity, but it is a very specific way of measuring "abundance" restricted to only some use cases. Abiotic measurements about the plot (e.g. soil type, pH, temperature) can be published using the measurements or facts extension linked to the Event core.
Markus
On 20 Aug 2014, at 20:08, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case? Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote: Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote: Same here – Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494; Fax: +45 3532 1480
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rob, Anne, Rich,
I think Markus has answered your question as to why we opted for an Event core which is being used in the sense described by Anne and Rich. For any event, you can have a list of species in an Occurrence extension and for each species, you can include quantity and quantityType, e.g., biomass, etc. The proposed term eventSeriesID was intended for linking together related events, although it now looks like parentEventID might be a better, more flexible term. The measurementOrFact extension is a good fit for capturing environmental information relating to an event. See, e.g., the Gialova Lagoon brackish water invertebrate test data set [1] where a set of 18 environmental variables, including temp, pH, Rdx, particulate organic matter, dissolved oxygen, salinity, chlorophyll-a were measured for each sampling station-sampling period combination. An example mapping is:
Id measurementType measurementValue measurementUnit measurementRemarks
IA Tmp (sed) 21.5 degree C Tmp (sed): temperature at the bottom surface
*Controlled vocabularies*
Ideally, the values for samplingUnit and quantityType would be selected from controlled vocabularies. This is, effectively, what we do by presenting a small list of values in a drop-down menu. The current values are what we derived for example data sets and discussion but they can undoubtedly be extended and improved.
We capture bucket type measures through a combination of samplingEffort, samplingGeometry and samplingUnit. For example, a pitfall trap (in a point location) left out for 16 days might have samplingEffort: 16, samplingGeometry: point and samplingUnit: day. Three m^2 quadrats in a shore survey might have samplingEffort: 3, samplingGeometry: area and samplingUnit: m^2.
It would be very useful to see your compilation of scope, effort and completeness measures to see if we can express them in our model and/or if we need to reconsider our approach.
Éamonn
[1] http://eubon-ipt.gbif.org/resource.do?r=ionian-brackish-lagoon
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: 20 August 2014 23:47 To: Robert Guralnick Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Rob,
this proposal if for monitoring surveys really, not to be confused with material samples like environmental or tissue samples which have a distinct new dwc class MaterialSample.
We tend to overload the term sampling a lot and it helps treating material samples different from pure observational "sampling". That is why the existing Event class was used as the core and classic Occurrence records as extensions. A classic example is a vegetation survey where each plot represents an Event record and each recorded species in that plot will be an Occurrence extension record with a given quantity. Darwin Core already offers individualCount to specify quantity, but it is a very specific way of measuring "abundance" restricted to only some use cases. Abiotic measurements about the plot (e.g. soil type, pH, temperature) can be published using the measurements or facts extension linked to the Event core.
Markus
On 20 Aug 2014, at 20:08, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case?
Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote:
Hi Rob I would like to respond to your item number 2.
From my perspective, I deal with lots of published descriptions of taxa. The
text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
1) I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
2) There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
3) Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Same here Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesnt stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a specific IPT version to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic core + extension format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term abundance as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an Event with associated taxon occurrences in an Occurrence extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document Publishing sample data using the GBIF IPT [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu http://eubon.eu/
[2] http://eubon-ipt.gbif.org http://eubon-ipt.gbif.org/
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494 tel:%2B45%203532%201494 ; Fax: +45 3532 1480 tel:%2B45%203532%201480
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This proposal is treading on ground that is quite similar to other observations and measurements standards for data exchange that are already mature, in particular:
* OGC Observations and Measurements ( http://www.opengeospatial.org/standards/om) * Extensible Observation Ontology (OBOE; https://semtools.ecoinformatics.org/oboe)
The former is a standard and broadly deployed, whereas the latter is part of a research program in the use of ontologies for measurements. Through collaboration between the two projects, they've been modified to be reasonably isomorphic, but O&M uses an XML serialization while OBOE uses an OWL-DL serialization. They largely express the same measurements and sampling model once one gets beyond the terminology differences.
So, I'm wondering if it make much sense to extend Darwin Core, which is at heart an Occurrence exchange syntax, into this measurements area that is well represented by these other existing specifications? I'm curious to hear why people would even want to do this. And if we do go down this path, won't we just end up with a new syntax that does essentially what O&M and OBOE do now?
Matt
On Thu, Aug 21, 2014 at 12:22 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Hi Rob, Anne, Rich,
I think Markus has answered your question as to why we opted for an Event core which is being used in the sense described by Anne and Rich. For any event, you can have a list of species in an Occurrence extension and for each species, you can include quantity and quantityType, e.g., biomass, etc. The proposed term eventSeriesID was intended for linking together related events, although it now looks like parentEventID might be a better, more flexible term. The measurementOrFact extension is a good fit for capturing environmental information relating to an event. See, e.g., the Gialova Lagoon brackish water invertebrate test data set [1] where a set of 18 environmental variables, including temp, pH, Rdx, particulate organic matter, dissolved oxygen, salinity, chlorophyll-a were measured for each sampling station-sampling period combination. An example mapping is:
Id measurementType measurementValue measurementUnit measurementRemarks
IA Tmp (sed) 21.5 degree C Tmp (sed): temperature at the bottom surface
**Controlled vocabularies**
Ideally, the values for samplingUnit and quantityType would be selected from controlled vocabularies. This is, effectively, what we do by presenting a small list of values in a drop-down menu. The current values are what we derived for example data sets and discussion but they can undoubtedly be extended and improved.
We capture “bucket” type measures through a combination of samplingEffort, samplingGeometry and samplingUnit. For example, a pitfall trap (in a point location) left out for 16 days might have samplingEffort: 16, samplingGeometry: point and samplingUnit: day. Three m^2 quadrats in a shore survey might have samplingEffort: 3, samplingGeometry: area and samplingUnit: m^2.
It would be very useful to see your compilation of scope, effort and completeness measures to see if we can express them in our model and/or if we need to reconsider our approach.
Éamonn
[1] http://eubon-ipt.gbif.org/resource.do?r=ionian-brackish-lagoon
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Markus Döring *Sent:* 20 August 2014 23:47 *To:* Robert Guralnick
*Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Rob,
this proposal if for monitoring surveys really, not to be confused with material samples like environmental or tissue samples which have a distinct new dwc class MaterialSample.
We tend to overload the term sampling a lot and it helps treating material samples different from pure observational "sampling". That is why the existing Event class was used as the core and classic Occurrence records as extensions. A classic example is a vegetation survey where each plot represents an Event record and each recorded species in that plot will be an Occurrence extension record with a given quantity. Darwin Core already offers individualCount to specify quantity, but it is a very specific way of measuring "abundance" restricted to only some use cases. Abiotic measurements about the plot (e.g. soil type, pH, temperature) can be published using the measurements or facts extension linked to the Event core.
Markus
On 20 Aug 2014, at 20:08, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case?
Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote:
Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
- I am surprised that you plan to use of MeasurementorFact extension in
relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
- There may be a subtle issue here extending "Event" to be more what you
call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
- Is Sampling Unit a controlled vocabulary? For another project, I have
looked through - and captured scope, effort and completeness measures from
- a large number of published biotic area inventories. The vast majorities
of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Same here – Events are central to the work that we do.
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Anne Thessen *Sent:* Wednesday, August 20, 2014 2:59 AM *To:* tdwg-content@lists.tdwg.org
*Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
**Event core**
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
**New terms around abundance**
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
*From:* robgur@gmail.com [mailto:robgur@gmail.com robgur@gmail.com] *On Behalf Of *Robert Guralnick *Sent:* 19 August 2014 16:56 *To:* Éamonn Ó Tuama [GBIF] *Cc:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is
the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
*quantity*: the number or enumeration value of the quantityType
(e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
*quantityType*: : the entity being referred to by quantity,
e.g., individuals, biomass, %species, scale type.
*samplingGeometry*: an indication of what kind of space was
sampled; select from point, line, area or volume.
*samplingUnit*: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
*eventSeriesID*: an identifier for a set of events that are
associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
*Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org eotuama@gbif.org), *
*Senior Programme Officer for Interoperability, *
*Global Biodiversity Information Facility Secretariat, *
*Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK*
*Phone: +45 3532 1494 <%2B45%203532%201494>; Fax: +45 3532 1480 <%2B45%203532%201480>*
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Matt,
I’ll take the chance to make a few quick comments here, because I believe that this work is of massive importance.
Clearly DwC should avoid trying to duplicate well-standardised models and protocols. However at the same time, there is enormous value for producers and consumers of DwC to benefit from richer data on the events and methods associated with individual species occurrences. I have never seen DwC as purely an Occurrence exchange syntax. I see it (from GBIF’s standpoint) more closely as a mechanism for diverse parties to pool the evidence they have for the occurrence of any species including associated information and/or actionable links to associated information. Users coming from this perspective certainly need (and are demanding) access to all the evidence that can be mobilized to serve as supporting evidence and they also need the ability to understand the significance of these records. Abundance measures, levels of effort, use of consistent methods and redetection of individual organisms are all part of this. DwC should be able to transmit as much data as publishers choose to share on such aspects as part of their publishing of DwC. Users of DwC carrying out species modeling, threat assessment or community analyses will benefit from rapid ways to filter data for those which derive from standardized sampling events, to understand relative abundance within samples, etc. Many publishers of DwC are currently sharing stripped-down subsets of data and wish to give more information on these points. Users are certainly demanding it.
The challenge is finding the sweet spot, the achievable, non-destructive overlap between DwC and the proper domain of models better designed to handle the representation of complex systems outside DwC’s current domain. If this is done correctly, there should be paths that enable us to generate O&E (and maybe OBOE) compatible data from data that publishers only serve as augmented DwC.
I’ll also note that this has been a prominent area of discussion now for several years. Many of us believe strongly that this is one of the most important ways in which we need to close arbitrary gaps between data silos. It’s a prominent part of the GBIF work programme for 2014-2016.
Very best wishes,
Donald
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Matt Jones Sent: Friday, August 22, 2014 4:52 AM To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
This proposal is treading on ground that is quite similar to other observations and measurements standards for data exchange that are already mature, in particular:
* OGC Observations and Measurements (http://www.opengeospatial.org/standards/om)
* Extensible Observation Ontology (OBOE; https://semtools.ecoinformatics.org/oboe)
The former is a standard and broadly deployed, whereas the latter is part of a research program in the use of ontologies for measurements. Through collaboration between the two projects, they've been modified to be reasonably isomorphic, but O&M uses an XML serialization while OBOE uses an OWL-DL serialization. They largely express the same measurements and sampling model once one gets beyond the terminology differences.
So, I'm wondering if it make much sense to extend Darwin Core, which is at heart an Occurrence exchange syntax, into this measurements area that is well represented by these other existing specifications? I'm curious to hear why people would even want to do this. And if we do go down this path, won't we just end up with a new syntax that does essentially what O&M and OBOE do now?
Matt
On Thu, Aug 21, 2014 at 12:22 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Hi Rob, Anne, Rich,
I think Markus has answered your question as to why we opted for an Event core which is being used in the sense described by Anne and Rich. For any event, you can have a list of species in an Occurrence extension and for each species, you can include quantity and quantityType, e.g., biomass, etc. The proposed term eventSeriesID was intended for linking together related events, although it now looks like parentEventID might be a better, more flexible term. The measurementOrFact extension is a good fit for capturing environmental information relating to an event. See, e.g., the Gialova Lagoon brackish water invertebrate test data set [1] where a set of 18 environmental variables, including temp, pH, Rdx, particulate organic matter, dissolved oxygen, salinity, chlorophyll-a were measured for each sampling station-sampling period combination. An example mapping is:
Id measurementType measurementValue measurementUnit measurementRemarks
IA Tmp (sed) 21.5 degree C Tmp (sed): temperature at the bottom surface
*Controlled vocabularies*
Ideally, the values for samplingUnit and quantityType would be selected from controlled vocabularies. This is, effectively, what we do by presenting a small list of values in a drop-down menu. The current values are what we derived for example data sets and discussion but they can undoubtedly be extended and improved.
We capture “bucket” type measures through a combination of samplingEffort, samplingGeometry and samplingUnit. For example, a pitfall trap (in a point location) left out for 16 days might have samplingEffort: 16, samplingGeometry: point and samplingUnit: day. Three m^2 quadrats in a shore survey might have samplingEffort: 3, samplingGeometry: area and samplingUnit: m^2.
It would be very useful to see your compilation of scope, effort and completeness measures to see if we can express them in our model and/or if we need to reconsider our approach.
Éamonn
[1] http://eubon-ipt.gbif.org/resource.do?r=ionian-brackish-lagoon
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus Döring Sent: 20 August 2014 23:47 To: Robert Guralnick
Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Rob,
this proposal if for monitoring surveys really, not to be confused with material samples like environmental or tissue samples which have a distinct new dwc class MaterialSample.
We tend to overload the term sampling a lot and it helps treating material samples different from pure observational "sampling". That is why the existing Event class was used as the core and classic Occurrence records as extensions. A classic example is a vegetation survey where each plot represents an Event record and each recorded species in that plot will be an Occurrence extension record with a given quantity. Darwin Core already offers individualCount to specify quantity, but it is a very specific way of measuring "abundance" restricted to only some use cases. Abiotic measurements about the plot (e.g. soil type, pH, temperature) can be published using the measurements or facts extension linked to the Event core.
Markus
On 20 Aug 2014, at 20:08, Robert Guralnick Robert.Guralnick@colorado.edu wrote:
Anne -- I don't know the answers! These are questions for Eamonn. I would presume that a sample could be a jumble of species or even just water or soil samples, and biomass would refer to that sample - but maybe that isn't a use case being considered? The examples given in the longer document all link an event_id to species name and some measure of quantity for that species (to the species, not an individual specimen), so I assume that is the prevailing (or only) case?
Best, Rob
On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen annethessen@gmail.com wrote:
Hi Rob I would like to respond to your item number 2.
From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do.
I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
1) I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
2) There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
3) Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Same here – Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu http://eubon.eu/
[2] http://eubon-ipt.gbif.org http://eubon-ipt.gbif.org/
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494 tel:%2B45%203532%201494 ; Fax: +45 3532 1480 tel:%2B45%203532%201480
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Just a quick comment on Events – we use a hierarchical event model, such that there might be an Event defined for an expedition, with child events for particular legs of an expedition, and grandchild events for dives, and possibly great-grandchild events for individual collecting events within a dive.
In our context, Events are simply the intersection of place and time, with the implication that “something” noteworthy happened at that place and time (and typically including metadata about “who”). The “time” is represented as a range, and the “location” can be anything from a precise GPS coordinate to “Planet Earth”. Events are created at a level of granularity of place and time commensurate with what the “something” is. For example, some events may span many years across a large geographic area, or in a very precise place across a fraction of a second. The degree of nesting events hierarchically is also flexible, commensurate with a human interpretation of how the data should be structured. The “something” can certainly be the sampling of material in nature, but it’s certainly not limited to that.
None of that really addresses Rob’s questions (the answers to which I am likewise interested in), but I thought I’d add this to the pot.
Aloha,
Rich
From: Anne Thessen [mailto:annethessen@gmail.com] Sent: Wednesday, August 20, 2014 7:56 AM To: Robert Guralnick; Eamonn O Tuama Cc: TDWG Content Mailing List; Richard Pyle Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Rob I would like to respond to your item number 2.
From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do.
I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
1) I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
2) There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
3) Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Same here – Events are central to the work that we do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494 tel:%2B45%203532%201494 ; Fax: +45 3532 1480 tel:%2B45%203532%201480
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rich,
this is where the eventSeriesID comes in to group related events. I always wondered if we should rather use a parentEventID term instead to capture arbitrary nesting levels. That would match your use case a lot, right?
Markus
On 20 Aug 2014, at 20:56, Richard Pyle deepreef@bishopmuseum.org wrote:
Just a quick comment on Events – we use a hierarchical event model, such that there might be an Event defined for an expedition, with child events for particular legs of an expedition, and grandchild events for dives, and possibly great-grandchild events for individual collecting events within a dive.
In our context, Events are simply the intersection of place and time, with the implication that “something” noteworthy happened at that place and time (and typically including metadata about “who”). The “time” is represented as a range, and the “location” can be anything from a precise GPS coordinate to “Planet Earth”. Events are created at a level of granularity of place and time commensurate with what the “something” is. For example, some events may span many years across a large geographic area, or in a very precise place across a fraction of a second. The degree of nesting events hierarchically is also flexible, commensurate with a human interpretation of how the data should be structured. The “something” can certainly be the sampling of material in nature, but it’s certainly not limited to that.
None of that really addresses Rob’s questions (the answers to which I am likewise interested in), but I thought I’d add this to the pot.
Aloha, Rich
From: Anne Thessen [mailto:annethessen@gmail.com] Sent: Wednesday, August 20, 2014 7:56 AM To: Robert Guralnick; Eamonn O Tuama Cc: TDWG Content Mailing List; Richard Pyle Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Rob I would like to respond to your item number 2. From my perspective, I deal with lots of published descriptions of taxa. The text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle deepreef@bishopmuseum.org wrote: Same here – Events are central to the work that we do.
Aloha, Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote: Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core* As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance* Yes, the discussion on TDWG did fade out but it was clear that the term “abundance” as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down. As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards, Éamonn
From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote: Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu [2] http://eubon-ipt.gbif.org [3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640 [4] http://links.gbif.org/sample_data_model [5] https://code.google.com/p/darwincore/issues/list
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org), Senior Programme Officer for Interoperability, Global Biodiversity Information Facility Secretariat, Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK Phone: +45 3532 1494; Fax: +45 3532 1480
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Anne E. Thessen, Ph.D. The Data Detektiv, Owner and Founder Ronin Institute, Research Scholar 443.225.9185 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
YES! parentEventID would PERFECTLY match our use case. The problem with eventSeriesID is that it implies an EventSeries Class, which is different in nature than an Event. We have found that nesting events hierarchically (via parentEventID) is more flexible and simpler (and in some ways more powerful).
Aloha,
Rich
From: Markus Döring [mailto:m.doering@mac.com] Sent: Wednesday, August 20, 2014 11:32 AM To: Richard Pyle Cc: Anne Thessen; Robert Guralnick; Éamonn Ó Tuama; TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Rich,
this is where the eventSeriesID comes in to group related events. I always wondered if we should rather use a parentEventID term instead to capture arbitrary nesting levels. That would match your use case a lot, right?
Markus
On 20 Aug 2014, at 20:56, Richard Pyle deepreef@bishopmuseum.org wrote:
Just a quick comment on Events we use a hierarchical event model, such that there might be an Event defined for an expedition, with child events for particular legs of an expedition, and grandchild events for dives, and possibly great-grandchild events for individual collecting events within a dive.
In our context, Events are simply the intersection of place and time, with the implication that something noteworthy happened at that place and time (and typically including metadata about who). The time is represented as a range, and the location can be anything from a precise GPS coordinate to Planet Earth. Events are created at a level of granularity of place and time commensurate with what the something is. For example, some events may span many years across a large geographic area, or in a very precise place across a fraction of a second. The degree of nesting events hierarchically is also flexible, commensurate with a human interpretation of how the data should be structured. The something can certainly be the sampling of material in nature, but its certainly not limited to that.
None of that really addresses Robs questions (the answers to which I am likewise interested in), but I thought Id add this to the pot.
Aloha,
Rich
From: Anne Thessen [mailto:annethessen@gmail.com] Sent: Wednesday, August 20, 2014 7:56 AM To: Robert Guralnick; Eamonn O Tuama Cc: TDWG Content Mailing List; Richard Pyle Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Rob I would like to respond to your item number 2.
From my perspective, I deal with lots of published descriptions of taxa. The
text might say something like "I saw species A in the Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The biomass range obviously corresponds to at least three different occurrences, but how to divide the biomass data? I would love to be able to have an *event* to attach it all to. There is almost two different levels of events - a sampling event and a "study event". The "study event" would correspond to the type of event I would like to use in the above example. It may not be ideal, but for the old literature that might be the best we can do. I have to admit that I don't know enough about trawl data to understand why an event core would be a problem. It seems that the trawl would be an event and each biomass measure (of each fish) would be attached to a separate occurrence which is attached to that event. Am I understanding this wrong? btw - I found a workaround for the example I gave, so it's not impossible to model with the current structure.... Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
Éamonn et al. --- Thanks for the clarifications. I think these help a ton but it raises a couple more questions for me.
1) I am surprised that you plan to use of MeasurementorFact extension in relation to the Event core, which seems like a novel (or perhaps awkward or unintended?) mechanism for capturing environmental data, but the same extension was not be seen as relevant for describing samples? Can you explain more about the thinking there?
2) There may be a subtle issue here extending "Event" to be more what you call a "Sampling Event Core". My read of this is that Darwin Core serves as a way to deal with point occurrences and Event reflects the context of a single capture event (whether a single observation, or a bulk sample capture). The changes recommended seem to dramatically extend and change that meaning? Its simply a question that I don't have answer to, but is Darwin Core, the right vehicle to start capturing repeated measures of biomass values from trawls? I don't have answer but man, terms like quantityType (as a property of occurrence?) give me pause.
3) Is Sampling Unit a controlled vocabulary? For another project, I have looked through - and captured scope, effort and completeness measures from - a large number of published biotic area inventories. The vast majorities of these are measured in units like bucket hours, or trap nights. Is a "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send along all the many examples of how biotic inventories of an area are completed and perhaps it might be good to see how those might be represented using the terms you are proposing?
Best, Rob
On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle < mailto:deepreef@bishopmuseum.org deepreef@bishopmuseum.org> wrote:
Same here Events are central to the work that we do.
Aloha,
Rich
From: mailto:tdwg-content-bounces@lists.tdwg.org tdwg-content-bounces@lists.tdwg.org [mailto: mailto:tdwg-content-bounces@lists.tdwg.org tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen Sent: Wednesday, August 20, 2014 2:59 AM To: mailto:tdwg-content@lists.tdwg.org tdwg-content@lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hello I would just like to comment on *event core*. I've been doing a lot of work translating published data into Darwin Core. During that process I've wished several times that I could use Event as core. I am happy to hear about that proposed change. It will make it easier to model the data I am working with. Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
Hi Rob,
Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting the outcomes here for further consideration.
*Event core*
As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesnt stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.
I want to stress that we are not building a specific IPT version to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic core + extension format to enable its use for exploration of more data formats. This is part of the core codebase and there were no custom forks of the IPT for this work. Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.
*New terms around abundance*
Yes, the discussion on TDWG did fade out but it was clear that the term abundance as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.
As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.
Best regards,
Éamonn
From: mailto:robgur@gmail.com robgur@gmail.com [ mailto:robgur@gmail.com mailto:robgur@gmail.com] On Behalf Of Robert Guralnick Sent: 19 August 2014 16:56 To: Éamonn Ó Tuama [GBIF] Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data
Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:
"Solutions without introducing an event core in Darwin Core Archives: During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."
That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum. Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.
So I see two issues, conflated, in this post you just made. One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample. The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago. To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?
Best, Rob
On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] < mailto:eotuama@gbif.org eotuama@gbif.org> wrote:
Dear All,
GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an Event with associated taxon occurrences in an Occurrence extension.
The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document Publishing sample data using the GBIF IPT [4].
As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:
1. quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.
2. quantityType: : the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.
3. samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.
4. samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3. It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
5. eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
Best regards,
Éamonn
[1] http://eubon.eu/ http://eubon.eu
[2] http://eubon-ipt.gbif.org/ http://eubon-ipt.gbif.org
[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.489864 0 http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
[4] http://links.gbif.org/sample_data_model http://links.gbif.org/sample_data_model
[5] https://code.google.com/p/darwincore/issues/list https://code.google.com/p/darwincore/issues/list
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. ( mailto:eotuama@gbif.org eotuama@gbif.org),
Senior Programme Officer for Interoperability,
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: tel:%2B45%203532%201494 +45 3532 1494; Fax: tel:%2B45%203532%201480 +45 3532 1480
_______________________________________________ tdwg-content mailing list mailto:tdwg-content@lists.tdwg.org tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list mailto:tdwg-content@lists.tdwg.org tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
participants (8)
-
Anne Thessen
-
Donald Hobern [GBIF]
-
Markus Döring
-
Markus Döring
-
Matt Jones
-
Richard Pyle
-
Robert Guralnick
-
Éamonn Ó Tuama [GBIF]