[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Éamonn Ó Tuama [GBIF] eotuama at gbif.org
Wed Aug 20 13:04:38 CEST 2014


Hi Rob,

 

Thank you for the feedback. I have tried to address the two main issues you raise below. At the outset, I would like to emphasise that much of this work is taking place in the context of the EU BON project which includes a task on developing/enhancing tools and standards for data sharing with a particular focus on the IPT for publishing sample-based data. So, we were constrained by the need to publish sample-based data sets in the Darwin Core Archive format and to demonstrate practical application using a working prototype. When the discussion on the TDWG list faded out, we took it to our EU BON partners whose requirements were essential input to further development. We recognise that these discussions took place away from TDWG (although the TDWG/EU BON contributors overlapped) and this is the reason we are presenting  the outcomes here for further consideration. 

 

*Event core*

As the SIGS report indicated, sample data can be modelled in Darwin Core Archives using either Occurrence or Event as core. This was the starting point for our evaluation but as things progressed the data wrangling pushed the model back towards the Event core. We actually went through the exercise of mapping multiple test datasets in an iterative process spanning several months' work. In the end, we found that using an Event core better matched the typical sample data we were dealing with, allowing use of a measurement-or-fact extension to be included for the efficient expression of environmental information associated with the event. The choice comes down to an Occurrence core or an Event core + Occurrence extension. In both cases, the true observation records are Occurrences. The big difference is what type the core has and therefore to which kind of records you can attach further facts and extra information with DwC-A extensions. Many sampling datasets have very rich information about the site and event, so it is very natural to hang facts from an Event core. When picking the Occurrence core those facts would have to be repeated for each and every occurrence record. Moreover, our approach doesn’t stop anyone from using the Occurrence core if they so wish. This just provides a different option for datasets that better fit an Event core model.

 

I want to stress that we are not building a “specific IPT version” to support an Event core but, rather, we adapted the IPT so that it can be configured to support any generic “core + extension” format to enable its use for exploration of more data formats.  This is part of the core codebase and there were no custom forks of the IPT for this work.  Our view at GBIF is that if there are significant numbers of data publishers who are keen to adopt, promote and use a (any) format, and the tools can be configured to do so, then we should support it, and, if necessary, use a custom namespace.

 

*New terms around abundance*

Yes, the discussion on TDWG did fade out but it was clear that the term “abundance”  as recommended by the SIGS report (along with abundanceAsPercent) was confusing many when we were looking for term(s) that reported quantitative measures of organisms in a sample. It also became clear we would need to be able to state the type of quantity being measured. An alternative suggestion for using the MeasurementsOrFact class was immediately shot down.

As some of our main use cases were coming from the EU BON project, discussion shifted to that forum and consensus formed about the currently proposed terms. It was within this group that the additional terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we began testing with sample data sets.

 

Best regards,

Éamonn

 

 

From: robgur at gmail.com [mailto:robgur at gmail.com] On Behalf Of Robert Guralnick
Sent: 19 August 2014 16:56
To: Éamonn Ó Tuama [GBIF]
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data

 

 

  Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper, in particular, this portion of the paper:  

 

"Solutions without introducing an event core in Darwin Core Archives:  During the review of the solutions for the uses cases, it became apparent that either model could be applied to every use case. The core and extensions bore a complementary relationship and between them could express all the required information. The core simply provided the central anchor in the star schema from which to join the additional information. Therefore, using the Occurrence core, well established in the GBIF network through uptake of the IPT, seemed more appropriate than inventing CollectingEvent as an additional core type."

 

   That SIGS paper has John Wieczorek and you both as authors, including many luminaries across the biodiversity standards spectrum.  Given the above, its curious to see the EventCore come back again, along with a specific IPT version to support it.  

 

    So I see two issues, conflated, in this post you just made.  One is the need for an EventCore at all, and the nature of relating Event and Occurrence/Material Sample.  The second is the introduction of new terms, which seemingly have arrived after debate on similar terms - but framed around abundance - stalled a year ago.  To my mind, these both require some further discussion, because I don't (necessarily) see TDWG community coherence around either issue?

 

Best, Rob

 

 

On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] <eotuama at gbif.org> wrote:

Dear All,

 

GBIF is committed to exploring ways in which the IPT and Darwin Core Archive format can be extended for publishing sample-based data sets. In association with the EU BON project [1], a customised version of the IPT [2] has been deployed to test this using a special type of Darwin Core Archive in which the core is an “Event” with associated taxon occurrences in an “Occurrence” extension.

 

The Darwin Core vocabulary already provides a rich set of terms with many relevant for describing sample-based data. Synthesising several sources of input (GBIF organised workshop on sample data, May 2013 [3], discussions on the TDWG mailing list in late 2013; internal discussion among EU BON project partners), five new terms relating to sample data were identified as essential. The complete model including these new terms are fully described with examples in the online document “Publishing sample data using the GBIF IPT” [4].

 

As a first step towards ratification, we would like to register the new terms in the DwC Google Code tracker [5] if there are no major objections on this list. The five terms are:

 

1.      quantity: the number or enumeration value of the quantityType (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a percentage measure recorded for the sample.

 

2.      quantityType: :  the entity being referred to by quantity, e.g., individuals, biomass, %species, scale type.

 

3.      samplingGeometry: an indication of what kind of space was sampled; select from point, line, area or volume.

 

4.      samplingUnit: the unit of measurement used for reporting the quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3.  It is combined with quantity and quantityType to provide the complete measurement, e.g., 9 individuals per day,  4 biomass-gm per metre^2.

 

5.      eventSeriesID: an identifier for a set of events that are associated in some way, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.

 

 

Best regards,

 

Éamonn

 

[1] http://eubon.eu 

[2] http://eubon-ipt.gbif.org 

[3] http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640

[4]  <http://links.gbif.org/sample_data_model> http://links.gbif.org/sample_data_model

[5] https://code.google.com/p/darwincore/issues/list 

 

 

 

____________________________________________________

Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org), 

Senior Programme Officer for Interoperability, 

Global Biodiversity Information Facility Secretariat, 

Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK

Phone:  +45 3532 1494 <tel:%2B45%203532%201494> ; Fax:  +45 3532 1480 <tel:%2B45%203532%201480> 

 


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/84ff2a8c/attachment.html 


More information about the tdwg-content mailing list