[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Anne Thessen annethessen at gmail.com
Wed Aug 20 14:58:50 CEST 2014

I would just like to comment on *event core*.
I've been doing a lot of work translating published data into Darwin 
Core. During that process I've wished several times that I could use 
Event as core. I am happy to hear about that proposed change. It will 
make it easier to model the data I am working with.

On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
> Hi Rob,
> Thank you for the feedback. I have tried to address the two main 
> issues you raise below. At the outset, I would like to emphasise that 
> much of this work is taking place in the context of the EU BON project 
> which includes a task on developing/enhancing tools and standards for 
> data sharing with a particular focus on the IPT for publishing 
> sample-based data. So, we were constrained by the need to publish 
> sample-based data sets in the Darwin Core Archive format and to 
> demonstrate practical application using a working prototype. When the 
> discussion on the TDWG list faded out, we took it to our EU BON 
> partners whose requirements were essential input to further 
> development. We recognise that these discussions took place away from 
> TDWG (although the TDWG/EU BON contributors overlapped) and this is 
> the reason we are presenting  the outcomes here for further 
> consideration.
> **Event core**
> As the SIGS report indicated, sample data can be modelled in Darwin 
> Core Archives using either Occurrence or Event as core. This was the 
> starting point for our evaluation but as things progressed the data 
> wrangling pushed the model back towards the Event core. We actually 
> went through the exercise of mapping multiple test datasets in 
> an iterative process spanning several months' work. In the end, we 
> found that using an Event core better matched the typical sample data 
> we were dealing with, allowing use of a measurement-or-fact extension 
> to be included for the efficient expression of environmental 
> information associated with the event. The choice comes down to an 
> Occurrence core or an Event core + Occurrence extension. In both 
> cases, the true observation records are Occurrences. The big 
> difference is what type the core has and therefore to which kind of 
> records you can attach further facts and extra information with DwC-A 
> extensions. Many sampling datasets have very rich information about 
> the site and event, so it is very natural to hang facts from an Event 
> core. When picking the Occurrence core those facts would have to be 
> repeated for each and every occurrence record. Moreover, our approach 
> doesn't stop anyone from using the Occurrence core if they so 
> wish. This just provides a different option for datasets that better 
> fit an Event core model.
> I want to stress that we are not building a "specific IPT version" to 
> support an Event core but, rather, we adapted the IPT so that it can 
> be configured to support any generic "core + extension" format to 
> enable its use for exploration of more data formats.  This is part of 
> the core codebase and there were no custom forks of the IPT for this 
> work.  Our view at GBIF is that if there are significant numbers of 
> data publishers who are keen to adopt, promote and use a (any) format, 
> and the tools can be configured to do so, then we should support it, 
> and, if necessary, use a custom namespace.
> **New terms around abundance**
> Yes, the discussion on TDWG did fade out but it was clear that the 
> term "abundance"  as recommended by the SIGS report (along with 
> abundanceAsPercent) was confusing many when we were looking for 
> term(s) that reported quantitative measures of organisms in a sample. 
> It also became clear we would need to be able to state the type of 
> quantity being measured. An alternative suggestion for using the 
> MeasurementsOrFact class was immediately shot down.
> As some of our main use cases were coming from the EU BON project, 
> discussion shifted to that forum and consensus formed about the 
> currently proposed terms. It was within this group that the additional 
> terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed 
> and where we began testing with sample data sets.
> Best regards,
> Éamonn
> *From:*robgur at gmail.com [mailto:robgur at gmail.com] *On Behalf Of 
> *Robert Guralnick
> *Sent:* 19 August 2014 16:56
> *To:* Éamonn Ó Tuama [GBIF]
> *Cc:* TDWG Content Mailing List
> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for 
> expressing sample data
>   Hi Éamonn --- I am curious about the outcomes presented in the SIGS 
> paper, in particular, this portion of the paper:
> "Solutions without introducing an event core in Darwin Core Archives: 
>  During the review of the solutions for the uses cases, it became 
> apparent that either model could be applied to every use case. The 
> core and extensions bore a complementary relationship and between them 
> could express all the required information. The core simply provided 
> the central anchor in the star schema from which to join the 
> additional information. Therefore, using the Occurrence core, well 
> established in the GBIF network through uptake of the IPT, seemed more 
> appropriate than inventing CollectingEvent as an additional core type."
>    That SIGS paper has John Wieczorek and you both as authors, 
> including many luminaries across the biodiversity standards spectrum. 
>  Given the above, its curious to see the EventCore come back again, 
> along with a specific IPT version to support it.
>     So I see two issues, conflated, in this post you just made.  One 
> is the need for an EventCore at all, and the nature of relating Event 
> and Occurrence/Material Sample.  The second is the introduction of new 
> terms, which seemingly have arrived after debate on similar terms - 
> but framed around abundance - stalled a year ago.  To my mind, these 
> both require some further discussion, because I don't (necessarily) 
> see TDWG community coherence around either issue?
> Best, Rob
> On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] 
> <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
> Dear All,
> GBIF is committed to exploring ways in which the IPT and Darwin Core 
> Archive format can be extended for publishing sample-based data sets. 
> In association with the EU BON project [1], a customised version of 
> the IPT [2] has been deployed to test this using a special type of 
> Darwin Core Archive in which the core is an "Event" with associated 
> taxon occurrences in an "Occurrence" extension.
> The Darwin Core vocabulary already provides a rich set of terms with 
> many relevant for describing sample-based data. Synthesising several 
> sources of input (GBIF organised workshop on sample data, May 2013 
> [3], discussions on the TDWG mailing list in late 2013; internal 
> discussion among EU BON project partners), five new terms relating to 
> sample data were identified as essential. The complete model including 
> these new terms are fully described with examples in the online 
> document "Publishing sample data using the GBIF IPT" [4].
> As a first step towards ratification, we would like to register the 
> new terms in the DwC Google Code tracker [5] if there are no major 
> objections on this list. The five terms are:
> 1.*quantity*: the number or enumeration value of the quantityType 
> (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per 
> samplingUnit or a percentage measure recorded for the sample.
> 2.*quantityType*: :  the entity being referred to by quantity, e.g., 
> individuals, biomass, %species, scale type.
> 3.*samplingGeometry*: an indication of what kind of space was sampled; 
> select from point, line, area or volume.
> 4.*samplingUnit*: the unit of measurement used for reporting the 
> quantity in the sample, e.g., minute, hour, day, metre, metre^2, 
> metre^3.  It is combined with quantity and quantityType to provide the 
> complete measurement, e.g., 9 individuals per day,  4 biomass-gm per 
> metre^2.
> 5.*eventSeriesID*: an identifier for a set of events that are 
> associated in some way, e.g., a monitoring series; may be a global 
> unique identifier or an identifier specific to the series.
> Best regards,
> Éamonn
> [1] http://eubon.eu
> [2] http://eubon-ipt.gbif.org
> [3] 
> http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
> [4] http://links.gbif.org/sample_data_model
> [5] https://code.google.com/p/darwincore/issues/list
> ____________________________________________________
> /Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org 
> <mailto:eotuama at gbif.org>), /
> /Senior Programme Officer for Interoperability, /
> /Global Biodiversity Information Facility Secretariat, /
> /Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/
> /Phone: +45 3532 1494 <tel:%2B45%203532%201494>; Fax: +45 3532 1480 
> <tel:%2B45%203532%201480>/
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/7c9adb66/attachment.html 

More information about the tdwg-content mailing list