[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Richard Pyle deepreef at bishopmuseum.org
Thu Aug 21 00:31:28 CEST 2014

YES!  parentEventID would PERFECTLY match our use case.  The problem with
eventSeriesID is that it implies an EventSeries “Class”, which is different
in nature than an “Event”.  We have found that nesting events hierarchically
(via parentEventID) is more flexible and simpler (and in some ways more






From: Markus Döring [mailto:m.doering at mac.com] 
Sent: Wednesday, August 20, 2014 11:32 AM
To: Richard Pyle
Cc: Anne Thessen; Robert Guralnick; Éamonn Ó Tuama; TDWG Content Mailing
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing
sample data


Hi Rich,


this is where the eventSeriesID comes in to group related events. I always
wondered if we should rather use a parentEventID term instead to capture
arbitrary nesting levels. That would match your use case a lot, right?





On 20 Aug 2014, at 20:56, Richard Pyle <deepreef at bishopmuseum.org> wrote:

Just a quick comment on Events – we use a hierarchical event model, such
that there might be an Event defined for an expedition, with child events
for particular legs of an expedition, and grandchild events for dives, and
possibly great-grandchild events for individual collecting events within a


In our context, Events are simply the intersection of place and time, with
the implication that “something” noteworthy happened at that place and time
(and typically including metadata about “who”).  The “time” is represented
as a range, and the “location” can be anything from a precise GPS coordinate
to “Planet Earth”.  Events are created at a level of granularity of place
and time commensurate with what the “something” is.  For example, some
events may span many years across a large geographic area, or in a very
precise place across a fraction of a second. The degree of nesting events
hierarchically is also flexible, commensurate with a human interpretation of
how the data should be structured.  The “something” can certainly be the
sampling of material in nature, but it’s certainly not limited to that.


None of that really addresses Rob’s questions (the answers to which I am
likewise interested in), but I thought I’d add this to the pot.






From: Anne Thessen [mailto:annethessen at gmail.com] 
Sent: Wednesday, August 20, 2014 7:56 AM
To: Robert Guralnick; Eamonn O Tuama
Cc: TDWG Content Mailing List; Richard Pyle
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing
sample data


Hi Rob
I would like to respond to your item number 2.
>From my perspective, I deal with lots of published descriptions of taxa. The
text might say something like "I saw species A in the Chesapeake Bay, the
Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The
biomass range obviously corresponds to at least three different occurrences,
but how to divide the biomass data? I would love to be able to have an
*event* to attach it all to. There is almost two different levels of events
- a sampling event and a "study event". The "study event" would correspond
to the type of event I would like to use in the above example. It may not be
ideal, but for the old literature that might be the best we can do.
I have to admit that I don't know enough about trawl data to understand why
an event core would be a problem. It seems that the trawl would be an event
and each biomass measure (of each fish) would be attached to a separate
occurrence which is attached to that event. Am I understanding this wrong?
btw - I found a workaround for the example I gave, so it's not impossible to
model with the current structure....

On 8/20/2014 1:16 PM, Robert Guralnick wrote:


Éamonn et al. --- Thanks for the clarifications.  I think these help a ton
but it raises a couple more questions for me.  


1)   I am surprised that you plan to use of MeasurementorFact extension in
relation to the Event core, which seems like a novel (or perhaps awkward or
unintended?) mechanism for capturing environmental data, but the same
extension was not be seen as relevant for describing samples? Can you
explain more about the thinking there?


2)  There may be a subtle issue here extending "Event" to be more what you
call a "Sampling Event Core".  My read of this is that Darwin Core serves as
a way to deal with point occurrences and Event reflects the context of a
single capture event (whether a single observation, or a bulk sample
capture).  The changes recommended seem to dramatically extend and change
that meaning?  Its simply a question that I don't have answer to, but is
Darwin Core, the right vehicle to start capturing repeated measures of
biomass values from trawls?   I don't have answer but man, terms like
quantityType (as a property of occurrence?) give me pause.  


3)  Is Sampling Unit a controlled vocabulary? For another project, I have
looked through - and captured scope, effort and completeness measures from -
a large number of published biotic area inventories.  The vast majorities of
these are measured in units like bucket hours, or trap nights.  Is a
"bucket" part of SamplingGeometry or Sampling Unit?  I'd be happy to send
along all the many examples of how biotic inventories of an area are
completed and perhaps it might be good to see how those might be represented
using the terms you are proposing? 


Best, Rob






On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle <
<mailto:deepreef at bishopmuseum.org> deepreef at bishopmuseum.org> wrote:

Same here – Events are central to the work that we do.





From:  <mailto:tdwg-content-bounces at lists.tdwg.org>
tdwg-content-bounces at lists.tdwg.org [mailto:
<mailto:tdwg-content-bounces at lists.tdwg.org>
tdwg-content-bounces at lists.tdwg.org] On Behalf Of Anne Thessen
Sent: Wednesday, August 20, 2014 2:59 AM
To:  <mailto:tdwg-content at lists.tdwg.org> tdwg-content at lists.tdwg.org

Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing
sample data


I would just like to comment on *event core*.
I've been doing a lot of work translating published data into Darwin Core.
During that process I've wished several times that I could use Event as
core. I am happy to hear about that proposed change. It will make it easier
to model the data I am working with.

On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:

Hi Rob,


Thank you for the feedback. I have tried to address the two main issues you
raise below. At the outset, I would like to emphasise that much of this work
is taking place in the context of the EU BON project which includes a task
on developing/enhancing tools and standards for data sharing with a
particular focus on the IPT for publishing sample-based data. So, we were
constrained by the need to publish sample-based data sets in the Darwin Core
Archive format and to demonstrate practical application using a working
prototype. When the discussion on the TDWG list faded out, we took it to our
EU BON partners whose requirements were essential input to further
development. We recognise that these discussions took place away from TDWG
(although the TDWG/EU BON contributors overlapped) and this is the reason we
are presenting  the outcomes here for further consideration.


*Event core*

As the SIGS report indicated, sample data can be modelled in Darwin Core
Archives using either Occurrence or Event as core. This was the starting
point for our evaluation but as things progressed the data wrangling pushed
the model back towards the Event core. We actually went through the exercise
of mapping multiple test datasets in an iterative process spanning several
months' work. In the end, we found that using an Event core better matched
the typical sample data we were dealing with, allowing use of a
measurement-or-fact extension to be included for the efficient expression of
environmental information associated with the event. The choice comes down
to an Occurrence core or an Event core + Occurrence extension. In both
cases, the true observation records are Occurrences. The big difference is
what type the core has and therefore to which kind of records you can attach
further facts and extra information with DwC-A extensions. Many sampling
datasets have very rich information about the site and event, so it is very
natural to hang facts from an Event core. When picking the Occurrence core
those facts would have to be repeated for each and every occurrence record.
Moreover, our approach doesn’t stop anyone from using the Occurrence core if
they so wish. This just provides a different option for datasets that better
fit an Event core model.


I want to stress that we are not building a “specific IPT version” to
support an Event core but, rather, we adapted the IPT so that it can be
configured to support any generic “core + extension” format to enable its
use for exploration of more data formats.  This is part of the core codebase
and there were no custom forks of the IPT for this work.  Our view at GBIF
is that if there are significant numbers of data publishers who are keen to
adopt, promote and use a (any) format, and the tools can be configured to do
so, then we should support it, and, if necessary, use a custom namespace.


*New terms around abundance*

Yes, the discussion on TDWG did fade out but it was clear that the term
“abundance”  as recommended by the SIGS report (along with
abundanceAsPercent) was confusing many when we were looking for term(s) that
reported quantitative measures of organisms in a sample. It also became
clear we would need to be able to state the type of quantity being measured.
An alternative suggestion for using the MeasurementsOrFact class was
immediately shot down.

As some of our main use cases were coming from the EU BON project,
discussion shifted to that forum and consensus formed about the currently
proposed terms. It was within this group that the additional terms
(samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we
began testing with sample data sets.


Best regards,




From:  <mailto:robgur at gmail.com> robgur at gmail.com [
<mailto:robgur at gmail.com> mailto:robgur at gmail.com] On Behalf Of Robert
Sent: 19 August 2014 16:56
To: Éamonn Ó Tuama [GBIF]
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing
sample data



  Hi Éamonn --- I am curious about the outcomes presented in the SIGS paper,
in particular, this portion of the paper:  


"Solutions without introducing an event core in Darwin Core Archives:
During the review of the solutions for the uses cases, it became apparent
that either model could be applied to every use case. The core and
extensions bore a complementary relationship and between them could express
all the required information. The core simply provided the central anchor in
the star schema from which to join the additional information. Therefore,
using the Occurrence core, well established in the GBIF network through
uptake of the IPT, seemed more appropriate than inventing CollectingEvent as
an additional core type."


   That SIGS paper has John Wieczorek and you both as authors, including
many luminaries across the biodiversity standards spectrum.  Given the
above, its curious to see the EventCore come back again, along with a
specific IPT version to support it.  


    So I see two issues, conflated, in this post you just made.  One is the
need for an EventCore at all, and the nature of relating Event and
Occurrence/Material Sample.  The second is the introduction of new terms,
which seemingly have arrived after debate on similar terms - but framed
around abundance - stalled a year ago.  To my mind, these both require some
further discussion, because I don't (necessarily) see TDWG community
coherence around either issue?


Best, Rob



On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] <
<mailto:eotuama at gbif.org> eotuama at gbif.org> wrote:

Dear All,


GBIF is committed to exploring ways in which the IPT and Darwin Core Archive
format can be extended for publishing sample-based data sets. In association
with the EU BON project [1], a customised version of the IPT [2] has been
deployed to test this using a special type of Darwin Core Archive in which
the core is an “Event” with associated taxon occurrences in an “Occurrence”


The Darwin Core vocabulary already provides a rich set of terms with many
relevant for describing sample-based data. Synthesising several sources of
input (GBIF organised workshop on sample data, May 2013 [3], discussions on
the TDWG mailing list in late 2013; internal discussion among EU BON project
partners), five new terms relating to sample data were identified as
essential. The complete model including these new terms are fully described
with examples in the online document “Publishing sample data using the GBIF
IPT” [4].


As a first step towards ratification, we would like to register the new
terms in the DwC Google Code tracker [5] if there are no major objections on
this list. The five terms are:


1.      quantity: the number or enumeration value of the quantityType (e.g.,
individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit or a
percentage measure recorded for the sample.


2.      quantityType: :  the entity being referred to by quantity, e.g.,
individuals, biomass, %species, scale type.


3.      samplingGeometry: an indication of what kind of space was sampled;
select from point, line, area or volume.


4.      samplingUnit: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3.
It is combined with quantity and quantityType to provide the complete
measurement, e.g., 9 individuals per day,  4 biomass-gm per metre^2.


5.      eventSeriesID: an identifier for a set of events that are associated
in some way, e.g., a monitoring series; may be a global unique identifier or
an identifier specific to the series.



Best regards,




[1]  <http://eubon.eu/> http://eubon.eu

[2]  <http://eubon-ipt.gbif.org/> http://eubon-ipt.gbif.org


[4]  <http://links.gbif.org/sample_data_model>

[5]  <https://code.google.com/p/darwincore/issues/list>





Éamonn Ó Tuama, M.Sc., Ph.D. ( <mailto:eotuama at gbif.org> eotuama at gbif.org),

Senior Programme Officer for Interoperability,

Global Biodiversity Information Facility Secretariat,

Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK

Phone:   <tel:%2B45%203532%201494> +45 3532 1494; Fax:
<tel:%2B45%203532%201480> +45 3532 1480


tdwg-content mailing list
 <mailto:tdwg-content at lists.tdwg.org> tdwg-content at lists.tdwg.org


tdwg-content mailing list
 <mailto:tdwg-content at lists.tdwg.org> tdwg-content at lists.tdwg.org


Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
 <tel:443.225.9185> 443.225.9185

tdwg-content mailing list
 <mailto:tdwg-content at lists.tdwg.org> tdwg-content at lists.tdwg.org


Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar

tdwg-content mailing list
tdwg-content at lists.tdwg.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/95aa170b/attachment.html 

More information about the tdwg-content mailing list