[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Anne Thessen annethessen at gmail.com
Wed Aug 20 19:56:22 CEST 2014

Hi Rob
I would like to respond to your item number 2.
 From my perspective, I deal with lots of published descriptions of 
taxa. The text might say something like "I saw species A in the 
Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is 
5 - 9 grams". The biomass range obviously corresponds to at least three 
different occurrences, but how to divide the biomass data? I would love 
to be able to have an *event* to attach it all to. There is almost two 
different levels of events - a sampling event and a "study event". The 
"study event" would correspond to the type of event I would like to use 
in the above example. It may not be ideal, but for the old literature 
that might be the best we can do.
I have to admit that I don't know enough about trawl data to understand 
why an event core would be a problem. It seems that the trawl would be 
an event and each biomass measure (of each fish) would be attached to a 
separate occurrence which is attached to that event. Am I understanding 
this wrong?
btw - I found a workaround for the example I gave, so it's not 
impossible to model with the current structure....

On 8/20/2014 1:16 PM, Robert Guralnick wrote:
> Éamonn et al. --- Thanks for the clarifications.  I think these help a 
> ton but it raises a couple more questions for me.
> 1)   I am surprised that you plan to use of MeasurementorFact 
> extension in relation to the Event core, which seems like a novel (or 
> perhaps awkward or unintended?) mechanism for capturing environmental 
> data, but the same extension was not be seen as relevant for 
> describing samples? Can you explain more about the thinking there?
> 2)  There may be a subtle issue here extending "Event" to be more what 
> you call a "Sampling Event Core".  My read of this is that Darwin Core 
> serves as a way to deal with point occurrences and Event reflects the 
> context of a single capture event (whether a single observation, or a 
> bulk sample capture).  The changes recommended seem to dramatically 
> extend and change that meaning?  Its simply a question that I don't 
> have answer to, but is Darwin Core, the right vehicle to start 
> capturing repeated measures of biomass values from trawls?   I don't 
> have answer but man, terms like quantityType (as a property of 
> occurrence?) give me pause.
> 3)  Is Sampling Unit a controlled vocabulary? For another project, I 
> have looked through - and captured scope, effort and completeness 
> measures from - a large number of published biotic area inventories. 
>  The vast majorities of these are measured in units like bucket hours, 
> or trap nights.  Is a "bucket" part of SamplingGeometry or Sampling 
> Unit?  I'd be happy to send along all the many examples of how biotic 
> inventories of an area are completed and perhaps it might be good to 
> see how those might be represented using the terms you are proposing?
> Best, Rob
> On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle 
> <deepreef at bishopmuseum.org <mailto:deepreef at bishopmuseum.org>> wrote:
>     Same here – Events are central to the work that we do.
>     Aloha,
>     Rich
>     *From:*tdwg-content-bounces at lists.tdwg.org
>     <mailto:tdwg-content-bounces at lists.tdwg.org>
>     [mailto:tdwg-content-bounces at lists.tdwg.org
>     <mailto:tdwg-content-bounces at lists.tdwg.org>] *On Behalf Of *Anne
>     Thessen
>     *Sent:* Wednesday, August 20, 2014 2:59 AM
>     *To:* tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>     expressing sample data
>     Hello
>     I would just like to comment on *event core*.
>     I've been doing a lot of work translating published data into
>     Darwin Core. During that process I've wished several times that I
>     could use Event as core. I am happy to hear about that proposed
>     change. It will make it easier to model the data I am working with.
>     Anne
>     On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
>         Hi Rob,
>         Thank you for the feedback. I have tried to address the two
>         main issues you raise below. At the outset, I would like to
>         emphasise that much of this work is taking place in the
>         context of the EU BON project which includes a task on
>         developing/enhancing tools and standards for data sharing with
>         a particular focus on the IPT for publishing sample-based
>         data. So, we were constrained by the need to publish
>         sample-based data sets in the Darwin Core Archive format and
>         to demonstrate practical application using a working
>         prototype. When the discussion on the TDWG list faded out, we
>         took it to our EU BON partners whose requirements were
>         essential input to further development. We recognise that
>         these discussions took place away from TDWG (although the
>         TDWG/EU BON contributors overlapped) and this is the reason we
>         are presenting  the outcomes here for further consideration.
>         **Event core**
>         As the SIGS report indicated, sample data can be modelled in
>         Darwin Core Archives using either Occurrence or Event as core.
>         This was the starting point for our evaluation but as things
>         progressed the data wrangling pushed the model back towards
>         the Event core. We actually went through the exercise of
>         mapping multiple test datasets in an iterative process
>         spanning several months' work. In the end, we found that using
>         an Event core better matched the typical sample data we were
>         dealing with, allowing use of a measurement-or-fact extension
>         to be included for the efficient expression of environmental
>         information associated with the event. The choice comes down
>         to an Occurrence core or an Event core + Occurrence extension.
>         In both cases, the true observation records are Occurrences.
>         The big difference is what type the core has and therefore to
>         which kind of records you can attach further facts and extra
>         information with DwC-A extensions. Many sampling datasets have
>         very rich information about the site and event, so it is very
>         natural to hang facts from an Event core. When picking the
>         Occurrence core those facts would have to be repeated for each
>         and every occurrence record. Moreover, our approach doesn’t
>         stop anyone from using the Occurrence core if they so
>         wish. This just provides a different option for datasets that
>         better fit an Event core model.
>         I want to stress that we are not building a “specific IPT
>         version” to support an Event core but, rather, we adapted the
>         IPT so that it can be configured to support any generic “core
>         + extension” format to enable its use for exploration of more
>         data formats.  This is part of the core codebase and there
>         were no custom forks of the IPT for this work.  Our view at
>         GBIF is that if there are significant numbers of data
>         publishers who are keen to adopt, promote and use a (any)
>         format, and the tools can be configured to do so, then we
>         should support it, and, if necessary, use a custom namespace.
>         **New terms around abundance**
>         Yes, the discussion on TDWG did fade out but it was clear that
>         the term “abundance”  as recommended by the SIGS report (along
>         with abundanceAsPercent) was confusing many when we were
>         looking for term(s) that reported quantitative measures of
>         organisms in a sample. It also became clear we would need to
>         be able to state the type of quantity being measured. An
>         alternative suggestion for using the MeasurementsOrFact class
>         was immediately shot down.
>         As some of our main use cases were coming from the EU BON
>         project, discussion shifted to that forum and consensus formed
>         about the currently proposed terms. It was within this group
>         that the additional terms (samplingGeometry, samplingUnit,
>         eventSeriesID) were proposed and where we began testing with
>         sample data sets.
>         Best regards,
>         Éamonn
>         *From:*robgur at gmail.com <mailto:robgur at gmail.com>
>         [mailto:robgur at gmail.com] *On Behalf Of *Robert Guralnick
>         *Sent:* 19 August 2014 16:56
>         *To:* Éamonn Ó Tuama [GBIF]
>         *Cc:* TDWG Content Mailing List
>         *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms
>         for expressing sample data
>           Hi Éamonn --- I am curious about the outcomes presented in
>         the SIGS paper, in particular, this portion of the paper:
>         "Solutions without introducing an event core in Darwin Core
>         Archives:  During the review of the solutions for the uses
>         cases, it became apparent that either model could be applied
>         to every use case. The core and extensions bore a
>         complementary relationship and between them could express all
>         the required information. The core simply provided the central
>         anchor in the star schema from which to join the additional
>         information. Therefore, using the Occurrence core, well
>         established in the GBIF network through uptake of the IPT,
>         seemed more appropriate than inventing CollectingEvent as an
>         additional core type."
>            That SIGS paper has John Wieczorek and you both as authors,
>         including many luminaries across the biodiversity standards
>         spectrum.  Given the above, its curious to see the EventCore
>         come back again, along with a specific IPT version to support it.
>             So I see two issues, conflated, in this post you just
>         made.  One is the need for an EventCore at all, and the nature
>         of relating Event and Occurrence/Material Sample.  The second
>         is the introduction of new terms, which seemingly have arrived
>         after debate on similar terms - but framed around abundance -
>         stalled a year ago.  To my mind, these both require some
>         further discussion, because I don't (necessarily) see TDWG
>         community coherence around either issue?
>         Best, Rob
>         On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF]
>         <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
>         Dear All,
>         GBIF is committed to exploring ways in which the IPT and
>         Darwin Core Archive format can be extended for publishing
>         sample-based data sets. In association with the EU BON project
>         [1], a customised version of the IPT [2] has been deployed to
>         test this using a special type of Darwin Core Archive in which
>         the core is an “Event” with associated taxon occurrences in an
>         “Occurrence” extension.
>         The Darwin Core vocabulary already provides a rich set of
>         terms with many relevant for describing sample-based data.
>         Synthesising several sources of input (GBIF organised workshop
>         on sample data, May 2013 [3], discussions on the TDWG mailing
>         list in late 2013; internal discussion among EU BON project
>         partners), five new terms relating to sample data were
>         identified as essential. The complete model including these
>         new terms are fully described with examples in the online
>         document “Publishing sample data using the GBIF IPT” [4].
>         As a first step towards ratification, we would like to
>         register the new terms in the DwC Google Code tracker [5] if
>         there are no major objections on this list. The five terms are:
>         1.*quantity*: the number or enumeration value of the
>         quantityType (e.g., individuals, biomass, biovolume,
>         BraunBlanquetScale) per samplingUnit or a percentage measure
>         recorded for the sample.
>         2.*quantityType*: :  the entity being referred to by quantity,
>         e.g., individuals, biomass, %species, scale type.
>         3.*samplingGeometry*: an indication of what kind of space was
>         sampled; select from point, line, area or volume.
>         4.*samplingUnit*: the unit of measurement used for reporting
>         the quantity in the sample, e.g., minute, hour, day, metre,
>         metre^2, metre^3.  It is combined with quantity and
>         quantityType to provide the complete measurement, e.g., 9
>         individuals per day,  4 biomass-gm per metre^2.
>         5.*eventSeriesID*: an identifier for a set of events that are
>         associated in some way, e.g., a monitoring series; may be a
>         global unique identifier or an identifier specific to the series.
>         Best regards,
>         Éamonn
>         [1] http://eubon.eu
>         [2] http://eubon-ipt.gbif.org
>         [3]
>         http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>         [4] http://links.gbif.org/sample_data_model
>         [5] https://code.google.com/p/darwincore/issues/list
>         ____________________________________________________
>         /Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org
>         <mailto:eotuama at gbif.org>), /
>         /Senior Programme Officer for Interoperability, /
>         /Global Biodiversity Information Facility Secretariat, /
>         /Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/
>         /Phone: +45 3532 1494 <tel:%2B45%203532%201494>; Fax: +45 3532
>         1480 <tel:%2B45%203532%201480>/
>         _______________________________________________
>         tdwg-content mailing list
>         tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>         http://lists.tdwg.org/mailman/listinfo/tdwg-content
>         _______________________________________________
>         tdwg-content mailing list
>         tdwg-content at lists.tdwg.org  <mailto:tdwg-content at lists.tdwg.org>
>         http://lists.tdwg.org/mailman/listinfo/tdwg-content
>     -- 
>     Anne E. Thessen, Ph.D.
>     The Data Detektiv, Owner and Founder
>     Ronin Institute, Research Scholar
>     443.225.9185  <tel:443.225.9185>
>     _______________________________________________
>     tdwg-content mailing list
>     tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     http://lists.tdwg.org/mailman/listinfo/tdwg-content

Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/8d1f51a3/attachment.html 

More information about the tdwg-content mailing list