[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Robert Guralnick Robert.Guralnick at colorado.edu
Wed Aug 20 20:08:15 CEST 2014


  Anne -- I don't know the answers!  These are questions for Eamonn.  I
would presume that a sample could be a jumble of species or even just water
or soil samples, and biomass would refer to that sample - but maybe that
isn't a use case being considered?  The examples given in the longer
document all link an event_id to species name and some measure of quantity
for that species (to the species, not an individual specimen), so I assume
that is the prevailing (or only) case?
Best, Rob



On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen <annethessen at gmail.com>
wrote:

>  Hi Rob
> I would like to respond to your item number 2.
> From my perspective, I deal with lots of published descriptions of taxa.
> The text might say something like "I saw species A in the Chesapeake Bay,
> the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The
> biomass range obviously corresponds to at least three different
> occurrences, but how to divide the biomass data? I would love to be able to
> have an *event* to attach it all to. There is almost two different levels
> of events - a sampling event and a "study event". The "study event" would
> correspond to the type of event I would like to use in the above example.
> It may not be ideal, but for the old literature that might be the best we
> can do.
> I have to admit that I don't know enough about trawl data to understand
> why an event core would be a problem. It seems that the trawl would be an
> event and each biomass measure (of each fish) would be attached to a
> separate occurrence which is attached to that event. Am I understanding
> this wrong?
> btw - I found a workaround for the example I gave, so it's not impossible
> to model with the current structure....
> Anne
>
>
> On 8/20/2014 1:16 PM, Robert Guralnick wrote:
>
>
> Éamonn et al. --- Thanks for the clarifications.  I think these help a ton
> but it raises a couple more questions for me.
>
>  1)   I am surprised that you plan to use of MeasurementorFact extension
> in relation to the Event core, which seems like a novel (or perhaps awkward
> or unintended?) mechanism for capturing environmental data, but the same
> extension was not be seen as relevant for describing samples? Can you
> explain more about the thinking there?
>
>  2)  There may be a subtle issue here extending "Event" to be more what
> you call a "Sampling Event Core".  My read of this is that Darwin Core
> serves as a way to deal with point occurrences and Event reflects the
> context of a single capture event (whether a single observation, or a bulk
> sample capture).  The changes recommended seem to dramatically extend and
> change that meaning?  Its simply a question that I don't have answer to,
> but is Darwin Core, the right vehicle to start capturing repeated measures
> of biomass values from trawls?   I don't have answer but man, terms like
> quantityType (as a property of occurrence?) give me pause.
>
>  3)  Is Sampling Unit a controlled vocabulary? For another project, I
> have looked through - and captured scope, effort and completeness measures
> from - a large number of published biotic area inventories.  The vast
> majorities of these are measured in units like bucket hours, or trap
> nights.  Is a "bucket" part of SamplingGeometry or Sampling Unit?  I'd be
> happy to send along all the many examples of how biotic inventories of an
> area are completed and perhaps it might be good to see how those might be
> represented using the terms you are proposing?
>
>  Best, Rob
>
>
>
>
>
>
>  On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle <deepreef at bishopmuseum.org
> > wrote:
>
>>  Same here – Events are central to the work that we do.
>>
>>
>>
>> Aloha,
>>
>> Rich
>>
>>
>>
>> *From:* tdwg-content-bounces at lists.tdwg.org [mailto:
>> tdwg-content-bounces at lists.tdwg.org] *On Behalf Of *Anne Thessen
>> *Sent:* Wednesday, August 20, 2014 2:59 AM
>> *To:* tdwg-content at lists.tdwg.org
>>
>> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>> expressing sample data
>>
>>
>>
>> Hello
>> I would just like to comment on *event core*.
>> I've been doing a lot of work translating published data into Darwin
>> Core. During that process I've wished several times that I could use Event
>> as core. I am happy to hear about that proposed change. It will make it
>> easier to model the data I am working with.
>> Anne
>>
>> On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
>>
>> Hi Rob,
>>
>>
>>
>> Thank you for the feedback. I have tried to address the two main issues
>> you raise below. At the outset, I would like to emphasise that much of this
>> work is taking place in the context of the EU BON project which includes a
>> task on developing/enhancing tools and standards for data sharing with a
>> particular focus on the IPT for publishing sample-based data. So, we were
>> constrained by the need to publish sample-based data sets in the Darwin
>> Core Archive format and to demonstrate practical application using a
>> working prototype. When the discussion on the TDWG list faded out, we took
>> it to our EU BON partners whose requirements were essential input to
>> further development. We recognise that these discussions took place away
>> from TDWG (although the TDWG/EU BON contributors overlapped) and this is
>> the reason we are presenting  the outcomes here for further consideration.
>>
>>
>>
>> **Event core**
>>
>> As the SIGS report indicated, sample data can be modelled in Darwin Core
>> Archives using either Occurrence or Event as core. This was the starting
>> point for our evaluation but as things progressed the data wrangling pushed
>> the model back towards the Event core. We actually went through the
>> exercise of mapping multiple test datasets in an iterative process spanning
>> several months' work. In the end, we found that using an Event core better
>> matched the typical sample data we were dealing with, allowing use of a
>> measurement-or-fact extension to be included for the efficient expression
>> of environmental information associated with the event. The choice comes
>> down to an Occurrence core or an Event core + Occurrence extension. In both
>> cases, the true observation records are Occurrences. The big difference is
>> what type the core has and therefore to which kind of records you can
>> attach further facts and extra information with DwC-A extensions. Many
>> sampling datasets have very rich information about the site and event, so
>> it is very natural to hang facts from an Event core. When picking the
>> Occurrence core those facts would have to be repeated for each and every
>> occurrence record. Moreover, our approach doesn’t stop anyone from using
>> the Occurrence core if they so wish. This just provides a different option
>> for datasets that better fit an Event core model.
>>
>>
>>
>> I want to stress that we are not building a “specific IPT version” to
>> support an Event core but, rather, we adapted the IPT so that it can be
>> configured to support any generic “core + extension” format to enable its
>> use for exploration of more data formats.  This is part of the core
>> codebase and there were no custom forks of the IPT for this work.  Our view
>> at GBIF is that if there are significant numbers of data publishers who are
>> keen to adopt, promote and use a (any) format, and the tools can be
>> configured to do so, then we should support it, and, if necessary, use a
>> custom namespace.
>>
>>
>>
>> **New terms around abundance**
>>
>> Yes, the discussion on TDWG did fade out but it was clear that the term
>> “abundance”  as recommended by the SIGS report (along with
>> abundanceAsPercent) was confusing many when we were looking for term(s)
>> that reported quantitative measures of organisms in a sample. It also
>> became clear we would need to be able to state the type of quantity being
>> measured. An alternative suggestion for using the MeasurementsOrFact class
>> was immediately shot down.
>>
>> As some of our main use cases were coming from the EU BON project,
>> discussion shifted to that forum and consensus formed about the currently
>> proposed terms. It was within this group that the additional terms
>> (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we
>> began testing with sample data sets.
>>
>>
>>
>> Best regards,
>>
>> Éamonn
>>
>>
>>
>>
>>
>> *From:* robgur at gmail.com [mailto:robgur at gmail.com <robgur at gmail.com>] *On
>> Behalf Of *Robert Guralnick
>> *Sent:* 19 August 2014 16:56
>> *To:* Éamonn Ó Tuama [GBIF]
>> *Cc:* TDWG Content Mailing List
>> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>> expressing sample data
>>
>>
>>
>>
>>
>>   Hi Éamonn --- I am curious about the outcomes presented in the SIGS
>> paper, in particular, this portion of the paper:
>>
>>
>>
>> "Solutions without introducing an event core in Darwin Core Archives:
>>  During the review of the solutions for the uses cases, it became apparent
>> that either model could be applied to every use case. The core and
>> extensions bore a complementary relationship and between them could express
>> all the required information. The core simply provided the central anchor
>> in the star schema from which to join the additional information.
>> Therefore, using the Occurrence core, well established in the GBIF network
>> through uptake of the IPT, seemed more appropriate than inventing
>> CollectingEvent as an additional core type."
>>
>>
>>
>>    That SIGS paper has John Wieczorek and you both as authors, including
>> many luminaries across the biodiversity standards spectrum.  Given the
>> above, its curious to see the EventCore come back again, along with a
>> specific IPT version to support it.
>>
>>
>>
>>     So I see two issues, conflated, in this post you just made.  One is
>> the need for an EventCore at all, and the nature of relating Event and
>> Occurrence/Material Sample.  The second is the introduction of new terms,
>> which seemingly have arrived after debate on similar terms - but framed
>> around abundance - stalled a year ago.  To my mind, these both require some
>> further discussion, because I don't (necessarily) see TDWG community
>> coherence around either issue?
>>
>>
>>
>> Best, Rob
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF] <eotuama at gbif.org>
>> wrote:
>>
>> Dear All,
>>
>>
>>
>> GBIF is committed to exploring ways in which the IPT and Darwin Core
>> Archive format can be extended for publishing sample-based data sets. In
>> association with the EU BON project [1], a customised version of the IPT
>> [2] has been deployed to test this using a special type of Darwin Core
>> Archive in which the core is an “Event” with associated taxon occurrences
>> in an “Occurrence” extension.
>>
>>
>>
>> The Darwin Core vocabulary already provides a rich set of terms with many
>> relevant for describing sample-based data. Synthesising several sources of
>> input (GBIF organised workshop on sample data, May 2013 [3], discussions on
>> the TDWG mailing list in late 2013; internal discussion among EU BON
>> project partners), five new terms relating to sample data were identified
>> as essential. The complete model including these new terms are fully
>> described with examples in the online document “Publishing sample data
>> using the GBIF IPT” [4].
>>
>>
>>
>> As a first step towards ratification, we would like to register the new
>> terms in the DwC Google Code tracker [5] if there are no major objections
>> on this list. The five terms are:
>>
>>
>>
>> 1.      *quantity*: the number or enumeration value of the quantityType
>> (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per
>> samplingUnit or a percentage measure recorded for the sample.
>>
>>
>>
>> 2.      *quantityType*: :  the entity being referred to by quantity,
>> e.g., individuals, biomass, %species, scale type.
>>
>>
>>
>> 3.      *samplingGeometry*: an indication of what kind of space was
>> sampled; select from point, line, area or volume.
>>
>>
>>
>> 4.      *samplingUnit*: the unit of measurement used for reporting the
>> quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3.
>> It is combined with quantity and quantityType to provide the complete
>> measurement, e.g., 9 individuals per day,  4 biomass-gm per metre^2.
>>
>>
>>
>> 5.      *eventSeriesID*: an identifier for a set of events that are
>> associated in some way, e.g., a monitoring series; may be a global unique
>> identifier or an identifier specific to the series.
>>
>>
>>
>>
>>
>> Best regards,
>>
>>
>>
>> Éamonn
>>
>>
>>
>> [1] http://eubon.eu
>>
>> [2] http://eubon-ipt.gbif.org
>>
>> [3]
>> http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>>
>> [4] http://links.gbif.org/sample_data_model
>>
>> [5] https://code.google.com/p/darwincore/issues/list
>>
>>
>>
>>
>>
>>
>>
>> ____________________________________________________
>>
>> *Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org <eotuama at gbif.org>), *
>>
>> *Senior Programme Officer for Interoperability, *
>>
>> *Global Biodiversity Information Facility Secretariat, *
>>
>> *Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK*
>>
>> *Phone:  +45 3532 1494 <%2B45%203532%201494>; Fax:  +45 3532 1480
>> <%2B45%203532%201480>*
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>>
>>
>>
>>  _______________________________________________
>>
>> tdwg-content mailing list
>>
>> tdwg-content at lists.tdwg.org
>>
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>>  --
>>
>> Anne E. Thessen, Ph.D.
>>
>> The Data Detektiv, Owner and Founder
>>
>> Ronin Institute, Research Scholar
>>
>> 443.225.9185
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>
> --
> Anne E. Thessen, Ph.D.
> The Data Detektiv, Owner and Founder
> Ronin Institute, Research Scholar443.225.9185
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/784ac1f1/attachment-0001.html 


More information about the tdwg-content mailing list