[tdwg-content] Darwin Core: proposed news terms for expressing sample data
Anne Thessen
annethessen at gmail.com
Wed Aug 20 19:56:22 CEST 2014
Hi Rob
I would like to respond to your item number 2.
From my perspective, I deal with lots of published descriptions of
taxa. The text might say something like "I saw species A in the
Chesapeake Bay, the Adriatic Sea and the Indian Ocean and the biomass is
5 - 9 grams". The biomass range obviously corresponds to at least three
different occurrences, but how to divide the biomass data? I would love
to be able to have an *event* to attach it all to. There is almost two
different levels of events - a sampling event and a "study event". The
"study event" would correspond to the type of event I would like to use
in the above example. It may not be ideal, but for the old literature
that might be the best we can do.
I have to admit that I don't know enough about trawl data to understand
why an event core would be a problem. It seems that the trawl would be
an event and each biomass measure (of each fish) would be attached to a
separate occurrence which is attached to that event. Am I understanding
this wrong?
btw - I found a workaround for the example I gave, so it's not
impossible to model with the current structure....
Anne
On 8/20/2014 1:16 PM, Robert Guralnick wrote:
>
> Éamonn et al. --- Thanks for the clarifications. I think these help a
> ton but it raises a couple more questions for me.
>
> 1) I am surprised that you plan to use of MeasurementorFact
> extension in relation to the Event core, which seems like a novel (or
> perhaps awkward or unintended?) mechanism for capturing environmental
> data, but the same extension was not be seen as relevant for
> describing samples? Can you explain more about the thinking there?
>
> 2) There may be a subtle issue here extending "Event" to be more what
> you call a "Sampling Event Core". My read of this is that Darwin Core
> serves as a way to deal with point occurrences and Event reflects the
> context of a single capture event (whether a single observation, or a
> bulk sample capture). The changes recommended seem to dramatically
> extend and change that meaning? Its simply a question that I don't
> have answer to, but is Darwin Core, the right vehicle to start
> capturing repeated measures of biomass values from trawls? I don't
> have answer but man, terms like quantityType (as a property of
> occurrence?) give me pause.
>
> 3) Is Sampling Unit a controlled vocabulary? For another project, I
> have looked through - and captured scope, effort and completeness
> measures from - a large number of published biotic area inventories.
> The vast majorities of these are measured in units like bucket hours,
> or trap nights. Is a "bucket" part of SamplingGeometry or Sampling
> Unit? I'd be happy to send along all the many examples of how biotic
> inventories of an area are completed and perhaps it might be good to
> see how those might be represented using the terms you are proposing?
>
> Best, Rob
>
>
>
>
>
> On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle
> <deepreef at bishopmuseum.org <mailto:deepreef at bishopmuseum.org>> wrote:
>
> Same here – Events are central to the work that we do.
>
> Aloha,
>
> Rich
>
> *From:*tdwg-content-bounces at lists.tdwg.org
> <mailto:tdwg-content-bounces at lists.tdwg.org>
> [mailto:tdwg-content-bounces at lists.tdwg.org
> <mailto:tdwg-content-bounces at lists.tdwg.org>] *On Behalf Of *Anne
> Thessen
> *Sent:* Wednesday, August 20, 2014 2:59 AM
> *To:* tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>
>
> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
> expressing sample data
>
> Hello
> I would just like to comment on *event core*.
> I've been doing a lot of work translating published data into
> Darwin Core. During that process I've wished several times that I
> could use Event as core. I am happy to hear about that proposed
> change. It will make it easier to model the data I am working with.
> Anne
>
> On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
>
> Hi Rob,
>
> Thank you for the feedback. I have tried to address the two
> main issues you raise below. At the outset, I would like to
> emphasise that much of this work is taking place in the
> context of the EU BON project which includes a task on
> developing/enhancing tools and standards for data sharing with
> a particular focus on the IPT for publishing sample-based
> data. So, we were constrained by the need to publish
> sample-based data sets in the Darwin Core Archive format and
> to demonstrate practical application using a working
> prototype. When the discussion on the TDWG list faded out, we
> took it to our EU BON partners whose requirements were
> essential input to further development. We recognise that
> these discussions took place away from TDWG (although the
> TDWG/EU BON contributors overlapped) and this is the reason we
> are presenting the outcomes here for further consideration.
>
> **Event core**
>
> As the SIGS report indicated, sample data can be modelled in
> Darwin Core Archives using either Occurrence or Event as core.
> This was the starting point for our evaluation but as things
> progressed the data wrangling pushed the model back towards
> the Event core. We actually went through the exercise of
> mapping multiple test datasets in an iterative process
> spanning several months' work. In the end, we found that using
> an Event core better matched the typical sample data we were
> dealing with, allowing use of a measurement-or-fact extension
> to be included for the efficient expression of environmental
> information associated with the event. The choice comes down
> to an Occurrence core or an Event core + Occurrence extension.
> In both cases, the true observation records are Occurrences.
> The big difference is what type the core has and therefore to
> which kind of records you can attach further facts and extra
> information with DwC-A extensions. Many sampling datasets have
> very rich information about the site and event, so it is very
> natural to hang facts from an Event core. When picking the
> Occurrence core those facts would have to be repeated for each
> and every occurrence record. Moreover, our approach doesn’t
> stop anyone from using the Occurrence core if they so
> wish. This just provides a different option for datasets that
> better fit an Event core model.
>
> I want to stress that we are not building a “specific IPT
> version” to support an Event core but, rather, we adapted the
> IPT so that it can be configured to support any generic “core
> + extension” format to enable its use for exploration of more
> data formats. This is part of the core codebase and there
> were no custom forks of the IPT for this work. Our view at
> GBIF is that if there are significant numbers of data
> publishers who are keen to adopt, promote and use a (any)
> format, and the tools can be configured to do so, then we
> should support it, and, if necessary, use a custom namespace.
>
> **New terms around abundance**
>
> Yes, the discussion on TDWG did fade out but it was clear that
> the term “abundance” as recommended by the SIGS report (along
> with abundanceAsPercent) was confusing many when we were
> looking for term(s) that reported quantitative measures of
> organisms in a sample. It also became clear we would need to
> be able to state the type of quantity being measured. An
> alternative suggestion for using the MeasurementsOrFact class
> was immediately shot down.
>
> As some of our main use cases were coming from the EU BON
> project, discussion shifted to that forum and consensus formed
> about the currently proposed terms. It was within this group
> that the additional terms (samplingGeometry, samplingUnit,
> eventSeriesID) were proposed and where we began testing with
> sample data sets.
>
> Best regards,
>
> Éamonn
>
> *From:*robgur at gmail.com <mailto:robgur at gmail.com>
> [mailto:robgur at gmail.com] *On Behalf Of *Robert Guralnick
> *Sent:* 19 August 2014 16:56
> *To:* Éamonn Ó Tuama [GBIF]
> *Cc:* TDWG Content Mailing List
> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms
> for expressing sample data
>
> Hi Éamonn --- I am curious about the outcomes presented in
> the SIGS paper, in particular, this portion of the paper:
>
> "Solutions without introducing an event core in Darwin Core
> Archives: During the review of the solutions for the uses
> cases, it became apparent that either model could be applied
> to every use case. The core and extensions bore a
> complementary relationship and between them could express all
> the required information. The core simply provided the central
> anchor in the star schema from which to join the additional
> information. Therefore, using the Occurrence core, well
> established in the GBIF network through uptake of the IPT,
> seemed more appropriate than inventing CollectingEvent as an
> additional core type."
>
> That SIGS paper has John Wieczorek and you both as authors,
> including many luminaries across the biodiversity standards
> spectrum. Given the above, its curious to see the EventCore
> come back again, along with a specific IPT version to support it.
>
> So I see two issues, conflated, in this post you just
> made. One is the need for an EventCore at all, and the nature
> of relating Event and Occurrence/Material Sample. The second
> is the introduction of new terms, which seemingly have arrived
> after debate on similar terms - but framed around abundance -
> stalled a year ago. To my mind, these both require some
> further discussion, because I don't (necessarily) see TDWG
> community coherence around either issue?
>
> Best, Rob
>
> On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF]
> <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
>
> Dear All,
>
> GBIF is committed to exploring ways in which the IPT and
> Darwin Core Archive format can be extended for publishing
> sample-based data sets. In association with the EU BON project
> [1], a customised version of the IPT [2] has been deployed to
> test this using a special type of Darwin Core Archive in which
> the core is an “Event” with associated taxon occurrences in an
> “Occurrence” extension.
>
> The Darwin Core vocabulary already provides a rich set of
> terms with many relevant for describing sample-based data.
> Synthesising several sources of input (GBIF organised workshop
> on sample data, May 2013 [3], discussions on the TDWG mailing
> list in late 2013; internal discussion among EU BON project
> partners), five new terms relating to sample data were
> identified as essential. The complete model including these
> new terms are fully described with examples in the online
> document “Publishing sample data using the GBIF IPT” [4].
>
> As a first step towards ratification, we would like to
> register the new terms in the DwC Google Code tracker [5] if
> there are no major objections on this list. The five terms are:
>
> 1.*quantity*: the number or enumeration value of the
> quantityType (e.g., individuals, biomass, biovolume,
> BraunBlanquetScale) per samplingUnit or a percentage measure
> recorded for the sample.
>
> 2.*quantityType*: : the entity being referred to by quantity,
> e.g., individuals, biomass, %species, scale type.
>
> 3.*samplingGeometry*: an indication of what kind of space was
> sampled; select from point, line, area or volume.
>
> 4.*samplingUnit*: the unit of measurement used for reporting
> the quantity in the sample, e.g., minute, hour, day, metre,
> metre^2, metre^3. It is combined with quantity and
> quantityType to provide the complete measurement, e.g., 9
> individuals per day, 4 biomass-gm per metre^2.
>
> 5.*eventSeriesID*: an identifier for a set of events that are
> associated in some way, e.g., a monitoring series; may be a
> global unique identifier or an identifier specific to the series.
>
> Best regards,
>
> Éamonn
>
> [1] http://eubon.eu
>
> [2] http://eubon-ipt.gbif.org
>
> [3]
> http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>
> [4] http://links.gbif.org/sample_data_model
>
> [5] https://code.google.com/p/darwincore/issues/list
>
> ____________________________________________________
>
> /Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org
> <mailto:eotuama at gbif.org>), /
>
> /Senior Programme Officer for Interoperability, /
>
> /Global Biodiversity Information Facility Secretariat, /
>
> /Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/
>
> /Phone: +45 3532 1494 <tel:%2B45%203532%201494>; Fax: +45 3532
> 1480 <tel:%2B45%203532%201480>/
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> _______________________________________________
>
> tdwg-content mailing list
>
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> --
>
> Anne E. Thessen, Ph.D.
>
> The Data Detektiv, Owner and Founder
>
> Ronin Institute, Research Scholar
>
> 443.225.9185 <tel:443.225.9185>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/8d1f51a3/attachment.html
More information about the tdwg-content
mailing list