[tdwg-content] Darwin Core: proposed news terms for expressing sample data
Anne Thessen
annethessen at gmail.com
Wed Aug 20 14:58:50 CEST 2014
Hello
I would just like to comment on *event core*.
I've been doing a lot of work translating published data into Darwin
Core. During that process I've wished several times that I could use
Event as core. I am happy to hear about that proposed change. It will
make it easier to model the data I am working with.
Anne
On 8/20/2014 7:04 AM, Éamonn Ó Tuama [GBIF] wrote:
>
> Hi Rob,
>
> Thank you for the feedback. I have tried to address the two main
> issues you raise below. At the outset, I would like to emphasise that
> much of this work is taking place in the context of the EU BON project
> which includes a task on developing/enhancing tools and standards for
> data sharing with a particular focus on the IPT for publishing
> sample-based data. So, we were constrained by the need to publish
> sample-based data sets in the Darwin Core Archive format and to
> demonstrate practical application using a working prototype. When the
> discussion on the TDWG list faded out, we took it to our EU BON
> partners whose requirements were essential input to further
> development. We recognise that these discussions took place away from
> TDWG (although the TDWG/EU BON contributors overlapped) and this is
> the reason we are presenting the outcomes here for further
> consideration.
>
> **Event core**
>
> As the SIGS report indicated, sample data can be modelled in Darwin
> Core Archives using either Occurrence or Event as core. This was the
> starting point for our evaluation but as things progressed the data
> wrangling pushed the model back towards the Event core. We actually
> went through the exercise of mapping multiple test datasets in
> an iterative process spanning several months' work. In the end, we
> found that using an Event core better matched the typical sample data
> we were dealing with, allowing use of a measurement-or-fact extension
> to be included for the efficient expression of environmental
> information associated with the event. The choice comes down to an
> Occurrence core or an Event core + Occurrence extension. In both
> cases, the true observation records are Occurrences. The big
> difference is what type the core has and therefore to which kind of
> records you can attach further facts and extra information with DwC-A
> extensions. Many sampling datasets have very rich information about
> the site and event, so it is very natural to hang facts from an Event
> core. When picking the Occurrence core those facts would have to be
> repeated for each and every occurrence record. Moreover, our approach
> doesn't stop anyone from using the Occurrence core if they so
> wish. This just provides a different option for datasets that better
> fit an Event core model.
>
> I want to stress that we are not building a "specific IPT version" to
> support an Event core but, rather, we adapted the IPT so that it can
> be configured to support any generic "core + extension" format to
> enable its use for exploration of more data formats. This is part of
> the core codebase and there were no custom forks of the IPT for this
> work. Our view at GBIF is that if there are significant numbers of
> data publishers who are keen to adopt, promote and use a (any) format,
> and the tools can be configured to do so, then we should support it,
> and, if necessary, use a custom namespace.
>
> **New terms around abundance**
>
> Yes, the discussion on TDWG did fade out but it was clear that the
> term "abundance" as recommended by the SIGS report (along with
> abundanceAsPercent) was confusing many when we were looking for
> term(s) that reported quantitative measures of organisms in a sample.
> It also became clear we would need to be able to state the type of
> quantity being measured. An alternative suggestion for using the
> MeasurementsOrFact class was immediately shot down.
>
> As some of our main use cases were coming from the EU BON project,
> discussion shifted to that forum and consensus formed about the
> currently proposed terms. It was within this group that the additional
> terms (samplingGeometry, samplingUnit, eventSeriesID) were proposed
> and where we began testing with sample data sets.
>
> Best regards,
>
> Éamonn
>
> *From:*robgur at gmail.com [mailto:robgur at gmail.com] *On Behalf Of
> *Robert Guralnick
> *Sent:* 19 August 2014 16:56
> *To:* Éamonn Ó Tuama [GBIF]
> *Cc:* TDWG Content Mailing List
> *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
> expressing sample data
>
> Hi Éamonn --- I am curious about the outcomes presented in the SIGS
> paper, in particular, this portion of the paper:
>
> "Solutions without introducing an event core in Darwin Core Archives:
> During the review of the solutions for the uses cases, it became
> apparent that either model could be applied to every use case. The
> core and extensions bore a complementary relationship and between them
> could express all the required information. The core simply provided
> the central anchor in the star schema from which to join the
> additional information. Therefore, using the Occurrence core, well
> established in the GBIF network through uptake of the IPT, seemed more
> appropriate than inventing CollectingEvent as an additional core type."
>
> That SIGS paper has John Wieczorek and you both as authors,
> including many luminaries across the biodiversity standards spectrum.
> Given the above, its curious to see the EventCore come back again,
> along with a specific IPT version to support it.
>
> So I see two issues, conflated, in this post you just made. One
> is the need for an EventCore at all, and the nature of relating Event
> and Occurrence/Material Sample. The second is the introduction of new
> terms, which seemingly have arrived after debate on similar terms -
> but framed around abundance - stalled a year ago. To my mind, these
> both require some further discussion, because I don't (necessarily)
> see TDWG community coherence around either issue?
>
> Best, Rob
>
> On Tue, Aug 19, 2014 at 6:11 AM, Éamonn Ó Tuama [GBIF]
> <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
>
> Dear All,
>
> GBIF is committed to exploring ways in which the IPT and Darwin Core
> Archive format can be extended for publishing sample-based data sets.
> In association with the EU BON project [1], a customised version of
> the IPT [2] has been deployed to test this using a special type of
> Darwin Core Archive in which the core is an "Event" with associated
> taxon occurrences in an "Occurrence" extension.
>
> The Darwin Core vocabulary already provides a rich set of terms with
> many relevant for describing sample-based data. Synthesising several
> sources of input (GBIF organised workshop on sample data, May 2013
> [3], discussions on the TDWG mailing list in late 2013; internal
> discussion among EU BON project partners), five new terms relating to
> sample data were identified as essential. The complete model including
> these new terms are fully described with examples in the online
> document "Publishing sample data using the GBIF IPT" [4].
>
> As a first step towards ratification, we would like to register the
> new terms in the DwC Google Code tracker [5] if there are no major
> objections on this list. The five terms are:
>
> 1.*quantity*: the number or enumeration value of the quantityType
> (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per
> samplingUnit or a percentage measure recorded for the sample.
>
> 2.*quantityType*: : the entity being referred to by quantity, e.g.,
> individuals, biomass, %species, scale type.
>
> 3.*samplingGeometry*: an indication of what kind of space was sampled;
> select from point, line, area or volume.
>
> 4.*samplingUnit*: the unit of measurement used for reporting the
> quantity in the sample, e.g., minute, hour, day, metre, metre^2,
> metre^3. It is combined with quantity and quantityType to provide the
> complete measurement, e.g., 9 individuals per day, 4 biomass-gm per
> metre^2.
>
> 5.*eventSeriesID*: an identifier for a set of events that are
> associated in some way, e.g., a monitoring series; may be a global
> unique identifier or an identifier specific to the series.
>
> Best regards,
>
> Éamonn
>
> [1] http://eubon.eu
>
> [2] http://eubon-ipt.gbif.org
>
> [3]
> http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>
> [4] http://links.gbif.org/sample_data_model
>
> [5] https://code.google.com/p/darwincore/issues/list
>
> ____________________________________________________
>
> /Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama at gbif.org
> <mailto:eotuama at gbif.org>), /
>
> /Senior Programme Officer for Interoperability, /
>
> /Global Biodiversity Information Facility Secretariat, /
>
> /Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK/
>
> /Phone: +45 3532 1494 <tel:%2B45%203532%201494>; Fax: +45 3532 1480
> <tel:%2B45%203532%201480>/
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Anne E. Thessen, Ph.D.
The Data Detektiv, Owner and Founder
Ronin Institute, Research Scholar
443.225.9185
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140820/7c9adb66/attachment.html
More information about the tdwg-content
mailing list