Hi Ramona,
The idea of applying domains and ranges to the proposed properties was never entertained because, as you point out, the absence of these is a feature designed into DwC to make it maximally re-usable. Given the (deliberate) lack of semantic rigour in Darwin Core, the question comes down to how well suited it is for use in Darwin Core Archives as an exchange format for what we have been referring to as "sample-based" data. I would like to re-emphasise that in doing this we are not trying to establish how data should be captured or modeled, only one way that they should be exposed to maximize discoverability and reuse, whether that be only a subset/view of some aspects of a data set. We are most definitely not trying to shoe-horn the complexity of the OGC O&M / OBOE model on to DwC.
We had already envisaged use of some additional flag like "sample" to indicate the nature of the evidence for the data but that will require addressing the issue recognised by TDWG in Florence of the need to replace basisOfRecord with some new hierarchical vocabulary for evidence(Type) as this is what we mostly (mis)use basisOfRecord for. That was an omission from our description - so thanks for highlighting it.
Your examples of how the vagueness of domain and range values for the proposed terms can lead to ambiguities in interpretation is valid from a strict ontological approach but can probably be levelled at all DwC properties, e.g., DwC itself does not enforce an obligatory pairing of lat and long values so if one is missing, then you are left without a location. So, yes, quantity and quantityType need to co-occur for interpretation as do samplingUnit and samplingEffort, and both pairings need to be present in order to interpret the figures correctly. If any of the pairings are incomplete, at most you will be able to say there was an occurrence of taxon X at the event location and it was recorded as part of a sampling event that used a particular protocol. The term "quantity" is not about a count of toads in a museum jar, rather it refers to the number of occurrences per samplingUnit, in this case, e.g., 9 toads were recorded per M^2 (the samplingUnit). In the end, our approach will only work with good documentation and guidance, by making the IPT as user-friendly as possible, possibly doing completeness checks, etc.
I'm still looking to the BCO to provide a comprehensive framework for observations/samples. But, as you point out, it's not there yet. Unfortunately, your suggestion of using EML as an exchange format for sample data does not meet our needs. We already include EML for general metadata but the whole point of DwC-A is that we can map to some standardised vocabulary, particularly in this case, for basic quantitative information around a data set associated with a particular sampling protocol.
This discussion has been very useful and has prompted us to review our proposal to ensure we apply the most minimally disruptive solution given the needs of the GBIF/EU BON community.
Éamonn
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Ramona Walls
Sent: 02 September 2014 05:39
To: TDWG Content Mailing List
Subject: Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 14
Thank you, Simon, for that explanation and the links. They were very helpful. Amen to the point: "There is no ?sample? class, because it is such an overloaded word (noun, verb, statistical sample vs ex-situ sample, etc)." The documents you shared highlight the very important point that OGC and OBO-E were designed specifically to describe observations.
Darwin Core, on the other hand, was designed to capture information about "taxa, their occurrence in nature as documented by observations, specimens, samples, and related information" [1]. As such, observations are not central to Darwin Core, but rather are included as evidence of the occurrence of a taxon in nature. It works for communicating basic information about an observation or other evidence of a taxon's occurrence, but I think it would be mis-using and abusing DwC to try to shoe-horn the complexity of observation data/metadata into it. It also does some dis-service to the communities who have spent so much time developing OGC and OBO-E.
Eamonn, this is not meant to discredit the work that you and your colleagues have done to develop a DwC archive schema for sampling data. I think it is an important step toward developing a comprehensive framework for biodiversity data, and just by proposing it, we have moved a step in the right direction (even if I disagree about adopting it). Your point that OBO-E is far more complex is true, and we may have to adopt more terms if we accurately want to describe observation data in DwC. On the other hand, we do not need to necessarily adopt every aspect of OBO-E to exchange observation data.
What the BCO participants -- and thanks to all the GBIF people who have participated! -- are trying to do is build a framework that can work across many (not necessarily all) types of biodiversity data, including specimen collection and observations, while considering existing efforts such as DwC, MIxS, OBO-E, and OBO Foundry ontologies. We started with specimens, but the intention has always been to link to observation data as well [2]. Although the full BCO model probably will be large and complex, we fully intend to offer views that are basically subsets of the ontology filtered for applications. This is regular practice now in application ontologies. Views makes it possible to provide a controlled vocabulary for data annotation without burdening annotators with a confusing array of terms and logical definitions.
However, the point that BCO is not yet ready for your needs is correct, and I would never tell anyone to just "hold on to your data until the ontology is ready". Did you examine the possibility of using EML as an exchange format for the sampling/survey related data? DwC-A already has an EML component, so I wonder if some combination of an occurrence core with an extended EML document (based on OGC) would work.
Ramona
[1] http://rs.tdwg.org/dwc/index.htm
[2] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089606
------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden
On Fri, Aug 29, 2014 at 3:00 AM, <tdwg-content-request@lists.tdwg.org> wrote:
Send tdwg-content mailing list submissions to
tdwg-content@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.tdwg.org/mailman/listinfo/tdwg-content
or, via email, send a message with subject or body 'help' to
tdwg-content-request@lists.tdwg.org
You can reach the person managing the list at
tdwg-content-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of tdwg-content digest..."
Today's Topics:
1. Re: Darwin Core: proposed news terms for expressing sample
data (Simon.Cox@csiro.au)
2. Re: tdwg-content Digest, Vol 63, Issue 6 (sigh) (?amonn)
----------------------------------------------------------------------
Message: 1
Date: Fri, 29 Aug 2014 07:58:14 +0000
From: <Simon.Cox@csiro.au>
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
expressing sample data
To: <tdwg-content@lists.tdwg.org>
Message-ID:
<2A7346E8D9F62D4CA8D78387173A054A5FFF5801@exmbx04-cdc.nexus.csiro.au>
Content-Type: text/plain; charset="utf-8"
G?day again TDWGers:
Matt passed on links to this thread to me and suggested I comment, as I was the author of the O&M standard (published as ISO 19156:2011 and OGC Abstract Spec Topic 20).
For those who are not aware of this work, there is a short Wikipedia page http://en.wikipedia.org/wiki/Observations_and_Measurements whose main value is it has links to a number of more detailed resources.
Probably the richest of these is another Wiki page at CSIRO https://www.seegrid.csiro.au/wiki/AppSchemas/ObservationsAndSampling which hasn?t been updated much recently, but at least has some diagrams embedded.
As Matt and others have hinted, as a result of a workshop at NCEAS a few years ago, there were some tweaks to allow it to meet some of the requirements identified in OBOE, just in time to beat the ISO deadline!
O&M includes a generic model for ?Sampling Features? ? being those artefacts that are created to assist the observation process, but would not exist and have very much interest otherwise.
Things like specimens, transects, sections, quadrats, scenes and swaths, drillholes, flightlines, trajectories, ships tracks, etc.
Because it is a generic standard, you won?t find things with names familiar to any particular discipline, and there are a lot of stub classes for supporting information which need filling out for specific applications.
But the intention is that it provides a framework for a discipline or community to specialize for their purposes, while retaining some topology and perhaps terminology (maybe just as super-classes) that help with information sharing across discipline boundaries.
The main properties of a sampling feature are
- The sampledFeature ? being the domain object which it is being used to characterize
- Related sampling features ? other features related to the observational strategy
- Related observations ? observation events that use this sampling feature (for which another generic model is provided)
We?ve generally found it helpful in teasing apart observational records and protocols in a variety of environmental science applications, and other have applied it in oceans, meteorology, even air-traffic control!
The primary classification of sampling features in O&M is by topological dimension (point, curve, surface, solid), because these are commonly used and afford common processing methods.
?Specimen? is the other concrete sampling-feature type.
There is no ?sample? class, because it is such an overloaded word (noun, verb, statistical sample vs ex-situ sample, etc).
O&M and its Sampling Feature model was designed in UML.
As Matt notes that the original implementation in the OGC context was in XML, using GML http://schemas.opengis.net/samplingSpecimen/2.0/specimen.xsd and http://schemas.opengis.net/samplingSpatial/2.0/spatialSamplingFeature.xsd .
However, it has been implemented other ways: there is an OWL2/RDFS representation at http://def.seegrid.csiro.au/isotc211/iso19156/2011/sampling which is linked in with OWL versions of a bunch of the other ISO standards, and therefore probably makes too many commitments for the faint hearted ? see paper from ISWC 2013 here http://ceur-ws.org/Vol-1063/paper1.pdf
O&M was also one of the core inputs to the W3C Semantic Sensor Network ontology, reported here: http://www.w3.org/2005/Incubator/ssn/wiki/Incubator_Report though that focussed on the sensors and observations side of the equation, and hardly deals with sampling.
Hope this helps.
>> Date: Thu, 21 Aug 2014 18:52:06 -0800
>> From: Matt Jones <jones@nceas.ucsb.edu<mailto:jones@nceas.ucsb.edu>>
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
>> expressing sample data
>> To: ?amonn ? Tuama [GBIF] <eotuama@gbif.org<mailto:eotuama@gbif.org>>
>> Cc: TDWG Content Mailing List <tdwg-content@lists.tdwg.org<mailto:tdwg-content@lists.tdwg.org>>
>> Message-ID:
>>
<CAFSW8xkx7uRP9PC2g3=JT_VJanqujH8nPXoz8GXwh+JwKw5Ccw@mail.gmail.com<mailto:JT_VJanqujH8nPXoz8GXwh%2BJwKw5Ccw@mail.gmail.com>>
>> Content-Type: text/plain; charset="utf-8"
>>
>> This proposal is treading on ground that is quite similar to other
>> observations and measurements standards for data exchange that are
already
>> mature, in particular:
>>
>> * OGC Observations and Measurements (
>> http://www.opengeospatial.org/standards/om)
>> * Extensible Observation Ontology (OBOE;
>> https://semtools.ecoinformatics.org/oboe)
>>
>> The former is a standard and broadly deployed, whereas the latter is part
>> of a research program in the use of ontologies for measurements. Through
>> collaboration between the two projects, they've been modified to be
>> reasonably isomorphic, but O&M uses an XML serialization while OBOE uses
an
>> OWL-DL serialization. They largely express the same measurements and
>> sampling model once one gets beyond the terminology differences.
>>
>> So, I'm wondering if it make much sense to extend Darwin Core, which is
at
>> heart an Occurrence exchange syntax, into this measurements area that is
>> well represented by these other existing specifications? I'm curious to
>> hear why people would even want to do this. And if we do go down this
>> path, won't we just end up with a new syntax that does essentially what
O&M
>> and OBOE do now?
>>
>> Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140829/2dcfd295/attachment-0001.html
------------------------------
Message: 2
Date: Fri, 29 Aug 2014 10:15:17 +0200
From: ?amonn ? Tuama [GBIF] <eotuama@gbif.org>
Subject: Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 6
(sigh)
To: "'Ramona Walls'" <rlwalls2008@gmail.com>, "'Robert Guralnick'"
<Robert.Guralnick@colorado.edu>
Cc: 'TDWG Content Mailing List' <tdwg-content@lists.tdwg.org>, 'John
Deck' <jdeck@berkeley.edu>
Message-ID: <003301cfc361$5e4f19c0$1aed4d40$@org>
Content-Type: text/plain; charset="utf-8"
Neither am I an authority on OBOE but I also was prompted to look to see if there was a way to re-use some of its properties for the proposed Darwin Core terms. But the OBOE model is far more complex and it seems to me we would have to adopt much more than a few terms and also have to deal with the restrictions regarding domains and ranges that go with them.
To answer Rob's questions: 1) at GBIF, yes - we feel there is a role for use of the Darwin Core Archive format in exchanging at least some "views" of sample-based data for the reasons already given. 2) the solution is not to invent new terms carelessly but I am not aware of equivalent terms in other published vocabularies - maybe we will uncover some in a wider public review?
Regarding the notion of Audubon Core as an extension of Darwin Core (because it imports many of its terms), I find the concept of 'application profiles' as defined by the Dublin Core Metadata Initiative an interesting way of looking at aggregates of terms drawn from different vocabularies each defined in their own namespace. However, as the guidelines [1] state "By definition, Dublin Core application profiles "use" properties that have been defined somewhere -- i.e., somewhere outside of the profile itself". So the creation of Audubon Core as a "set of vocabularies" [2] involved defining any required new terms in an Audubon Core namespace and importing existing terms from a range of other published vocabularies.
?amonn
[1] http://dublincore.org/documents/2008/11/03/profile-guidelines/
[2] http://terms.tdwg.org/wiki/Audubon_Core_Term_List
From: Ramona Walls [mailto:rlwalls2008@gmail.com]
Sent: 29 August 2014 06:35
To: Robert Guralnick
Cc: John Deck; ?amonn ? Tuama [GBIF]; TDWG Content Mailing List
Subject: Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 6 (sigh)
I also really do appreciate GBIF's pressing need to serve survey/sample data, and I don't have a major of a problem with the idea of an Event core or even adding new terms to DwC (in principle). Rather, I am urging caution in how we proceed with it.
?amonn, in response to your statement "Once the BCO model is available for uptake, it should be possible to develop a mapping between it and the simple DwC sample model" I will respond: possible, maybe, easy, no way, unambiguous, probably impossible. The problem I foresee is that once the "simple DwC sample model" is in place, people will start using it to do all kinds of not so simple things, and the mapping will become muddled. We have ample evidence that this is the case with existing Darwin Core archives.
Going back to the five new terms that ?amonn proposed, I would like to see if we can link them NOW to existing ontology terms (as other have proposed), thus making their semantics explicit from the start, but still allowing GBIF/EU-BON to proceed with the work they need to do. This will not prevent people form misusing terms, but may at least help make mapping easier later. In cases where the terms can't be mapped to an existing term, BCO curators would be willing to help develop a term or set of terms that can convey meaning required, or work with other ontology developers to get the terms added elsewhere.
Trying to be constructive, I attempted to do a quick and dirty, preliminary mapping of the five terms (quantity, quantity type, sampling geometry, sampling unity, and event series ID), bearing in mind that I am not an authority on OBOE or OGC ontologies. [Aside: Based on what I know, OGC ontologies are not yet sufficiently developed to provide the semantics we need, but I would love for someone to show me otherwise.]
A serious problem with mapping these terms to existing ontologies is that some of them do NOT map to a single ontology term (namely, quantity, quantity type, and sampling geometry). This is evidence that the proposed terms could indeed be interpreted in multiple ways and further supports the argument that it would not be easy to retrospectively add them to a semantic framework at some later date.
I think there is a path forward that would allow for both the expressiveness of OBOE and other ontologies and convenience of standard exchange formats.
Ramona
------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden
On Thu, Aug 28, 2014 at 8:58 AM, Robert Guralnick <Robert.Guralnick@colorado.edu> wrote:
Hi all --- Ok, I think the scope of the issue is quite clear. Let me summarize: 1) As ?amonn and the rest of GBIF has made quite clear, "GBIF is faced with the immediate task of making sample-based data discoverable and accessible using its current ecosystem of tools" given a funding mandate from EU-BON. 2) The solution for this problem is to develop an Event-core and to promote new terms to the Darwin Core to make this happen. I will note a small inconsistency here: the current ecosystem standards and tools of is Darwin Core (as it stands) and publishing systems such as IPT. That ecosystem of tools includes mechanisms to extend Darwin Core where needed, via extensions. The current ecosystem of tools doesn't include new Cores or new DwC terms, does it?
So this leads in nicely to the contentious issue(s) and places where there seems to be discussions --- these have to do with the nature of the changes suggested and the scope of those changes, both in terms of an Event core and DwC term additions. Leaving aside the Event-core for now, the key questions simply about term additions to the Darwin Core that seem to be at heart here are: 1) Is the intent of the Darwin Core to model surveys, which usually involve multiple kinds and types of sampling over multiple sites using multiple methods? 2) Is the solution to invent new terms for the Darwin Core if there are already terms from other efforts, wouldn't we work with those existing efforts to assure interoperability?
I appreciate the efforts of GBIF here fully, and am personally torn because on the one hand, I fully agree with the goal of extending Darwin Core to better represent richer biodiversity data. On the other hand, I worry about process here and how to make that happen in a way that isn't too hasty or locks us into just the opposite of what I think many of us want with regards to sharing data more broadly than within just one ecosystem of tools.
Best, Rob
On Thu, Aug 28, 2014 at 6:30 AM, John Deck <jdeck@berkeley.edu> wrote:
I see the rational for enabling this in Darwin Core Archives and adding the new terms. However, back to what Matt Jones brought up: "won't we just end up with a new syntax that does essentially what O&M and OBOE do now?".
We should include explicit references to existing terms/definitions that encapsulate what we're talking about, e.g. in our MaterialSample proposal last year we linked the an existing term in OBI, which has a much richer description and context for MaterialSample than what we considered (https://code.google.com/p/darwincore/issues/detail?id=167)
Have we explored the possibility of doing this with OBOE? I'm not suggesting we adopt OBOE wholesale, but it seems like we have a good opportunity to enable better semantic linking with that efforts.
John
On Thu, Aug 28, 2014 at 4:23 AM, ?amonn ? Tuama [GBIF] <eotuama@gbif.org> wrote:
Thanks, Ramona and Rob.
I'd like to add a few points following on Markus's reply.
I think your pressing of the need for a robust semantic model for
biodiversity sample/survey data is incontestable ? we do need one and it
should enable rich data integration once it is defined and the tools and
data standards to support it become available. However, GBIF is faced with
the immediate task of making sample-based data discoverable and accessible
using its current ecosystem of tools (IPT) and exchange standards (DwC;
EML). Waiting for a functional, implementable semantic model and the tools
and support services for it is just not an option for us right now.
We have already spend considerable time in analysing the merits of
Occurrence core vs Event core and have opted for an Event core for reasons
previously given. I don?t believe we are trying to reconfigure Event (?an
action that occurs at a place and during a period of time?) and regardless
of whether we use Occurrence or Event, the need for some additional terms
arise (e.g., quantity, quantityType, samplingGeometry, samplingUnit). Once
the BCO model is available for uptake, it should be possible to develop a
mapping between it and the simple DwC sample model.
So GBIF?s stance is that we need to take a two-pronged approach by exploring
how the IPT and DwC-A can be adapted for publishing sample-based data in the
near term while supporting the work of TDWG and groups such as the BCO in
advancing biodiversity informatics. GBIF has already engaged in the work of
the BCO and will continue to do so.
?amonn
-----Original Message-----
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus D?ring
Sent: 28 August 2014 12:44
To: Ramona Walls
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 6
Hi Ramona & Rob,
The Event proposal does not try to change the semantics of an Event, it just
uses the existing Darwin Core Event "class" at the core in Darwin Core
archives. The actual change proposed is simply adding 3 new terms to the
Event "group" to better share information about sampling methods & efforts,
extending the existing limited capabilities of Darwin Core which already has
the terms dwc:samplingProtocol and dwc:samplingEffort. It also proposes 2
new terms for dealing with quantity of Occurrences, something that has been
discussed since 2012 now, when I had proposed a new abundance term [2].
In general application of Darwin Core is not at all limited to specimens and
observations. It is used for sharing taxonomic datasets already and it's
definition and goal is broad. Let me cite some of the introduction to Darwin
Core [1]:
What is the Darwin Core?
The Darwin Core is body of standards. It includes a glossary of terms (in
other contexts these might be called properties, elements, fields, columns,
attributes, or concepts) intended to facilitate the sharing of information
about biological diversity by providing reference definitions, examples, and
commentaries. The Darwin Core is primarily based on taxa, their occurrence
in nature as documented by observations, specimens, samples, and related
information.
Motivation: The Darwin Core standard was originally conceived to facilitate
the discovery, retrieval, and integration of information about modern
biological specimens, their spatiotemporal occurrence, and their supporting
evidence housed in collections (physical or digital). The Darwin Core today
is broader in scope and more versatile. It is meant to provide a stable
standard reference for sharing information on biological diversity. As a
glossary of terms, the Darwin Core is meant to provide stable semantic
definitions with the goal of being maximally reusable in a variety of
contexts.
Markus
[1] http://rs.tdwg.org/dwc/index.htm
[2] https://code.google.com/p/darwincore/issues/detail?id=142
--
Markus D?ring
Software Developer
Global Biodiversity Information Facility (GBIF)
mdoering@gbif.org
http://www.gbif.org
On 27 Aug 2014, at 18:57, Ramona Walls <rlwalls2008@gmail.com> wrote:
> I think it is important to consider the purpose of both Darwin Core and
DwC archives in deciding whether or not to expand them, but we should use
that consideration to address the question at hand, which is whether or not
to add an Event core and additional properties to describe events.
>
> Describing the exchange format before the semantics is the wrong way to
go, given that we now have a framework for developing semantics. Expanding
Darwin Core before we adequately model survey data is bound to lead to
problems later, when we try to retro-fit the semantics to Darwin Core Event
archives. This is exactly the problem we are running into now with Occurance
archives, and we have the opportunity to avoid it.
>
> I suggest we first use existing ontologies to model survey data, then deal
with if and how to exchange that information in DwC-A. This is what I was
hinting at in my first email, but should have said more explicitly.
>
> Ramona
>
> ------------------------------------------------------
> Ramona L. Walls, Ph.D.
> Scientific Analyst, The iPlant Collaborative, University of Arizona
> Research Associate, Bio5 Institute, University of Arizona
> Laboratory Research Associate, New York Botanical Garden
>
>
> On Wed, Aug 27, 2014 at 9:14 AM, Robert Guralnick
<Robert.Guralnick@colorado.edu> wrote:
>
> It may be a sensible view for Darwin Core Archives and their intended
use, but Tim's email suggests we should be putting the method of delivery
ahead of the standard that delivers that content. If this was just about
DwC-As, why not develop a survey extension that links each occurrence to
information about the survey process using the existing star-schema methods
we have in place? Why are we discussing adding terms to the Darwin Core or
trying to fully reconfigure what we call an Event? That is what is on the
table, not DwC-As and how we use them. Or am I missing something?
>
> Best, Rob
>
>
>
> On Wed, Aug 27, 2014 at 10:01 AM, Ramona Walls <rlwalls2008@gmail.com>
wrote:
> Thanks, Tim, and yes, DwC-A as a view (but not necessarily the primary
archive) of data seems like the right point of view.
>
> Ramona
>
> ------------------------------------------------------
> Ramona L. Walls, Ph.D.
> Scientific Analyst, The iPlant Collaborative, University of Arizona
> Research Associate, Bio5 Institute, University of Arizona
> Laboratory Research Associate, New York Botanical Garden
>
>
> On Wed, Aug 27, 2014 at 1:58 AM, Tim Robertson <trobertson@gbif.org>
wrote:
> Hi Ramona,
>
> Those are good points, and I?d like to come back to the original thinking
behind the DwC-A.
>
> It was designed and intended to be a simple way of exposing a complete
view of a dataset, primarily for building sophisticated indexes, inventories
and allowing basic analytics (e.g. GBIF.org being one sophisticated index).
We found that the star schema provided the flexibility to do a lot, and with
the bundled metadata (e.g. EML) was enough to trace provenance and allow
users to determine if the dataset might be fit for various uses. In many
cases this represents the complete (e.g. lossless) view of a dataset.
>
> What we are discussing here are far richer datasets, where shoe-horning
content into the star schema becomes lossy for some, although we?re finding
other cases where it is indeed lossless. I believe we should be looking to
harmonise ontologies / models etc as you mention but in parallel we should
define one or more star schema views that can still be used for discovery /
reporting / basic analytical purpose, and not long term archival of the
dataset. The dataset would then have the canonical rich form and an
additional DwC-A view. What I write here is applicable to all content types
of course.
>
> Please also note that many people put supplementary files in the DwC-A
which are ignored by DwC-A readers but could be a way of keeping the richer
view in the bundle. If one wished you can describe those supplementary
files in the EML document.
>
> Does this gel with the view of others as well?
>
> Cheers,
> Tim
>
>
>
> On 27 Aug 2014, at 02:55, Ramona Walls <rlwalls2008@gmail.com> wrote:
>
>> I think Matt hit the nail on the head. Although Darwin Core can be used
to exchange survey data, it lacks the semantics and structure necessary to
archive the data without loss of information. I think the biodiversity
community would be better served devoting energy to harmonizing existing
technologies such as OGC, OBOE, and BCO, not to mention the many database
for storing plot or survey data. The goal should be to preserve the data in
the most informative manner possible.
>>
>> There is a strong a case for wanting to search across all evidence for
occurences, including surveys and point occurences, so I can see possible
demand for a tool that would extract occurences from survey data to a DwC
archive. However, I am very concerned that making a DwC archive the primary
exchange format for survey or plot data commits us to a path of losing
information from the start, for all but the simplest sampling schemas.
>>
>> Ramona
>>
>> ------------------------------------------------------
>> Ramona L. Walls, Ph.D.
>> Scientific Analyst, The iPlant Collaborative, University of Arizona
>> Research Associate, Bio5 Institute, University of Arizona
>> Laboratory Research Associate, New York Botanical Garden
>>
>>
>> On Fri, Aug 22, 2014 at 3:00 AM, <tdwg-content-request@lists.tdwg.org>
wrote:
>> Send tdwg-content mailing list submissions to
>> tdwg-content@lists.tdwg.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> or, via email, send a message with subject or body 'help' to
>> tdwg-content-request@lists.tdwg.org
>>
>> You can reach the person managing the list at
>> tdwg-content-owner@lists.tdwg.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of tdwg-content digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Darwin Core: proposed news terms for expressing sample
>> data (Matt Jones)
>> 2. Re: Darwin Core: proposed news terms for expressing sample
>> data (Donald Hobern [GBIF])
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Thu, 21 Aug 2014 18:52:06 -0800
>> From: Matt Jones <jones@nceas.ucsb.edu>
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
>> expressing sample data
>> To: ?amonn ? Tuama [GBIF] <eotuama@gbif.org>
>> Cc: TDWG Content Mailing List <tdwg-content@lists.tdwg.org>
>> Message-ID:
>>
<CAFSW8xkx7uRP9PC2g3=JT_VJanqujH8nPXoz8GXwh+JwKw5Ccw@mail.gmail.com <mailto:JT_VJanqujH8nPXoz8GXwh%2BJwKw5Ccw@mail.gmail.com> >
>> Content-Type: text/plain; charset="utf-8"
>>
>> This proposal is treading on ground that is quite similar to other
>> observations and measurements standards for data exchange that are
already
>> mature, in particular:
>>
>> * OGC Observations and Measurements (
>> http://www.opengeospatial.org/standards/om)
>> * Extensible Observation Ontology (OBOE;
>> https://semtools.ecoinformatics.org/oboe)
>>
>> The former is a standard and broadly deployed, whereas the latter is part
>> of a research program in the use of ontologies for measurements. Through
>> collaboration between the two projects, they've been modified to be
>> reasonably isomorphic, but O&M uses an XML serialization while OBOE uses
an
>> OWL-DL serialization. They largely express the same measurements and
>> sampling model once one gets beyond the terminology differences.
>>
>> So, I'm wondering if it make much sense to extend Darwin Core, which is
at
>> heart an Occurrence exchange syntax, into this measurements area that is
>> well represented by these other existing specifications? I'm curious to
>> hear why people would even want to do this. And if we do go down this
>> path, won't we just end up with a new syntax that does essentially what
O&M
>> and OBOE do now?
>>
>> Matt
>>
>>
>>
>> On Thu, Aug 21, 2014 at 12:22 AM, ?amonn ? Tuama [GBIF]
<eotuama@gbif.org>
>> wrote:
>>
>> > Hi Rob, Anne, Rich,
>> >
>> >
>> >
>> > I think Markus has answered your question as to why we opted for an
Event
>> > core which is being used in the sense described by Anne and Rich. For
any
>> > event, you can have a list of species in an Occurrence extension and
for
>> > each species, you can include quantity and quantityType, e.g., biomass,
>> > etc. The proposed term eventSeriesID was intended for linking together
>> > related events, although it now looks like parentEventID might be a
better,
>> > more flexible term. The measurementOrFact extension is a good fit for
>> > capturing environmental information relating to an event. See, e.g.,
the
>> > Gialova Lagoon brackish water invertebrate test data set [1] where a
set
>> > of 18 environmental variables, including temp, pH, Rdx, particulate
organic
>> > matter, dissolved oxygen, salinity, chlorophyll-a were measured for
each
>> > sampling station-sampling period combination. An example mapping is:
>> >
>> >
>> >
>> > Id measurementType measurementValue
>> > measurementUnit measurementRemarks
>> >
>> > IA Tmp (sed) 21.5
>> > degree C Tmp
>> > (sed): temperature at the bottom surface
>> >
>> >
>> >
>> > **Controlled vocabularies**
>> >
>> > Ideally, the values for samplingUnit and quantityType would be selected
>> > from controlled vocabularies. This is, effectively, what we do by
>> > presenting a small list of values in a drop-down menu. The current
values
>> > are what we derived for example data sets and discussion but they can
>> > undoubtedly be extended and improved.
>> >
>> >
>> >
>> > We capture ?bucket? type measures through a combination of
samplingEffort,
>> > samplingGeometry and samplingUnit. For example, a pitfall trap (in a
point
>> > location) left out for 16 days might have samplingEffort: 16,
>> > samplingGeometry: point and samplingUnit: day. Three m^2 quadrats in a
>> > shore survey might have samplingEffort: 3, samplingGeometry: area and
>> > samplingUnit: m^2.
>> >
>> >
>> >
>> > It would be very useful to see your compilation of scope, effort and
>> > completeness measures to see if we can express them in our model and/or
if
>> > we need to reconsider our approach.
>> >
>> >
>> >
>> > ?amonn
>> >
>> >
>> >
>> > [1] http://eubon-ipt.gbif.org/resource.do?r=ionian-brackish-lagoon
>> >
>> >
>> >
>> > *From:* tdwg-content-bounces@lists.tdwg.org [mailto:
>> > tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Markus D?ring
>> > *Sent:* 20 August 2014 23:47
>> > *To:* Robert Guralnick
>> >
>> > *Cc:* TDWG Content Mailing List
>> > *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>> > expressing sample data
>> >
>> >
>> >
>> > Rob,
>> >
>> >
>> >
>> > this proposal if for monitoring surveys really, not to be confused with
>> > material samples like environmental or tissue samples which have a
distinct
>> > new dwc class MaterialSample.
>> >
>> >
>> >
>> > We tend to overload the term sampling a lot and it helps treating
material
>> > samples different from pure observational "sampling". That is why the
>> > existing Event class was used as the core and classic Occurrence
records as
>> > extensions. A classic example is a vegetation survey where each plot
>> > represents an Event record and each recorded species in that plot will
be
>> > an Occurrence extension record with a given quantity. Darwin Core
already
>> > offers individualCount to specify quantity, but it is a very specific
way
>> > of measuring "abundance" restricted to only some use cases. Abiotic
>> > measurements about the plot (e.g. soil type, pH, temperature) can be
>> > published using the measurements or facts extension linked to the Event
>> > core.
>> >
>> >
>> >
>> > Markus
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 20 Aug 2014, at 20:08, Robert Guralnick
<Robert.Guralnick@colorado.edu>
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > Anne -- I don't know the answers! These are questions for Eamonn. I
>> > would presume that a sample could be a jumble of species or even just
water
>> > or soil samples, and biomass would refer to that sample - but maybe
that
>> > isn't a use case being considered? The examples given in the longer
>> > document all link an event_id to species name and some measure of
quantity
>> > for that species (to the species, not an individual specimen), so I
assume
>> > that is the prevailing (or only) case?
>> >
>> > Best, Rob
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen <annethessen@gmail.com>
>> > wrote:
>> >
>> > Hi Rob
>> > I would like to respond to your item number 2.
>> > From my perspective, I deal with lots of published descriptions of
taxa.
>> > The text might say something like "I saw species A in the Chesapeake
Bay,
>> > the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams".
The
>> > biomass range obviously corresponds to at least three different
>> > occurrences, but how to divide the biomass data? I would love to be
able to
>> > have an *event* to attach it all to. There is almost two different
levels
>> > of events - a sampling event and a "study event". The "study event"
would
>> > correspond to the type of event I would like to use in the above
example.
>> > It may not be ideal, but for the old literature that might be the best
we
>> > can do.
>> > I have to admit that I don't know enough about trawl data to understand
>> > why an event core would be a problem. It seems that the trawl would be
an
>> > event and each biomass measure (of each fish) would be attached to a
>> > separate occurrence which is attached to that event. Am I understanding
>> > this wrong?
>> > btw - I found a workaround for the example I gave, so it's not
impossible
>> > to model with the current structure....
>> > Anne
>> >
>> >
>> >
>> > On 8/20/2014 1:16 PM, Robert Guralnick wrote:
>> >
>> >
>> >
>> > ?amonn et al. --- Thanks for the clarifications. I think these help a
ton
>> > but it raises a couple more questions for me.
>> >
>> >
>> >
>> > 1) I am surprised that you plan to use of MeasurementorFact extension
in
>> > relation to the Event core, which seems like a novel (or perhaps
awkward or
>> > unintended?) mechanism for capturing environmental data, but the same
>> > extension was not be seen as relevant for describing samples? Can you
>> > explain more about the thinking there?
>> >
>> >
>> >
>> > 2) There may be a subtle issue here extending "Event" to be more what
you
>> > call a "Sampling Event Core". My read of this is that Darwin Core
serves
>> > as a way to deal with point occurrences and Event reflects the context
of a
>> > single capture event (whether a single observation, or a bulk sample
>> > capture). The changes recommended seem to dramatically extend and
change
>> > that meaning? Its simply a question that I don't have answer to, but
is
>> > Darwin Core, the right vehicle to start capturing repeated measures of
>> > biomass values from trawls? I don't have answer but man, terms like
>> > quantityType (as a property of occurrence?) give me pause.
>> >
>> >
>> >
>> > 3) Is Sampling Unit a controlled vocabulary? For another project, I
have
>> > looked through - and captured scope, effort and completeness measures
from
>> > - a large number of published biotic area inventories. The vast
majorities
>> > of these are measured in units like bucket hours, or trap nights. Is a
>> > "bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to
send
>> > along all the many examples of how biotic inventories of an area are
>> > completed and perhaps it might be good to see how those might be
>> > represented using the terms you are proposing?
>> >
>> >
>> >
>> > Best, Rob
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle
<deepreef@bishopmuseum.org>
>> > wrote:
>> >
>> > Same here ? Events are central to the work that we do.
>> >
>> >
>> >
>> > Aloha,
>> >
>> > Rich
>> >
>> >
>> >
>> > *From:* tdwg-content-bounces@lists.tdwg.org [mailto:
>> > tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Anne Thessen
>> > *Sent:* Wednesday, August 20, 2014 2:59 AM
>> > *To:* tdwg-content@lists.tdwg.org
>> >
>> >
>> > *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>> > expressing sample data
>> >
>> >
>> >
>> > Hello
>> > I would just like to comment on *event core*.
>> > I've been doing a lot of work translating published data into Darwin
Core.
>> > During that process I've wished several times that I could use Event as
>> > core. I am happy to hear about that proposed change. It will make it
easier
>> > to model the data I am working with.
>> > Anne
>> >
>> > On 8/20/2014 7:04 AM, ?amonn ? Tuama [GBIF] wrote:
>> >
>> > Hi Rob,
>> >
>> >
>> >
>> > Thank you for the feedback. I have tried to address the two main issues
>> > you raise below. At the outset, I would like to emphasise that much of
this
>> > work is taking place in the context of the EU BON project which
includes a
>> > task on developing/enhancing tools and standards for data sharing with
a
>> > particular focus on the IPT for publishing sample-based data. So, we
were
>> > constrained by the need to publish sample-based data sets in the Darwin
>> > Core Archive format and to demonstrate practical application using a
>> > working prototype. When the discussion on the TDWG list faded out, we
took
>> > it to our EU BON partners whose requirements were essential input to
>> > further development. We recognise that these discussions took place
away
>> > from TDWG (although the TDWG/EU BON contributors overlapped) and this
is
>> > the reason we are presenting the outcomes here for further
consideration.
>> >
>> >
>> >
>> > **Event core**
>> >
>> > As the SIGS report indicated, sample data can be modelled in Darwin
Core
>> > Archives using either Occurrence or Event as core. This was the
starting
>> > point for our evaluation but as things progressed the data wrangling
pushed
>> > the model back towards the Event core. We actually went through the
>> > exercise of mapping multiple test datasets in an iterative process
spanning
>> > several months' work. In the end, we found that using an Event core
better
>> > matched the typical sample data we were dealing with, allowing use of a
>> > measurement-or-fact extension to be included for the efficient
expression
>> > of environmental information associated with the event. The choice
comes
>> > down to an Occurrence core or an Event core + Occurrence extension. In
both
>> > cases, the true observation records are Occurrences. The big difference
is
>> > what type the core has and therefore to which kind of records you can
>> > attach further facts and extra information with DwC-A extensions. Many
>> > sampling datasets have very rich information about the site and event,
so
>> > it is very natural to hang facts from an Event core. When picking the
>> > Occurrence core those facts would have to be repeated for each and
every
>> > occurrence record. Moreover, our approach doesn?t stop anyone from
using
>> > the Occurrence core if they so wish. This just provides a different
option
>> > for datasets that better fit an Event core model.
>> >
>> >
>> >
>> > I want to stress that we are not building a ?specific IPT version? to
>> > support an Event core but, rather, we adapted the IPT so that it can be
>> > configured to support any generic ?core + extension? format to enable
its
>> > use for exploration of more data formats. This is part of the core
>> > codebase and there were no custom forks of the IPT for this work. Our
view
>> > at GBIF is that if there are significant numbers of data publishers who
are
>> > keen to adopt, promote and use a (any) format, and the tools can be
>> > configured to do so, then we should support it, and, if necessary, use
a
>> > custom namespace.
>> >
>> >
>> >
>> > **New terms around abundance**
>> >
>> > Yes, the discussion on TDWG did fade out but it was clear that the term
>> > ?abundance? as recommended by the SIGS report (along with
>> > abundanceAsPercent) was confusing many when we were looking for term(s)
>> > that reported quantitative measures of organisms in a sample. It also
>> > became clear we would need to be able to state the type of quantity
being
>> > measured. An alternative suggestion for using the MeasurementsOrFact
class
>> > was immediately shot down.
>> >
>> > As some of our main use cases were coming from the EU BON project,
>> > discussion shifted to that forum and consensus formed about the
currently
>> > proposed terms. It was within this group that the additional terms
>> > (samplingGeometry, samplingUnit, eventSeriesID) were proposed and where
we
>> > began testing with sample data sets.
>> >
>> >
>> >
>> > Best regards,
>> >
>> > ?amonn
>> >
>> >
>> >
>> >
>> >
>> > *From:* robgur@gmail.com [mailto:robgur@gmail.com <robgur@gmail.com>]
*On
>> > Behalf Of *Robert Guralnick
>> > *Sent:* 19 August 2014 16:56
>> > *To:* ?amonn ? Tuama [GBIF]
>> > *Cc:* TDWG Content Mailing List
>> > *Subject:* Re: [tdwg-content] Darwin Core: proposed news terms for
>> > expressing sample data
>> >
>> >
>> >
>> >
>> >
>> > Hi ?amonn --- I am curious about the outcomes presented in the SIGS
>> > paper, in particular, this portion of the paper:
>> >
>> >
>> >
>> > "Solutions without introducing an event core in Darwin Core Archives:
>> > During the review of the solutions for the uses cases, it became
apparent
>> > that either model could be applied to every use case. The core and
>> > extensions bore a complementary relationship and between them could
express
>> > all the required information. The core simply provided the central
anchor
>> > in the star schema from which to join the additional information.
>> > Therefore, using the Occurrence core, well established in the GBIF
network
>> > through uptake of the IPT, seemed more appropriate than inventing
>> > CollectingEvent as an additional core type."
>> >
>> >
>> >
>> > That SIGS paper has John Wieczorek and you both as authors,
including
>> > many luminaries across the biodiversity standards spectrum. Given the
>> > above, its curious to see the EventCore come back again, along with a
>> > specific IPT version to support it.
>> >
>> >
>> >
>> > So I see two issues, conflated, in this post you just made. One is
>> > the need for an EventCore at all, and the nature of relating Event and
>> > Occurrence/Material Sample. The second is the introduction of new
terms,
>> > which seemingly have arrived after debate on similar terms - but framed
>> > around abundance - stalled a year ago. To my mind, these both require
some
>> > further discussion, because I don't (necessarily) see TDWG community
>> > coherence around either issue?
>> >
>> >
>> >
>> > Best, Rob
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Aug 19, 2014 at 6:11 AM, ?amonn ? Tuama [GBIF]
<eotuama@gbif.org>
>> > wrote:
>> >
>> > Dear All,
>> >
>> >
>> >
>> > GBIF is committed to exploring ways in which the IPT and Darwin Core
>> > Archive format can be extended for publishing sample-based data sets.
In
>> > association with the EU BON project [1], a customised version of the
IPT
>> > [2] has been deployed to test this using a special type of Darwin Core
>> > Archive in which the core is an ?Event? with associated taxon
occurrences
>> > in an ?Occurrence? extension.
>> >
>> >
>> >
>> > The Darwin Core vocabulary already provides a rich set of terms with
many
>> > relevant for describing sample-based data. Synthesising several sources
of
>> > input (GBIF organised workshop on sample data, May 2013 [3],
discussions on
>> > the TDWG mailing list in late 2013; internal discussion among EU BON
>> > project partners), five new terms relating to sample data were
identified
>> > as essential. The complete model including these new terms are fully
>> > described with examples in the online document ?Publishing sample data
>> > using the GBIF IPT? [4].
>> >
>> >
>> >
>> > As a first step towards ratification, we would like to register the new
>> > terms in the DwC Google Code tracker [5] if there are no major
objections
>> > on this list. The five terms are:
>> >
>> >
>> >
>> > 1. *quantity*: the number or enumeration value of the quantityType
>> > (e.g., individuals, biomass, biovolume, BraunBlanquetScale) per
>> > samplingUnit or a percentage measure recorded for the sample.
>> >
>> >
>> >
>> > 2. *quantityType*: : the entity being referred to by quantity,
>> > e.g., individuals, biomass, %species, scale type.
>> >
>> >
>> >
>> > 3. *samplingGeometry*: an indication of what kind of space was
>> > sampled; select from point, line, area or volume.
>> >
>> >
>> >
>> > 4. *samplingUnit*: the unit of measurement used for reporting the
>> > quantity in the sample, e.g., minute, hour, day, metre, metre^2,
metre^3.
>> > It is combined with quantity and quantityType to provide the complete
>> > measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
>> >
>> >
>> >
>> > 5. *eventSeriesID*: an identifier for a set of events that are
>> > associated in some way, e.g., a monitoring series; may be a global
unique
>> > identifier or an identifier specific to the series.
>> >
>> >
>> >
>> >
>> >
>> > Best regards,
>> >
>> >
>> >
>> > ?amonn
>> >
>> >
>> >
>> > [1] http://eubon.eu
>> >
>> > [2] http://eubon-ipt.gbif.org
>> >
>> > [3]
>> >
http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>> >
>> > [4] http://links.gbif.org/sample_data_model
>> >
>> > [5] https://code.google.com/p/darwincore/issues/list
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ____________________________________________________
>> >
>> > *?amonn ? Tuama, M.Sc., Ph.D. (eotuama@gbif.org <eotuama@gbif.org>), *
>> >
>> > *Senior Programme Officer for Interoperability, *
>> >
>> > *Global Biodiversity Information Facility Secretariat, *
>> >
>> > *Universitetsparken 15, DK-2100, Copenhagen ?, DENMARK*
>> >
>> > *Phone: +45 3532 1494 <tel:%2B45%203532%201494> <%2B45%203532%201494>; Fax: +45 3532 1480 <tel:%2B45%203532%201480>
>> > <%2B45%203532%201480>*
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content@lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> >
>> > tdwg-content mailing list
>> >
>> > tdwg-content@lists.tdwg.org
>> >
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>> >
>> > --
>> >
>> > Anne E. Thessen, Ph.D.
>> >
>> > The Data Detektiv, Owner and Founder
>> >
>> > Ronin Institute, Research Scholar
>> >
>> > 443.225.9185
>> >
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content@lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Anne E. Thessen, Ph.D.
>> >
>> > The Data Detektiv, Owner and Founder
>> >
>> > Ronin Institute, Research Scholar
>> >
>> > 443.225.9185
>> >
>> >
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content@lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content@lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140821/4b338606/a <http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140821/4b338606/attachment-0001.html>
ttachment-0001.html
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 22 Aug 2014 08:54:07 +0200
>> From: "Donald Hobern [GBIF]" <dhobern@gbif.org>
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
>> expressing sample data
>> To: "'Matt Jones'" <jones@nceas.ucsb.edu>, '?amonn ? Tuama [GBIF]'
>> <eotuama@gbif.org>
>> Cc: 'TDWG Content Mailing List' <tdwg-content@lists.tdwg.org>
>> Message-ID: <003401cfbdd5$de52b640$9af822c0$@gbif.org>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Matt,
>>
>>
>>
>> I?ll take the chance to make a few quick comments here, because I believe
that this work is of massive importance.
>>
>>
>>
>> Clearly DwC should avoid trying to duplicate well-standardised models and
protocols. However at the same time, there is enormous value for producers
and consumers of DwC to benefit from richer data on the events and methods
associated with individual species occurrences. I have never seen DwC as
purely an Occurrence exchange syntax. I see it (from GBIF?s standpoint)
more closely as a mechanism for diverse parties to pool the evidence they
have for the occurrence of any species including associated information
and/or actionable links to associated information. Users coming from this
perspective certainly need (and are demanding) access to all the evidence
that can be mobilized to serve as supporting evidence and they also need the
ability to understand the significance of these records. Abundance
measures, levels of effort, use of consistent methods and redetection of
individual organisms are all part of this. DwC should be able to transmit
as much data as publishers cho
>> ose to share on such aspects as part of their publishing of DwC. Users
of DwC carrying out species modeling, threat assessment or community
analyses will benefit from rapid ways to filter data for those which derive
from standardized sampling events, to understand relative abundance within
samples, etc. Many publishers of DwC are currently sharing stripped-down
subsets of data and wish to give more information on these points. Users
are certainly demanding it.
>>
>>
>>
>> The challenge is finding the sweet spot, the achievable, non-destructive
overlap between DwC and the proper domain of models better designed to
handle the representation of complex systems outside DwC?s current domain.
If this is done correctly, there should be paths that enable us to generate
O&E (and maybe OBOE) compatible data from data that publishers only serve as
augmented DwC.
>>
>>
>>
>> I?ll also note that this has been a prominent area of discussion now for
several years. Many of us believe strongly that this is one of the most
important ways in which we need to close arbitrary gaps between data silos.
It?s a prominent part of the GBIF work programme for 2014-2016.
>>
>>
>>
>> Very best wishes,
>>
>>
>>
>> Donald
>>
>>
>>
>> From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Matt Jones
>> Sent: Friday, August 22, 2014 4:52 AM
>> To: ?amonn ? Tuama [GBIF]
>> Cc: TDWG Content Mailing List
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
expressing sample data
>>
>>
>>
>> This proposal is treading on ground that is quite similar to other
observations and measurements standards for data exchange that are already
mature, in particular:
>>
>>
>>
>> * OGC Observations and Measurements
(http://www.opengeospatial.org/standards/om)
>>
>> * Extensible Observation Ontology (OBOE;
https://semtools.ecoinformatics.org/oboe)
>>
>>
>>
>> The former is a standard and broadly deployed, whereas the latter is part
of a research program in the use of ontologies for measurements. Through
collaboration between the two projects, they've been modified to be
reasonably isomorphic, but O&M uses an XML serialization while OBOE uses an
OWL-DL serialization. They largely express the same measurements and
sampling model once one gets beyond the terminology differences.
>>
>>
>>
>> So, I'm wondering if it make much sense to extend Darwin Core, which is
at heart an Occurrence exchange syntax, into this measurements area that is
well represented by these other existing specifications? I'm curious to
hear why people would even want to do this. And if we do go down this path,
won't we just end up with a new syntax that does essentially what O&M and
OBOE do now?
>>
>>
>>
>> Matt
>>
>>
>>
>>
>>
>> On Thu, Aug 21, 2014 at 12:22 AM, ?amonn ? Tuama [GBIF]
<eotuama@gbif.org> wrote:
>>
>> Hi Rob, Anne, Rich,
>>
>>
>>
>> I think Markus has answered your question as to why we opted for an Event
core which is being used in the sense described by Anne and Rich. For any
event, you can have a list of species in an Occurrence extension and for
each species, you can include quantity and quantityType, e.g., biomass, etc.
The proposed term eventSeriesID was intended for linking together related
events, although it now looks like parentEventID might be a better, more
flexible term. The measurementOrFact extension is a good fit for capturing
environmental information relating to an event. See, e.g., the Gialova
Lagoon brackish water invertebrate test data set [1] where a set of 18
environmental variables, including temp, pH, Rdx, particulate organic
matter, dissolved oxygen, salinity, chlorophyll-a were measured for each
sampling station-sampling period combination. An example mapping is:
>>
>>
>>
>> Id measurementType measurementValue
measurementUnit measurementRemarks
>>
>> IA Tmp (sed) 21.5
degree C Tmp (sed): temperature at the bottom
surface
>>
>>
>>
>> *Controlled vocabularies*
>>
>> Ideally, the values for samplingUnit and quantityType would be selected
from controlled vocabularies. This is, effectively, what we do by presenting
a small list of values in a drop-down menu. The current values are what we
derived for example data sets and discussion but they can undoubtedly be
extended and improved.
>>
>>
>>
>> We capture ?bucket? type measures through a combination of
samplingEffort, samplingGeometry and samplingUnit. For example, a pitfall
trap (in a point location) left out for 16 days might have samplingEffort:
16, samplingGeometry: point and samplingUnit: day. Three m^2 quadrats in a
shore survey might have samplingEffort: 3, samplingGeometry: area and
samplingUnit: m^2.
>>
>>
>>
>> It would be very useful to see your compilation of scope, effort and
completeness measures to see if we can express them in our model and/or if
we need to reconsider our approach.
>>
>>
>>
>> ?amonn
>>
>>
>>
>> [1] http://eubon-ipt.gbif.org/resource.do?r=ionian-brackish-lagoon
>>
>>
>>
>> From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Markus D?ring
>> Sent: 20 August 2014 23:47
>> To: Robert Guralnick
>>
>>
>> Cc: TDWG Content Mailing List
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
expressing sample data
>>
>>
>>
>> Rob,
>>
>>
>>
>> this proposal if for monitoring surveys really, not to be confused with
material samples like environmental or tissue samples which have a distinct
new dwc class MaterialSample.
>>
>>
>>
>> We tend to overload the term sampling a lot and it helps treating
material samples different from pure observational "sampling". That is why
the existing Event class was used as the core and classic Occurrence records
as extensions. A classic example is a vegetation survey where each plot
represents an Event record and each recorded species in that plot will be an
Occurrence extension record with a given quantity. Darwin Core already
offers individualCount to specify quantity, but it is a very specific way of
measuring "abundance" restricted to only some use cases. Abiotic
measurements about the plot (e.g. soil type, pH, temperature) can be
published using the measurements or facts extension linked to the Event
core.
>>
>>
>>
>> Markus
>>
>>
>>
>>
>>
>>
>>
>> On 20 Aug 2014, at 20:08, Robert Guralnick
<Robert.Guralnick@colorado.edu> wrote:
>>
>>
>>
>>
>>
>> Anne -- I don't know the answers! These are questions for Eamonn. I
would presume that a sample could be a jumble of species or even just water
or soil samples, and biomass would refer to that sample - but maybe that
isn't a use case being considered? The examples given in the longer
document all link an event_id to species name and some measure of quantity
for that species (to the species, not an individual specimen), so I assume
that is the prevailing (or only) case?
>>
>> Best, Rob
>>
>>
>>
>>
>>
>> On Wed, Aug 20, 2014 at 11:56 AM, Anne Thessen <annethessen@gmail.com>
wrote:
>>
>> Hi Rob
>> I would like to respond to your item number 2.
>> >From my perspective, I deal with lots of published descriptions of taxa.
The text might say something like "I saw species A in the Chesapeake Bay,
the Adriatic Sea and the Indian Ocean and the biomass is 5 - 9 grams". The
biomass range obviously corresponds to at least three different occurrences,
but how to divide the biomass data? I would love to be able to have an
*event* to attach it all to. There is almost two different levels of events
- a sampling event and a "study event". The "study event" would correspond
to the type of event I would like to use in the above example. It may not be
ideal, but for the old literature that might be the best we can do.
>> I have to admit that I don't know enough about trawl data to understand
why an event core would be a problem. It seems that the trawl would be an
event and each biomass measure (of each fish) would be attached to a
separate occurrence which is attached to that event. Am I understanding this
wrong?
>> btw - I found a workaround for the example I gave, so it's not impossible
to model with the current structure....
>> Anne
>>
>>
>>
>> On 8/20/2014 1:16 PM, Robert Guralnick wrote:
>>
>>
>>
>> ?amonn et al. --- Thanks for the clarifications. I think these help a
ton but it raises a couple more questions for me.
>>
>>
>>
>> 1) I am surprised that you plan to use of MeasurementorFact extension
in relation to the Event core, which seems like a novel (or perhaps awkward
or unintended?) mechanism for capturing environmental data, but the same
extension was not be seen as relevant for describing samples? Can you
explain more about the thinking there?
>>
>>
>>
>> 2) There may be a subtle issue here extending "Event" to be more what
you call a "Sampling Event Core". My read of this is that Darwin Core
serves as a way to deal with point occurrences and Event reflects the
context of a single capture event (whether a single observation, or a bulk
sample capture). The changes recommended seem to dramatically extend and
change that meaning? Its simply a question that I don't have answer to, but
is Darwin Core, the right vehicle to start capturing repeated measures of
biomass values from trawls? I don't have answer but man, terms like
quantityType (as a property of occurrence?) give me pause.
>>
>>
>>
>> 3) Is Sampling Unit a controlled vocabulary? For another project, I have
looked through - and captured scope, effort and completeness measures from -
a large number of published biotic area inventories. The vast majorities of
these are measured in units like bucket hours, or trap nights. Is a
"bucket" part of SamplingGeometry or Sampling Unit? I'd be happy to send
along all the many examples of how biotic inventories of an area are
completed and perhaps it might be good to see how those might be represented
using the terms you are proposing?
>>
>>
>>
>> Best, Rob
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Aug 20, 2014 at 10:16 AM, Richard Pyle
<deepreef@bishopmuseum.org> wrote:
>>
>> Same here ? Events are central to the work that we do.
>>
>>
>>
>> Aloha,
>>
>> Rich
>>
>>
>>
>> From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Anne Thessen
>> Sent: Wednesday, August 20, 2014 2:59 AM
>> To: tdwg-content@lists.tdwg.org
>>
>>
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
expressing sample data
>>
>>
>>
>> Hello
>> I would just like to comment on *event core*.
>> I've been doing a lot of work translating published data into Darwin
Core. During that process I've wished several times that I could use Event
as core. I am happy to hear about that proposed change. It will make it
easier to model the data I am working with.
>> Anne
>>
>> On 8/20/2014 7:04 AM, ?amonn ? Tuama [GBIF] wrote:
>>
>> Hi Rob,
>>
>>
>>
>> Thank you for the feedback. I have tried to address the two main issues
you raise below. At the outset, I would like to emphasise that much of this
work is taking place in the context of the EU BON project which includes a
task on developing/enhancing tools and standards for data sharing with a
particular focus on the IPT for publishing sample-based data. So, we were
constrained by the need to publish sample-based data sets in the Darwin Core
Archive format and to demonstrate practical application using a working
prototype. When the discussion on the TDWG list faded out, we took it to our
EU BON partners whose requirements were essential input to further
development. We recognise that these discussions took place away from TDWG
(although the TDWG/EU BON contributors overlapped) and this is the reason we
are presenting the outcomes here for further consideration.
>>
>>
>>
>> *Event core*
>>
>> As the SIGS report indicated, sample data can be modelled in Darwin Core
Archives using either Occurrence or Event as core. This was the starting
point for our evaluation but as things progressed the data wrangling pushed
the model back towards the Event core. We actually went through the exercise
of mapping multiple test datasets in an iterative process spanning several
months' work. In the end, we found that using an Event core better matched
the typical sample data we were dealing with, allowing use of a
measurement-or-fact extension to be included for the efficient expression of
environmental information associated with the event. The choice comes down
to an Occurrence core or an Event core + Occurrence extension. In both
cases, the true observation records are Occurrences. The big difference is
what type the core has and therefore to which kind of records you can attach
further facts and extra information with DwC-A extensions. Many sampling
datasets have very rich infor
>> mation about the site and event, so it is very natural to hang facts
from an Event core. When picking the Occurrence core those facts would have
to be repeated for each and every occurrence record. Moreover, our approach
doesn?t stop anyone from using the Occurrence core if they so wish. This
just provides a different option for datasets that better fit an Event core
model.
>>
>>
>>
>> I want to stress that we are not building a ?specific IPT version? to
support an Event core but, rather, we adapted the IPT so that it can be
configured to support any generic ?core + extension? format to enable its
use for exploration of more data formats. This is part of the core codebase
and there were no custom forks of the IPT for this work. Our view at GBIF
is that if there are significant numbers of data publishers who are keen to
adopt, promote and use a (any) format, and the tools can be configured to do
so, then we should support it, and, if necessary, use a custom namespace.
>>
>>
>>
>> *New terms around abundance*
>>
>> Yes, the discussion on TDWG did fade out but it was clear that the term
?abundance? as recommended by the SIGS report (along with
abundanceAsPercent) was confusing many when we were looking for term(s) that
reported quantitative measures of organisms in a sample. It also became
clear we would need to be able to state the type of quantity being measured.
An alternative suggestion for using the MeasurementsOrFact class was
immediately shot down.
>>
>> As some of our main use cases were coming from the EU BON project,
discussion shifted to that forum and consensus formed about the currently
proposed terms. It was within this group that the additional terms
(samplingGeometry, samplingUnit, eventSeriesID) were proposed and where we
began testing with sample data sets.
>>
>>
>>
>> Best regards,
>>
>> ?amonn
>>
>>
>>
>>
>>
>> From: robgur@gmail.com [mailto:robgur@gmail.com] On Behalf Of Robert
Guralnick
>> Sent: 19 August 2014 16:56
>> To: ?amonn ? Tuama [GBIF]
>> Cc: TDWG Content Mailing List
>> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
expressing sample data
>>
>>
>>
>>
>>
>> Hi ?amonn --- I am curious about the outcomes presented in the SIGS
paper, in particular, this portion of the paper:
>>
>>
>>
>> "Solutions without introducing an event core in Darwin Core Archives:
During the review of the solutions for the uses cases, it became apparent
that either model could be applied to every use case. The core and
extensions bore a complementary relationship and between them could express
all the required information. The core simply provided the central anchor in
the star schema from which to join the additional information. Therefore,
using the Occurrence core, well established in the GBIF network through
uptake of the IPT, seemed more appropriate than inventing CollectingEvent as
an additional core type."
>>
>>
>>
>> That SIGS paper has John Wieczorek and you both as authors, including
many luminaries across the biodiversity standards spectrum. Given the
above, its curious to see the EventCore come back again, along with a
specific IPT version to support it.
>>
>>
>>
>> So I see two issues, conflated, in this post you just made. One is
the need for an EventCore at all, and the nature of relating Event and
Occurrence/Material Sample. The second is the introduction of new terms,
which seemingly have arrived after debate on similar terms - but framed
around abundance - stalled a year ago. To my mind, these both require some
further discussion, because I don't (necessarily) see TDWG community
coherence around either issue?
>>
>>
>>
>> Best, Rob
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 6:11 AM, ?amonn ? Tuama [GBIF] <eotuama@gbif.org>
wrote:
>>
>> Dear All,
>>
>>
>>
>> GBIF is committed to exploring ways in which the IPT and Darwin Core
Archive format can be extended for publishing sample-based data sets. In
association with the EU BON project [1], a customised version of the IPT [2]
has been deployed to test this using a special type of Darwin Core Archive
in which the core is an ?Event? with associated taxon occurrences in an
?Occurrence? extension.
>>
>>
>>
>> The Darwin Core vocabulary already provides a rich set of terms with many
relevant for describing sample-based data. Synthesising several sources of
input (GBIF organised workshop on sample data, May 2013 [3], discussions on
the TDWG mailing list in late 2013; internal discussion among EU BON project
partners), five new terms relating to sample data were identified as
essential. The complete model including these new terms are fully described
with examples in the online document ?Publishing sample data using the GBIF
IPT? [4].
>>
>>
>>
>> As a first step towards ratification, we would like to register the new
terms in the DwC Google Code tracker [5] if there are no major objections on
this list. The five terms are:
>>
>>
>>
>> 1. quantity: the number or enumeration value of the quantityType
(e.g., individuals, biomass, biovolume, BraunBlanquetScale) per samplingUnit
or a percentage measure recorded for the sample.
>>
>>
>>
>> 2. quantityType: : the entity being referred to by quantity, e.g.,
individuals, biomass, %species, scale type.
>>
>>
>>
>> 3. samplingGeometry: an indication of what kind of space was
sampled; select from point, line, area or volume.
>>
>>
>>
>> 4. samplingUnit: the unit of measurement used for reporting the
quantity in the sample, e.g., minute, hour, day, metre, metre^2, metre^3.
It is combined with quantity and quantityType to provide the complete
measurement, e.g., 9 individuals per day, 4 biomass-gm per metre^2.
>>
>>
>>
>> 5. eventSeriesID: an identifier for a set of events that are
associated in some way, e.g., a monitoring series; may be a global unique
identifier or an identifier specific to the series.
>>
>>
>>
>>
>>
>> Best regards,
>>
>>
>>
>> ?amonn
>>
>>
>>
>> [1] http://eubon.eu <http://eubon.eu/>
>>
>> [2] http://eubon-ipt.gbif.org <http://eubon-ipt.gbif.org/>
>>
>> [3]
http://www.standardsingenomics.org/index.php/sigen/article/view/sigs.4898640
>>
>> [4] <http://links.gbif.org/sample_data_model>
http://links.gbif.org/sample_data_model
>>
>> [5] https://code.google.com/p/darwincore/issues/list
>>
>>
>>
>>
>>
>>
>>
>> ____________________________________________________
>>
>> ?amonn ? Tuama, M.Sc., Ph.D. (eotuama@gbif.org),
>>
>> Senior Programme Officer for Interoperability,
>>
>> Global Biodiversity Information Facility Secretariat,
>>
>> Universitetsparken 15, DK-2100, Copenhagen ?, DENMARK
>>
>> Phone: +45 3532 1494 <tel:%2B45%203532%201494> <tel:%2B45%203532%201494> ; Fax: +45 3532 1480 <tel:%2B45%203532%201480>
<tel:%2B45%203532%201480>
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>>
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>> --
>> Anne E. Thessen, Ph.D.
>> The Data Detektiv, Owner and Founder
>> Ronin Institute, Research Scholar
>> 443.225.9185
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>>
>>
>> --
>> Anne E. Thessen, Ph.D.
>> The Data Detektiv, Owner and Founder
>> Ronin Institute, Research Scholar
>> 443.225.9185
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140822/a46f067d/a <http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140822/a46f067d/attachment-0001.html>
ttachment-0001.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>> End of tdwg-content Digest, Vol 63, Issue 6
>> *******************************************
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content@lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content@lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content@lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
John Deck
(541) 321-0689 <tel:%28541%29%20321-0689>
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140829/8d91e9ec/attachment-0001.html
------------------------------
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
End of tdwg-content Digest, Vol 63, Issue 14
********************************************