Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 14

3 Sep 2014

      Hi Éamonn,

Thanks for the feedback. You are quite right that the criticism of
ambiguous interpretation of terms extends to many DwC terms. I was not
trying single out the proposed terms in this regard, merely show how the
ambiguity that exists across many DwC terms could also plague the proposed
ones. As you indicate, good documentation - including definitions and
instructions that are as explicit as possible - is key to insuring that the
terms you suggest are used properly. In my opinion, there is not shame in
very long definitions - better to say exactly what you mean than worry
about brevity!

I look forward to discussing this with you in person at TDWG.

Ramona

------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden

On Wed, Sep 3, 2014 at 7:44 AM, Éamonn Ó Tuama [GBIF] <eotuama@gbif.org>
wrote:
...
Hi Ramona,
The idea of applying domains and ranges to the proposed properties was
never entertained because, as you point out, the absence of these is a
feature designed into DwC to make it maximally re-usable. Given the
(deliberate) lack of semantic rigour in Darwin Core, the question comes
down to how well suited it is for use in Darwin Core Archives as an
exchange format for what we have been referring to as "sample-based" data.
I would like to re-emphasise that in doing this we are not trying to
establish how data should be captured or modeled, only one way that they
should be exposed to maximize discoverability and reuse, whether that be
only a subset/view of some aspects of a data set. We are most definitely
not trying to shoe-horn the complexity of the OGC O&M / OBOE model on to
DwC.
We had already envisaged use of some additional flag like "sample" to
indicate  the nature of the evidence for the data but that will require
addressing the issue recognised by TDWG in Florence of the need to replace
basisOfRecord with some new hierarchical vocabulary for evidence(Type) as
this is what we mostly (mis)use basisOfRecord for. That was an omission
from our description - so thanks for highlighting it.
Your examples of how the vagueness of domain and range values for the
proposed terms can lead to ambiguities in interpretation is valid from a
strict ontological approach but can probably be levelled at all DwC
properties, e.g., DwC itself does not enforce an obligatory pairing of lat
and long values so if one is missing, then you are left without a location.
So, yes, quantity and quantityType need to co-occur for interpretation as
do samplingUnit and samplingEffort, and both pairings need to be present in
order to interpret the figures correctly. If any of the pairings are
incomplete, at most you will be able to say there was an occurrence of
taxon X at the event location and it was recorded as part of a sampling
event that used a particular protocol. The term "quantity" is not about a
count of toads in a museum jar, rather it refers to the number of
occurrences per samplingUnit, in this case, e.g., 9 toads were recorded per
M^2 (the samplingUnit). In the end, our approach will only work with good
documentation and guidance, by making the IPT as user-friendly as possible,
possibly doing completeness checks, etc.
I'm still looking to the BCO to provide a comprehensive framework for
observations/samples. But, as you point out, it's not there yet.
Unfortunately, your suggestion of using EML as an exchange format for
sample data does not meet our needs. We already include EML for general
metadata but the whole point of DwC-A is that we can map to some
standardised vocabulary, particularly in this case, for basic quantitative
information around a data set associated with a particular sampling
protocol.
This discussion has been very useful and has prompted us to review our
proposal to ensure we apply the most minimally disruptive solution given
the needs of the GBIF/EU BON community.
Éamonn
*From:* tdwg-content-bounces@lists.tdwg.org [mailto:
tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Ramona Walls
*Sent:* 02 September 2014 05:39
*To:* TDWG Content Mailing List
*Subject:* Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 14
Thank you, Simon, for that explanation and the links. They were very
helpful. Amen to the point: "There is no ?sample? class, because it is such
an overloaded word (noun, verb, statistical sample vs ex-situ sample,
etc)." The documents you shared highlight the very important point that OGC
and OBO-E were designed specifically to describe observations.
Darwin Core, on the other hand, was designed to capture information about
"taxa, their occurrence in nature as documented by observations, specimens,
samples, and related information" [1]. As such, observations are not
central to Darwin Core, but rather are included as evidence of the
occurrence of a taxon in nature. It works for communicating basic
information about an observation or other evidence of a taxon's occurrence,
but I think it would be mis-using and abusing DwC to try to shoe-horn the
complexity of observation data/metadata into it. It also does some
dis-service to the communities who have spent so much time developing OGC
and OBO-E.
Eamonn, this is not meant to discredit the work that you and your
colleagues have done to develop a DwC archive schema for sampling data. I
think it is an important step toward developing a comprehensive framework
for biodiversity data, and just by proposing it, we have moved a step in
the right direction (even if I disagree about adopting it). Your point that
OBO-E is far more complex is true, and we may have to adopt more terms if
we accurately want to describe observation data in DwC. On the other hand,
we do not need to necessarily adopt every aspect of OBO-E to exchange
observation data.
What the BCO participants -- and thanks to all the GBIF people who have
participated! -- are trying to do is build a framework that can work across
many (not necessarily all) types of biodiversity data, including specimen
collection and observations, while considering existing efforts such as
DwC, MIxS, OBO-E, and OBO Foundry ontologies. We started with specimens,
but the intention has always been to link to observation data as well [2].
Although the full BCO model probably will be large and complex, we fully
intend to offer views that are basically subsets of the ontology filtered
for applications. This is regular practice now in application ontologies.
Views makes it possible to provide a controlled vocabulary for data
annotation without burdening annotators with a confusing array of terms and
logical definitions.
However, the point that BCO is not yet ready for your needs is correct,
and I would never tell anyone to just "hold on to your data until the
ontology is ready".  Did you examine the possibility of using EML as an
exchange format for the sampling/survey related data? DwC-A already has an
EML component, so I wonder if some combination of an occurrence core with
an extended EML document (based on OGC) would work.
Ramona
[1] http://rs.tdwg.org/dwc/index.htm
[2]
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089606
------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden
On Fri, Aug 29, 2014 at 3:00 AM, <tdwg-content-request@lists.tdwg.org>
wrote:
Send tdwg-content mailing list submissions to
        tdwg-content@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.tdwg.org/mailman/listinfo/tdwg-content
or, via email, send a message with subject or body 'help' to
        tdwg-content-request@lists.tdwg.org
You can reach the person managing the list at
        tdwg-content-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of tdwg-content digest..."
Today's Topics:
1. Re: Darwin Core: proposed news terms for expressing sample
      data (Simon.Cox@csiro.au)
   2. Re: tdwg-content Digest, Vol 63, Issue 6 (sigh) (?amonn)
----------------------------------------------------------------------
Message: 1
Date: Fri, 29 Aug 2014 07:58:14 +0000
From: <Simon.Cox@csiro.au>
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
        expressing sample data
To: <tdwg-content@lists.tdwg.org>
Message-ID:
        <
2A7346E8D9F62D4CA8D78387173A054A5FFF5801@exmbx04-cdc.nexus.csiro.au>
Content-Type: text/plain; charset="utf-8"
G?day again TDWGers:
Matt passed on links to this thread to me and suggested I comment, as I
was the author of the O&M standard (published as ISO 19156:2011 and OGC
Abstract Spec Topic 20).
For those who are not aware of this work, there is a short Wikipedia page
http://en.wikipedia.org/wiki/Observations_and_Measurements whose main
value is it has links to a number of more detailed resources.
Probably the richest of these is another Wiki page at CSIRO
https://www.seegrid.csiro.au/wiki/AppSchemas/ObservationsAndSampling
which hasn?t been updated much recently, but at least has some diagrams
embedded.
As Matt and others have hinted, as a result of a workshop at NCEAS a few
years ago, there were some tweaks to allow it to meet some of the
requirements identified in OBOE, just in time to beat the ISO deadline!
O&M includes a generic model for ?Sampling Features? ? being those
artefacts that are created to assist the observation process, but would not
exist and have very much interest otherwise.
Things like specimens, transects, sections, quadrats, scenes and swaths,
drillholes, flightlines, trajectories, ships tracks, etc.
Because it is a generic standard, you won?t find things with names
familiar to any particular discipline, and there are a lot of stub classes
for supporting information which need filling out for specific applications.
But the intention is that it provides a framework for a discipline or
community to specialize for their purposes, while retaining some topology
and perhaps terminology (maybe just as super-classes) that help with
information sharing across discipline boundaries.
The main properties of a sampling feature are
-          The sampledFeature ? being the domain object which it is being
used to characterize
-          Related sampling features ? other features related to the
observational strategy
-          Related observations ? observation events that use this
sampling feature (for which another generic model is provided)
We?ve generally found it helpful in teasing apart observational records
and protocols in a variety of environmental science applications, and other
have applied it in oceans, meteorology, even air-traffic control!
The primary classification of sampling features in O&M is by topological
dimension (point, curve, surface, solid), because these are commonly used
and afford common processing methods.
?Specimen? is the other concrete sampling-feature type.
There is no ?sample? class, because it is such an overloaded word (noun,
verb, statistical sample vs ex-situ sample, etc).
O&M and its Sampling Feature model was designed in UML.
As Matt notes that the original implementation in the OGC context was in
XML, using GML
http://schemas.opengis.net/samplingSpecimen/2.0/specimen.xsd and
http://schemas.opengis.net/samplingSpatial/2.0/spatialSamplingFeature.xsd
.
However, it has been implemented other ways: there is an OWL2/RDFS
representation at
http://def.seegrid.csiro.au/isotc211/iso19156/2011/sampling which is
linked in with OWL versions of a bunch of the other ISO standards, and
therefore probably makes too many commitments for the faint hearted ? see
paper from ISWC 2013 here http://ceur-ws.org/Vol-1063/paper1.pdf
O&M was also one of the core inputs to the W3C Semantic Sensor Network
ontology, reported here:
http://www.w3.org/2005/Incubator/ssn/wiki/Incubator_Report though that
focussed on the sensors and observations side of the equation, and hardly
deals with sampling.
Hope this helps.
...
...
Date: Thu, 21 Aug 2014 18:52:06 -0800
From: Matt Jones <jones@nceas.ucsb.edu<mailto:jones@nceas.ucsb.edu>>
Subject: Re: [tdwg-content] Darwin Core: proposed news terms for
        expressing sample data
To: ?amonn ? Tuama [GBIF] <eotuama@gbif.org<mailto:eotuama@gbif.org>>
Cc: TDWG Content Mailing List <tdwg-content@lists.tdwg.org<mailto:
tdwg-content@lists.tdwg.org>>
Message-ID:
...
Content-Type: text/plain; charset="utf-8"
This proposal is treading on ground that is quite similar to other
observations and measurements standards for data exchange that are
already
mature, in particular:
* OGC Observations and Measurements (
http://www.opengeospatial.org/standards/om)
* Extensible Observation Ontology (OBOE;
https://semtools.ecoinformatics.org/oboe)
The former is a standard and broadly deployed, whereas the latter is
<CAFSW8xkx7uRP9PC2g3=JT_VJanqujH8nPXoz8GXwh+JwKw5Ccw@mail.gmail.com
<mailto:JT_VJanqujH8nPXoz8GXwh%2BJwKw5Ccw@mail.gmail.com>>
part
...
of a research program in the use of ontologies for measurements.
Through
collaboration between the two projects, they've been modified to be
reasonably isomorphic, but O&M uses an XML serialization while OBOE uses
an
OWL-DL serialization. They largely express the same measurements and
sampling model once one gets beyond the terminology differences.
So, I'm wondering if it make much sense to extend Darwin Core, which is
at
heart an Occurrence exchange syntax, into this measurements area that is
well represented by these other existing specifications?  I'm curious to
hear why people would even want to do this.  And if we do go down this
path, won't we just end up with a new syntax that does essentially what
O&M
and OBOE do now?
Matt