Hi Éamonn,
Thanks for the feedback. You are quite right that the criticism of ambiguous interpretation of terms extends to many DwC terms. I was not trying single out the proposed terms in this regard, merely show how the ambiguity that exists across many DwC terms could also plague the proposed ones. As you indicate, good documentation - including definitions and instructions that are as explicit as possible - is key to insuring that the terms you suggest are used properly. In my opinion, there is not shame in very long definitions - better to say exactly what you mean than worry about brevity!
I look forward to discussing this with you in person at TDWG.
Ramona
------------------------------------------------------ Ramona L. Walls, Ph.D. Scientific Analyst, The iPlant Collaborative, University of Arizona Research Associate, Bio5 Institute, University of Arizona Laboratory Research Associate, New York Botanical Garden
On Wed, Sep 3, 2014 at 7:44 AM, Éamonn Ó Tuama [GBIF] eotuama@gbif.org wrote:
Hi Ramona,
The idea of applying domains and ranges to the proposed properties was never entertained because, as you point out, the absence of these is a feature designed into DwC to make it maximally re-usable. Given the (deliberate) lack of semantic rigour in Darwin Core, the question comes down to how well suited it is for use in Darwin Core Archives as an exchange format for what we have been referring to as "sample-based" data. I would like to re-emphasise that in doing this we are not trying to establish how data should be captured or modeled, only one way that they should be exposed to maximize discoverability and reuse, whether that be only a subset/view of some aspects of a data set. We are most definitely not trying to shoe-horn the complexity of the OGC O&M / OBOE model on to DwC.
We had already envisaged use of some additional flag like "sample" to indicate the nature of the evidence for the data but that will require addressing the issue recognised by TDWG in Florence of the need to replace basisOfRecord with some new hierarchical vocabulary for evidence(Type) as this is what we mostly (mis)use basisOfRecord for. That was an omission from our description - so thanks for highlighting it.
Your examples of how the vagueness of domain and range values for the proposed terms can lead to ambiguities in interpretation is valid from a strict ontological approach but can probably be levelled at all DwC properties, e.g., DwC itself does not enforce an obligatory pairing of lat and long values so if one is missing, then you are left without a location. So, yes, quantity and quantityType need to co-occur for interpretation as do samplingUnit and samplingEffort, and both pairings need to be present in order to interpret the figures correctly. If any of the pairings are incomplete, at most you will be able to say there was an occurrence of taxon X at the event location and it was recorded as part of a sampling event that used a particular protocol. The term "quantity" is not about a count of toads in a museum jar, rather it refers to the number of occurrences per samplingUnit, in this case, e.g., 9 toads were recorded per M^2 (the samplingUnit). In the end, our approach will only work with good documentation and guidance, by making the IPT as user-friendly as possible, possibly doing completeness checks, etc.
I'm still looking to the BCO to provide a comprehensive framework for observations/samples. But, as you point out, it's not there yet. Unfortunately, your suggestion of using EML as an exchange format for sample data does not meet our needs. We already include EML for general metadata but the whole point of DwC-A is that we can map to some standardised vocabulary, particularly in this case, for basic quantitative information around a data set associated with a particular sampling protocol.
This discussion has been very useful and has prompted us to review our proposal to ensure we apply the most minimally disruptive solution given the needs of the GBIF/EU BON community.
Éamonn
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Ramona Walls *Sent:* 02 September 2014 05:39 *To:* TDWG Content Mailing List *Subject:* Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 14
Thank you, Simon, for that explanation and the links. They were very helpful. Amen to the point: "There is no ?sample? class, because it is such an overloaded word (noun, verb, statistical sample vs ex-situ sample, etc)." The documents you shared highlight the very important point that OGC and OBO-E were designed specifically to describe observations.
Darwin Core, on the other hand, was designed to capture information about "taxa, their occurrence in nature as documented by observations, specimens, samples, and related information" [1]. As such, observations are not central to Darwin Core, but rather are included as evidence of the occurrence of a taxon in nature. It works for communicating basic information about an observation or other evidence of a taxon's occurrence, but I think it would be mis-using and abusing DwC to try to shoe-horn the complexity of observation data/metadata into it. It also does some dis-service to the communities who have spent so much time developing OGC and OBO-E.
Eamonn, this is not meant to discredit the work that you and your colleagues have done to develop a DwC archive schema for sampling data. I think it is an important step toward developing a comprehensive framework for biodiversity data, and just by proposing it, we have moved a step in the right direction (even if I disagree about adopting it). Your point that OBO-E is far more complex is true, and we may have to adopt more terms if we accurately want to describe observation data in DwC. On the other hand, we do not need to necessarily adopt every aspect of OBO-E to exchange observation data.
What the BCO participants -- and thanks to all the GBIF people who have participated! -- are trying to do is build a framework that can work across many (not necessarily all) types of biodiversity data, including specimen collection and observations, while considering existing efforts such as DwC, MIxS, OBO-E, and OBO Foundry ontologies. We started with specimens, but the intention has always been to link to observation data as well [2]. Although the full BCO model probably will be large and complex, we fully intend to offer views that are basically subsets of the ontology filtered for applications. This is regular practice now in application ontologies. Views makes it possible to provide a controlled vocabulary for data annotation without burdening annotators with a confusing array of terms and logical definitions.
However, the point that BCO is not yet ready for your needs is correct, and I would never tell anyone to just "hold on to your data until the ontology is ready". Did you examine the possibility of using EML as an exchange format for the sampling/survey related data? DwC-A already has an EML component, so I wonder if some combination of an occurrence core with an extended EML document (based on OGC) would work.
Ramona
[1] http://rs.tdwg.org/dwc/index.htm [2] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089606
Ramona L. Walls, Ph.D. Scientific Analyst, The iPlant Collaborative, University of Arizona Research Associate, Bio5 Institute, University of Arizona Laboratory Research Associate, New York Botanical Garden
On Fri, Aug 29, 2014 at 3:00 AM, tdwg-content-request@lists.tdwg.org wrote:
Send tdwg-content mailing list submissions to tdwg-content@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-content or, via email, send a message with subject or body 'help' to tdwg-content-request@lists.tdwg.org
You can reach the person managing the list at tdwg-content-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-content digest..."
Today's Topics:
- Re: Darwin Core: proposed news terms for expressing sample data (Simon.Cox@csiro.au)
- Re: tdwg-content Digest, Vol 63, Issue 6 (sigh) (?amonn)
Message: 1 Date: Fri, 29 Aug 2014 07:58:14 +0000 From: Simon.Cox@csiro.au Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data To: tdwg-content@lists.tdwg.org Message-ID: < 2A7346E8D9F62D4CA8D78387173A054A5FFF5801@exmbx04-cdc.nexus.csiro.au> Content-Type: text/plain; charset="utf-8"
G?day again TDWGers:
Matt passed on links to this thread to me and suggested I comment, as I was the author of the O&M standard (published as ISO 19156:2011 and OGC Abstract Spec Topic 20).
For those who are not aware of this work, there is a short Wikipedia page http://en.wikipedia.org/wiki/Observations_and_Measurements whose main value is it has links to a number of more detailed resources. Probably the richest of these is another Wiki page at CSIRO https://www.seegrid.csiro.au/wiki/AppSchemas/ObservationsAndSampling which hasn?t been updated much recently, but at least has some diagrams embedded. As Matt and others have hinted, as a result of a workshop at NCEAS a few years ago, there were some tweaks to allow it to meet some of the requirements identified in OBOE, just in time to beat the ISO deadline!
O&M includes a generic model for ?Sampling Features? ? being those artefacts that are created to assist the observation process, but would not exist and have very much interest otherwise. Things like specimens, transects, sections, quadrats, scenes and swaths, drillholes, flightlines, trajectories, ships tracks, etc. Because it is a generic standard, you won?t find things with names familiar to any particular discipline, and there are a lot of stub classes for supporting information which need filling out for specific applications. But the intention is that it provides a framework for a discipline or community to specialize for their purposes, while retaining some topology and perhaps terminology (maybe just as super-classes) that help with information sharing across discipline boundaries. The main properties of a sampling feature are
The sampledFeature ? being the domain object which it is being
used to characterize
Related sampling features ? other features related to the
observational strategy
Related observations ? observation events that use this
sampling feature (for which another generic model is provided) We?ve generally found it helpful in teasing apart observational records and protocols in a variety of environmental science applications, and other have applied it in oceans, meteorology, even air-traffic control!
The primary classification of sampling features in O&M is by topological dimension (point, curve, surface, solid), because these are commonly used and afford common processing methods. ?Specimen? is the other concrete sampling-feature type. There is no ?sample? class, because it is such an overloaded word (noun, verb, statistical sample vs ex-situ sample, etc).
O&M and its Sampling Feature model was designed in UML. As Matt notes that the original implementation in the OGC context was in XML, using GML http://schemas.opengis.net/samplingSpecimen/2.0/specimen.xsd and http://schemas.opengis.net/samplingSpatial/2.0/spatialSamplingFeature.xsd . However, it has been implemented other ways: there is an OWL2/RDFS representation at http://def.seegrid.csiro.au/isotc211/iso19156/2011/sampling which is linked in with OWL versions of a bunch of the other ISO standards, and therefore probably makes too many commitments for the faint hearted ? see paper from ISWC 2013 here http://ceur-ws.org/Vol-1063/paper1.pdf O&M was also one of the core inputs to the W3C Semantic Sensor Network ontology, reported here: http://www.w3.org/2005/Incubator/ssn/wiki/Incubator_Report though that focussed on the sensors and observations side of the equation, and hardly deals with sampling.
Hope this helps.
Date: Thu, 21 Aug 2014 18:52:06 -0800 From: Matt Jones <jones@nceas.ucsb.edumailto:jones@nceas.ucsb.edu> Subject: Re: [tdwg-content] Darwin Core: proposed news terms for expressing sample data To: ?amonn ? Tuama [GBIF] <eotuama@gbif.orgmailto:eotuama@gbif.org> Cc: TDWG Content Mailing List <tdwg-content@lists.tdwg.org<mailto:
tdwg-content@lists.tdwg.org>>
Message-ID:
<CAFSW8xkx7uRP9PC2g3=JT_VJanqujH8nPXoz8GXwh+JwKw5Ccw@mail.gmail.com mailto:JT_VJanqujH8nPXoz8GXwh%2BJwKw5Ccw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
This proposal is treading on ground that is quite similar to other observations and measurements standards for data exchange that are
already
mature, in particular:
- OGC Observations and Measurements (
http://www.opengeospatial.org/standards/om)
- Extensible Observation Ontology (OBOE;
https://semtools.ecoinformatics.org/oboe)
The former is a standard and broadly deployed, whereas the latter is
part
of a research program in the use of ontologies for measurements.
Through
collaboration between the two projects, they've been modified to be reasonably isomorphic, but O&M uses an XML serialization while OBOE uses
an
OWL-DL serialization. They largely express the same measurements and sampling model once one gets beyond the terminology differences.
So, I'm wondering if it make much sense to extend Darwin Core, which is
at
heart an Occurrence exchange syntax, into this measurements area that is well represented by these other existing specifications? I'm curious to hear why people would even want to do this. And if we do go down this path, won't we just end up with a new syntax that does essentially what
O&M
and OBOE do now?
Matt