[tdwg-content] Darwin Core: proposed news terms for expressing sample data

Simon.Cox at csiro.au Simon.Cox at csiro.au
Tue Sep 2 06:31:55 CEST 2014

Hi Ramona - 

I understand your concern, though I would counterpoint that the only real reason to collect and curate a specimen is to support observations, either contemporaneously or at some future time. 
So it could be seen as slightly perverse to suggest that a model for specimens and samples could be divorced from the notion of observations. 

FWIW I'm right now trying to develop a simplified SamplingFeatures ontology, still conceptually based on the ISO 19156 model, but with no commitments to marginal ontologies (i.e. lift it out of the ISO 19100 ghetto). This has led me to consider re-use of more standard ontologies. W3C Prov-O is interesting. Since a lot of the information that you would want to record about a specimen concerns its provenance, then it probably makes sense to align with prov. Currently I have 

sam:Specimen  a           owl:Class ;
        rdfs:comment      "A Specimen is a physical sample, obtained for observation(s) normally carried out ex-situ, sometimes in a laboratory."^^xsd:string ;
        rdfs:label        "Specimen"@en ;
        rdfs:subClassOf   prov:Entity , sam:SamplingFeature ;
        rdfs:subClassOf   [ a                owl:Restriction ;
                            owl:cardinality  "1"^^xsd:nonNegativeInteger ;
                            owl:onProperty   sam:samplingTime
                          ] .


Date: Mon, 1 Sep 2014 20:38:33 -0700
From: Ramona Walls <rlwalls2008 at gmail.com>
Subject: Re: [tdwg-content] tdwg-content Digest, Vol 63, Issue 14
To: TDWG Content Mailing List <tdwg-content at lists.tdwg.org>
	<CAJYF1k6wcwPWtUt6ZHr8OgEcBJVYHUq0jiYc0hqEJeMY_kQ=1A at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thank you, Simon, for that explanation and the links. They were very
helpful. Amen to the point: "There is no ?sample? class, because it is such
an overloaded word (noun, verb, statistical sample vs ex-situ sample,
etc)." The documents you shared highlight the very important point that OGC
and OBO-E were designed specifically to describe observations.

Darwin Core, on the other hand, was designed to capture information about
"taxa, their occurrence in nature as documented by observations, specimens,
samples, and related information" [1]. As such, observations are not
central to Darwin Core, but rather are included as evidence of the
occurrence of a taxon in nature. It works for communicating basic
information about an observation or other evidence of a taxon's occurrence,
but I think it would be mis-using and abusing DwC to try to shoe-horn the
complexity of observation data/metadata into it. It also does some
dis-service to the communities who have spent so much time developing OGC
and OBO-E.

Eamonn, this is not meant to discredit the work that you and your
colleagues have done to develop a DwC archive schema for sampling data. I
think it is an important step toward developing a comprehensive framework
for biodiversity data, and just by proposing it, we have moved a step in
the right direction (even if I disagree about adopting it). Your point that
OBO-E is far more complex is true, and we may have to adopt more terms if
we accurately want to describe observation data in DwC. On the other hand,
we do not need to necessarily adopt every aspect of OBO-E to exchange
observation data.

What the BCO participants -- and thanks to all the GBIF people who have
participated! -- are trying to do is build a framework that can work across
many (not necessarily all) types of biodiversity data, including specimen
collection and observations, while considering existing efforts such as
DwC, MIxS, OBO-E, and OBO Foundry ontologies. We started with specimens,
but the intention has always been to link to observation data as well [2].
Although the full BCO model probably will be large and complex, we fully
intend to offer views that are basically subsets of the ontology filtered
for applications. This is regular practice now in application ontologies.
Views makes it possible to provide a controlled vocabulary for data
annotation without burdening annotators with a confusing array of terms and
logical definitions.

However, the point that BCO is not yet ready for your needs is correct, and
I would never tell anyone to just "hold on to your data until the ontology
is ready".  Did you examine the possibility of using EML as an exchange
format for the sampling/survey related data? DwC-A already has an EML
component, so I wonder if some combination of an occurrence core with an
extended EML document (based on OGC) would work.


[1] http://rs.tdwg.org/dwc/index.htm

Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden

More information about the tdwg-content mailing list