[tdwg-content] New Darwin Core terms proposed relating to material samples

Richard Pyle deepreef at bishopmuseum.org
Fri May 31 06:24:21 CEST 2013


Thanks Ramona;

 

Actually, the basic elements of our data mode precede DwC by quite a bit.
What we’ve tried to do, however, is mold the data model to be more
compatible with DwC, to make the task of mapping for data export & exchange
that much easier.  Of course, DwC is not (and is not intend to be) a data
model in any sense of the word; however, it’s impossible to avoid
representing core elements of a bona-fide data model within DwC.  This is
especially true when it comes to each of the “ID” terms (and
doubly-especially true when the “ID” terms correlate to class terms).  The
existence of an “ID” term implies that some class of object exists to which
an “ID” value is applied.  The “ID” value itself is never useful
data/metadata – it is just a way to reference a data record that
(presumably) contains properties that can be expressed as data/metadata for
the object represented by the “ID” value.

 

This was all well-understood when the original DwC was being drafted; but as
it evolved into its current iteration (with the addition of all the “ID”
terms), it has been drawn ever more (in some ways subtly, and in some ways
not-so-subtly) in the direction of a data model.

 

Of course, what we all (desperately!) need is a robust ontology that fits
our world.  The task is not easy in part because our data domain is not so
easily modeled, in part because different sections of our broader community
have different priorities, and in part because there is always a delicate
balance between developing a model or ontology that is practical and useful
for the data providers and consumers, with one that is robust and detailed
and flexible, to allow asking questions of the data that were never even
considered at the time the model/ontology were conceived.  The parallel
experience in database modeling is normalization (as Paul Kirk likes to say:
“Normalize until it hurts; then de-normalize until it works”).

 

The original DwC was completely flat.  The current DwC moved into the
direction of more complex structures by clustering terms into classes and
sprinkling with “ID” terms.  It even tip-toed into RDF-space with
dwc:ResourceRelationship.  I think that’s definitely an improvement, but it
still must strike a delicate balance between the needs by some to represent
a robust data model, and the needs by others to have a simple/practical
mechanism to exchange biodiversity data in a standard form.  It will never
be all things to all people; but at least it is enough things to enough
people that it represents an important “flag pole” around which our
community has (more or less) successfully rallied.

 

Hmmmm
. Now I’ve forgotten what my point was.  I guess I was just in a
ramblin’ mood. Well
.sorry about the bandwidth!

 

Aloha,

Rich

 

From: Ramona Walls [mailto:rlwalls2008 at gmail.com] 
Sent: Thursday, May 30, 2013 6:05 PM
To: Richard Pyle
Cc: Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Jason,

Thanks for sharing how you have been using the Darwin Core terms. I am
intrigued by the data structure you have developed. It is quite interesting
how both you and Rich have adapted the DwC to fit your specific needs. While
I am often troubled by the vagueness of DwC, I guess in some ways it is that
vagueness allows it to be used in many different applications. Of course, I
don't think vagueness is necessary for wide application, or a good thing for
data exchange, but it does seem to be working for a lot of different
purposes.

Ramona

 

On Thu, May 30, 2013 at 3:07 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Hi Jason,

 

Many thanks for this input.  If I understand you correctly, then you are
using “Encounter” as equivalent to what we have been using “Occurrence” for.
That is, by our definition, an “Occurrence” is the instance representing an
intersection between an Event (i.e., where, when, who, etc.) and what we
have been calling an “Individual” (i.e., what); and the properties we attach
to the Occurrence are the “how” bits (including things like size, etc.).

 

In my mind, the essence of an “Individual” is the collective physical
material of the individual.  If I see a fish on a reef, its “Occurrence” on
that reef and at that time exists (and is worth documenting) regardless of
whether I took an image of it (what we would call “Evidence”), or whether I
took a tissue sample from it, or whether I collected and killed the whole
damn thing.  To me, the “essence” of the individual – or its occurrence at
an event – is unaffected by what I end up doing to it.  By extension,
following a hierarchical model of “individual”, a sub-sample
(materialSample) extracted from it is just another instance of “Individual”.
This is why I generally think of “materialSample” (if it were represented as
a class – which it is currently not propsed for DWC) as a subclass of a
broader concept (e.g., “material”, but what I have naively been referring to
as “Individual”).

 

That part of our model has proven to be very stable and effective for
representing the information as we want it.

 

Where it gets complicated is instances of taxonomically heterogeneous
objects treated as a single “individual” – which (in my mind) includes such
things as soil samples.

 

In that contect, I see (and agree) with John and others that really it’s a
separate axis of classification from what I have called “Individual”.

 

I don’t expect that to make a lot of sense (I barely understand it myself).

 

Aloha,

Rich

 

From: Jason Holmberg [mailto:holmbergius at gmail.com] 
Sent: Thursday, May 30, 2013 11:28 AM
To: Richard Pyle
Cc: Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Hi everyone.

 

List lurker here. DWC has been a great inspiration in my work, so I hope I
can contribute some small amount of insight on the "individual" and
"material sample" threads. I have no grand thoughts on the subject, but I
can tell you how the DWC has inspired my own information architecture for
open source mark-recapture software:

 

http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview

 

I felt the very clear need for a distinct Individual Class and to separate
that from the concept of a sample taken from nature. When reviewing DWC, I
interpreted Occurrence.individualCount to be somewhat contradictory to
Occurrence.individualID, so I created a one-individual-at-a-point-in-time
class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence
I then broadened to include the potential for multiple marked individuals.

 

I neither present this as "right" nor "good" (though they have worked very
well for us). I just present it as a practical example from mark-recapture
in which we have tried to adhere to DWC in order to expose data to GBIF,
iOBIS, etc. The concepts of "material sample" and "individual" are very
important to us, and this is how we have defined them.

 

 

Cheers,


Jason Holmberg
ECOCEAN Whale Shark Photo-identification Library
http://www.whaleshark.org

Please consider adopting a shark to support our mission:
http://www.whaleshark.org/adoptashark.jsp

 

On Wed, May 29, 2013 at 4:13 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Hi Ramona,

 

Yes, I agree, and thanks.  I’ve always felt that there has been a trend
towards trying to push too much “ontology” (or other semantic meaning) onto
DWC terms and classes, when DWC was fundamentally intended to represent an
mechanism for data exchange; not a mechanism to describe the ontological
landscape of biodiversity data.  The only reason I brought this up now (and,
I think, why we discussed it in 2010), is that the term “individualID” in
DWC sort of hinted that something like “Individual” was the “forgotten
class” for DWC.  I sincerely hope that BCO and DSW gain more traction (and,
ideally, harmony between them) than earlier attempts at developing
ontologies in this space have met – and clearly that is the right path
forward.

 

My main concern for this thread (and the reason I engaged in it), was to:

1)      Find out the status of the discussions that began in 2010; and

2)      Clarify where the current materialSample proposal overlaps, or does
not overlap, with that earlier effort.

 

Steve has very adequately answered the first question, and you, John, and
others have answered the second, and I’m happy with both sets of answers.

 

I’m sorry for the voluminous exchange, but I felt the discussion was both
important, and very helpful (certainly to me).

 

Aloha,

Rich

 

From: Ramona Walls [mailto:rlwalls2008 at gmail.com] 
Sent: Wednesday, May 29, 2013 1:03 PM
To: Richard Pyle
Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List;
Robert Whitton


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Hi Rich,

Sorry I didn't mention this sooner, but your emails were also helpful to me
in describing an important and generalizable use case. 

I don't know whether or not the TDWG community is ready to deal with the
level of abstraction we are talking about, but my assessment is that whether
or not they are ready, the Darwin Core is not constructed to deal with it.
That is why (among other reasons) we started work on the BCO, and perhaps
one reason why Steve and others developed DSW. 

Our goal with the material sample proposal was not to overhaul DwC, but to
work within the DwC framework to make it more compatible with other
standards such as MIxS. That is why we tried to keep our proposal fairly
narrowly focused.

Ramona

 

On Wed, May 29, 2013 at 3:21 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Thanks, Ramona – this is an *extremely* helpful email! It helps clear things
up a lot in my mind.

 

Just to be clear, what I am looking for is the notion of a defined physical
object (what I think you mean by “material entity”), and I explicitly mean
the material entity itself.  Yes, there is information (properties) that
relate to that material entity, but to me that is a separate issue.  What I
would like to see clearly defined is the class representing the material
(physical) entity – which seems to me to be a superclass of what
materialSample is intended to represent.

 

Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of
abstraction (unfortunately, I absolutely have to, because “Occurrence” is
simply way too overloaded a class for me to use independently of what I have
been calling “individual” and what I have been calling “Evidence”).  In that
case, I guess the best thing to do is accept materialSample as a
basisOfRecord for Occurrence and move on.  But this is more or less the same
thing that happened the last time we engaged in this conversation (2 years
or so ago), and I was hoping this conversation about materialSample could
leverage progress on the larger issue.

 

As I’ve said before, the last thing I want to do is confuse or otherwise
slow down the process of incorporating the term “materialSample” into DWC.
It’s just that I saw enough overlap with that “other” issue, that I was
hoping we could find a reasonable pathway forward on both.

 

Thanks again for the very helpful comments.

 

Aloha,

Rich

 

From: Ramona Walls [mailto:rlwalls2008 at gmail.com] 
Sent: Wednesday, May 29, 2013 9:14 AM
To: Richard Pyle
Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List;
Robert Whitton


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Rich,

I now understand more fully what you are asking for ( a clear definition
goes a long way!). A material sample, as we discussed it at the Kansas and
Oxford workshops, does indeed need to be physically removed from its
environment. This is also the case with the OBI term material sample, which,
as a subclass of OBI:specimen is the output of some collecting process. It
is true that concept of material sample could be defined to include sampling
in an observational sense, but that is not how it is defined at this point.
Based on this, material sample is NOT the term you are looking for or
defined as :

"The category of information pertaining to the physical basis of a sampling,
subsampling, or observational event. In biological collections, the
[SuperclassTerm] is typically a defined group of organisms, a single whole
organism, or a part of a whole organism that is collected or otherwise
documented in nature, and either preserved, destructively processed, or
documented through some form of Evidence (such as images or reported visual
observations)."

 

What you have defined is a category of information (whatever that may be)
that pertains to some material entity. Not the material entity itself, but
information about that entity. The "SuperclassTerm" you refer to in the
definition sounds an awful lot like a material entity from the Basic Formal
Ontology, which is used for defining material sample in OBI and the
Bio-collections Ontology.

Ramona

 

On Wed, May 29, 2013 at 11:51 AM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Many thanks, John.  This is extremely helpful!

 

First of all, in the context of a distinct term for basisOfRecord, I see
absolutely no problem with adding the term “MaterialSample”. I fully support
your proposal (although if this is simply a basisOfRecord term to be used
alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen,
HumanObservation, MachineObservation; does it need a defined “ID” term? Do
all the others have defined “ID” terms?). 

 

However, I’m excited by this conversation because I think we are very close
to solving a bigger problem (which was the focus of the 2010 discussion on
this list around “IndividualOrganism”).

 

This bigger problem involves the need for a defined “concept” (I’m
hesitating to say “class”), and an associated “ID”, in dwc that refers to
the physical/material basis of an Occurrence.  We don’t yet have a term for
this concept in dwc (“IndividualID” hinted at the need for one, but that
term was not well-defined, and the term itself seems to cause confusion).
As Steve Baskauf and I have both been advocating for the establishment of
new class in dwc for exactly this purpose, I just want to make sure that
we’re on the same page about what each concept is.  The more I understand
about what you need for “materalSample”, the more convinced I am that both
of our needs can be met with the same concept.

 

I am perfectly happy to adopt the term “MaterialSample”, but I guess it all
boils down to this: In order for something to be a “MaterialSample”, must it
necessarily be removed from nature?   

 

If the answer is “No”, then I think we can merge the two concepts into one.

 

If the answer is “Yes”, then I think “materialSample” is best characterized
as a subclass of what Steve and I have been pushing for (setting aside, for
the moment, the additional complexity of taxonomically homogeneous vs.
heterogeneous).

 

In the latter case, I would define the superclass (whatever term is used for
it), along the lines of:

 

"The category of information pertaining to the physical basis of a sampling,
subsampling, or observational event. In biological collections, the
[SuperclassTerm] is typically a defined group of organisms, a single whole
organism, or a part of a whole organism that is collected or otherwise
documented in nature, and either preserved, destructively processed, or
documented through some form of Evidence (such as images or reported visual
observations)."

 

Aloha,

Rich

 

 

From: jdeck88 at gmail.com [mailto:jdeck88 at gmail.com] On Behalf Of John Deck
Sent: Wednesday, May 29, 2013 4:01 AM
To: Richard Pyle
Cc: Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton;
Ramona Walls


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Since the original proposal was from a group of folks, we decided to put our
heads together to construct a general response to the various issues and
ideas expressed on this thread. 

 

John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek

 

In the text of the  issue submitted for MaterialSample (
<https://code.google.com/p/darwincore/issues/detail?id=167>
https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases
where the current basisOfRecord terms pertaining to the Occurrence class
(Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen,
HumanObservation, MachineObservation) do not adequately cover certain cases,
including: environmental sample (for metagenomic analysis), transcriptomes
(measuring genes but not taxa), and destructive samples (e.g. tissues
destructively sampled in order to generate genomic DNA).  The term we
borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad
enough to be utilized across various cases that fulfill our criteria while
still maintaining a consistent, clear and human understandable meaning.  For
our purposes, we can think of “Material Sample” as any type of matter that
we can use in order derive further evidence needed for identifications, and
taxa, whether it is taxonomically homogenous, heterogenous, a single
individual, sets of individuals, or populations. 

 

How is MaterialSample different from Individual?  The intent of individualID
is fairly clear:  since an Occurrence represents an organism at a place and
time (per Markus’ email), the individualID term allows us to assign an
instance identifier for a particular organism that can be present in
multiple events. MaterialSampleID, on the other hand, is intended to allow
users to say that the basis of an occurence is a material entity (i.e.
matter) that has been sampled according to some particular method. Whether
or not this material entity is an individual (sensu individualID in DwC) is
an independent axis of classification. As was already pointed out, there is
no restriction on specifying that an occurence is associated with more than
one type, so any occurrence can have both an individualID and a
materialSampleID.

 

We maintain our position on the proposal for MaterialSample as a value for
the basisOfRecord, with an associated materialSampleID to identify instances
of them. Per Steve’s initial comments, we have already withdrawn the
proposal for a MaterialSample class distinct from that in the Darwin Core
type vocabulary, which should make it easier to evaluate the implications of
what we’re discussing.  

 

********************

 

NOTES, MaterialSample from OBI:


OBI has fairly broad definitions of samples & specimens that are meant to be
utilized across many different scientific activities.  Material Sample is
defined as a “material entity that has the material sample role”, while a
material sample role is defined as “ a specimen role borne by a material
entity that is the output of a material sampling process”, and a material
sampling process is “a specimen gathering process with the objective to
obtain a specimen that is representative of the input material entity”.  

 

 

 

 

On Mon, May 27, 2013 at 11:59 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Hi Markus,

Great question!  Particularly because this is exactly the sort of use case
we designed our model around.


> if you take a tissue sample of the same tree every year, would the
identifier
> in individualID be the same for all of them or be different? WIth the
current
> dwc:individualID definition it would be the same for all samples. If I
> understand you correct each sample would have its own "individual"
> identifier in your proposal? It can't see how you can collapse these two
things
> into one definition.

No, that is not how we would handle it.

In our model, there would be one IndividualID to represent the tree,
spanning the time period beginning (more or less) when the seed was
germinated, until the time at which the entire physical structure of the
tree was disintegrated.  It is an individual tree.

There would be multiple Occurrence instances, for each time that someone
observed or sampled or otherwise wished to document some condition of that
tree. All of these Occurrence instances would refer to the same individualID
value (i.e., the "tree").  In the example above, this means there would be a
different Occurrence instance for each year that a sample is taken --
because in each case, an assertion that the full tree existed at a certain
time and place can be made (I understand that trees tend not to move around
very much, so the Location for each event associated with each Occurrence
would, in this case, remain the same; but the other Event properties -- such
as eventID, samplingProtocol, samplingEffort, eventDate, eventTime,
startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat,
fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for
each sampling Occurrence instance).

Suppose that the tree is visited every month, but only sampled once per
year.  In that case, there would be an Occurrence record for every monthly
visit.  In other words, an Occurrence instance exists regardless of whether
a physical sample was made or not.  Any in-situ images made of the tree
would likewise be associated with the specific Occurrence instance, and each
image would represent a separate instance of "Evidence".

Now, let's focus on the annual samplings.  Every time a new sample is taken
from the tree, at least one new instance of Individual (with a unique
individualID value) is created to represent the sample.  This sample
(individual instance) may be a "gathering" (set of multiple individual
specimens gathered at the same time), or it may be a single specimen, or it
may be simply a tissue sample intended for destructive analysis.  In any
case, it's a new individual instance derived from the "parent" individual
instance representing the whole tree.  In our implementation, "Individual"
can be hierarchical, such that a whole-organism tree can be sub-sampled with
many "child" instances of "gatherings" (say, one gathering each year), and
each gathering may have multiple child "specimen" individuals (e.g.
individual botanical sheets created from the multiple items of a single
gathering), and each specimen may have further "child" subsamples extracted
for DNA analysis (or whatever), and the hierarchy can continue on down to
whatever derivatives that people feel a need to keep track of (e.g.,
aliquot).

The point is, all Individual instances are well-defined physical objects (or
well-defined sets of physical objects), and they can be arranged in a
n-tiered hierarchy.

Moreover, each Individual that can be characterized as a "sample" (what we
refer to as a "CollectionObject") may also have a property value for
"CollectionOccurrenceID" -- which refers to the specific Occurrence instance
at which the sample was obtained.

So, for example, if the tree is visited on May 27, 2013 and a specimen
(sample) is taken, then:
1) An Event instance is generated to represent the event where the tree was
visited;
2) An Occurrence instance is generated, which refers to the new EventID, and
the existing IndividualID for the whole tree, and includes whatever other
Occurrence properties are relevant for the tree at the time of this
Occurrence
3) An Individual instance is generated for the specimen, which has a
property value for parentIndividualID that refers to the individualID for
the whole tree, and a property value for collectionOccurrenceID that refers
the Occurrence instance where the specimen was collected.

So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same Individual
instance representing the whole tree (and, hence can be collapsed to the
same IndividualID value).
- Any Individual can have derivatives that are themselves unique Individual
instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their
associated logical constraints.

At some point, I will assemble a set of other specific use cases, and how we
manage them through our use of the "Individual" instance (although I will
probably not use the word "Individual", as this seems to cause too much
confusion in these discussions).

Aloha,
Rich





 

-- 
John Deck
(541) 321-0689 <tel:%28541%29%20321-0689> 

 

 


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20130530/5e262f9c/attachment-0001.html 


More information about the tdwg-content mailing list