[tdwg-content] New Darwin Core terms proposed relating to material samples

Richard Pyle deepreef at bishopmuseum.org
Fri May 31 21:56:41 CEST 2013


OK, thanks.  Now I understand it.  This is all related to the taxonomic
homogeneous/heterogeneous thing.  One thing I should caution, using your
example data below:

 

If we assume that JohnDeckGutSample1 was extracted from JohnDeck after
JohnDeck was extracted from nature, then we have to be careful about
inferring that the organism Bacteria501 has an occurrence related to the
place & time where JohnDeck was extracted from nature.  In other words, we
can’t reliably connect Bacteria501 with the Occurrence of JohnDeck in
nature, because Bacteria501 might have entered the gut of JohnDeck at some
later time (e.g., during decomposition).

 

I also disagree that the location where the gut sample was taken is
fundamentally different where the organism was extracted from nature.  We
definitely need be able to distinguish between Occurrences representing
“natural” place+time+organism instances, from “articifical” instances.
However, you can’t simply say “extracting a tissue in a lab is fundamentally
different from extracting an organism in nature”  The reason is that there
is a very rich spectrum between those two end points, and no clear place
along that spectrum where a line can be drawn.  What about a specimen of a
“naturalized” species in a certain location?  What about an organism taken
from nature that was born of parents that were brought to that place by
humans?  What about the organisms that were themselves brought by humans,
then released, then recaptured?  What about captive organisms or plans in a
person’s garden?  This spectrum continues all the way down to extracting a
gut sample from a specimen collected in Moorea, in a lab at Berkeley. The
degree of “naturalness” of an Occurrence is certainly important, but it’s
not Boolean, and it’s only one axis of interest, so we shouldn’t simply
assume dc:location represents some kinds of locations, but not others.

 

I think the basic problem is that, as has already been stated, DwC emerged
from the collections-based world, where one specimen = one occurrence, and
that occurrence is naturally regarded as being the occurrence when+where the
specimen was extracted from nature.  Now that we have such a diversity of
data we are trying to manage, this Occurrence-centric approach (with its
overloaded notion of an “Occurrence”) is being stretched to the breaking
point.

 

I think we should trend towards leaving DwC as a simple data exchange
paradigm, and focus these more complex conversations on a next-gen ontology
for biodiversity data.  I realize that’s already happening; but it seems
like the “center of mass” for conversation should shift from DwC to the
biodiversity ontology domain.

 

Aloha,

Rich

 

From: jdeck88 at gmail.com [mailto:jdeck88 at gmail.com] On Behalf Of John Deck
Sent: Friday, May 31, 2013 9:17 AM
To: Richard Pyle
Cc: TDWG Content Mailing List; Robert Whitton; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Since it was a gut sample, we'll be seeing lots of stuff in there, maybe
even 1000 different taxa each one can be a distinct occurrence as
JohnDeckOccurrence124 and JohnDeckOccurrence125 are different taxa.  

 

Now, in the sense of Event/location i'm taking the location details to be
when it was isolated from nature and thus the location would be the same as
the wholeorganism location.  In our recent workshop on this same issue
recently in Copenhagen we went around about this issue for awhile but
decided that we should take "location" to mean the the location at which
whatever parent organism was isolated in nature.  Certainly the location
where the gut contents were extracted (e.g. in the lab) is important too but
that is something different and not represented by dc:location or in dwc in
general.  This is something of an interpretation of the actual term but it
since our implementation model of DwCA is still using "occcurrence" at the
core we probably don't have much choice since GBIF does not use a
graph-based parser.  The other option is to represent this all using
event/location at the core of a DwCA but felt this introduced yet more
complexities and breaks other cases where we would want to hang data off of
occurrence (e.g. Identification), and would not be able to do so if
event/location lived at the core.

 

John

 

On Fri, May 31, 2013 at 12:02 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Thanks, John – this is REALLY helpful!

 

A couple questions – can you expand a bit on the differences between
JohnDeckOccurrence123, 124,and 125?  I’m assuming that JohnDeckOccurrence123
is associated with the Event representing the time & place when
JohnDeckTissueSample1 was removed from JohnDeck.  I’m guessing that
JohnDeckOccurrence124 is associated with the Event representing the time &
place when JohnDeckGutSample1 was removed from JohnDeck.  What I don’t
understand is why there needs to be a JohnDeckOccurrence125.  What
Occurrence does that represent?  Later you suggest that JohnDeck
(WholeOrganism) was extracted from nature. Is the extraction-from-nature
Occurrence one of these three Occurrences?

 

What you describe below is consistent with our approach to treating
materialSample as a subclass of Individual (assuming a hierarchical
Individual, which means that ParentIndividualID of both
JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck).  The
nice thing about the hierarchical approach is that deals with the problem
you describe in the last paragraph.

 

Rich

 

 

From: jdeck88 at gmail.com [mailto:jdeck88 at gmail.com] On Behalf Of John Deck
Sent: Friday, May 31, 2013 7:54 AM
To: Richard Pyle
Cc: Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert
Whitton; Ramona Walls


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

Yep--- that reference point for aggregation can be really powerful:  To
provide a working example of how these identifiers would work, and how they
can act to aggregate data elements, consider the following:

IndividualID = JohnDeck

MaterialSampleID = JohnDeckTissueSample1

OccurrenceID = JohnDeckOccurrence123

Taxon = "Homo sapiens"

 

IndividualID = JohnDeck

MaterialSampleID = JohnDeckGutSample1

OccurrenceID = JohnDeckOccurrence124

Taxon = "Bacteria500"

 

IndividualID = JohnDeck

MaterialSampleID = JohnDeckGutSample1

OccurrenceID = JohnDeckOccurrence125

Taxon = "Bacteria501"

 

JohnDeckTissueSample1 is representative of the Individual itself, while
JohnDeckGutSample1 is still associated with the same Individual but notice
the taxon has changed and it is a new Occurrence as well.  This approach
allows for some sense to be constructed using a flat file approach if
desired.  Providing a Material Sample BoR for OccurrenceID's 124 and 125
provides further context.  Meanwhile, we can consider the implications of,
for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd
put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland
biome")  but the distinct occurrence records for the gut samples could be
listed as ( <http://purl.obolibrary.org/obo/ENVO_01000162>
http://purl.obolibrary.org/obo/ENVO_01000162, "organ").

 

Another use for the identifier MaterialSampleID -- lets assume we've
expressed an equivalent identifier for a genbank sample using
MIxS:source_mat_id, a term which references the same OBI:MaterialSample
we're referencing, which allows.  If they're URIs we can model this in RDF
using the MaterialSampleID's as either subjects or objects... this gets us a
step closer for representing contextual information in genbank and DwC
without duplicating metadata across systems (genbank for sequencing metada;
DwC for environmental context)

There are some issues with this approach of course, for example, if we
provide a lat/lng for an occurrence that is a gutsample are we taking the
lat/lng where the gutsample was removed from the organism (may be different
than where a parent organism was isolated from nature).  In this case, we
need to assume that we're referring to where the parent organism was
isolated from nature to be consistent with DwC and implementations in use.
However, the notion of habitat should vary with the occurrence of the actual
organism (e.g. "organ" vs. "temperate grassland biome").  Thus, we can still
aggregate properties around MaterialSample BoR's that are useful but we need
to think carefully about what exactly the properties mean that we assign to
these things.... but this is no different than issues we've encountered
between other BoR's (Fossil, PreservedSpecimen, or
Human/MachineObservation).  

 

John

 

On Thu, May 30, 2013 at 11:48 PM, Richard Pyle <deepreef at bishopmuseum.org>
wrote:

Yes, that’s a fair point!  In a sense, the ID has intrinsic value on its own
if for no other reason than to represent a reference point for aggregation.

 

Nevertheless, I still maintain that if it fulfills that purpose, then it
implies a “thing” (around which other “things” are aggregated), and I can’t
imagine such a “thing” that we would care about for aggregating purposes,
about which we would not associate other property values. 

 

I say all this quite deliberately in reference to “dwc:individualID”, of
course
. J

 

Aloha,

Rich

 

 

From: Markus Döring [mailto:m.doering at mac.com] 
Sent: Thursday, May 30, 2013 7:56 PM
To: Jason Holmberg
Cc: Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck;
Ramona Walls


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to
material samples

 

The id value is actually very useful and the only trustworthy way of
grouping records, e.g. all occurrences of the same whale.

Markus 


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content





 

-- 
John Deck
(541) 321-0689 <tel:%28541%29%20321-0689> 


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content





 

-- 
John Deck
(541) 321-0689

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20130531/6f900b4a/attachment.html 


More information about the tdwg-content mailing list