[tdwg-content] New Darwin Core terms proposed relating to material samples

John Deck jdeck at berkeley.edu
Fri May 31 22:45:59 CEST 2013


"..I think we should trend towards leaving DwC as a simple data exchange
paradigm, and focus these more complex conversations on a next-gen ontology
for biodiversity data.  I realize that’s already happening; but it seems
like the “center of mass” for conversation should shift from DwC to the
biodiversity ontology domain."

Agree!!


On Fri, May 31, 2013 at 12:56 PM, Richard Pyle <deepreef at bishopmuseum.org>wrote:

> OK, thanks.  Now I understand it.  This is all related to the taxonomic
> homogeneous/heterogeneous thing.  One thing I should caution, using your
> example data below:****
>
> ** **
>
> If we assume that JohnDeckGutSample1 was extracted from JohnDeck after
> JohnDeck was extracted from nature, then we have to be careful about
> inferring that the organism Bacteria501 has an occurrence related to the
> place & time where JohnDeck was extracted from nature.  In other words, we
> can’t reliably connect Bacteria501 with the Occurrence of JohnDeck in
> nature, because Bacteria501 might have entered the gut of JohnDeck at some
> later time (e.g., during decomposition).****
>
> ** **
>
> I also disagree that the location where the gut sample was taken is
> fundamentally different where the organism was extracted from nature.  We
> definitely need be able to distinguish between Occurrences representing
> “natural” place+time+organism instances, from “articifical” instances.
> However, you can’t simply say “extracting a tissue in a lab is
> fundamentally different from extracting an organism in nature”  The reason
> is that there is a very rich spectrum between those two end points, and no
> clear place along that spectrum where a line can be drawn.  What about a
> specimen of a “naturalized” species in a certain location?  What about an
> organism taken from nature that was born of parents that were brought to
> that place by humans?  What about the organisms that were themselves
> brought by humans, then released, then recaptured?  What about captive
> organisms or plans in a person’s garden?  This spectrum continues all the
> way down to extracting a gut sample from a specimen collected in Moorea, in
> a lab at Berkeley. The degree of “naturalness” of an Occurrence is
> certainly important, but it’s not Boolean, and it’s only one axis of
> interest, so we shouldn’t simply assume dc:location represents some kinds
> of locations, but not others.****
>
> ** **
>
> I think the basic problem is that, as has already been stated, DwC emerged
> from the collections-based world, where one specimen = one occurrence, and
> that occurrence is naturally regarded as being the occurrence when+where
> the specimen was extracted from nature.  Now that we have such a diversity
> of data we are trying to manage, this Occurrence-centric approach (with its
> overloaded notion of an “Occurrence”) is being stretched to the breaking
> point.****
>
> ** **
>
> I think we should trend towards leaving DwC as a simple data exchange
> paradigm, and focus these more complex conversations on a next-gen ontology
> for biodiversity data.  I realize that’s already happening; but it seems
> like the “center of mass” for conversation should shift from DwC to the
> biodiversity ontology domain.****
>
> ** **
>
> Aloha,****
>
> Rich****
>
> ** **
>
> *From:* jdeck88 at gmail.com [mailto:jdeck88 at gmail.com] *On Behalf Of *John
> Deck
> *Sent:* Friday, May 31, 2013 9:17 AM
> *To:* Richard Pyle
> *Cc:* TDWG Content Mailing List; Robert Whitton; Ramona Walls
>
> *Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to
> material samples****
>
> ** **
>
> Since it was a gut sample, we'll be seeing lots of stuff in there, maybe
> even 1000 different taxa each one can be a distinct occurrence as
> JohnDeckOccurrence124 and JohnDeckOccurrence125 are different taxa.  ****
>
> ** **
>
> Now, in the sense of Event/location i'm taking the location details to be
> when it was isolated from nature and thus the location would be the same as
> the wholeorganism location.  In our recent workshop on this same issue
> recently in Copenhagen we went around about this issue for awhile but
> decided that we should take "location" to mean the the location at which
> whatever parent organism was isolated in nature.  Certainly the location
> where the gut contents were extracted (e.g. in the lab) is important too
> but that is something different and not represented by dc:location or in
> dwc in general.  This is something of an interpretation of the actual term
> but it since our implementation model of DwCA is still using "occcurrence"
> at the core we probably don't have much choice since GBIF does not use a
> graph-based parser.  The other option is to represent this all using
> event/location at the core of a DwCA but felt this introduced yet more
> complexities and breaks other cases where we would want to hang data off of
> occurrence (e.g. Identification), and would not be able to do so if
> event/location lived at the core.****
>
> ** **
>
> John****
>
> ** **
>
> On Fri, May 31, 2013 at 12:02 PM, Richard Pyle <deepreef at bishopmuseum.org>
> wrote:****
>
> Thanks, John – this is REALLY helpful!****
>
>  ****
>
> A couple questions – can you expand a bit on the differences between
> JohnDeckOccurrence123, 124,and 125?  I’m assuming that
> JohnDeckOccurrence123 is associated with the Event representing the time &
> place when JohnDeckTissueSample1 was removed from JohnDeck.  I’m guessing
> that JohnDeckOccurrence124 is associated with the Event representing the
> time & place when JohnDeckGutSample1 was removed from JohnDeck.  What I
> don’t understand is why there needs to be a JohnDeckOccurrence125.  What
> Occurrence does that represent?  Later you suggest that JohnDeck
> (WholeOrganism) was extracted from nature. Is the extraction-from-nature
> Occurrence one of these three Occurrences?****
>
>  ****
>
> What you describe below is consistent with our approach to treating
> materialSample as a subclass of Individual (assuming a hierarchical
> Individual, which means that ParentIndividualID of both
> JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck).
> The nice thing about the hierarchical approach is that deals with the
> problem you describe in the last paragraph.****
>
>  ****
>
> Rich****
>
>  ****
>
>  ****
>
> *From:* jdeck88 at gmail.com [mailto:jdeck88 at gmail.com] *On Behalf Of *John
> Deck
> *Sent:* Friday, May 31, 2013 7:54 AM
> *To:* Richard Pyle
> *Cc:* Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert
> Whitton; Ramona Walls****
>
>
> *Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to
> material samples****
>
>  ****
>
> Yep--- that reference point for aggregation can be really powerful:  To
> provide a working example of how these identifiers would work, and how they
> can act to aggregate data elements, consider the following:****
>
> IndividualID = JohnDeck****
>
> MaterialSampleID = JohnDeckTissueSample1****
>
> OccurrenceID = JohnDeckOccurrence123****
>
> Taxon = "Homo sapiens"****
>
>  ****
>
> IndividualID = JohnDeck****
>
> MaterialSampleID = JohnDeckGutSample1****
>
> OccurrenceID = JohnDeckOccurrence124****
>
> Taxon = "Bacteria500"****
>
>  ****
>
> IndividualID = JohnDeck****
>
> MaterialSampleID = JohnDeckGutSample1****
>
> OccurrenceID = JohnDeckOccurrence125****
>
> Taxon = "Bacteria501"****
>
>  ****
>
> JohnDeckTissueSample1 is representative of the Individual itself, while
> JohnDeckGutSample1 is still associated with the same Individual but notice
> the taxon has changed and it is a new Occurrence as well.  This approach
> allows for some sense to be constructed using a flat file approach if
> desired.  Providing a Material Sample BoR for OccurrenceID's 124 and 125
> provides further context.  Meanwhile, we can consider the implications of,
> for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd
> put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland
> biome")  but the distinct occurrence records for the gut samples could be
> listed as (http://purl.obolibrary.org/obo/ENVO_01000162, "organ").****
>
>  ****
>
> Another use for the identifier MaterialSampleID -- lets assume we've
> expressed an equivalent identifier for a genbank sample using
> MIxS:source_mat_id, a term which references the same OBI:MaterialSample
> we're referencing, which allows.  If they're URIs we can model this in RDF
> using the MaterialSampleID's as either subjects or objects... this gets us
> a step closer for representing contextual information in genbank and DwC
> without duplicating metadata across systems (genbank for sequencing metada;
> DwC for environmental context)****
>
> There are some issues with this approach of course, for example, if we
> provide a lat/lng for an occurrence that is a gutsample are we taking the
> lat/lng where the gutsample was removed from the organism (may be different
> than where a parent organism was isolated from nature).  In this case, we
> need to assume that we're referring to where the parent organism was
> isolated from nature to be consistent with DwC and implementations in use.
>  However, the notion of habitat should vary with the occurrence of the
> actual organism (e.g. "organ" vs. "temperate grassland biome").  Thus, we
> can still aggregate properties around MaterialSample BoR's that are useful
> but we need to think carefully about what exactly the properties mean that
> we assign to these things.... but this is no different than issues we've
> encountered between other BoR's (Fossil, PreservedSpecimen, or
> Human/MachineObservation).  ****
>
>  ****
>
> John****
>
>  ****
>
> On Thu, May 30, 2013 at 11:48 PM, Richard Pyle <deepreef at bishopmuseum.org>
> wrote:****
>
> Yes, that’s a fair point!  In a sense, the ID has intrinsic value on its
> own if for no other reason than to represent a reference point for
> aggregation.****
>
>  ****
>
> Nevertheless, I still maintain that if it fulfills that purpose, then it
> implies a “thing” (around which other “things” are aggregated), and I can’t
> imagine such a “thing” that we would care about for aggregating purposes,
> about which we would not associate other property values. ****
>
>  ****
>
> I say all this quite deliberately in reference to “dwc:individualID”, of
> course…. J****
>
>  ****
>
> Aloha,****
>
> Rich****
>
>  ****
>
>  ****
>
> *From:* Markus Döring [mailto:m.doering at mac.com]
> *Sent:* Thursday, May 30, 2013 7:56 PM
> *To:* Jason Holmberg
> *Cc:* Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck;
> Ramona Walls****
>
>
> *Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to
> material samples****
>
>  ****
>
> The id value is actually very useful and the only trustworthy way of
> grouping records, e.g. all occurrences of the same whale.
>
> Markus ****
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content****
>
>
>
> ****
>
>  ****
>
> --
> John Deck
> (541) 321-0689****
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content****
>
>
>
> ****
>
> ** **
>
> --
> John Deck
> (541) 321-0689****
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>


-- 
John Deck
(541) 321-0689
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20130531/959ab381/attachment.html 


More information about the tdwg-content mailing list