Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Thanks Ramona;
Actually, the basic elements of our data mode precede DwC by quite a bit. What weve tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, its impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the ID terms (and doubly-especially true when the ID terms correlate to class terms). The existence of an ID term implies that some class of object exists to which an ID value is applied. The ID value itself is never useful data/metadata it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the ID value.
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the ID terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: Normalize until it hurts; then de-normalize until it works).
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with ID terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think thats definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important flag pole around which our community has (more or less) successfully rallied.
Hmmmm . Now Ive forgotten what my point was. I guess I was just in a ramblin mood. Well .sorry about the bandwidth!
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Thursday, May 30, 2013 6:05 PM To: Richard Pyle Cc: Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.
Ramona
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Jason,
Many thanks for this input. If I understand you correctly, then you are using Encounter as equivalent to what we have been using Occurrence for. That is, by our definition, an Occurrence is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an Individual (i.e., what); and the properties we attach to the Occurrence are the how bits (including things like size, etc.).
In my mind, the essence of an Individual is the collective physical material of the individual. If I see a fish on a reef, its Occurrence on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call Evidence), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the essence of the individual or its occurrence at an event is unaffected by what I end up doing to it. By extension, following a hierarchical model of individual, a sub-sample (materialSample) extracted from it is just another instance of Individual. This is why I generally think of materialSample (if it were represented as a class which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., material, but what I have naively been referring to as Individual).
That part of our model has proven to be very stable and effective for representing the information as we want it.
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single individual which (in my mind) includes such things as soil samples.
In that contect, I see (and agree) with John and others that really its a separate axis of classification from what I have called Individual.
I dont expect that to make a lot of sense (I barely understand it myself).
Aloha,
Rich
From: Jason Holmberg [mailto:holmbergius@gmail.com] Sent: Thursday, May 30, 2013 11:28 AM To: Richard Pyle Cc: Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi everyone.
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Ramona,
Yes, I agree, and thanks. Ive always felt that there has been a trend towards trying to push too much ontology (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term individualID in DWC sort of hinted that something like Individual was the forgotten class for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met and clearly that is the right path forward.
My main concern for this thread (and the reason I engaged in it), was to:
1) Find out the status of the discussions that began in 2010; and
2) Clarify where the current materialSample proposal overlaps, or does not overlap, with that earlier effort.
Steve has very adequately answered the first question, and you, John, and others have answered the second, and Im happy with both sets of answers.
Im sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Wednesday, May 29, 2013 1:03 PM To: Richard Pyle Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW.
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.
Ramona
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Thanks, Ramona this is an *extremely* helpful email! It helps clear things up a lot in my mind.
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by material entity), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity which seems to me to be a superclass of what materialSample is intended to represent.
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because Occurrence is simply way too overloaded a class for me to use independently of what I have been calling individual and what I have been calling Evidence). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.
As Ive said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term materialSample into DWC. Its just that I saw enough overlap with that other issue, that I was hoping we could find a reasonable pathway forward on both.
Thanks again for the very helpful comments.
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Wednesday, May 29, 2013 9:14 AM To: Richard Pyle Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Rich,
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.
Ramona
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Many thanks, John. This is extremely helpful!
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term MaterialSample. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined ID term? Do all the others have defined ID terms?).
However, Im excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around IndividualOrganism).
This bigger problem involves the need for a defined concept (Im hesitating to say class), and an associated ID, in dwc that refers to the physical/material basis of an Occurrence. We dont yet have a term for this concept in dwc (IndividualID hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that were on the same page about what each concept is. The more I understand about what you need for materalSample, the more convinced I am that both of our needs can be met with the same concept.
I am perfectly happy to adopt the term MaterialSample, but I guess it all boils down to this: In order for something to be a MaterialSample, must it necessarily be removed from nature?
If the answer is No, then I think we can merge the two concepts into one.
If the answer is Yes, then I think materialSample is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
Aloha,
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Wednesday, May 29, 2013 4:01 AM To: Richard Pyle Cc: Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167 https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of Material Sample as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations.
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steves initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what were discussing.
********************
NOTES, MaterialSample from OBI:
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a material entity that has the material sample role, while a material sample role is defined as a specimen role borne by a material entity that is the output of a material sampling process, and a material sampling process is a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity.
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then: 1) An Event instance is generated to represent the event where the tree was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question: - There are multiple Occurrence instances that refer to the same Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value). - Any Individual can have derivatives that are themselves unique Individual instances. - Individuals are arranged hierarchically, and certain properties can be inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
The “ID” value itself is never useful data/metadata – it is just a way to
reference a data record that (presumably) contains properties that can
be expressed as data/metadata for the object represented by the “ID” value.
Too true! I have been in so many conversations wherein complex ID schemes for wildlife population members have been generated, and my response has always been "The ID does not matter so long as it is unique."
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 12:24 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Thanks Ramona;****
Actually, the basic elements of our data mode precede DwC by quite a bit. What we’ve tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, it’s impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the “ID” terms (and doubly-especially true when the “ID” terms correlate to class terms). The existence of an “ID” term implies that some class of object exists to which an “ID” value is applied. The “ID” value itself is never useful data/metadata – it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the “ID” value.****
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the “ID” terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.****
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: “Normalize until it hurts; then de-normalize until it works”).****
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with “ID” terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think that’s definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important “flag pole” around which our community has (more or less) successfully rallied.****
Hmmmm…. Now I’ve forgotten what my point was. I guess I was just in a ramblin’ mood. Well….sorry about the bandwidth!****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Thursday, May 30, 2013 6:05 PM *To:* Richard Pyle *Cc:* Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.****
Ramona****
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Jason,****
Many thanks for this input. If I understand you correctly, then you are using “Encounter” as equivalent to what we have been using “Occurrence” for. That is, by our definition, an “Occurrence” is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an “Individual” (i.e., what); and the properties we attach to the Occurrence are the “how” bits (including things like size, etc.).****
In my mind, the essence of an “Individual” is the collective physical material of the individual. If I see a fish on a reef, its “Occurrence” on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call “Evidence”), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the “essence” of the individual – or its occurrence at an event – is unaffected by what I end up doing to it. By extension, following a hierarchical model of “individual”, a sub-sample (materialSample) extracted from it is just another instance of “Individual”. This is why I generally think of “materialSample” (if it were represented as a class – which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., “material”, but what I have naively been referring to as “Individual”).****
That part of our model has proven to be very stable and effective for representing the information as we want it.****
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single “individual” – which (in my mind) includes such things as soil samples.****
In that contect, I see (and agree) with John and others that really it’s a separate axis of classification from what I have called “Individual”.****
I don’t expect that to make a lot of sense (I barely understand it myself).
Aloha,****
Rich****
*From:* Jason Holmberg [mailto:holmbergius@gmail.com] *Sent:* Thursday, May 30, 2013 11:28 AM *To:* Richard Pyle *Cc:* Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton** **
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi everyone.****
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:****
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview****
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.****
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.****
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp****
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Ramona,****
Yes, I agree, and thanks. I’ve always felt that there has been a trend towards trying to push too much “ontology” (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term “individualID” in DWC sort of hinted that something like “Individual” was the “forgotten class” for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met – and clearly that is the right path forward.****
My main concern for this thread (and the reason I engaged in it), was to:*
Find out the status of the discussions that began in 2010; and****
Clarify where the current materialSample proposal overlaps, or
does not overlap, with that earlier effort.****
Steve has very adequately answered the first question, and you, John, and others have answered the second, and I’m happy with both sets of answers.*
I’m sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 1:03 PM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW. ****
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.****
Ramona****
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Thanks, Ramona – this is an **extremely** helpful email! It helps clear things up a lot in my mind.****
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by “material entity”), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity – which seems to me to be a superclass of what materialSample is intended to represent.****
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because “Occurrence” is simply way too overloaded a class for me to use independently of what I have been calling “individual” and what I have been calling “Evidence”). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.****
As I’ve said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term “materialSample” into DWC. It’s just that I saw enough overlap with that “other” issue, that I was hoping we could find a reasonable pathway forward on both.****
Thanks again for the very helpful comments.****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 9:14 AM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Rich,****
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.****
Ramona****
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Many thanks, John. This is extremely helpful!****
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term “MaterialSample”. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined “ID” term? Do all the others have defined “ID” terms?). ****
However, I’m excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around “IndividualOrganism”).****
This bigger problem involves the need for a defined “concept” (I’m hesitating to say “class”), and an associated “ID”, in dwc that refers to the physical/material basis of an Occurrence. We don’t yet have a term for this concept in dwc (“IndividualID” hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that we’re on the same page about what each concept is. The more I understand about what you need for “materalSample”, the more convinced I am that both of our needs can be met with the same concept.****
I am perfectly happy to adopt the term “MaterialSample”, but I guess it all boils down to this: In order for something to be a “MaterialSample”, must it necessarily be removed from nature? ****
If the answer is “No”, then I think we can merge the two concepts into one.
If the answer is “Yes”, then I think “materialSample” is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).****
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
Aloha,****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Wednesday, May 29, 2013 4:01 AM *To:* Richard Pyle *Cc:* Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread. ****
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek****
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of “Material Sample” as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations. ** **
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.****
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve’s initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we’re discussing. ****
NOTES, MaterialSample from OBI:****
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a “*material entity that has the material sample role*”, while a material sample role is defined as “ *a specimen role borne by a material entity that is the output of a material sampling process*”, and a material sampling process is “*a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity*”. ****
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.****
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.****
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree was
visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same Individual
instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique Individual
instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
will never be all things to all people; but at least it is enough things
to enough people that it represents an important “flag pole”
around which our community has (more or less) successfully rallied.
Too true! DWC allows me (as a programmer and info. architect) to move beyond laborious and endless discussions of data definitions for a new project and rather point over at DWC and say "Let's use this standard as a starting point. It has a rich history behind iy, so let's build on it's wisdom and move the project ahead faster."
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 1:41 AM, Jason Holmberg holmbergius@gmail.comwrote:
The “ID” value itself is never useful data/metadata – it is just a way
to reference a data record that (presumably) contains properties that can
be expressed as data/metadata for the object represented by the “ID”
value.
Too true! I have been in so many conversations wherein complex ID schemes for wildlife population members have been generated, and my response has always been "The ID does not matter so long as it is unique."
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 12:24 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Thanks Ramona;****
Actually, the basic elements of our data mode precede DwC by quite a bit. What we’ve tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, it’s impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the “ID” terms (and doubly-especially true when the “ID” terms correlate to class terms). The existence of an “ID” term implies that some class of object exists to which an “ID” value is applied. The “ID” value itself is never useful data/metadata – it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the “ID” value.****
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the “ID” terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.****
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: “Normalize until it hurts; then de-normalize until it works”).****
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with “ID” terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think that’s definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important “flag pole” around which our community has (more or less) successfully rallied.****
Hmmmm…. Now I’ve forgotten what my point was. I guess I was just in a ramblin’ mood. Well….sorry about the bandwidth!****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Thursday, May 30, 2013 6:05 PM *To:* Richard Pyle *Cc:* Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.****
Ramona****
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Jason,****
Many thanks for this input. If I understand you correctly, then you are using “Encounter” as equivalent to what we have been using “Occurrence” for. That is, by our definition, an “Occurrence” is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an “Individual” (i.e., what); and the properties we attach to the Occurrence are the “how” bits (including things like size, etc.).****
In my mind, the essence of an “Individual” is the collective physical material of the individual. If I see a fish on a reef, its “Occurrence” on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call “Evidence”), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the “essence” of the individual – or its occurrence at an event – is unaffected by what I end up doing to it. By extension, following a hierarchical model of “individual”, a sub-sample (materialSample) extracted from it is just another instance of “Individual”. This is why I generally think of “materialSample” (if it were represented as a class – which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., “material”, but what I have naively been referring to as “Individual”).****
That part of our model has proven to be very stable and effective for representing the information as we want it.****
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single “individual” – which (in my mind) includes such things as soil samples.****
In that contect, I see (and agree) with John and others that really it’s a separate axis of classification from what I have called “Individual”.** **
I don’t expect that to make a lot of sense (I barely understand it myself).****
Aloha,****
Rich****
*From:* Jason Holmberg [mailto:holmbergius@gmail.com] *Sent:* Thursday, May 30, 2013 11:28 AM *To:* Richard Pyle *Cc:* Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton*
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi everyone.****
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:****
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview***
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.****
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp****
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Ramona,****
Yes, I agree, and thanks. I’ve always felt that there has been a trend towards trying to push too much “ontology” (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term “individualID” in DWC sort of hinted that something like “Individual” was the “forgotten class” for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met – and clearly that is the right path forward.****
My main concern for this thread (and the reason I engaged in it), was to:
Find out the status of the discussions that began in 2010; and***
Clarify where the current materialSample proposal overlaps, or
does not overlap, with that earlier effort.****
Steve has very adequately answered the first question, and you, John, and others have answered the second, and I’m happy with both sets of answers.
I’m sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 1:03 PM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW. ****
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.****
Ramona****
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Thanks, Ramona – this is an **extremely** helpful email! It helps clear things up a lot in my mind.****
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by “material entity”), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity – which seems to me to be a superclass of what materialSample is intended to represent.****
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because “Occurrence” is simply way too overloaded a class for me to use independently of what I have been calling “individual” and what I have been calling “Evidence”). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.****
As I’ve said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term “materialSample” into DWC. It’s just that I saw enough overlap with that “other” issue, that I was hoping we could find a reasonable pathway forward on both.****
Thanks again for the very helpful comments.****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 9:14 AM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Rich,****
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.****
Ramona****
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Many thanks, John. This is extremely helpful!****
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term “MaterialSample”. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined “ID” term? Do all the others have defined “ID” terms?). ****
However, I’m excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around “IndividualOrganism”).****
This bigger problem involves the need for a defined “concept” (I’m hesitating to say “class”), and an associated “ID”, in dwc that refers to the physical/material basis of an Occurrence. We don’t yet have a term for this concept in dwc (“IndividualID” hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that we’re on the same page about what each concept is. The more I understand about what you need for “materalSample”, the more convinced I am that both of our needs can be met with the same concept.****
I am perfectly happy to adopt the term “MaterialSample”, but I guess it all boils down to this: In order for something to be a “MaterialSample”, must it necessarily be removed from nature? ****
If the answer is “No”, then I think we can merge the two concepts into one.****
If the answer is “Yes”, then I think “materialSample” is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).****
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
Aloha,****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Wednesday, May 29, 2013 4:01 AM *To:* Richard Pyle *Cc:* Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread. ****
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek****
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of “Material Sample” as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations. ****
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.****
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve’s initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we’re discussing. ****
NOTES, MaterialSample from OBI:****
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a “*material entity that has the material sample role*”, while a material sample role is defined as “ *a specimen role borne by a material entity that is the output of a material sampling process*”, and a material sampling process is “*a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity*”. ****
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.****
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.****
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree
was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same
Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique
Individual instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus
Am 31.05.2013 um 07:41 schrieb Jason Holmberg holmbergius@gmail.com:
The “ID” value itself is never useful data/metadata – it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the “ID” value.
Too true! I have been in so many conversations wherein complex ID schemes for wildlife population members have been generated, and my response has always been "The ID does not matter so long as it is unique."
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 12:24 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Thanks Ramona;
Actually, the basic elements of our data mode precede DwC by quite a bit. What we’ve tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, it’s impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the “ID” terms (and doubly-especially true when the “ID” terms correlate to class terms). The existence of an “ID” term implies that some class of object exists to which an “ID” value is applied. The “ID” value itself is never useful data/metadata – it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the “ID” value.
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the “ID” terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: “Normalize until it hurts; then de-normalize until it works”).
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with “ID” terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think that’s definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important “flag pole” around which our community has (more or less) successfully rallied.
Hmmmm…. Now I’ve forgotten what my point was. I guess I was just in a ramblin’ mood. Well….sorry about the bandwidth!
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Thursday, May 30, 2013 6:05 PM To: Richard Pyle Cc: Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.
Ramona
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Jason,
Many thanks for this input. If I understand you correctly, then you are using “Encounter” as equivalent to what we have been using “Occurrence” for. That is, by our definition, an “Occurrence” is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an “Individual” (i.e., what); and the properties we attach to the Occurrence are the “how” bits (including things like size, etc.).
In my mind, the essence of an “Individual” is the collective physical material of the individual. If I see a fish on a reef, its “Occurrence” on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call “Evidence”), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the “essence” of the individual – or its occurrence at an event – is unaffected by what I end up doing to it. By extension, following a hierarchical model of “individual”, a sub-sample (materialSample) extracted from it is just another instance of “Individual”. This is why I generally think of “materialSample” (if it were represented as a class – which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., “material”, but what I have naively been referring to as “Individual”).
That part of our model has proven to be very stable and effective for representing the information as we want it.
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single “individual” – which (in my mind) includes such things as soil samples.
In that contect, I see (and agree) with John and others that really it’s a separate axis of classification from what I have called “Individual”.
I don’t expect that to make a lot of sense (I barely understand it myself).
Aloha,
Rich
From: Jason Holmberg [mailto:holmbergius@gmail.com] Sent: Thursday, May 30, 2013 11:28 AM To: Richard Pyle Cc: Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi everyone.
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Ramona,
Yes, I agree, and thanks. I’ve always felt that there has been a trend towards trying to push too much “ontology” (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term “individualID” in DWC sort of hinted that something like “Individual” was the “forgotten class” for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met – and clearly that is the right path forward.
My main concern for this thread (and the reason I engaged in it), was to:
Find out the status of the discussions that began in 2010; and
Clarify where the current materialSample proposal overlaps, or does not overlap, with that earlier effort.
Steve has very adequately answered the first question, and you, John, and others have answered the second, and I’m happy with both sets of answers.
I’m sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Wednesday, May 29, 2013 1:03 PM To: Richard Pyle Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW.
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.
Ramona
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Thanks, Ramona – this is an *extremely* helpful email! It helps clear things up a lot in my mind.
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by “material entity”), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity – which seems to me to be a superclass of what materialSample is intended to represent.
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because “Occurrence” is simply way too overloaded a class for me to use independently of what I have been calling “individual” and what I have been calling “Evidence”). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.
As I’ve said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term “materialSample” into DWC. It’s just that I saw enough overlap with that “other” issue, that I was hoping we could find a reasonable pathway forward on both.
Thanks again for the very helpful comments.
Aloha,
Rich
From: Ramona Walls [mailto:rlwalls2008@gmail.com] Sent: Wednesday, May 29, 2013 9:14 AM To: Richard Pyle Cc: John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Rich,
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.
Ramona
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Many thanks, John. This is extremely helpful!
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term “MaterialSample”. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined “ID” term? Do all the others have defined “ID” terms?).
However, I’m excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around “IndividualOrganism”).
This bigger problem involves the need for a defined “concept” (I’m hesitating to say “class”), and an associated “ID”, in dwc that refers to the physical/material basis of an Occurrence. We don’t yet have a term for this concept in dwc (“IndividualID” hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that we’re on the same page about what each concept is. The more I understand about what you need for “materalSample”, the more convinced I am that both of our needs can be met with the same concept.
I am perfectly happy to adopt the term “MaterialSample”, but I guess it all boils down to this: In order for something to be a “MaterialSample”, must it necessarily be removed from nature?
If the answer is “No”, then I think we can merge the two concepts into one.
If the answer is “Yes”, then I think “materialSample” is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
Aloha,
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Wednesday, May 29, 2013 4:01 AM To: Richard Pyle Cc: Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
In the text of the issue submitted for MaterialSample (https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of “Material Sample” as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations.
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve’s initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we’re discussing.
NOTES, MaterialSample from OBI:
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a “material entity that has the material sample role”, while a material sample role is defined as “ a specimen role borne by a material entity that is the output of a material sampling process”, and a material sampling process is “a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity”.
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree was
visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same Individual
instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique Individual
instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
-- John Deck (541) 321-0689
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Oh, I agree with that. But the value itself...whether it be "Bob" or "12345"...is of little consequence. I have seen schemes trying to encode year, location, research group, etc.
But then the whale shows up in another area, in another year, and by a different group. Ultimately, uniqueness is the only requirement of the ID value and the actual format of the value is of little value. Any amount of classification is better handled by another field, which also allows for faster and easier processing (SQL, programming language, etc.)
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 1:55 AM, Markus Döring m.doering@mac.com wrote:
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus
Am 31.05.2013 um 07:41 schrieb Jason Holmberg holmbergius@gmail.com:
The “ID” value itself is never useful data/metadata – it is just a way
to reference a data record that (presumably) contains properties that can
be expressed as data/metadata for the object represented by the “ID”
value.
Too true! I have been in so many conversations wherein complex ID schemes for wildlife population members have been generated, and my response has always been "The ID does not matter so long as it is unique."
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Fri, May 31, 2013 at 12:24 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Thanks Ramona;****
Actually, the basic elements of our data mode precede DwC by quite a bit. What we’ve tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, it’s impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the “ID” terms (and doubly-especially true when the “ID” terms correlate to class terms). The existence of an “ID” term implies that some class of object exists to which an “ID” value is applied. The “ID” value itself is never useful data/metadata – it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the “ID” value.****
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the “ID” terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.****
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: “Normalize until it hurts; then de-normalize until it works”).****
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with “ID” terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think that’s definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important “flag pole” around which our community has (more or less) successfully rallied.****
Hmmmm…. Now I’ve forgotten what my point was. I guess I was just in a ramblin’ mood. Well….sorry about the bandwidth!****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Thursday, May 30, 2013 6:05 PM *To:* Richard Pyle *Cc:* Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.****
Ramona****
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Jason,****
Many thanks for this input. If I understand you correctly, then you are using “Encounter” as equivalent to what we have been using “Occurrence” for. That is, by our definition, an “Occurrence” is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an “Individual” (i.e., what); and the properties we attach to the Occurrence are the “how” bits (including things like size, etc.).****
In my mind, the essence of an “Individual” is the collective physical material of the individual. If I see a fish on a reef, its “Occurrence” on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call “Evidence”), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the “essence” of the individual – or its occurrence at an event – is unaffected by what I end up doing to it. By extension, following a hierarchical model of “individual”, a sub-sample (materialSample) extracted from it is just another instance of “Individual”. This is why I generally think of “materialSample” (if it were represented as a class – which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., “material”, but what I have naively been referring to as “Individual”).****
That part of our model has proven to be very stable and effective for representing the information as we want it.****
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single “individual” – which (in my mind) includes such things as soil samples.****
In that contect, I see (and agree) with John and others that really it’s a separate axis of classification from what I have called “Individual”.** **
I don’t expect that to make a lot of sense (I barely understand it myself).****
Aloha,****
Rich****
*From:* Jason Holmberg [mailto:holmbergius@gmail.com] *Sent:* Thursday, May 30, 2013 11:28 AM *To:* Richard Pyle *Cc:* Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton*
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi everyone.****
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:****
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview***
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.****
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp****
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Ramona,****
Yes, I agree, and thanks. I’ve always felt that there has been a trend towards trying to push too much “ontology” (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term “individualID” in DWC sort of hinted that something like “Individual” was the “forgotten class” for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met – and clearly that is the right path forward.****
My main concern for this thread (and the reason I engaged in it), was to:
Find out the status of the discussions that began in 2010; and***
Clarify where the current materialSample proposal overlaps, or
does not overlap, with that earlier effort.****
Steve has very adequately answered the first question, and you, John, and others have answered the second, and I’m happy with both sets of answers.
I’m sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 1:03 PM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW. ****
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.****
Ramona****
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Thanks, Ramona – this is an **extremely** helpful email! It helps clear things up a lot in my mind.****
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by “material entity”), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity – which seems to me to be a superclass of what materialSample is intended to represent.****
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because “Occurrence” is simply way too overloaded a class for me to use independently of what I have been calling “individual” and what I have been calling “Evidence”). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.****
As I’ve said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term “materialSample” into DWC. It’s just that I saw enough overlap with that “other” issue, that I was hoping we could find a reasonable pathway forward on both.****
Thanks again for the very helpful comments.****
Aloha,****
Rich****
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 9:14 AM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Rich,****
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.****
Ramona****
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Many thanks, John. This is extremely helpful!****
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term “MaterialSample”. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined “ID” term? Do all the others have defined “ID” terms?). ****
However, I’m excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around “IndividualOrganism”).****
This bigger problem involves the need for a defined “concept” (I’m hesitating to say “class”), and an associated “ID”, in dwc that refers to the physical/material basis of an Occurrence. We don’t yet have a term for this concept in dwc (“IndividualID” hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that we’re on the same page about what each concept is. The more I understand about what you need for “materalSample”, the more convinced I am that both of our needs can be met with the same concept.****
I am perfectly happy to adopt the term “MaterialSample”, but I guess it all boils down to this: In order for something to be a “MaterialSample”, must it necessarily be removed from nature? ****
If the answer is “No”, then I think we can merge the two concepts into one.****
If the answer is “Yes”, then I think “materialSample” is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).****
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:****
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."****
Aloha,****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Wednesday, May 29, 2013 4:01 AM *To:* Richard Pyle *Cc:* Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread. ****
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek****
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of “Material Sample” as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations. ****
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.****
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve’s initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we’re discussing. ****
NOTES, MaterialSample from OBI:****
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a “*material entity that has the material sample role*”, while a material sample role is defined as “ *a specimen role borne by a material entity that is the output of a material sampling process*”, and a material sampling process is “*a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity*”. ****
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.****
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.****
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree
was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same
Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique
Individual instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Yes, that’s a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a “thing” (around which other “things” are aggregated), and I can’t imagine such a “thing” that we would care about for aggregating purposes, about which we would not associate other property values.
I say all this quite deliberately in reference to “dwc:individualID”, of course…. J
Aloha,
Rich
From: Markus Döring [mailto:m.doering@mac.com] Sent: Thursday, May 30, 2013 7:56 PM To: Jason Holmberg Cc: Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus
Yep--- that reference point for aggregation can be really powerful: To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:
IndividualID = JohnDeck MaterialSampleID = JohnDeckTissueSample1 OccurrenceID = JohnDeckOccurrence123 Taxon = "Homo sapiens"
IndividualID = JohnDeck MaterialSampleID = JohnDeckGutSample1 OccurrenceID = JohnDeckOccurrence124 Taxon = "Bacteria500"
IndividualID = JohnDeck MaterialSampleID = JohnDeckGutSample1 OccurrenceID = JohnDeckOccurrence125 Taxon = "Bacteria501"
JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well. This approach allows for some sense to be constructed using a flat file approach if desired. Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context. Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome") but the distinct occurrence records for the gut samples could be listed as (http://purl.obolibrary.org/obo/ENVO_01000162, "organ").
Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows. If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature). In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use. However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome"). Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation).
John
On Thu, May 30, 2013 at 11:48 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Yes, that’s a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.****
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a “thing” (around which other “things” are aggregated), and I can’t imagine such a “thing” that we would care about for aggregating purposes, about which we would not associate other property values. ****
I say all this quite deliberately in reference to “dwc:individualID”, of course…. J****
Aloha,****
Rich****
*From:* Markus Döring [mailto:m.doering@mac.com] *Sent:* Thursday, May 30, 2013 7:56 PM *To:* Jason Holmberg *Cc:* Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus ****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks, John this is REALLY helpful!
A couple questions can you expand a bit on the differences between JohnDeckOccurrence123, 124,and 125? Im assuming that JohnDeckOccurrence123 is associated with the Event representing the time & place when JohnDeckTissueSample1 was removed from JohnDeck. Im guessing that JohnDeckOccurrence124 is associated with the Event representing the time & place when JohnDeckGutSample1 was removed from JohnDeck. What I dont understand is why there needs to be a JohnDeckOccurrence125. What Occurrence does that represent? Later you suggest that JohnDeck (WholeOrganism) was extracted from nature. Is the extraction-from-nature Occurrence one of these three Occurrences?
What you describe below is consistent with our approach to treating materialSample as a subclass of Individual (assuming a hierarchical Individual, which means that ParentIndividualID of both JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck). The nice thing about the hierarchical approach is that deals with the problem you describe in the last paragraph.
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Friday, May 31, 2013 7:54 AM To: Richard Pyle Cc: Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert Whitton; Ramona Walls Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Yep--- that reference point for aggregation can be really powerful: To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:
IndividualID = JohnDeck
MaterialSampleID = JohnDeckTissueSample1
OccurrenceID = JohnDeckOccurrence123
Taxon = "Homo sapiens"
IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence124
Taxon = "Bacteria500"
IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence125
Taxon = "Bacteria501"
JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well. This approach allows for some sense to be constructed using a flat file approach if desired. Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context. Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome") but the distinct occurrence records for the gut samples could be listed as ( http://purl.obolibrary.org/obo/ENVO_01000162 http://purl.obolibrary.org/obo/ENVO_01000162, "organ").
Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows. If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature). In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use. However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome"). Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation).
John
On Thu, May 30, 2013 at 11:48 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Yes, thats a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a thing (around which other things are aggregated), and I cant imagine such a thing that we would care about for aggregating purposes, about which we would not associate other property values.
I say all this quite deliberately in reference to dwc:individualID, of course . J
Aloha,
Rich
From: Markus Döring [mailto:m.doering@mac.com] Sent: Thursday, May 30, 2013 7:56 PM To: Jason Holmberg Cc: Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Since it was a gut sample, we'll be seeing lots of stuff in there, maybe even 1000 different taxa each one can be a distinct occurrence as JohnDeckOccurrence124 and JohnDeckOccurrence125 are different taxa.
Now, in the sense of Event/location i'm taking the location details to be when it was isolated from nature and thus the location would be the same as the wholeorganism location. In our recent workshop on this same issue recently in Copenhagen we went around about this issue for awhile but decided that we should take "location" to mean the the location at which whatever parent organism was isolated in nature. Certainly the location where the gut contents were extracted (e.g. in the lab) is important too but that is something different and not represented by dc:location or in dwc in general. This is something of an interpretation of the actual term but it since our implementation model of DwCA is still using "occcurrence" at the core we probably don't have much choice since GBIF does not use a graph-based parser. The other option is to represent this all using event/location at the core of a DwCA but felt this introduced yet more complexities and breaks other cases where we would want to hang data off of occurrence (e.g. Identification), and would not be able to do so if event/location lived at the core.
John
On Fri, May 31, 2013 at 12:02 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Thanks, John – this is REALLY helpful!****
A couple questions – can you expand a bit on the differences between JohnDeckOccurrence123, 124,and 125? I’m assuming that JohnDeckOccurrence123 is associated with the Event representing the time & place when JohnDeckTissueSample1 was removed from JohnDeck. I’m guessing that JohnDeckOccurrence124 is associated with the Event representing the time & place when JohnDeckGutSample1 was removed from JohnDeck. What I don’t understand is why there needs to be a JohnDeckOccurrence125. What Occurrence does that represent? Later you suggest that JohnDeck (WholeOrganism) was extracted from nature. Is the extraction-from-nature Occurrence one of these three Occurrences?****
What you describe below is consistent with our approach to treating materialSample as a subclass of Individual (assuming a hierarchical Individual, which means that ParentIndividualID of both JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck). The nice thing about the hierarchical approach is that deals with the problem you describe in the last paragraph.****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Friday, May 31, 2013 7:54 AM *To:* Richard Pyle *Cc:* Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert Whitton; Ramona Walls
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Yep--- that reference point for aggregation can be really powerful: To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckTissueSample1****
OccurrenceID = JohnDeckOccurrence123****
Taxon = "Homo sapiens"****
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckGutSample1****
OccurrenceID = JohnDeckOccurrence124****
Taxon = "Bacteria500"****
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckGutSample1****
OccurrenceID = JohnDeckOccurrence125****
Taxon = "Bacteria501"****
JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well. This approach allows for some sense to be constructed using a flat file approach if desired. Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context. Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome") but the distinct occurrence records for the gut samples could be listed as (http://purl.obolibrary.org/obo/ENVO_01000162, "organ").****
Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows. If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature). In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use. However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome"). Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation). ****
John****
On Thu, May 30, 2013 at 11:48 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Yes, that’s a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.****
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a “thing” (around which other “things” are aggregated), and I can’t imagine such a “thing” that we would care about for aggregating purposes, about which we would not associate other property values. ****
I say all this quite deliberately in reference to “dwc:individualID”, of course…. J****
Aloha,****
Rich****
*From:* Markus Döring [mailto:m.doering@mac.com] *Sent:* Thursday, May 30, 2013 7:56 PM *To:* Jason Holmberg *Cc:* Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus ****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
OK, thanks. Now I understand it. This is all related to the taxonomic homogeneous/heterogeneous thing. One thing I should caution, using your example data below:
If we assume that JohnDeckGutSample1 was extracted from JohnDeck after JohnDeck was extracted from nature, then we have to be careful about inferring that the organism Bacteria501 has an occurrence related to the place & time where JohnDeck was extracted from nature. In other words, we cant reliably connect Bacteria501 with the Occurrence of JohnDeck in nature, because Bacteria501 might have entered the gut of JohnDeck at some later time (e.g., during decomposition).
I also disagree that the location where the gut sample was taken is fundamentally different where the organism was extracted from nature. We definitely need be able to distinguish between Occurrences representing natural place+time+organism instances, from articifical instances. However, you cant simply say extracting a tissue in a lab is fundamentally different from extracting an organism in nature The reason is that there is a very rich spectrum between those two end points, and no clear place along that spectrum where a line can be drawn. What about a specimen of a naturalized species in a certain location? What about an organism taken from nature that was born of parents that were brought to that place by humans? What about the organisms that were themselves brought by humans, then released, then recaptured? What about captive organisms or plans in a persons garden? This spectrum continues all the way down to extracting a gut sample from a specimen collected in Moorea, in a lab at Berkeley. The degree of naturalness of an Occurrence is certainly important, but its not Boolean, and its only one axis of interest, so we shouldnt simply assume dc:location represents some kinds of locations, but not others.
I think the basic problem is that, as has already been stated, DwC emerged from the collections-based world, where one specimen = one occurrence, and that occurrence is naturally regarded as being the occurrence when+where the specimen was extracted from nature. Now that we have such a diversity of data we are trying to manage, this Occurrence-centric approach (with its overloaded notion of an Occurrence) is being stretched to the breaking point.
I think we should trend towards leaving DwC as a simple data exchange paradigm, and focus these more complex conversations on a next-gen ontology for biodiversity data. I realize thats already happening; but it seems like the center of mass for conversation should shift from DwC to the biodiversity ontology domain.
Aloha,
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Friday, May 31, 2013 9:17 AM To: Richard Pyle Cc: TDWG Content Mailing List; Robert Whitton; Ramona Walls Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Since it was a gut sample, we'll be seeing lots of stuff in there, maybe even 1000 different taxa each one can be a distinct occurrence as JohnDeckOccurrence124 and JohnDeckOccurrence125 are different taxa.
Now, in the sense of Event/location i'm taking the location details to be when it was isolated from nature and thus the location would be the same as the wholeorganism location. In our recent workshop on this same issue recently in Copenhagen we went around about this issue for awhile but decided that we should take "location" to mean the the location at which whatever parent organism was isolated in nature. Certainly the location where the gut contents were extracted (e.g. in the lab) is important too but that is something different and not represented by dc:location or in dwc in general. This is something of an interpretation of the actual term but it since our implementation model of DwCA is still using "occcurrence" at the core we probably don't have much choice since GBIF does not use a graph-based parser. The other option is to represent this all using event/location at the core of a DwCA but felt this introduced yet more complexities and breaks other cases where we would want to hang data off of occurrence (e.g. Identification), and would not be able to do so if event/location lived at the core.
John
On Fri, May 31, 2013 at 12:02 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Thanks, John this is REALLY helpful!
A couple questions can you expand a bit on the differences between JohnDeckOccurrence123, 124,and 125? Im assuming that JohnDeckOccurrence123 is associated with the Event representing the time & place when JohnDeckTissueSample1 was removed from JohnDeck. Im guessing that JohnDeckOccurrence124 is associated with the Event representing the time & place when JohnDeckGutSample1 was removed from JohnDeck. What I dont understand is why there needs to be a JohnDeckOccurrence125. What Occurrence does that represent? Later you suggest that JohnDeck (WholeOrganism) was extracted from nature. Is the extraction-from-nature Occurrence one of these three Occurrences?
What you describe below is consistent with our approach to treating materialSample as a subclass of Individual (assuming a hierarchical Individual, which means that ParentIndividualID of both JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck). The nice thing about the hierarchical approach is that deals with the problem you describe in the last paragraph.
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Friday, May 31, 2013 7:54 AM To: Richard Pyle Cc: Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert Whitton; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Yep--- that reference point for aggregation can be really powerful: To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:
IndividualID = JohnDeck
MaterialSampleID = JohnDeckTissueSample1
OccurrenceID = JohnDeckOccurrence123
Taxon = "Homo sapiens"
IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence124
Taxon = "Bacteria500"
IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence125
Taxon = "Bacteria501"
JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well. This approach allows for some sense to be constructed using a flat file approach if desired. Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context. Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome") but the distinct occurrence records for the gut samples could be listed as ( http://purl.obolibrary.org/obo/ENVO_01000162 http://purl.obolibrary.org/obo/ENVO_01000162, "organ").
Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows. If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature). In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use. However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome"). Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation).
John
On Thu, May 30, 2013 at 11:48 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Yes, thats a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a thing (around which other things are aggregated), and I cant imagine such a thing that we would care about for aggregating purposes, about which we would not associate other property values.
I say all this quite deliberately in reference to dwc:individualID, of course . J
Aloha,
Rich
From: Markus Döring [mailto:m.doering@mac.com] Sent: Thursday, May 30, 2013 7:56 PM To: Jason Holmberg Cc: Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls
Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
"..I think we should trend towards leaving DwC as a simple data exchange paradigm, and focus these more complex conversations on a next-gen ontology for biodiversity data. I realize that’s already happening; but it seems like the “center of mass” for conversation should shift from DwC to the biodiversity ontology domain."
Agree!!
On Fri, May 31, 2013 at 12:56 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
OK, thanks. Now I understand it. This is all related to the taxonomic homogeneous/heterogeneous thing. One thing I should caution, using your example data below:****
If we assume that JohnDeckGutSample1 was extracted from JohnDeck after JohnDeck was extracted from nature, then we have to be careful about inferring that the organism Bacteria501 has an occurrence related to the place & time where JohnDeck was extracted from nature. In other words, we can’t reliably connect Bacteria501 with the Occurrence of JohnDeck in nature, because Bacteria501 might have entered the gut of JohnDeck at some later time (e.g., during decomposition).****
I also disagree that the location where the gut sample was taken is fundamentally different where the organism was extracted from nature. We definitely need be able to distinguish between Occurrences representing “natural” place+time+organism instances, from “articifical” instances. However, you can’t simply say “extracting a tissue in a lab is fundamentally different from extracting an organism in nature” The reason is that there is a very rich spectrum between those two end points, and no clear place along that spectrum where a line can be drawn. What about a specimen of a “naturalized” species in a certain location? What about an organism taken from nature that was born of parents that were brought to that place by humans? What about the organisms that were themselves brought by humans, then released, then recaptured? What about captive organisms or plans in a person’s garden? This spectrum continues all the way down to extracting a gut sample from a specimen collected in Moorea, in a lab at Berkeley. The degree of “naturalness” of an Occurrence is certainly important, but it’s not Boolean, and it’s only one axis of interest, so we shouldn’t simply assume dc:location represents some kinds of locations, but not others.****
I think the basic problem is that, as has already been stated, DwC emerged from the collections-based world, where one specimen = one occurrence, and that occurrence is naturally regarded as being the occurrence when+where the specimen was extracted from nature. Now that we have such a diversity of data we are trying to manage, this Occurrence-centric approach (with its overloaded notion of an “Occurrence”) is being stretched to the breaking point.****
I think we should trend towards leaving DwC as a simple data exchange paradigm, and focus these more complex conversations on a next-gen ontology for biodiversity data. I realize that’s already happening; but it seems like the “center of mass” for conversation should shift from DwC to the biodiversity ontology domain.****
Aloha,****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Friday, May 31, 2013 9:17 AM *To:* Richard Pyle *Cc:* TDWG Content Mailing List; Robert Whitton; Ramona Walls
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Since it was a gut sample, we'll be seeing lots of stuff in there, maybe even 1000 different taxa each one can be a distinct occurrence as JohnDeckOccurrence124 and JohnDeckOccurrence125 are different taxa. ****
Now, in the sense of Event/location i'm taking the location details to be when it was isolated from nature and thus the location would be the same as the wholeorganism location. In our recent workshop on this same issue recently in Copenhagen we went around about this issue for awhile but decided that we should take "location" to mean the the location at which whatever parent organism was isolated in nature. Certainly the location where the gut contents were extracted (e.g. in the lab) is important too but that is something different and not represented by dc:location or in dwc in general. This is something of an interpretation of the actual term but it since our implementation model of DwCA is still using "occcurrence" at the core we probably don't have much choice since GBIF does not use a graph-based parser. The other option is to represent this all using event/location at the core of a DwCA but felt this introduced yet more complexities and breaks other cases where we would want to hang data off of occurrence (e.g. Identification), and would not be able to do so if event/location lived at the core.****
John****
On Fri, May 31, 2013 at 12:02 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Thanks, John – this is REALLY helpful!****
A couple questions – can you expand a bit on the differences between JohnDeckOccurrence123, 124,and 125? I’m assuming that JohnDeckOccurrence123 is associated with the Event representing the time & place when JohnDeckTissueSample1 was removed from JohnDeck. I’m guessing that JohnDeckOccurrence124 is associated with the Event representing the time & place when JohnDeckGutSample1 was removed from JohnDeck. What I don’t understand is why there needs to be a JohnDeckOccurrence125. What Occurrence does that represent? Later you suggest that JohnDeck (WholeOrganism) was extracted from nature. Is the extraction-from-nature Occurrence one of these three Occurrences?****
What you describe below is consistent with our approach to treating materialSample as a subclass of Individual (assuming a hierarchical Individual, which means that ParentIndividualID of both JohnDeckTissueSample1 and JohnDeckGutSample1 is IndividualID=JohnDeck). The nice thing about the hierarchical approach is that deals with the problem you describe in the last paragraph.****
Rich****
*From:* jdeck88@gmail.com [mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Friday, May 31, 2013 7:54 AM *To:* Richard Pyle *Cc:* Markus Döring; Jason Holmberg; TDWG Content Mailing List; Robert Whitton; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
Yep--- that reference point for aggregation can be really powerful: To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:****
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckTissueSample1****
OccurrenceID = JohnDeckOccurrence123****
Taxon = "Homo sapiens"****
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckGutSample1****
OccurrenceID = JohnDeckOccurrence124****
Taxon = "Bacteria500"****
IndividualID = JohnDeck****
MaterialSampleID = JohnDeckGutSample1****
OccurrenceID = JohnDeckOccurrence125****
Taxon = "Bacteria501"****
JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well. This approach allows for some sense to be constructed using a flat file approach if desired. Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context. Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome") but the distinct occurrence records for the gut samples could be listed as (http://purl.obolibrary.org/obo/ENVO_01000162, "organ").****
Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows. If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)****
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature). In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use. However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome"). Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation). ****
John****
On Thu, May 30, 2013 at 11:48 PM, Richard Pyle deepreef@bishopmuseum.org wrote:****
Yes, that’s a fair point! In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.****
Nevertheless, I still maintain that if it fulfills that purpose, then it implies a “thing” (around which other “things” are aggregated), and I can’t imagine such a “thing” that we would care about for aggregating purposes, about which we would not associate other property values. ****
I say all this quite deliberately in reference to “dwc:individualID”, of course…. J****
Aloha,****
Rich****
*From:* Markus Döring [mailto:m.doering@mac.com] *Sent:* Thursday, May 30, 2013 7:56 PM *To:* Jason Holmberg *Cc:* Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls****
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples****
The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.
Markus ****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content****
-- John Deck (541) 321-0689****
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I'm going to go ahead and leave this subject line since its a convenient way to group related emails. However, this email really isn't about the material sample proposal (which I support given that dwctype:MaterialSample seems to me to be well-enough defined and to have a clear use).
I wanted to comment on what Rich said about "a robust ontology" and modeling in our data domain. I feel like the discussion here has demonstrated that there are a number of groups in our constituency who have an interest in modeling complex datasets that involve relating multiple observations/sampling incidents and keeping track of the relationships among sets of derived objects. So developing a consensus model is really important if we hope to integrate such datasets and facilitate "asking questions of the data" which probably will in many cases mean having the ability to construct queries that will "work" across these diverse datasets.
What I have an issue with is equating the development of a consensus data model with the development of a robust ontology. In a previous email, Rich hoped that DSW might be harmonized with BCO. I really am not sure that is possible and is perhaps not even desirable. DSW and BCO are in my mind apples and oranges.
Although we've called DSW an "ontology" because it's written in OWL and uses some of the constraints present in OWL to restrict how the DSW terms can be used, it really is fundamentally a data model, not an ontology. The basis of DSW (outlined at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels ) was pretty much laid out in Rich's email http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html based on the ASC model as modified by the discussion of "individual" in the 2010 discussion. The DSW model says that one to many Events can happen at one Location, one to many Occurrences can be documented during one Event, one Individual can be recorded in one to many Occurrences, etc. The DSW model does NOT define (ontologically) what a Location, Event, Occurrence, or Individual is (other than in the documentary text) or how they are related to each other ontologically (except to say that the class instances can be connected through DSW object properties, e.g. <dsw:IndividualOrganism instance> dsw:hasOccurrence <dwc:Occurrence instance>. DSW is designed to describe (and to some extent restrict) how its users should organize their data to allow them to aggregate their data with other DSW users and to allow queries to be constructed that will produce consistent results across providers.
In contrast, BCO (browse at http://bioportal.bioontology.org/ontologies/49826?p=terms ) uses rdfs:subClassOf, owl:someValuesFrom, and other properties to define how its classes are related to each other and to restrict what kinds of resources are allowed to fall within their defined classes. This makes it very useful for clearly defining what classes are in an ontological sense, but it's not a particularly efficient way of organizing class relationships in a database. For example, if you wanted to explain how an identification process was related to an organism, you could say that an identification process is a subclass of a process, which is a subclass of an occurrent, which is a subclass of an entity, which has the subclass continuent, which has the subclass independent continuent, which has the subclass material entity, which has the subclass object, which has the subclass organism or virus or viroid. If you wanted to do some kind of logical reasoning about organisms or identification processes, this would be great, but if you wanted to relate an Identification instance with an IndividualOrganism instance in a database, you would be better off just using dws:IndividualOrganism dsw:hasIdentification dwc:Identification than stringing a connection between the two class instances using the eight or so subClassOf relationships that connect <identification process> instances with <organism or virus or viroid> instances. The latter would not be "de-normalize until it works".
I think we clearly need a mechanism for defining and clarifying the relationships among material samples, organisms, specimens, material entities, populations, etc. and BCO or something like it is probably the best way to do that clearly. But I don't think that the resulting ontology is going to be a data model like DSW or ASC. I think a consensus ontology and a consensus data model would both be very useful, but I don't think they will or should be expected to be one and the same thing.
Steve
Richard Pyle wrote:
Thanks Ramona;
Actually, the basic elements of our data mode precede DwC by quite a bit. What we've tried to do, however, is mold the data model to be more compatible with DwC, to make the task of mapping for data export & exchange that much easier. Of course, DwC is not (and is not intend to be) a data model in any sense of the word; however, it's impossible to avoid representing core elements of a bona-fide data model within DwC. This is especially true when it comes to each of the "ID" terms (and doubly-especially true when the "ID" terms correlate to class terms). The existence of an "ID" term implies that some class of object exists to which an "ID" value is applied. The "ID" value itself is never useful data/metadata -- it is just a way to reference a data record that (presumably) contains properties that can be expressed as data/metadata for the object represented by the "ID" value.
This was all well-understood when the original DwC was being drafted; but as it evolved into its current iteration (with the addition of all the "ID" terms), it has been drawn ever more (in some ways subtly, and in some ways not-so-subtly) in the direction of a data model.
Of course, what we all (desperately!) need is a robust ontology that fits our world. The task is not easy in part because our data domain is not so easily modeled, in part because different sections of our broader community have different priorities, and in part because there is always a delicate balance between developing a model or ontology that is practical and useful for the data providers and consumers, with one that is robust and detailed and flexible, to allow asking questions of the data that were never even considered at the time the model/ontology were conceived. The parallel experience in database modeling is normalization (as Paul Kirk likes to say: "Normalize until it hurts; then de-normalize until it works").
The original DwC was completely flat. The current DwC moved into the direction of more complex structures by clustering terms into classes and sprinkling with "ID" terms. It even tip-toed into RDF-space with dwc:ResourceRelationship. I think that's definitely an improvement, but it still must strike a delicate balance between the needs by some to represent a robust data model, and the needs by others to have a simple/practical mechanism to exchange biodiversity data in a standard form. It will never be all things to all people; but at least it is enough things to enough people that it represents an important "flag pole" around which our community has (more or less) successfully rallied.
Hmmmm.... Now I've forgotten what my point was. I guess I was just in a ramblin' mood. Well....sorry about the bandwidth!
Aloha,
Rich
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com] *Sent:* Thursday, May 30, 2013 6:05 PM *To:* Richard Pyle *Cc:* Jason Holmberg; TDWG Content Mailing List; John Deck; Robert Whitton *Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Jason,
Thanks for sharing how you have been using the Darwin Core terms. I am intrigued by the data structure you have developed. It is quite interesting how both you and Rich have adapted the DwC to fit your specific needs. While I am often troubled by the vagueness of DwC, I guess in some ways it is that vagueness allows it to be used in many different applications. Of course, I don't think vagueness is necessary for wide application, or a good thing for data exchange, but it does seem to be working for a lot of different purposes.
Ramona
On Thu, May 30, 2013 at 3:07 PM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Hi Jason,
Many thanks for this input. If I understand you correctly, then you are using "Encounter" as equivalent to what we have been using "Occurrence" for. That is, by our definition, an "Occurrence" is the instance representing an intersection between an Event (i.e., where, when, who, etc.) and what we have been calling an "Individual" (i.e., what); and the properties we attach to the Occurrence are the "how" bits (including things like size, etc.).
In my mind, the essence of an "Individual" is the collective physical material of the individual. If I see a fish on a reef, its "Occurrence" on that reef and at that time exists (and is worth documenting) regardless of whether I took an image of it (what we would call "Evidence"), or whether I took a tissue sample from it, or whether I collected and killed the whole damn thing. To me, the "essence" of the individual -- or its occurrence at an event -- is unaffected by what I end up doing to it. By extension, following a hierarchical model of "individual", a sub-sample (materialSample) extracted from it is just another instance of "Individual". This is why I generally think of "materialSample" (if it were represented as a class -- which it is currently not propsed for DWC) as a subclass of a broader concept (e.g., "material", but what I have naively been referring to as "Individual").
That part of our model has proven to be very stable and effective for representing the information as we want it.
Where it gets complicated is instances of taxonomically heterogeneous objects treated as a single "individual" -- which (in my mind) includes such things as soil samples.
In that contect, I see (and agree) with John and others that really it's a separate axis of classification from what I have called "Individual".
I don't expect that to make a lot of sense (I barely understand it myself).
Aloha,
Rich
*From:* Jason Holmberg [mailto:holmbergius@gmail.com mailto:holmbergius@gmail.com] *Sent:* Thursday, May 30, 2013 11:28 AM *To:* Richard Pyle *Cc:* Ramona Walls; TDWG Content Mailing List; John Deck; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi everyone.
List lurker here. DWC has been a great inspiration in my work, so I hope I can contribute some small amount of insight on the "individual" and "material sample" threads. I have no grand thoughts on the subject, but I can tell you how the DWC has inspired my own information architecture for open source mark-recapture software:
http://www.ecoceanusa.org/shepherd/doku.php?id=manual:2.0.x:1_overview
I felt the very clear need for a distinct Individual Class and to separate that from the concept of a sample taken from nature. When reviewing DWC, I interpreted Occurrence.individualCount to be somewhat contradictory to Occurrence.individualID, so I created a one-individual-at-a-point-in-time class called Encounter that reuses quite a bit of DWC.Occurrence. Occurrence I then broadened to include the potential for multiple marked individuals.
I neither present this as "right" nor "good" (though they have worked very well for us). I just present it as a practical example from mark-recapture in which we have tried to adhere to DWC in order to expose data to GBIF, iOBIS, etc. The concepts of "material sample" and "individual" are very important to us, and this is how we have defined them.
Cheers,
Jason Holmberg ECOCEAN Whale Shark Photo-identification Library http://www.whaleshark.org
Please consider adopting a shark to support our mission: http://www.whaleshark.org/adoptashark.jsp
On Wed, May 29, 2013 at 4:13 PM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Hi Ramona,
Yes, I agree, and thanks. I've always felt that there has been a trend towards trying to push too much "ontology" (or other semantic meaning) onto DWC terms and classes, when DWC was fundamentally intended to represent an mechanism for data exchange; not a mechanism to describe the ontological landscape of biodiversity data. The only reason I brought this up now (and, I think, why we discussed it in 2010), is that the term "individualID" in DWC sort of hinted that something like "Individual" was the "forgotten class" for DWC. I sincerely hope that BCO and DSW gain more traction (and, ideally, harmony between them) than earlier attempts at developing ontologies in this space have met -- and clearly that is the right path forward.
My main concern for this thread (and the reason I engaged in it), was to:
Find out the status of the discussions that began in 2010; and
Clarify where the current materialSample proposal overlaps, or
does not overlap, with that earlier effort.
Steve has very adequately answered the first question, and you, John, and others have answered the second, and I'm happy with both sets of answers.
I'm sorry for the voluminous exchange, but I felt the discussion was both important, and very helpful (certainly to me).
Aloha,
Rich
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 1:03 PM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi Rich,
Sorry I didn't mention this sooner, but your emails were also helpful to me in describing an important and generalizable use case.
I don't know whether or not the TDWG community is ready to deal with the level of abstraction we are talking about, but my assessment is that whether or not they are ready, the Darwin Core is not constructed to deal with it. That is why (among other reasons) we started work on the BCO, and perhaps one reason why Steve and others developed DSW.
Our goal with the material sample proposal was not to overhaul DwC, but to work within the DwC framework to make it more compatible with other standards such as MIxS. That is why we tried to keep our proposal fairly narrowly focused.
Ramona
On Wed, May 29, 2013 at 3:21 PM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Thanks, Ramona -- this is an **extremely** helpful email! It helps clear things up a lot in my mind.
Just to be clear, what I am looking for is the notion of a defined physical object (what I think you mean by "material entity"), and I explicitly mean the material entity itself. Yes, there is information (properties) that relate to that material entity, but to me that is a separate issue. What I would like to see clearly defined is the class representing the material (physical) entity -- which seems to me to be a superclass of what materialSample is intended to represent.
Perhaps our (TDWG/DWC) community is not yet ready to deal with this level of abstraction (unfortunately, I absolutely have to, because "Occurrence" is simply way too overloaded a class for me to use independently of what I have been calling "individual" and what I have been calling "Evidence"). In that case, I guess the best thing to do is accept materialSample as a basisOfRecord for Occurrence and move on. But this is more or less the same thing that happened the last time we engaged in this conversation (2 years or so ago), and I was hoping this conversation about materialSample could leverage progress on the larger issue.
As I've said before, the last thing I want to do is confuse or otherwise slow down the process of incorporating the term "materialSample" into DWC. It's just that I saw enough overlap with that "other" issue, that I was hoping we could find a reasonable pathway forward on both.
Thanks again for the very helpful comments.
Aloha,
Rich
*From:* Ramona Walls [mailto:rlwalls2008@gmail.com mailto:rlwalls2008@gmail.com] *Sent:* Wednesday, May 29, 2013 9:14 AM *To:* Richard Pyle *Cc:* John Deck; Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Rich,
I now understand more fully what you are asking for ( a clear definition goes a long way!). A material sample, as we discussed it at the Kansas and Oxford workshops, does indeed need to be physically removed from its environment. This is also the case with the OBI term material sample, which, as a subclass of OBI:specimen is the output of some collecting process. It is true that concept of material sample could be defined to include sampling in an observational sense, but that is not how it is defined at this point. Based on this, material sample is NOT the term you are looking for or defined as :
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
What you have defined is a category of information (whatever that may be) that pertains to some material entity. Not the material entity itself, but information about that entity. The "SuperclassTerm" you refer to in the definition sounds an awful lot like a material entity from the Basic Formal Ontology, which is used for defining material sample in OBI and the Bio-collections Ontology.
Ramona
On Wed, May 29, 2013 at 11:51 AM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Many thanks, John. This is extremely helpful!
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term "MaterialSample". I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined "ID" term? Do all the others have defined "ID" terms?).
However, I'm excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around "IndividualOrganism").
This bigger problem involves the need for a defined "concept" (I'm hesitating to say "class"), and an associated "ID", in dwc that refers to the physical/material basis of an Occurrence. We don't yet have a term for this concept in dwc ("IndividualID" hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that we're on the same page about what each concept is. The more I understand about what you need for "materalSample", the more convinced I am that both of our needs can be met with the same concept.
I am perfectly happy to adopt the term "MaterialSample", but I guess it all boils down to this: In order for something to be a "MaterialSample", must it necessarily be removed from nature?
If the answer is "No", then I think we can merge the two concepts into one.
If the answer is "Yes", then I think "materialSample" is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
Aloha,
Rich
*From:* jdeck88@gmail.com mailto:jdeck88@gmail.com [mailto:jdeck88@gmail.com mailto:jdeck88@gmail.com] *On Behalf Of *John Deck *Sent:* Wednesday, May 29, 2013 4:01 AM *To:* Richard Pyle *Cc:* Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls
*Subject:* Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
In the text of the issue submitted for MaterialSample (https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of "Material Sample" as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations.
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus' email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve's initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we're discussing.
NOTES, MaterialSample from OBI:
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a "/material entity that has the material sample role/", while a material sample role is defined as " /a specimen role borne by a material entity that is the output of a material sampling process/", and a material sampling process is "/a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity/".
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the
tree was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same
Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique
Individual instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
-- John Deck (541) 321-0689 tel:%28541%29%20321-0689
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Many thanks, Steve.
What I have an issue with is equating the development of a consensus data model with the development of a robust ontology. In a previous email, Rich hoped that DSW might be harmonized with BCO. I really am not sure that is possible and is perhaps not even desirable. DSW and BCO are in my mind apples and oranges.
I agree. Through this discussion, I have since come to see my earlier position as unrealistic and naïve; and, in fact, not even necessarily desirable (as Steve indicates). So no resistance from me on that point.
Although we've called DSW an "ontology" because it's written in OWL and uses some of the constraints present in OWL to restrict how the DSW terms can be used, it really is fundamentally a data model, not an
ontology.
The basis of DSW (outlined at
http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels)
was pretty much laid out in Rich's email
http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html
based on the ASC model as modified by the discussion of "individual" in
the 2010 discussion.
The DSW model says that one to many Events can happen at one Location, one to many Occurrences can be documented during one Event, one Individual
can be recorded in one to
many Occurrences, etc. The DSW model does NOT define (ontologically) what
a Location, Event,
Occurrence, or Individual is (other than in the documentary text) or how
they are related to each other
ontologically (except to say that the class instances can be connected
through DSW object properties,
e.g. <dsw:IndividualOrganism instance> dsw:hasOccurrence <dwc:Occurrence
instance>.
DSW is designed to describe (and to some extent restrict) how its users
should organize their data to
allow them to aggregate their data with other DSW users and to allow
queries to be constructed
that will produce consistent results across providers.
Let me just say here that, while I agree with everything you say above (i.e., that DSW is more of a data model with some ontological characteristics, than a proper ontology), I see DSW as an EXTREMELY valuable step in the right direction. Back when Steve first posted all that information to the google code site, we printed up a copy of the diagram at the top of this page: https://code.google.com/p/darwin-sw/ on large-format paper, and it remains posted on the wall in the office that Rob and I share as a guide. We now have our own diagram (which is currently sketched on a whiteboard right next to the DSW diagram), which is conceptually almost identical, but with some extensions and additional features (e.g., many-to-many relationship between Occurrence and "Evidence" (=token); hierarchical Locations, Events, Evidence, and Individuals; etc.).
But like Steve, what we have is a data model, not an ontology.
I think we clearly need a mechanism for defining and clarifying the
relationships
among material samples, organisms, specimens, material entities,
populations,
etc. and BCO or something like it is probably the best way to do that
clearly.
But I don't think that the resulting ontology is going to be a data model
like
DSW or ASC. I think a consensus ontology and a consensus data model would
both be very useful, but I don't think they will or should be expected to be one and the same thing.
OK, I think this is an extremely important point. So, I guess the question is: which should we focus on? Data model, or ontology? The obvious answer is "Both". However, if it is "Both", then the historical trend is that one class of people tend to converge around the ontology, and another class of people tend to converge around the data model (both classes being subclasses of the superclass "biodiversityDataNerd") -- which is sort of the predicament we're in right now. My earlier comment about moving the center of mass of the discussion was an effort to build some bridges between these two currently largely non-connected) conversations.
I have a lot of experience thinking about data models, and a lot to contribute on that topic. I have very little experience thinking about ontologies, and very little to contribute on that topic (my definition of "ontology" is the one Roger Hyam showed at TDWG a few years ago: "Ontology: blah blah blah"). But I also recognize the strong need for these groups to co-mingle more than they have been. We definitely need an ontology to allow reasoning across the information stored in our data models; but it's not unusual for me to see pieces of biodiversity ontologies that could have benefitted from some better insight on how the biodiversity data are modeled (though this may have been limited to early biodiversity ontology efforts -- I haven't kept up lately).
All of this rambling to ask: What do we do next? Do we need to stop talking about DWC and start talking about..... what? Data Modeling? Ontology? Both? Separately? Concurrently? On this list? On a Wiki somewhere...? I really have no idea or opinion about where we go from here -- as long as it's not the same old circular conversation (also, I'd rather it not be "nowhere").
Aloha, Rich
Rich,
I would also like to see this conversation continue, although like you, I'm not sure about the venue and strategy. I suspect that we are close to saturating the tdwg-content email list - at a certain level of traffic, some subscribers start to zone out. I think this discussion would be appropriate on the RDF Task Group (technically RDF/OWL Task Group) email list, although ontology development doesn't have to be done in OWL and data models don't have to be expressed as RDF. It's my understanding that the RDF TG has been charged with "examining the implications of adding new classes to the Darwin Core Type Vocabulary in the broader context of clarification of the relationships among classes in the biodiversity realm" (see Background at http://code.google.com/p/tdwg-rdf/ ), which sounds a lot like just what we've been talking about.
The RDF TG has a functioning Wiki and email infrastructure as well as Subversion capabilities and a couple RDF sandboxes for testing. Also, a subset of the TG members/subscribers (not including me) are OBO Ontology/BioPortal savy. The disadvantage of moving the discussion to the RDF TG email list is that it might exclude people with data modeling experience who are on tdwg-content, but not on the RDF TG email list (but anyone who is interested can be added to the email list).
Alternatively, the discussion could just continue here... Steve
Richard Pyle wrote:
OK, I think this is an extremely important point. So, I guess the question is: which should we focus on? Data model, or ontology? The obvious answer is "Both". However, if it is "Both", then the historical trend is that one class of people tend to converge around the ontology, and another class of people tend to converge around the data model (both classes being subclasses of the superclass "biodiversityDataNerd") -- which is sort of the predicament we're in right now. My earlier comment about moving the center of mass of the discussion was an effort to build some bridges between these two currently largely non-connected) conversations.
I have a lot of experience thinking about data models, and a lot to contribute on that topic. I have very little experience thinking about ontologies, and very little to contribute on that topic (my definition of "ontology" is the one Roger Hyam showed at TDWG a few years ago: "Ontology: blah blah blah"). But I also recognize the strong need for these groups to co-mingle more than they have been. We definitely need an ontology to allow reasoning across the information stored in our data models; but it's not unusual for me to see pieces of biodiversity ontologies that could have benefitted from some better insight on how the biodiversity data are modeled (though this may have been limited to early biodiversity ontology efforts -- I haven't kept up lately).
All of this rambling to ask: What do we do next? Do we need to stop talking about DWC and start talking about..... what? Data Modeling? Ontology? Both? Separately? Concurrently? On this list? On a Wiki somewhere...? I really have no idea or opinion about where we go from here -- as long as it's not the same old circular conversation (also, I'd rather it not be "nowhere").
Aloha, Rich
.
participants (5)
-
Jason Holmberg
-
John Deck
-
Markus Döring
-
Richard Pyle
-
Steve Baskauf