Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Hi Ramona,
I apologize for the long emails, but this stuff is complex and unfortunately requires lots of words (to avoid - or at least minimize - misunderstanding). I will try to keep my responses to your points short.
Using the word "individual" to describe collections of organisms - whether they are taxonomically homogenous or heterogeneous - makes no
sense.
Yes, I know it is just a label, but seriously, just make a better label.
Yes, I agree. But it's what we already have in DWC (http://rs.tdwg.org/dwc/terms/index.htm#individualID) I have no problem using a different term, but before we choose terms, we should first define what the concepts are.
A single organism and a collection of organisms are fundamentally
different things.
Actually, not really different things. Many natural history collections maintain their specimens as "lots", which may have a single individual specimen, or multiple specimens. Regardless of whether it's a single specimen or multiple specimens, the basic properties are the same (same collecting event, sme taxonomic identification, and many other identical properties). This becomes especially true for colonial organisms (like corals, where the "individual" could be interpreted as a single polyp). It's also true for other use cases we deal with that are outside the DWC/TDWG scope.
If you need a class that can cover both of them under certain
circumstances,
you need to use a logical definition to define the circumstances (just
like the
class material sample does by using the criterion of having a material
sample role).
In order to do this, you also need to have separate classes for individual
organism
and collection of organisms.
We have tried to do this by distinguishing instances as "Lot" or "Whole Organism" -- which could be thought of as distinct subclasses (though again, they generally share the same properties). The same is true for tissue samples, and other "parts".
I agree whole-heartedly with the need to clearly track stakeholders needs for different classes of things, using a logical system to decide how
these
things relate to one another, examining alternative systems for creating the classes of things, and testing them against use cases (Steve's points
1-4).
This is precisely what we are trying to do with the bio-collections
ontology
(BCO). The suggestion to use the term material sample came out of just such a process. It is important to remember that the stakeholders include more than just the community using DwC.
It seems we are all in full agreement on these points. In my case, I am especially in agreement with the last point, as much of our thinking has been independent of the TDWG/DWC thinking, but still keeping that set of use-cases in mind.
Aloha, Rich
Rich, Thanks for clarifying that "individual" as you are using the term corresponds to dsw:IndividualOrganism rather than THeE. I didn't read carefully enough. With regards to the term "Individual", as you note, its origin is from the term dwc:individualID. The original DwC term addition proposal was to create the term dwc:Individual to follow the pattern of the other ID terms in DwC. When Cam and I were writing DSW in Web Ontology Language (OWL), we realized that "individual" had a particular meaning in OWL: it effectively means "instance". So that made creating a class called dsw:Individual particularly confusing. For that reason, Cam suggested dsw:IndividualOrganism to indicate that we were talking about individuals sensu organisms rather than individuals sensu OWL. As you know, we never intended for it to apply only to individual organisms.
I think that pretty much everybody agrees that "individual" is a confusing term name for a number of reasons. If at some point there is a DwC term which corresponds to what we are talking about (TaxonomicallyHomogeneousEntity, THoE, or whatever), the solution may be to deprecate dwc:individualID and change it to dwc:taxonomicallyHomogeneousEntityID or whatever corresponds to the new class name with "ID" tacked on the end. For convenience, in this email I'll refer to "Individual" with the understanding that it's not a good name.
Although there is potentially significant overlap between the proposed dwc:MaterialSample class and Individual, I think that there are at least two ways that they differ significantly. One is that I'm pretty sure that there is no requirement that a dwc:MaterialSample must be a biological material (i.e. derived from a living thing). I think that it's pretty clear from what Rich has said that Individual (to include the range from tissue samples up to herds) must consist of biological materials. The other is that a material sample must be physically sampled (i.e. removed from the environment and subjected to some kind of processing). An important feature of an Individual (at least to me!) is that it can be observed, photographed, or recorded without necessarily having all or part of it being removed from its environment and subjected to processing. My reading of the definition of dwc:individualID (http://rs.tdwg.org/dwc/terms/index.htm#individualID ) is that the "individual or named group of individual organisms" (e.g. "Orca J 23") might be observed repeatedly without physical sampling. The definition of dwc:individualID says "resampling", but I think "sampling" there was being used more broadly than just "physically removing part or all of the organism". I've been thinking about whether it is a problem for DwC type vocabulary terms to overlap. There is nothing in the current definitions of the type vocabulary terms that require its classes to be disjoint. I think it is possible that something could be both a dwctype:PreservedSpecimen and a dwctype:FossilSpecimen, and if dwctype:MaterialSample is accepted as a term there would undoubtedly be things that were both dwctype:PreservedSpecimen and dwctype:MaterialSample. So I don't think it is necessarily a problem if there is overlap between dwctype:MaterialSample and an Individual class. Certainly RDF allows a resource to have two (or more) rdf:type declarations.
With regards to Ramona's objection "Making one root class to cover lots of different types of entities is poor ontological practice", I would just note that Darwin Core is "glossary of terms ... intended to facilitate the sharing of information about biological diversity" (http://www.tdwg.org/standards/450/ ) and that the mission of TDWG in general is to "Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms" (http://www.tdwg.org/about-tdwg/ ) and not ontology building per se. So in the context of TDWG and Darwin Core, the primary criterion for judging a proposed term is whether it effectively facilitates the sharing and exchange of information about biological diversity, and not whether it fits well into an ontology. Don't get me wrong - I'm fully in support of ontology-building as a means to clarify the relationships among entities of interest to TDWG. What I'm saying is that there will probably be terms in DwC that have a utilitarian purpose in promoting data exchange that will never be part of an ontology. It is possible (perhaps likely) that Individual will be such a term. It was once described as "more of a database join than a real thing" (or something like that) which is perhaps an overstatement because it does correspond roughly to a certain set of real things. But I think its purpose is really more for linking sets of resources that have shared properties and shared connections to identifications, observations, etc.
Steve
Richard Pyle wrote:
Hi Ramona,
I apologize for the long emails, but this stuff is complex and unfortunately requires lots of words (to avoid - or at least minimize - misunderstanding). I will try to keep my responses to your points short.
Using the word "individual" to describe collections of organisms - whether they are taxonomically homogenous or heterogeneous - makes no
sense.
Yes, I know it is just a label, but seriously, just make a better label.
Yes, I agree. But it's what we already have in DWC (http://rs.tdwg.org/dwc/terms/index.htm#individualID) I have no problem using a different term, but before we choose terms, we should first define what the concepts are.
A single organism and a collection of organisms are fundamentally
different things.
Actually, not really different things. Many natural history collections maintain their specimens as "lots", which may have a single individual specimen, or multiple specimens. Regardless of whether it's a single specimen or multiple specimens, the basic properties are the same (same collecting event, sme taxonomic identification, and many other identical properties). This becomes especially true for colonial organisms (like corals, where the "individual" could be interpreted as a single polyp). It's also true for other use cases we deal with that are outside the DWC/TDWG scope.
If you need a class that can cover both of them under certain
circumstances,
you need to use a logical definition to define the circumstances (just
like the
class material sample does by using the criterion of having a material
sample role).
In order to do this, you also need to have separate classes for individual
organism
and collection of organisms.
We have tried to do this by distinguishing instances as "Lot" or "Whole Organism" -- which could be thought of as distinct subclasses (though again, they generally share the same properties). The same is true for tissue samples, and other "parts".
I agree whole-heartedly with the need to clearly track stakeholders needs for different classes of things, using a logical system to decide how
these
things relate to one another, examining alternative systems for creating the classes of things, and testing them against use cases (Steve's points
1-4).
This is precisely what we are trying to do with the bio-collections
ontology
(BCO). The suggestion to use the term material sample came out of just such a process. It is important to remember that the stakeholders include more than just the community using DwC.
It seems we are all in full agreement on these points. In my case, I am especially in agreement with the last point, as much of our thinking has been independent of the TDWG/DWC thinking, but still keeping that set of use-cases in mind.
Aloha, Rich
.
Thanks, Steve. I'll keep my responses as minimal as possible.
I think that pretty much everybody agrees that "individual" is a confusing term name for a number of reasons.
And I agree as well. I hope it's clear that we latched on to that term only because, at the time, it was the closest term to what we needed, and in many cases it's better to stick with an existing (even if potentially confusing) term than it is to invent a new (but nearly identical) term -- which risks creating even more confusion. As I have said, I think we should sort out the concepts first, then we should debate about the appropriate terms to label the concepts.
Although there is potentially significant overlap between the proposed dwc:MaterialSample class and Individual, I think that there are at least two ways that they differ significantly. One is that I'm pretty sure that there is no requirement that a dwc:MaterialSample must be a biological material (i.e. derived from a living thing).
In my mind, this doesn't really count as a "difference", because in our model, an "Individual" does not need to be biological material either. However, I concede that this is a bastardization of the original intent of dwc:individualID, so I'm ready to completely abandon the term "individual". My real concern is that we do not maintain parallel and largely overlapping terms. Following your suggestion, I would therefore recommend that we move towards deprecating dwc:individualID, and do one of the following:
1) Replace it with dwc:materialSampleID and establish a new materialSample class; or 2) Replace it with [someOtherLabel]ID and establish a new someOtherLabel class
But as I keep saying, the most important thing I think we need to discuss is whether the original intent of dwc:individualID:
"An identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set."
...encompasses (or should be redefined to encompass) what is needed to accommodate the needs of the proposed dwc:materialSample:
"The category of information pertaining to the physical results of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed."
Are we better served by a single class of thing that includes both definitions (with defined subclasses as needed)? Or are we better off with two completely separate classes of things? For reasons that would require far too much text to describe here, I strongly favor the former.
I can't express a preference for how best to label this concept (or these separate concepts) until I know what the concepts are.
I think that it's pretty clear from what Rich has said that Individual (to include the range from tissue samples up to herds) must consist of
biological materials.
No, that's not correct. In our model, "Individual" includes things that are decidedly *not* biological. Biological things are a subset of instances of our "Individual".
The other is that a material sample must be physically sampled (i.e.
removed
from the environment and subjected to some kind of processing). An
important
feature of an Individual (at least to me!) is that it can be observed,
photographed,
or recorded without necessarily having all or part of it being removed
from its environment
and subjected to processing.
Yes -- exactly. I think we need a superclass that can be applied physical objects (either biological, or non-biological). A subset of these things are biological. Another subset of these things are extracted from nature. Another subset is subsampled and used for some sort of destructive or non-destructive analysis.
There is nothing in the current definitions of the type vocabulary terms
that require its
classes to be disjoint. I think it is possible that something could be
both a
dwctype:PreservedSpecimen and a dwctype:FossilSpecimen, and if dwctype:MaterialSample is accepted as a term there would undoubtedly be
things
that were both dwctype:PreservedSpecimen and dwctype:MaterialSample. So I
don't think
it is necessarily a problem if there is overlap between
dwctype:MaterialSample and an Individual class.
Certainly RDF allows a resource to have two (or more) rdf:type
declarations.
Sure -- but we are setting ourselves up for chaos if we leave it open for individual providers to apply one class or another to the same physical thing.
Aloha, Rich
Rich,
if you take a tissue sample of the same tree every year, would the identifier in individualID be the same for all of them or be different? WIth the current dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two things into one definition.
best, Markus
On 27.05.2013, at 22:37, Richard Pyle wrote:
Thanks, Steve. I'll keep my responses as minimal as possible.
I think that pretty much everybody agrees that "individual" is a confusing term name for a number of reasons.
And I agree as well. I hope it's clear that we latched on to that term only because, at the time, it was the closest term to what we needed, and in many cases it's better to stick with an existing (even if potentially confusing) term than it is to invent a new (but nearly identical) term -- which risks creating even more confusion. As I have said, I think we should sort out the concepts first, then we should debate about the appropriate terms to label the concepts.
Although there is potentially significant overlap between the proposed dwc:MaterialSample class and Individual, I think that there are at least two ways that they differ significantly. One is that I'm pretty sure that there is no requirement that a dwc:MaterialSample must be a biological material (i.e. derived from a living thing).
In my mind, this doesn't really count as a "difference", because in our model, an "Individual" does not need to be biological material either. However, I concede that this is a bastardization of the original intent of dwc:individualID, so I'm ready to completely abandon the term "individual". My real concern is that we do not maintain parallel and largely overlapping terms. Following your suggestion, I would therefore recommend that we move towards deprecating dwc:individualID, and do one of the following:
- Replace it with dwc:materialSampleID and establish a new materialSample
class; or 2) Replace it with [someOtherLabel]ID and establish a new someOtherLabel class
But as I keep saying, the most important thing I think we need to discuss is whether the original intent of dwc:individualID:
"An identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set."
...encompasses (or should be redefined to encompass) what is needed to accommodate the needs of the proposed dwc:materialSample:
"The category of information pertaining to the physical results of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed."
Are we better served by a single class of thing that includes both definitions (with defined subclasses as needed)? Or are we better off with two completely separate classes of things? For reasons that would require far too much text to describe here, I strongly favor the former.
I can't express a preference for how best to label this concept (or these separate concepts) until I know what the concepts are.
I think that it's pretty clear from what Rich has said that Individual (to include the range from tissue samples up to herds) must consist of
biological materials.
No, that's not correct. In our model, "Individual" includes things that are decidedly *not* biological. Biological things are a subset of instances of our "Individual".
The other is that a material sample must be physically sampled (i.e.
removed
from the environment and subjected to some kind of processing). An
important
feature of an Individual (at least to me!) is that it can be observed,
photographed,
or recorded without necessarily having all or part of it being removed
from its environment
and subjected to processing.
Yes -- exactly. I think we need a superclass that can be applied physical objects (either biological, or non-biological). A subset of these things are biological. Another subset of these things are extracted from nature. Another subset is subsampled and used for some sort of destructive or non-destructive analysis.
There is nothing in the current definitions of the type vocabulary terms
that require its
classes to be disjoint. I think it is possible that something could be
both a
dwctype:PreservedSpecimen and a dwctype:FossilSpecimen, and if dwctype:MaterialSample is accepted as a term there would undoubtedly be
things
that were both dwctype:PreservedSpecimen and dwctype:MaterialSample. So I
don't think
it is necessarily a problem if there is overlap between
dwctype:MaterialSample and an Individual class.
Certainly RDF allows a resource to have two (or more) rdf:type
declarations.
Sure -- but we are setting ourselves up for chaos if we leave it open for individual providers to apply one class or another to the same physical thing.
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then: 1) An Event instance is generated to represent the event where the tree was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question: - There are multiple Occurrence instances that refer to the same Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value). - Any Individual can have derivatives that are themselves unique Individual instances. - Individuals are arranged hierarchically, and certain properties can be inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of “Material Sample” as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations.
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steve’s initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what we’re discussing.
********************
NOTES, MaterialSample from OBI:
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a “material entity that has the material sample role”, while a material sample role is defined as “ a specimen role borne by a material entity that is the output of a material sampling process”, and a material sampling process is “a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity”.
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree was
visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same Individual
instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique Individual
instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
Many thanks, John. This is extremely helpful!
First of all, in the context of a distinct term for basisOfRecord, I see absolutely no problem with adding the term MaterialSample. I fully support your proposal (although if this is simply a basisOfRecord term to be used alongside Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation; does it need a defined ID term? Do all the others have defined ID terms?).
However, Im excited by this conversation because I think we are very close to solving a bigger problem (which was the focus of the 2010 discussion on this list around IndividualOrganism).
This bigger problem involves the need for a defined concept (Im hesitating to say class), and an associated ID, in dwc that refers to the physical/material basis of an Occurrence. We dont yet have a term for this concept in dwc (IndividualID hinted at the need for one, but that term was not well-defined, and the term itself seems to cause confusion). As Steve Baskauf and I have both been advocating for the establishment of new class in dwc for exactly this purpose, I just want to make sure that were on the same page about what each concept is. The more I understand about what you need for materalSample, the more convinced I am that both of our needs can be met with the same concept.
I am perfectly happy to adopt the term MaterialSample, but I guess it all boils down to this: In order for something to be a MaterialSample, must it necessarily be removed from nature?
If the answer is No, then I think we can merge the two concepts into one.
If the answer is Yes, then I think materialSample is best characterized as a subclass of what Steve and I have been pushing for (setting aside, for the moment, the additional complexity of taxonomically homogeneous vs. heterogeneous).
In the latter case, I would define the superclass (whatever term is used for it), along the lines of:
"The category of information pertaining to the physical basis of a sampling, subsampling, or observational event. In biological collections, the [SuperclassTerm] is typically a defined group of organisms, a single whole organism, or a part of a whole organism that is collected or otherwise documented in nature, and either preserved, destructively processed, or documented through some form of Evidence (such as images or reported visual observations)."
Aloha,
Rich
From: jdeck88@gmail.com [mailto:jdeck88@gmail.com] On Behalf Of John Deck Sent: Wednesday, May 29, 2013 4:01 AM To: Richard Pyle Cc: Markus Döring; Steve Baskauf; TDWG Content Mailing List; Robert Whitton; Ramona Walls Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
In the text of the issue submitted for MaterialSample ( https://code.google.com/p/darwincore/issues/detail?id=167 https://code.google.com/p/darwincore/issues/detail?id=167) we noted cases where the current basisOfRecord terms pertaining to the Occurrence class (Occurrence, PreservedSpecimen, LivingSpecimen, FossilSpecimen, HumanObservation, MachineObservation) do not adequately cover certain cases, including: environmental sample (for metagenomic analysis), transcriptomes (measuring genes but not taxa), and destructive samples (e.g. tissues destructively sampled in order to generate genomic DNA). The term we borrowed from OBI (http://purl.obolibrary.org/obo/OBI_0000747) is broad enough to be utilized across various cases that fulfill our criteria while still maintaining a consistent, clear and human understandable meaning. For our purposes, we can think of Material Sample as any type of matter that we can use in order derive further evidence needed for identifications, and taxa, whether it is taxonomically homogenous, heterogenous, a single individual, sets of individuals, or populations.
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not this material entity is an individual (sensu individualID in DwC) is an independent axis of classification. As was already pointed out, there is no restriction on specifying that an occurence is associated with more than one type, so any occurrence can have both an individualID and a materialSampleID.
We maintain our position on the proposal for MaterialSample as a value for the basisOfRecord, with an associated materialSampleID to identify instances of them. Per Steves initial comments, we have already withdrawn the proposal for a MaterialSample class distinct from that in the Darwin Core type vocabulary, which should make it easier to evaluate the implications of what were discussing.
********************
NOTES, MaterialSample from OBI:
OBI has fairly broad definitions of samples & specimens that are meant to be utilized across many different scientific activities. Material Sample is defined as a material entity that has the material sample role, while a material sample role is defined as a specimen role borne by a material entity that is the output of a material sampling process, and a material sampling process is a specimen gathering process with the objective to obtain a specimen that is representative of the input material entity.
On Mon, May 27, 2013 at 11:59 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then: 1) An Event instance is generated to represent the event where the tree was visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question: - There are multiple Occurrence instances that refer to the same Individual instance representing the whole tree (and, hence can be collapsed to the same IndividualID value). - Any Individual can have derivatives that are themselves unique Individual instances. - Individuals are arranged hierarchically, and certain properties can be inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
Again this email is not related to the material sample proposal, but rather to the email quoted below.
Rich, I have been pondering your series of emails and wanted to flesh out in my mind the way that you envision implementing "individual" instances sensu Pyle/Whitton vs. "IndividualOrganism" instances sensu DSW. Once again, I recognize the deficiencies of those term names, but use them for convenience.
As was noted in the previous emails, dwc:individualID was minted as a way to tie together instances of repeated sampling/observing instances. In the 2010 discussion about the creation of a dwc:Individual class, it was noted that an "individual" (sensu vague) could not only tie together multiple occurrences (with "occurrence" defined as an organism recorded at a particular time and place) but also could tie together multiple identifications. In DSW, we can tie a third thing to dsw:IndividualOrganism: physical objects that are removed from the IndividualOrganism and which may constitute part or all of the dsw:IndividualOrganism. (For convenience at the moment, I'm going to talk about dsw:IndividualOrganism as if it were actually an individual organism but recognize that it could also represent some of the taxonomically homogeneous things in nature like herds, clones, and colonies.) So with some variation on what kinds of things one wants to connect to an "individual", an individual instance could be described: - in RDF terms as a node to which other types of resources are linked, and - in database terms as a join between various tables (hopefully I'm using this terminology correctly).
In DSW, we would prefer for there to be a single instance of dsw:IndividualOrganism for each set of resources that one would like to link. To make this more concrete, I'm going to introduce an example that I had already created a few months ago. You can see it at http://code.google.com/p/darwin-sw/wiki/BioBlitzUseCase . I'm going to focus on the first example called "Bird at BioBlitz". Although the example is described in detail in RDF, you can just look at the diagram and read the little story to get a feel for the situation. I'm going to use the following convention in talking about the example: the globally unique and persistent identifier for a resource is in angle brackets, e.g. <bird specimen> is the identifier for the bird specimen instance. (The RDF example has assigned actual fake URIs to each resource, but this convention should be easier to follow.) Anyway, you can see that in the example, <bird> is in the role I've just described for dsw:IndividualOrganism. There are two dwc:Identification instances linked to it and one dwc:Occurrence instance linked to it (the occurrence when the bird was shot and collected as a specimen). Although not included in the example, one could link N additional dwc:Occurrence instances to <bird> for instances when the bird was banded, observed, radio-tracked, etc. In the example, <bird specimen> is also linked to <bird>. In this particular case, the whole bird is collected, but one could imagine situations where specimens or material samples were collected without causing the bird to cease to exist in the environment (blood sample, toe clip, collect a feather or whatever). In the example, there is a branching series of derived material objects that came from <bird> which includes: <bird specimen>, <skeleton>, <skin>, <stomach>.
In DSW, these derived material objects are linked to their parent object by a transitive property, dsw:derivedFrom, which has an inverse transitive property, dsw:hasDerivative. In DSW, dsw:derivedFrom/dsw:hasDerivative is not limited to linking physical things to physical things and can be used to link any kind of resource to its parent resource, including information resources such as digital images or DNA sequences. To indicate that a physical thing is derived from another physical thing, we suggest dcterms:hasPart which is not transitive, but one could use other terms that are transitive (I think there is a BFO hasPart term that is transitive). Anyway, the point of using dsw:derivedFrom/dsw:hasDerivative is to get away from having to link multiple resources to the identification instances. We would not say <bird specimen> dsw:hasIdentification <Corvus caurinus> <bird specimen> dsw:hasIdentification <Corvus brachyrhynchos> <skeleton> dsw:hasIdentification <Corvus caurinus> <skeleton> dsw:hasIdentification <Corvus brachyrhynchos> <skin> dsw:hasIdentification <Corvus caurinus> <skin> dsw:hasIdentification <Corvus brachyrhynchos> <image> dsw:hasIdentification <Corvus caurinus> <image> dsw:hasIdentification <Corvus brachyrhynchos> etc.
we would just say that <bird> dsw:hasIdentification <Corvus caurinus> <bird> dsw:hasIdentification <Corvus brachyrhynchos>
and then connect derived things to <bird> using dsw:derivedFrom, e.g. <bird specimen> dsw:derivedFrom <bird> <skin> dsw:derivedFrom <bird specimen>
Because dsw:derivedFrom is transitive, a client could reason <skin> dsw:derivedFrom <bird>
and because dsw:hasDerivative is the inverse of dsw:derivedFrom, a client could also reason <bird> dsw:hasDerivative <bird specimen> <bird specimen> dsw:hasDerivative <skin> <bird> dsw:hasDerivative <skin>
There are many other dsw:derivedFrom and dsw:hasDerivative relationships in the diagram which for brevity I won't mention here, but they are explicitly stated in the Appendices to the example for anyone who is interested.
So the point of this method of organizing resources is to make querying simple. Assuming that a client has inferred all dsw:hasDerivative/dsw:derivedFrom relationships that aren't explicitly stated (by virtue of the inverse and transitive properties), it is a relatively simple matter to construct a query to discover all specimens identified as Corvus caurinus. Set up the query pattern:
?Individual dsw:hasIdentification <Corvus caurinus>. ?Specimen dsw:derivedFrom ?Individual. ?Specimen a dsw:PreservedSpecimen.
and then have the query engine find all instances of ?Specimen that fit the pattern (this query pattern is oversimplified because the identification is actually linked to a taxon concept instance; see real example at end). One could easily change the pattern to find all images of things derived from Individuals, all images of the Individuals themselves, all specimens that come from individuals identified as Corvus caurins and which also have DNA sequences derived from that individual, etc., etc.
If I'm understanding individual sensu Pyle correctly, in the example <bird>, <bird specimen>, <stomach>, <skin>, and <skeleton> would all be typed as individuals and possibly there would be no distinction between <bird> (the living thing that could be the subject of repeated Occurrence instances) and <bird specimen> (the dead bird in a drawer). In contrast, DSW would only type <bird> as dsw:IndividualOrganism and would type the other four things as dsw:LivingSpecimen (or dwctype:LivingSpecimen). From a philosophical point of view, one could take either position - the constraints on "individual" would just be different. As a practical matter, the DSW approach that I described above is desirable if one presupposes that the metadata will be expressed as RDF triples, that clients capable of inferring triples from transitive and inverse properties will be used, and that querying (i.e. SPARQL) will be performed on the triples. If one does NOT presuppose those things and simply lives in a relational database world, then some other approach (possibly the Pyle approach to individuals) might be better.
So what I'm wondering is: what are the advantages of typing <stomach>, <skin>, and <skeleton> as individual sensu Pyle? In the email below, you say "Any Individual can have derivatives that are themselves unique Individual instances." and "Individuals are arranged hierarchically, and certain properties can be inherited up or down the hierarchy, depending on the properties and their associated logical constraints." If someone asserts a third identification for <bird> as "Corvus novum", must that identification be inherited by <stomach>, <skin>, and <skeleton> instances to create additional facts like: <skeleton> dsw:hasIdentification <Corvus novum> <skin> dsw:hasIdentification <Corvus novum> <stomach> dsw:hasIdentification <Corvus novum>
It seems to me like some of the advantage of having some kind of "individual" instance as a central node or connection point gets lost if one starts proliferating related instances of "individual" because it requires duplicating assertions which one makes about one individual instance to all of the other related individual instances.
I have a number of other questions, but I'll stop there in the interest of limiting the scope of this email to one question. If anyone is interested in the actual SPARQL query that corresponds to the example above, I will list it below.
Steve
To try out this actual SPARQL query, follow the instructions on http://code.google.com/p/darwin-sw/wiki/BioBlitzUseCase . These instructions give the URL for the triplestore sandbox that already has all of the triples for the example loaded (step 3). They also have the namespace abbreviations for cut and paste in step 4. Here is the actual query to paste below the namespace abbreviations:
SELECT DISTINCT ?Specimen WHERE { ?Individual dsw:hasIdentification ?Identification. ?Identification dsw:toTaxon ?TaxonConcept. ?TaxonConcept tc:hasName ?Name. ?Name tn:genusPart "Corvus". ?Name tn:specificEpithet "caurinus". ?Specimen dsw:derivedFrom ?Individual. ?Specimen a dsw:PreservedSpecimen. } LIMIT 50
Here is another query which finds all images of any kind of resource which is derived from individuals identified as "Corvus caurinus":
SELECT DISTINCT ?Image WHERE { ?Individual dsw:hasIdentification ?Identification. ?Identification dsw:toTaxon ?TaxonConcept. ?TaxonConcept tc:hasName ?Name. ?Name tn:genusPart "Corvus". ?Name tn:specificEpithet "caurinus". ?Resource dsw:derivedFrom ?Individual. ?Resource foaf:depiction ?Image. } LIMIT 50
Richard Pyle wrote:
Hi Markus,
Great question! Particularly because this is exactly the sort of use case we designed our model around.
if you take a tissue sample of the same tree every year, would the
identifier
in individualID be the same for all of them or be different? WIth the
current
dwc:individualID definition it would be the same for all samples. If I understand you correct each sample would have its own "individual" identifier in your proposal? It can't see how you can collapse these two
things
into one definition.
No, that is not how we would handle it.
In our model, there would be one IndividualID to represent the tree, spanning the time period beginning (more or less) when the seed was germinated, until the time at which the entire physical structure of the tree was disintegrated. It is an individual tree.
There would be multiple Occurrence instances, for each time that someone observed or sampled or otherwise wished to document some condition of that tree. All of these Occurrence instances would refer to the same individualID value (i.e., the "tree"). In the example above, this means there would be a different Occurrence instance for each year that a sample is taken -- because in each case, an assertion that the full tree existed at a certain time and place can be made (I understand that trees tend not to move around very much, so the Location for each event associated with each Occurrence would, in this case, remain the same; but the other Event properties -- such as eventID, samplingProtocol, samplingEffort, eventDate, eventTime, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, habitat, fieldNumber, fieldNotes, eventRemarks -- would be documented accordingly for each sampling Occurrence instance).
Suppose that the tree is visited every month, but only sampled once per year. In that case, there would be an Occurrence record for every monthly visit. In other words, an Occurrence instance exists regardless of whether a physical sample was made or not. Any in-situ images made of the tree would likewise be associated with the specific Occurrence instance, and each image would represent a separate instance of "Evidence".
Now, let's focus on the annual samplings. Every time a new sample is taken from the tree, at least one new instance of Individual (with a unique individualID value) is created to represent the sample. This sample (individual instance) may be a "gathering" (set of multiple individual specimens gathered at the same time), or it may be a single specimen, or it may be simply a tissue sample intended for destructive analysis. In any case, it's a new individual instance derived from the "parent" individual instance representing the whole tree. In our implementation, "Individual" can be hierarchical, such that a whole-organism tree can be sub-sampled with many "child" instances of "gatherings" (say, one gathering each year), and each gathering may have multiple child "specimen" individuals (e.g. individual botanical sheets created from the multiple items of a single gathering), and each specimen may have further "child" subsamples extracted for DNA analysis (or whatever), and the hierarchy can continue on down to whatever derivatives that people feel a need to keep track of (e.g., aliquot).
The point is, all Individual instances are well-defined physical objects (or well-defined sets of physical objects), and they can be arranged in a n-tiered hierarchy.
Moreover, each Individual that can be characterized as a "sample" (what we refer to as a "CollectionObject") may also have a property value for "CollectionOccurrenceID" -- which refers to the specific Occurrence instance at which the sample was obtained.
So, for example, if the tree is visited on May 27, 2013 and a specimen (sample) is taken, then:
- An Event instance is generated to represent the event where the tree was
visited; 2) An Occurrence instance is generated, which refers to the new EventID, and the existing IndividualID for the whole tree, and includes whatever other Occurrence properties are relevant for the tree at the time of this Occurrence 3) An Individual instance is generated for the specimen, which has a property value for parentIndividualID that refers to the individualID for the whole tree, and a property value for collectionOccurrenceID that refers the Occurrence instance where the specimen was collected.
So, to summarize the answer to your question:
- There are multiple Occurrence instances that refer to the same Individual
instance representing the whole tree (and, hence can be collapsed to the same IndividualID value).
- Any Individual can have derivatives that are themselves unique Individual
instances.
- Individuals are arranged hierarchically, and certain properties can be
inherited up or down the hierarchy, depending on the properties and their associated logical constraints.
At some point, I will assemble a set of other specific use cases, and how we manage them through our use of the "Individual" instance (although I will probably not use the word "Individual", as this seems to cause too much confusion in these discussions).
Aloha, Rich
.
participants (4)
-
John Deck
-
Markus Döring
-
Richard Pyle
-
Steve Baskauf