Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?
In the first and second installment of this series, I have tried to show that the class Individual as I have proposed it is a central part of a fully denormalized Darwin Core model. It's connective role allows for one-to-many relationships between itself and both the Occurrence and Identification classes (see http://bioimages.vanderbilt.edu/pages/full-model.jpg). I have also pointed out that in that role, it has very few properties. The reason for this is described in detail on p.26 of my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664), but in summary the only way we can actually find out anything about an individual organism is through some kind of observation or collection, which is exactly what happens in an Occurrence. Thus things that we "know" about Individuals generally are directly or indirectly associated with Occurrences, not with the instances of Individual themselves.
Rich has suggested that we should consider whether some properties that are currently properties of Occurrence should be moved into the proposed Individual class. It is good to think about this, because we do want to have an economy of classes and terms (no point in having two classes for something when one would do), and because the mental image that we have about an individual organism does include aspects of both the proposed Individual class and the part of the ASC diagram called "Collecting Unit". There are a number of ways of approaching this problem. The first approach, which is the way the discussion developed on the email list, is to just try moving terms from Occurrence to the proposed Individual class and to see whether that would "work" or not. As the discussion progressed, I began to feel increasingly uncomfortable with this process, but wasn't sure why. After I went back to the ASC diagram, it became clear to me what was the problem was. I believe that the question is really being framed incorrectly. What I have proposed for the class Individual is precisely what I have described in the previous posts: for it to serve as a node connecting Occurrences to Identificaitons. What I think Rich wants to recognize is the section of the ASC model called Collecting Unit and the boxes below it: Unsorted Lot, Lot, Specimen, and Specimen Component (I'm not sure exactly what "Derived Object" is - maybe things like images of specimens?). If I am correct in understanding what Rich wants, then the question boils down to: can or should my proposed class be the same as (or possibly include) the section on the ASC diagram called Collecting Unit. I think that I have a pretty clear idea in my mind what Individual as I have defined it means, so my task has been to try to understand what exactly is a CollectingUnit and what properties should it have. The I can approach the question of congruence with "my" Individual. If all things that we would want to fold within CollectingUnit share properties that can be placed within the Individual class, then they are congruent and should be the same thing. If some or most properties that we want to fit within CollectingUnit don't fit the defined purpose of the Individual class, then they should be two separate classes.
Because the ASC model was developed by the museum community, I think that its creators were primarily concerned with handling dead specimens. However, as Rich has correctly pointed out, the distinction between dead and living CollectingUnits is probably artificial. Rather, both living and preserved specimens may be instances of the same class which have a different value for some "live/dead" property (see http://code.google.com/p/darwincore/issues/detail?id=91). So for the moment, I'm assuming that a CollectingUnit can be either living or preserved. The case of preserved specimens is fairly straightforward. The have their origin in a single Occurrence that happens at a single Event (what I called a "resource creation event" in my Biodiversity Informatics paper). Living specimens are more complex. They may originate when the whole organism is collected from the wild and moved to a zoo or botanical garden (John's wildebeest calf). In that case there is a clear "resource creation event" if we call the living specimen a resource that is distinct from the organism when it was in the wild. In some cases, the living specimen is born in captivity, grown from a seed, or propagated vegetatively from a cutting. In that case, there is also a definable event when the living specimen originated. What was really driving me crazy was this: http://bioimages.vanderbilt.edu/vanderbilt/7-314 The Bicentennial oak is a tree that is growing in Vanderbilt's arboretum. It seemed to me that it was a living specimen because it is now a part of a collection of trees (the arboretum). But it is over 230 years old and Vanderbilt itself is only 137 years old. So clearly nobody captured, moved, or planted it to make it a part of the arboretum. For a while I tried to define it out of being a living specimen, but then I realized that the thing that made it different from other old trees that are standing around Nashville is that it has been accessioned. In other words, when the tree was claimed as a part of the arboretum, assigned an identifier (7-314), and added to the arboretum database, it became a living specimen in addition to being just a normal tree. The event of calling the tree a part of the arboretum, assigning it an identifier, and adding it to the arboreutm database is the Occurrence that marks the creation of the thing "living specimen". At that point it can have any attribute that other Occurrences have and it is then capable of serving as evidence for the Occurrence because anybody can examine it at will. The "claimed as a part of the arboretum" part is important, because I can go out into the woods and collect information about a tree there, assign it an identifier, and add it to my database, but that doesn't make it a living specimen because I don't assert that I have any control over it or that I can guarantee anyone that I can verify its status at will. If I band a bird and release it, I have assigned it an identifier and hopefully will be able to track it over time, but I can't claim it is a living specimen because I don't claim to exert control over it. That's different from John's wildebeest calf which is in a pen and be observed at will. It is similar to a maize plant in a field in Iowa which was cultivated by a human, but has no curator who is making sure that it can be found again and that it won't be harvested and ground up into wildebeest food without his or her knowledge.
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
So that brings me back to the question of whether this thing that I'm calling AccessionedUnit (which is sitting in the spot on the ASC diagram where Collecting Unit was originally) can or should be considered the same as what I have proposed to be the class dwc:Individual. The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model. With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences). The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no". It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled), particuarly if it is dead and has been put in a museum collection. It is possible that the thing referred to by an AccessionedUnit might be documented by multiple Occurrences if it is alive (like the Bicentennial Oak), but that is not an intrinsic property of an AccessionedUnit in the same way that preparations or disposition would be. On the other hand it is also quite clear that many "ResamplingUnitHavingDetermination"s will never become accessioned. That would include the banded bird, a tree photographed in the forest, or a whale observed swimming in the ocean. The longer I think about this, the more convinced I am that making a distinction between AccessionedUnit and ResamplingUnitHavingDetermination is the best course of action.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
Hi Steve,
I'm starting with your Part 3, as I have limited time right now, and this was the one most relevant to our discussions.
First, a clarification: Several times in earlier posts I made reference to "BiologicalObject" in the ASC model. I now realize that was my term, in one of my early databases, that corresponded to "Collecting Unit" in ASC. Sorry for the confusion.
What I have proposed for the class Individual is precisely what I have described in the previous posts: for it to serve as a node connecting Occurrences to Identificaitons.
I think we're both in agreement on this. The only real question is what are the scope of allowable and distinguishable things that can serve this function. I think the only issue on scoping Individual boils down to this:
A whole specimen is collected. Two tissue samples are taken from it. The (remainder of the) whole specimen is preserved as a voucher, and the two tissue samples are sent to two different institutions that specialize in DNA sequencing and analysis. A morphologist establishes an identification instance for the whole specimen, and each of the molecular-based institutions assign their own Identification instances to their respective tissue samples, based on DNA sequencs of different genes. (in this example, it doesn't matter whether the Identifications are all the same and reinforcing; or different and competing).
To me, this is the crux of the issue: Given that each of the three institutions will generate a record in their respective database to represent the physical object that they are managing, should we [Option A] treat these as three separate instances of the "Individual" class, which link to each other (and hence share Identification instances) via parent/child relationships (e.g., the two tissue-sample Individuals as "child" instances of the whole-specimen Individual)?
Or, [Option B] do we establish an abstract "Individual" instance to which all Identification instances are linked, and for which these three objects are representative "tokens"?
In my mind, both allow us to represent the information we want to represent. I think Option B might be better in terms of representing the "reality" we wish to model; but I think Option A is overwhelmingly more practical in the "reality" of how data of this sort are actually managed by the people who mange them.
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
I don't think that's quite right. Certainly, the ASC model was designed specifically for DeadSpecimen data management; but I think conceptually a CollectingUnit is not intrinsically defined as an Accessioned object (definitely don't use the word "Accession" as part of the term -- this is too loaded of a term). I think of a CollectingUnit as what I called "BiologicalObject". That is, a physical object consisting of biological material (or mineralized representations of biological material, in the case of fossils). Whether it is alive or dead, accessioned or not, captive or in the wild ... are all attributes of a BiologicalObject, but do not *define* what a BiologicalObject is. The definition of a BiologicalObject is that it is an object, primarily consisting of biological stuff (or mineralized biological stuff).
The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model.
I completely agree!! But the problem is that there are two, sometimes competing, sets of needs: the needs of the data consumers, who want to use this information for answering biological questions; and the needs of the data providers, who are constrined by the nature of the data they manage. The "art" in DwC is in reconciling gaps between these needs in an elegant and simple way.
With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences).
As we've discussed before, I think "Resampling" is only one benefit of an Individual class, and should not be the defining metric of an Individual. I'd be more comfortable following along with dwc:SampledUnitHavingDetermination; because some of these Units that need to have Determinations are resampled multiple times, and some are sampled only once.
The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no".
I would agree. But I don't think that's the right question. To me, the right question is:
Should the scope of SampledUnitHavingDetermination include all instances of BiologicalObject; or only a subset of them? (i.e., Should SampledUnitHavingDetermination *include* BiologicalObject, or *overlap with* BiologicalObject?)
It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled),
Agreed! But the vast, vast majority of things that have Determinations and are associated with Occurrences are only sampled once. Even if they represented the vast, vast minority (but non-trivial), there would still be solid rationale for not restricting the scope of Individual to only those things that are Sampled more than once.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
comments inline
Richard Pyle wrote:
function. I think the only issue on scoping Individual boils down to this:
A whole specimen is collected. Two tissue samples are taken from it. The (remainder of the) whole specimen is preserved as a voucher, and the two tissue samples are sent to two different institutions that specialize in DNA sequencing and analysis. A morphologist establishes an identification instance for the whole specimen, and each of the molecular-based institutions assign their own Identification instances to their respective tissue samples, based on DNA sequencs of different genes. (in this example, it doesn't matter whether the Identifications are all the same and reinforcing; or different and competing).
To me, this is the crux of the issue: Given that each of the three institutions will generate a record in their respective database to represent the physical object that they are managing, should we [Option A] treat these as three separate instances of the "Individual" class, which link to each other (and hence share Identification instances) via parent/child relationships (e.g., the two tissue-sample Individuals as "child" instances of the whole-specimen Individual)?
Or, [Option B] do we establish an abstract "Individual" instance to which all Identification instances are linked, and for which these three objects are representative "tokens"?
In my mind, both allow us to represent the information we want to represent. I think Option B might be better in terms of representing the "reality" we wish to model; but I think Option A is overwhelmingly more practical in the "reality" of how data of this sort are actually managed by the people who mange them.
You are denormalizing a more general model. What if the specimen is from a tree? I collect flowers in May and return to collect fruit in September. I have a hard time making those two Occurrences be one. According to the definition on the table for a vote, an Individual is to permit resampling over time. Read the definition of dwc:IndividualID.
I think we're both in agreement on this. The only real question is what are
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
I don't think that's quite right. Certainly, the ASC model was designed specifically for DeadSpecimen data management; but I think conceptually a CollectingUnit is not intrinsically defined as an Accessioned object (definitely don't use the word "Accession" as part of the term -- this is too loaded of a term). I think of a CollectingUnit as what I called "BiologicalObject". That is, a physical object consisting of biological material (or mineralized representations of biological material, in the case of fossils). Whether it is alive or dead, accessioned or not, captive or in the wild ... are all attributes of a BiologicalObject, but do not *define* what a BiologicalObject is. The definition of a BiologicalObject is that it is an object, primarily consisting of biological stuff (or mineralized biological stuff).
The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model.
I completely agree!! But the problem is that there are two, sometimes competing, sets of needs: the needs of the data consumers, who want to use this information for answering biological questions; and the needs of the data providers, who are constrined by the nature of the data they manage. The "art" in DwC is in reconciling gaps between these needs in an elegant and simple way.
With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences).
As we've discussed before, I think "Resampling" is only one benefit of an Individual class, and should not be the defining metric of an Individual. I'd be more comfortable following along with dwc:SampledUnitHavingDetermination; because some of these Units that need to have Determinations are resampled multiple times, and some are sampled only once.
You are denormalizing a more general model. We both confessed to this sin in an earlier series of emails. If you don't want to read the first post in the series, just click on the links on order and look at what happens to the diagram. I want (no NEED) the fully normalized model for what I do and so do others. You may not need it, but I do and that's why I made the proposal. If I'm the only person who ever needs to resample anything then vote the proposal down and I'll just use the definition of Individual that I have in RDF at Bioimages. I'm already doing that. I would just prefer it to be a "well known" Darwin Core term, not an ad hoc one that I made up. If it was important enough to put individualID into the Darwin Core standard to facilitate resampling, then why is it suddenly not very important to make that term usable in RDF?
The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no".
I would agree. But I don't think that's the right question. To me, the right question is:
Should the scope of SampledUnitHavingDetermination include all instances of BiologicalObject; or only a subset of them? (i.e., Should SampledUnitHavingDetermination *include* BiologicalObject, or *overlap with* BiologicalObject?)
It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled),
Agreed! But the vast, vast majority of things that have Determinations and are associated with Occurrences are only sampled once. Even if they represented the vast, vast minority (but non-trivial), there would still be solid rationale for not restricting the scope of Individual to only those things that are Sampled more than once.
I will repeat my charge of denormalization again.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
Hi Steve,
I really need to read your other messages before I can commpent in full, but I did want to respond to some of your recent comments, below.
What if the specimen is from a tree? I collect flowers in May and return
to
collect fruit in September. I have a hard time making those two Occurrences be one.
Why would the two occurrences have to be one? You have an Individual instance for the tree, and a two more Individual instance for the collected subsample. The Individual instance corresponding to the tree could have its own related Occurrences (e.g., if you took a photo of the tree); and the two collected specimens would each have their respective collecting event represented as an Occurrence. The latter two occurrences can trace back to the Individual Occurrence by virtue of the fact that both of the collected Individuals are semantically linked to the "parent" Individual (i.e., the whole tree). Isn't that the sort of "reasoning" that RDF is supposed to be able to allow? That is:
Tree exists as LivingSpecimen Individual whole tree
Flowers exist as PreservedSpecimen Individual that is part of a tree
Fruit exist as PreservedSpecimen Individual that is part of a tree
The first can have as many Occurrence linked to it as needed (images, measurements, etc. all taken on different dates).
The latter two likely have one Occurrence each (the collecting Event for each).
The latter two Individuals are linked as "derivedFrom" or "partOf" or whatever back to the whole Tree Individual, so that the presence of the whole tree can be inferred by the Occurrence records for the two parts (specimens).
According to the definition on the table for a vote, an Individual is to permit resampling over time.
Absolutely! No argument there! You can go back to the WholeTree Individual and generate a zillion Occurrence reocrds if you want. Not a problem.
You are denormalizing a more general model. We both confessed to this sin
in an earlier series of emails.
I don't understand what you mean by that.
If you don't want to read the first post in the series, just click on the links on order and look at what happens to the diagram. I want (no NEED) the fully normalized model for what I do and so do others.
And so do I.
You may not need it,
No, I do.
If I'm the only person who ever needs to resample anything
You are not. Nothing in my proposal prevents resampling of an Individual. In fact, nothing in my proposal prevents *anything* you want to do, as far as I can tell. But your proposal prevents me from representing parts of an whole organism as Individuals unto themselves, which is especially encumbering when I know that I have many, many, many parts of whole organisms for which other parst of the same whole organism exist, but I lack the knowledge to build those links.
I would just prefer it to be a "well known" Darwin Core term, not an ad hoc one that I made up.
I would too! But in the same way that DwC started focused on DeadSpecimensInMuseums only, and was later expanded to a more general accomodation of occurrence records for both dead and live things; your proposal for "Individual" may have started out to fulfill a very specific need, and I am advocating a slightly broader interpretation that still meets your specific needs completely, but also accomodates a borader scope of needs (without denormalizing anything).
Maybe we need a new term like "BiologicalObject" that should be the more general class, of which "Individual" as a more narroly-defined subclass?
If it was important enough to put individualID into the Darwin Core standard to facilitate resampling, then why is it suddenly not very important to make that term usable in RDF?
I can't speak to how it would be used in RDF (I am Bob's "BETA"); but I can see how the same definition of "Individual" can accommodate both sets of needs rather elegantly. More after I digest your other two messages.
Rich
Richard Pyle wrote:
You are not. Nothing in my proposal prevents resampling of an Individual. In fact, nothing in my proposal prevents *anything* you want to do, as far as I can tell. But your proposal prevents me from representing parts of an whole organism as Individuals unto themselves, which is especially encumbering when I know that I have many, many, many parts of whole organisms for which other parst of the same whole organism exist, but I lack the knowledge to build those links.
What your proposal does is to repeat the mistake that was made with Occurrence (well I consider it a mistake for a fully normalized model). You want to take metadata terms that apply in one particular subset of cases (the terms that describe the physical aspects of the individual and its pieces) and combine them with terms that apply to a more general situation (the terms that describe the role that Individual plays as a node connecting multiple Occurrences to Identifications or as a joining table in a database). This sounds good to you because you mostly deal with the physical aspects of individuals and their pieces but some people (photographers and people who make observations) don't need to describe the physical aspect because they don't collect them.
My apples/orange analogy would be better if I'd said that you are talking about apples and I'm talking about trees. One could create a class called dwc:Tree and then say that a tree is a dwc:Tree having scope property "whole thing" and that the apple is dwc:Tree having scope property "apple". But there are properties that apply to trees in general (i.e. to trees that aren't apple trees or that don't even have fruit) and there are properties that apply specifically to apples. There would be little benefit to defining dwc:Tree in this way because you would then have the circumstance where you would have to complicate things by saying that when you talk about stuff like leaves and bark that those can't be applied to dwc:Tree (apple) but they can be applied to dwc:Tree (whole). If you are going to apply those kinds of restrictions, then why not just define dwc:Tree and dwc:Apple and if anybody cares, you can try to explain how the apple is connected to the tree. As I said, I could probably write RDF examples to demonstrate this more concretely, but I don't have time now because I'm so far behind on other tasks.
Steve
What your proposal does is to repeat the mistake that was made with Occurrence (well I consider it a mistake for a fully normalized model). You want to take metadata terms that apply in one particular subset of cases (the terms that describe the physical aspects of the individual and its pieces) and combine them with terms that apply to a more general situation (the terms that describe the role that Individual plays as a node connecting multiple Occurrences to Identifications or as a joining table in a database). This sounds good to you because you mostly deal with the physical aspects of individuals and their pieces but some people (photographers and people who make observations) don't need to describe the physical aspect because they don't collect them.
I disagree with this assesment. Forgeting which terms apply to Occurrences vs Individuals (that is much less important to me), the real issue here is the logical limit of what an "Individual" is.
We both agree that an Individual consists of actual biological stuff. Physical stuff. Cells with biomolecules within them. A digital image has no such stuff. A film image has no such stuff.
We also both agree that an Individual can have one to many Occurrences assoiciated with it. One Occurrence will usually be the case if the entire organism or population was extracted from the natural habitat the first time that it was documented to exist, and placed in a Museum or zoo. There may very well be more than one Occurrence linked to an Individual that is not extracted from the natural habitat, and is revisited over time (and either left alone, or eventually extracted entirely, thereby representing the last meaningful Occurrence for that Individual).
We also both agree that an Individual may have more than one competing or reinforcing Identification associated with it, but it cannot have more than one concurrently legitimate Identification associated with it.
We also both agree that Individuals can be derived from other Individuals.
As far as I can tell, the only real difference we have (forget about the tokens for now) is that you want the chain od derived Individuals to stop at the level of WholeOrganism; whereas I would like to allow the scope of "Individual" instances to extend down to a Part of an Individual.
We can argue about the properties and tokens later; first we need to nail down the "essence" of an Individual.
My greatest concern about your scoping of "Individual" is that there are non-trivial numbers of examples that straddle the "WholeOrganism" threshold. Sponges, corals, certain fungi, budding organism, clonal organisms, and a whole bunch of other examples make it unclear where the "Individual" (sensu you) ends, and the "part" or "token" begins.
My apples/orange analogy would be better if I'd said that you are talking about apples and I'm talking about trees.
No, I'm talking about Apples *and* trees. An Individual "WholeOrganism" Tree is derived from another Individual "Population" of the same species of tree. In the same way, an Individual "OrganismPart" apple id derived from an Individual "WholeOrganism" Tree. Each of these (Population of Trees, WholeOrganism Tree, Part of Tree) represents, in my mind, a potential Individual.
Aloha, Rich
Damn. Sent that too soon. Here's the rest:
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity.
I know that your original point for establishing the Class Individual was to allow for Resampling of things -- and I think that's a key value to having a class for Individual. But I don't think a class Individual that is *restricted* to things that are resampled (or resamplable) is a wise approach. A broader approach that serves the needs of resampled things *and* things sampled only once would, I think, represent a better compromise between consumer needs and provider needs.
If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit.
If you're talking about these as two separate classes in DwC, I'm getting very nervous. There is very little ambiguity between an instance of "Locality" and an instance of "Taxon". Same can be said for the other DwC classes (except, maybe, Event and Occurrence -- but I think most people would not have any trouble deciding what those two things are). However, I see a lot of ambiguity between were an Individual ends, and a BiologicalObject(=AccessionedUnit) begins. To me that says that dividing them into separate classes is inviting confusion and inconsistent application of DwC to existing (and most future) datasets.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously.
Agreed. I think there does need to be a dwc:individualScope term, and there should be a recommended Controlled Vocabulary to go along with it.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications).
Unfortunately, we don't seem to be any closer to consensus on this point. Perhaps others who have been following this dicussion can weigh in?
If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
I agree with everything in this diagram *except* the box labelled "multiple tokens and types". I'm still unclear on what this thing is, and what sorts of properties it would have. However, if it represents what I think it reprersents, then I would hang it off the "Individual" class.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Well, including you and I, there is at least 50% agreement! :-)
Maybe others can wiegh in?
Aloha, Rich
On Sat, Nov 13, 2010 at 3:09 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
...
If you're talking about these as two separate classes in DwC, I'm getting very nervous. Â There is very little ambiguity between an instance of "Locality" and an instance of "Taxon". Â Same can be said for the other DwC classes (except, maybe, Event and Occurrence -- but I think most people would not have any trouble deciding what those two things are). Â However, I see a lot of ambiguity between were an Individual ends, and a BiologicalObject(=AccessionedUnit) begins. Â To me that says that dividing them into separate classes is inviting confusion and inconsistent application of DwC to existing (and most future) datasets. ...
Not necessarily. If two classes share some but not all properties, you can make them be children of a common parent class which holds (or dare I say "is the domain of") the common properties. Then the not-common properties can be put on each class as appropriate. Finally, you can arrange that nothing is ever in both classes (or more precisely, that a reasoner would signal so if it were).
Bob
Not necessarily. If two classes share some but not all properties, you can make them be children of a common parent class which holds (or dare I say "is the domain of") the common properties. Then the not-common properties can be put on each class as appropriate. Finally, you can arrange that nothing is ever in both classes (or more precisely, that a reasoner would signal so if it were).
OK, that's more or less what I was trying to say (I was originally going to use the term "Subclass", but the "BETA" in me was afraid of being scolded for misapplying that term).
The point is, I would see the class "Individual" as the common parent, and various other things (perhaps mutually exclusive) as the children classes.
Rich
Richard Pyle wrote:
Damn. Sent that too soon. Here's the rest:
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity.
I know that your original point for establishing the Class Individual was to allow for Resampling of things -- and I think that's a key value to having a class for Individual. But I don't think a class Individual that is *restricted* to things that are resampled (or resamplable) is a wise approach. A broader approach that serves the needs of resampled things *and* things sampled only once would, I think, represent a better compromise between consumer needs and provider needs.
If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit.
If you're talking about these as two separate classes in DwC, I'm getting very nervous. There is very little ambiguity between an instance of "Locality" and an instance of "Taxon". Same can be said for the other DwC classes (except, maybe, Event and Occurrence -- but I think most people would not have any trouble deciding what those two things are). However, I see a lot of ambiguity between were an Individual ends, and a BiologicalObject(=AccessionedUnit) begins. To me that says that dividing them into separate classes is inviting confusion and inconsistent application of DwC to existing (and most future) datasets.
Forget that I ever tried to try to describe how I "think" about Individuals. Retract all of that and say that the thing I want is a class for the object of the Darwin Core term individualID, whatever that thing is. If anybody else can figure out how to make it other things by nifty tricks of subclassing or domain definitions, have at it.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously.
Agreed. I think there does need to be a dwc:individualScope term, and there should be a recommended Controlled Vocabulary to go along with it.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications).
Unfortunately, we don't seem to be any closer to consensus on this point. Perhaps others who have been following this dicussion can weigh in?
If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
I agree with everything in this diagram *except* the box labelled "multiple tokens and types". I'm still unclear on what this thing is, and what sorts of properties it would have. However, if it represents what I think it reprersents, then I would hang it off the "Individual" class.
It means that in a single Occurrence you can take a picture, record, a sound, and collect several specimens if you want. You can't hang if off of the Individual class if you come back a year later and do it again. Some people actually do that kind of thing (me, whale people, bird people). If you want to keep Darwin Core as the "one specimen-one time" club that it was when it started, then forget about the Individual class.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Well, including you and I, there is at least 50% agreement! :-)
Maybe others can wiegh in?
Aloha, Rich
.
OK, a couple comments on your second:
I agree with everything in this diagram *except* the box labelled
"multiple
tokens and types". I'm still unclear on what this thing is, and what
sorts
of properties it would have. However, if it represents what I think it reprersents, then I would hang it off the "Individual" class.
It means that in a single Occurrence you can take a picture, record, a sound, and collect several specimens if you want. You can't hang if off of the Individual class if you come back a year later and do it again.
OK, I'll have to digest this a bit more. Got a plane ride coming up, which gives me a few hours of think-time. Perhaps my real problem is that I don't think that collected specimens should be treated the same way that images are (i.e., "tokens").
Some people actually do that kind of thing (me, whale people, bird people).
And so do I.
If you want to keep Darwin Core as the "one specimen-one time" club that it was when it started, then forget about the Individual class.
That's absolutely *NOT* my intention, and my preferred scope of Individual allows MORE flexibility on this; not less.
Rich
I have several comments / questions on all this:
The "Individual" debate seems to me like we are discussing very similar things, just trying to clarify the subtleties of the class/term. Tell me if I am wrong but it seems like Steve's idea for the Individual more closely resembles a many-to-many joining table in a database (ie doesn't serve much use other than connecting two tables/classes together - and doesn't normally relate to a "real world" type of object). Whereas it seems Rich's idea is to relate it more to "real-world" objects, such as samples, re-samples, etc, to allow tracking and connectability of the observed/collected/processed individuals. I think in a "model" we need to try to define the real world object types and relationships, then we apply that model to an instantiation of that model which will tend to contain more structure, such as joining table type classes.
Another comment is on how RDF works (RDF domains in particular, and the benefit of defining the domain, or having no domain). It thought we removed the rdf domain restrictions off the DwC classes to allow more flexible use of the DwC terms. So this is obviously an important requirement. So my question really is, would it be possible to do both? I.e. is it possible to have terms that can be defined as having a certain domain, then equivalent terms that are not part of the domain - I know this would complicate things immensely, but I do wonder if other communities using RDF have encountered similar problems and have found a way to do this?? We could perhaps have SimpleDwC that defines a bunch of terms that have no domain, that have "rdf links" to terms defined in a FullDwC that defines all classes and the properties of each class - this seems flexible to me and I thought this was one of the benefits of RDF - i.e. the links/mappings etc??? Another way to look at this could be to develop DwC as a UML model - no RDF defined, then have several implementations of the DwC model for specific use cases (this would make Roger happy :-)) - but I'm not sure how this will work with respect to the full advantages of using RDF - eg reasoning - i.e. we probably need to define the full model in RDF to make full use of RDF features.
Another comment is on the placing of your useful posts, Steve. Do you think you could do some of these as blogs that could be commented on. We can then link to blogs from google repositories, wikis, etc. Otherwise they may disappear into the mailing list archives (I know I have said this before, but it still bugs me). It seems to me you have a lot more time to spend on this stuff right now than a lot of us - being a predominantly voluntary community, the effort that can be put into discussions and activities by the community members is very sporadic. I personally would like more time to spend on this stuff, but have other tasks that are more demanding. Also, due to the sporadic nature of the effort in the TDWG community, I think a lot of people are getting left behind - especially with the large amount of content that is submitted to the mailing list - but I DO NOT want to curb your enthusiasm at all! Just thinking there might be a better place to store them?? Thanks for your efforts on these topics Steve, it is certainly valuable.
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Sunday, 14 November 2010 5:26 a.m. To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?
In the first and second installment of this series, I have tried to show that the class Individual as I have proposed it is a central part of a fully denormalized Darwin Core model. It's connective role allows for one-to-many relationships between itself and both the Occurrence and Identification classes (see http://bioimages.vanderbilt.edu/pages/full-model.jpg). I have also pointed out that in that role, it has very few properties. The reason for this is described in detail on p.26 of my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664), but in summary the only way we can actually find out anything about an individual organism is through some kind of observation or collection, which is exactly what happens in an Occurrence. Thus things that we "know" about Individuals generally are directly or indirectly associated with Occurrences, not with the instances of Individual themselves.
Rich has suggested that we should consider whether some properties that are currently properties of Occurrence should be moved into the proposed Individual class. It is good to think about this, because we do want to have an economy of classes and terms (no point in having two classes for something when one would do), and because the mental image that we have about an individual organism does include aspects of both the proposed Individual class and the part of the ASC diagram called "Collecting Unit". There are a number of ways of approaching this problem. The first approach, which is the way the discussion developed on the email list, is to just try moving terms from Occurrence to the proposed Individual class and to see whether that would "work" or not. As the discussion progressed, I began to feel increasingly uncomfortable with this process, but wasn't sure why. After I went back to the ASC diagram, it became clear to me what was the problem was. I believe that the question is really being framed incorrectly. What I have proposed for the class Individual is precisely what I have described in the previous posts: for it to serve as a node connecting Occurrences to Identificaitons. What I think Rich wants to recognize is the section of the ASC model called Collecting Unit and the boxes below it: Unsorted Lot, Lot, Specimen, and Specimen Component (I'm not sure exactly what "Derived Object" is - maybe things like images of specimens?). If I am correct in understanding what Rich wants, then the question boils down to: can or should my proposed class be the same as (or possibly include) the section on the ASC diagram called Collecting Unit. I think that I have a pretty clear idea in my mind what Individual as I have defined it means, so my task has been to try to understand what exactly is a CollectingUnit and what properties should it have. The I can approach the question of congruence with "my" Individual. If all things that we would want to fold within CollectingUnit share properties that can be placed within the Individual class, then they are congruent and should be the same thing. If some or most properties that we want to fit within CollectingUnit don't fit the defined purpose of the Individual class, then they should be two separate classes.
Because the ASC model was developed by the museum community, I think that its creators were primarily concerned with handling dead specimens. However, as Rich has correctly pointed out, the distinction between dead and living CollectingUnits is probably artificial. Rather, both living and preserved specimens may be instances of the same class which have a different value for some "live/dead" property (see http://code.google.com/p/darwincore/issues/detail?id=91). So for the moment, I'm assuming that a CollectingUnit can be either living or preserved. The case of preserved specimens is fairly straightforward. The have their origin in a single Occurrence that happens at a single Event (what I called a "resource creation event" in my Biodiversity Informatics paper). Living specimens are more complex. They may originate when the whole organism is collected from the wild and moved to a zoo or botanical garden (John's wildebeest calf). In that case there is a clear "resource creation event" if we call the living specimen a resource that is distinct from the organism when it was in the wild. In some cases, the living specimen is born in captivity, grown from a seed, or propagated vegetatively from a cutting. In that case, there is also a definable event when the living specimen originated. What was really driving me crazy was this: http://bioimages.vanderbilt.edu/vanderbilt/7-314 The Bicentennial oak is a tree that is growing in Vanderbilt's arboretum. It seemed to me that it was a living specimen because it is now a part of a collection of trees (the arboretum). But it is over 230 years old and Vanderbilt itself is only 137 years old. So clearly nobody captured, moved, or planted it to make it a part of the arboretum. For a while I tried to define it out of being a living specimen, but then I realized that the thing that made it different from other old trees that are standing around Nashville is that it has been accessioned. In other words, when the tree was claimed as a part of the arboretum, assigned an identifier (7-314), and added to the arboretum database, it became a living specimen in addition to being just a normal tree. The event of calling the tree a part of the arboretum, assigning it an identifier, and adding it to the arboreutm database is the Occurrence that marks the creation of the thing "living specimen". At that point it can have any attribute that other Occurrences have and it is then capable of serving as evidence for the Occurrence because anybody can examine it at will. The "claimed as a part of the arboretum" part is important, because I can go out into the woods and collect information about a tree there, assign it an identifier, and add it to my database, but that doesn't make it a living specimen because I don't assert that I have any control over it or that I can guarantee anyone that I can verify its status at will. If I band a bird and release it, I have assigned it an identifier and hopefully will be able to track it over time, but I can't claim it is a living specimen because I don't claim to exert control over it. That's different from John's wildebeest calf which is in a pen and be observed at will. It is similar to a maize plant in a field in Iowa which was cultivated by a human, but has no curator who is making sure that it can be found again and that it won't be harvested and ground up into wildebeest food without his or her knowledge.
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
So that brings me back to the question of whether this thing that I'm calling AccessionedUnit (which is sitting in the spot on the ASC diagram where Collecting Unit was originally) can or should be considered the same as what I have proposed to be the class dwc:Individual. The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model. With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences). The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no". It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled), particuarly if it is dead and has been put in a museum collection. It is possible that the thing referred to by an AccessionedUnit might be documented by multiple Occurrences if it is alive (like the Bicentennial Oak), but that is not an intrinsic property of an AccessionedUnit in the same way that preparations or disposition would be. On the other hand it is also quite clear that many "ResamplingUnitHavingDetermination"s will never become accessioned. That would include the banded bird, a tree photographed in the forest, or a whale observed swimming in the ocean. The longer I think about this, the more convinced I am that making a distinction between AccessionedUnit and ResamplingUnitHavingDetermination is the best course of action.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
On 15/11/2010, at 9:22 AM, Kevin Richards wrote:
allow more flexible use of the DwC terms. So this is obviously an important requirement. So my question really is, would it be possible to do both? I.e. is it possible to have terms that can be defined as having a certain domain, then equivalent terms that are not part of the domain
Technically, this is fairly straightforward: you define the "strict" property to be a subproperty of the "lax" one (I am borrowing the terms "strict" and "lax" from XML schema language). A use of a strict property also counts as a use of the lax one, allowing queries against the lax version to pick up all instances.
With respect to managing the terms, a simple solution is to use the same property names for both versions, but to have separate namespaces (URI prefixes). DwC-strict and DwC-lax can be called different "conformance levels" (borrowing the term from the OWL documentation).
Having this set up you can then discuss and define the DwC vocabulary in its strict sense, in terms of how the vocabulary ideally ought to be used.
------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
------
Kevin, Thanks for your comments which I think are very helpful. In particular, your first paragraph hits the nail on the head. When I say that an Individual is a node (i.e.as in a RDF node) that connects one to many Occurrences to one to many Identifications, that is (I believe) exactly equivalent to a many-to-many joining table. I didn't say it that way because I don't know anything about databases. As I said in my posts, I don't think that Individual (as the object of the existing term dwc:individualID) has many properties (other than bookkeeping ones) for that very reason. One could then make the case that Darwin Core doesn't "need" Individual because it's not like the other classes which function primarily as categories under which to group other terms. But the imperative (which I tried to express in the second posting) is that it IS needed because of the RDF typing issue.
The issue here is that I am talking about Apples and Rich is talking about Oranges. I need Darwin Core to define a type/class for Apples. Rich needs Darwin Core to define a type/class for Oranges. Rich feels (for some reason) that it is important to have dwc:Fruit where Apple is a subclass of Fruit and Orange is a subclass of Fruit. One could do that, but from the standpoint of data organization (at least in RDF) I can't see how that would serve any useful purpose except for allowing people to know that Apples and Oranges are both fruit, which we already know anyway. I think that I could write RDF examples relating to Individuals to show what I mean in a more rigorous way, but that would be about another two days of my spare time eaten up.
I could post this stuff on a blog somewhere where it would sit unread like most other blogs, but as this is related to an official proposal to add two Darwin core terms (Individual and individualRemarks), I thought this was the place where official discussion was supposed to take place. I realize that many on the list may not be interested in reading long emails (two line jokes are easier and more fun) or may not have time to read and ponder them, but I really don't have any other alternative. It would be better to hash this out at a meeting (which I had hoped would happen at the TDWG meeting, but it didn't), but I have no money and no organizational support for international travel. So this is it for me - email or nothing. I actually DON'T have a lot of time to spend on this. I'm about a month behind on where I need to be on editing my spring lab manual and have made no progress on either my website development or the development of the live plant imaging interest group I'm supposed to be helping organize, largely because of the time that I've spent trying to explain what the proposal means and why I think it is important. But I've already invested 20 months in trying to develop the idea of how Individuals would fit into the DwC world and I'm not willing to drop it now without a vote. I thought we were almost at that point, but the questions keep coming back like a really bad dream that won't go away. I would like it if the TAG would read the last three emails I sent (and tried to make as clear and concise in addressing the issues that have been raised) and then vote on the proposal. If in the TAG's great wisdom they think there is no need for the Individual class, my feelings won't be hurt. But I really can't keep spending time on this.
Steve
P.S. I'm hoping Bob will comment on the domain comments below. I'm not sure why it would be necessary to define domains to have a working RDF model. What seems more important to me is to have the necessary terms to connect the classes in the model (i.e. "occursInEvent" and its inverse "hasOccurrence", etc.). People would hopefully apply predicates (i.e. terms) to reasonable subjects. Could not domains be added later when people "settle" on how to use the property terms based on what does and doesn't work?
Kevin Richards wrote:
I have several comments / questions on all this:
The "Individual" debate seems to me like we are discussing very similar things, just trying to clarify the subtleties of the class/term. Tell me if I am wrong but it seems like Steve's idea for the Individual more closely resembles a many-to-many joining table in a database (ie doesn't serve much use other than connecting two tables/classes together - and doesn't normally relate to a "real world" type of object). Whereas it seems Rich's idea is to relate it more to "real-world" objects, such as samples, re-samples, etc, to allow tracking and connectability of the observed/collected/processed individuals. I think in a "model" we need to try to define the real world object types and relationships, then we apply that model to an instantiation of that model which will tend to contain more structure, such as joining table type classes.
Another comment is on how RDF works (RDF domains in particular, and the benefit of defining the domain, or having no domain). It thought we removed the rdf domain restrictions off the DwC classes to allow more flexible use of the DwC terms. So this is obviously an important requirement. So my question really is, would it be possible to do both? I.e. is it possible to have terms that can be defined as having a certain domain, then equivalent terms that are not part of the domain - I know this would complicate things immensely, but I do wonder if other communities using RDF have encountered similar problems and have found a way to do this?? We could perhaps have SimpleDwC that defines a bunch of terms that have no domain, that have "rdf links" to terms defined in a FullDwC that defines all classes and the properties of each class - this seems flexible to me and I thought this was one of the benefits of RDF - i.e. the links/mappings etc??? Another way to look at this could be to develop DwC as a UML model - no RDF defined, then have several implementations of the DwC model for specific use cases (this would make Roger happy :-)) - but I'm not sure how this will work with respect to the full advantages of using RDF - eg reasoning - i.e. we probably need to define the full model in RDF to make full use of RDF features.
Another comment is on the placing of your useful posts, Steve. Do you think you could do some of these as blogs that could be commented on. We can then link to blogs from google repositories, wikis, etc. Otherwise they may disappear into the mailing list archives (I know I have said this before, but it still bugs me). It seems to me you have a lot more time to spend on this stuff right now than a lot of us - being a predominantly voluntary community, the effort that can be put into discussions and activities by the community members is very sporadic. I personally would like more time to spend on this stuff, but have other tasks that are more demanding. Also, due to the sporadic nature of the effort in the TDWG community, I think a lot of people are getting left behind - especially with the large amount of content that is submitted to the mailing list - but I DO NOT want to curb your enthusiasm at all! Just thinking there might be a better place to store them?? Thanks for your efforts on these topics Steve, it is certainly valuable.
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Sunday, 14 November 2010 5:26 a.m. To: tdwg-content@lists.tdwg.org Subject: [tdwg-content] Background for the Individual class proposal. 3. Should an Individual also be a Collecting Unit?
In the first and second installment of this series, I have tried to show that the class Individual as I have proposed it is a central part of a fully denormalized Darwin Core model. It's connective role allows for one-to-many relationships between itself and both the Occurrence and Identification classes (see http://bioimages.vanderbilt.edu/pages/full-model.jpg). I have also pointed out that in that role, it has very few properties. The reason for this is described in detail on p.26 of my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664), but in summary the only way we can actually find out anything about an individual organism is through some kind of observation or collection, which is exactly what happens in an Occurrence. Thus things that we "know" about Individuals generally are directly or indirectly associated with Occurrences, not with the instances of Individual themselves.
Rich has suggested that we should consider whether some properties that are currently properties of Occurrence should be moved into the proposed Individual class. It is good to think about this, because we do want to have an economy of classes and terms (no point in having two classes for something when one would do), and because the mental image that we have about an individual organism does include aspects of both the proposed Individual class and the part of the ASC diagram called "Collecting Unit". There are a number of ways of approaching this problem. The first approach, which is the way the discussion developed on the email list, is to just try moving terms from Occurrence to the proposed Individual class and to see whether that would "work" or not. As the discussion progressed, I began to feel increasingly uncomfortable with this process, but wasn't sure why. After I went back to the ASC diagram, it became clear to me what was the problem was. I believe that the question is really being framed incorrectly. What I have proposed for the class Individual is precisely what I have described in the previous posts: for it to serve as a node connecting Occurrences to Identificaitons. What I think Rich wants to recognize is the section of the ASC model called Collecting Unit and the boxes below it: Unsorted Lot, Lot, Specimen, and Specimen Component (I'm not sure exactly what "Derived Object" is - maybe things like images of specimens?). If I am correct in understanding what Rich wants, then the question boils down to: can or should my proposed class be the same as (or possibly include) the section on the ASC diagram called Collecting Unit. I think that I have a pretty clear idea in my mind what Individual as I have defined it means, so my task has been to try to understand what exactly is a CollectingUnit and what properties should it have. The I can approach the question of congruence with "my" Individual. If all things that we would want to fold within CollectingUnit share properties that can be placed within the Individual class, then they are congruent and should be the same thing. If some or most properties that we want to fit within CollectingUnit don't fit the defined purpose of the Individual class, then they should be two separate classes.
Because the ASC model was developed by the museum community, I think that its creators were primarily concerned with handling dead specimens. However, as Rich has correctly pointed out, the distinction between dead and living CollectingUnits is probably artificial. Rather, both living and preserved specimens may be instances of the same class which have a different value for some "live/dead" property (see http://code.google.com/p/darwincore/issues/detail?id=91). So for the moment, I'm assuming that a CollectingUnit can be either living or preserved. The case of preserved specimens is fairly straightforward. The have their origin in a single Occurrence that happens at a single Event (what I called a "resource creation event" in my Biodiversity Informatics paper). Living specimens are more complex. They may originate when the whole organism is collected from the wild and moved to a zoo or botanical garden (John's wildebeest calf). In that case there is a clear "resource creation event" if we call the living specimen a resource that is distinct from the organism when it was in the wild. In some cases, the living specimen is born in captivity, grown from a seed, or propagated vegetatively from a cutting. In that case, there is also a definable event when the living specimen originated. What was really driving me crazy was this: http://bioimages.vanderbilt.edu/vanderbilt/7-314 The Bicentennial oak is a tree that is growing in Vanderbilt's arboretum. It seemed to me that it was a living specimen because it is now a part of a collection of trees (the arboretum). But it is over 230 years old and Vanderbilt itself is only 137 years old. So clearly nobody captured, moved, or planted it to make it a part of the arboretum. For a while I tried to define it out of being a living specimen, but then I realized that the thing that made it different from other old trees that are standing around Nashville is that it has been accessioned. In other words, when the tree was claimed as a part of the arboretum, assigned an identifier (7-314), and added to the arboretum database, it became a living specimen in addition to being just a normal tree. The event of calling the tree a part of the arboretum, assigning it an identifier, and adding it to the arboreutm database is the Occurrence that marks the creation of the thing "living specimen". At that point it can have any attribute that other Occurrences have and it is then capable of serving as evidence for the Occurrence because anybody can examine it at will. The "claimed as a part of the arboretum" part is important, because I can go out into the woods and collect information about a tree there, assign it an identifier, and add it to my database, but that doesn't make it a living specimen because I don't assert that I have any control over it or that I can guarantee anyone that I can verify its status at will. If I band a bird and release it, I have assigned it an identifier and hopefully will be able to track it over time, but I can't claim it is a living specimen because I don't claim to exert control over it. That's different from John's wildebeest calf which is in a pen and be observed at will. It is similar to a maize plant in a field in Iowa which was cultivated by a human, but has no curator who is making sure that it can be found again and that it won't be harvested and ground up into wildebeest food without his or her knowledge.
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
So that brings me back to the question of whether this thing that I'm calling AccessionedUnit (which is sitting in the spot on the ASC diagram where Collecting Unit was originally) can or should be considered the same as what I have proposed to be the class dwc:Individual. The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model. With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences). The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no". It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled), particuarly if it is dead and has been put in a museum collection. It is possible that the thing referred to by an AccessionedUnit might be documented by multiple Occurrences if it is alive (like the Bicentennial Oak), but that is not an intrinsic property of an AccessionedUnit in the same way that preparations or disposition would be. On the other hand it is also quite clear that many "ResamplingUnitHavingDetermination"s will never become accessioned. That would include the banded bird, a tree photographed in the forest, or a whale observed swimming in the ocean. The longer I think about this, the more convinced I am that making a distinction between AccessionedUnit and ResamplingUnitHavingDetermination is the best course of action.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz .
participants (5)
-
Bob Morris
-
Kevin Richards
-
Paul Murray
-
Richard Pyle
-
Steve Baskauf