comments inline
Richard Pyle wrote:
function. I think the only issue on scoping Individual boils down to this:
A whole specimen is collected. Two tissue samples are taken from it. The (remainder of the) whole specimen is preserved as a voucher, and the two tissue samples are sent to two different institutions that specialize in DNA sequencing and analysis. A morphologist establishes an identification instance for the whole specimen, and each of the molecular-based institutions assign their own Identification instances to their respective tissue samples, based on DNA sequencs of different genes. (in this example, it doesn't matter whether the Identifications are all the same and reinforcing; or different and competing).
To me, this is the crux of the issue: Given that each of the three institutions will generate a record in their respective database to represent the physical object that they are managing, should we [Option A] treat these as three separate instances of the "Individual" class, which link to each other (and hence share Identification instances) via parent/child relationships (e.g., the two tissue-sample Individuals as "child" instances of the whole-specimen Individual)?
Or, [Option B] do we establish an abstract "Individual" instance to which all Identification instances are linked, and for which these three objects are representative "tokens"?
In my mind, both allow us to represent the information we want to represent. I think Option B might be better in terms of representing the "reality" we wish to model; but I think Option A is overwhelmingly more practical in the "reality" of how data of this sort are actually managed by the people who mange them.
You are denormalizing a more general model. What if the specimen is from a tree? I collect flowers in May and return to collect fruit in September. I have a hard time making those two Occurrences be one. According to the definition on the table for a vote, an Individual is to permit resampling over time. Read the definition of dwc:IndividualID.
I think we're both in agreement on this. The only real question is what are
If I think about all of the kinds of things that I would like to put into the spot on the ASC diagram labeled "Collecting Unit" (including things like the Bicentennial Oak that was never "collected" by anybody), the one thing that they all seem to have in common is this aspect of being "accessioned". So I would assert that in a general model, "AccessionedUnit" would be a better name than "CollectingUnit". Some of the terms that I think should come out of Occurrence (such as preparations and disposition) could apply to any AccessionedUnit.
I don't think that's quite right. Certainly, the ASC model was designed specifically for DeadSpecimen data management; but I think conceptually a CollectingUnit is not intrinsically defined as an Accessioned object (definitely don't use the word "Accession" as part of the term -- this is too loaded of a term). I think of a CollectingUnit as what I called "BiologicalObject". That is, a physical object consisting of biological material (or mineralized representations of biological material, in the case of fossils). Whether it is alive or dead, accessioned or not, captive or in the wild ... are all attributes of a BiologicalObject, but do not *define* what a BiologicalObject is. The definition of a BiologicalObject is that it is an object, primarily consisting of biological stuff (or mineralized biological stuff).
The decision on this should not be made based on what we "think" an Individual should be, but rather on what we need it to be to fulfill the role that we have assigned it in our model.
I completely agree!! But the problem is that there are two, sometimes competing, sets of needs: the needs of the data consumers, who want to use this information for answering biological questions; and the needs of the data providers, who are constrined by the nature of the data they manage. The "art" in DwC is in reconciling gaps between these needs in an elegant and simple way.
With that in mind, it might be better for the moment to change the name dwc:Individual to dwc:ResamplingUnitHavingDetermination because that is what it needs to do according to its current definition and location in the model diagram (I'm considering resampling to be the documentation of multiple Occurrences).
As we've discussed before, I think "Resampling" is only one benefit of an Individual class, and should not be the defining metric of an Individual. I'd be more comfortable following along with dwc:SampledUnitHavingDetermination; because some of these Units that need to have Determinations are resampled multiple times, and some are sampled only once.
You are denormalizing a more general model. We both confessed to this sin in an earlier series of emails. If you don't want to read the first post in the series, just click on the links on order and look at what happens to the diagram. I want (no NEED) the fully normalized model for what I do and so do others. You may not need it, but I do and that's why I made the proposal. If I'm the only person who ever needs to resample anything then vote the proposal down and I'll just use the definition of Individual that I have in RDF at Bioimages. I'm already doing that. I would just prefer it to be a "well known" Darwin Core term, not an ad hoc one that I made up. If it was important enough to put individualID into the Darwin Core standard to facilitate resampling, then why is it suddenly not very important to make that term usable in RDF?
The question them becomes: should AccessionedUnit be considered the same as ResamplingUnitHavingDetermination because they share the same properties (i.e. are described by the same terms)? To me the answer is clearly "no".
I would agree. But I don't think that's the right question. To me, the right question is:
Should the scope of SampledUnitHavingDetermination include all instances of BiologicalObject; or only a subset of them? (i.e., Should SampledUnitHavingDetermination *include* BiologicalObject, or *overlap with* BiologicalObject?)
It is very likely that an AccessionedUnit will never be associated with more than one Occurrence (i.e. be resampled),
Agreed! But the vast, vast majority of things that have Determinations and are associated with Occurrences are only sampled once. Even if they represented the vast, vast minority (but non-trivial), there would still be solid rationale for not restricting the scope of Individual to only those things that are Sampled more than once.
I will repeat my charge of denormalization again.
Having made a decision about this based on functional need and shared properties, it is still helpful for me to try to develop a mental image of what these two things are. In my mind, I imagine the ResamplingUnitHavingDetermination (which I will henceforth return to calling dwc:Individual) to be an entity having a homogeneous taxonomic identity. It has some moment when it came into existance as a living thing (by being born, planted, or founded) although we will never know when that moment was unless an Occurrence happens that allows us to document that Event. The Individual remains an entity as long as it has the potential to be documented as an Occurrence. That doesn't necessarily means that it must be alive. But if it decomposes, or is preserved and put into a collection, it no longer is capable of being resampled (i.e. documented by an Occurrence). Thus a fossil that is dead for a million years and is sitting in some stratum still fits my mental image of an Individual. If it gets chipped out of the rock and put in a museum, there would no longer be any point in documenting another Occurrence for it since there would be no useful Location or GeologicalContext information to be gained from that. A roadside population of herbaceous plants having homogenous taxonomic identity would be an Individual from the first time it was capable of being sampled (when it was founded) and would end being an Individual when it was extirpated by some road construction crew and was no longer capable of being documented by an Occurrence. A wolf pack would be a similar case.
My mental image of AccessionedUnit is an entity that comes into existence when some human person or institution takes control of it, assigns it an identifier, and keeps records of it. I think I would never see it as coming to an end. Even if it is lost or destroyed, it would continue to exist as long as the person or institution maintains its record. It would just have dwc:disposition "lost" or "destroyed". It could be a dead, preserved specimen in a jar or glued to a sheet of paper, a living wildebeest calf in a zoo, or even a field sampling plot in a park as long as the park exerts control and ownership over it and maintains records about it. It could not be any wild, free-ranging animal or plant. It could not be roadkill left on the side of the road to decompose. It could not be a photograph of a wildebeest calf in the zoo, or the sound recording of the wildebeest calf's grunt. It COULD be a tissue sample from the wildebeest calf or from the roadkill. The critical thing is that it is a physical artifact originating from a living thing that has been cataloged and placed under human control. I think this is the kind of thing that Rich wanted to be able to define when he wanted to broaden the definition of Individual.
For any entity having an origin as a living thing (in my mental image), its status as an Individual is independent of its status as an AccessionedUnit. If the entity is removed and preserved in its entirety (fish killed and put in a jar of formaldehyde), it ceases to exist as a dwc:Individual and begins to exist as an AccessionedUnit. If a branch is removed from a tree or one plant pulled from a roadside population to become specimens, the removed part becomes an AccessionedUnit while the dwc:Individual continues to exist. In the case of the Bicentennial Oak or a permanent sampling plot, the entity simultaneously exists as both an AccessionedUnit and a dwc:Individual. In terms of metadata records, the establishment of any AccessionedUnit is an Occurrence (grouped under the Individual) having a property of recordedBy. Whether or not subsequent Occurrences are possible for the Individual depends on whether the act of creating the AccessionedUnit has rendered subsequent sampling irrelevant.
I agree with the point that was made previously that no specific taxonomic level should be placed in the definition of Individual. That would allow for the possibility that Individuals could contain several different lower level taxa as long as the Individual is homogeneous at the taxonomic level at with the determination is applied. I am open to suggestion for how this could be accomplished. Somehow there needs to be a value for a term like "individualScope" that allows one to make the kind of inferences about duplicates that I described previously. Maybe one controlled value for "individualScope" should be "DuplicateLevel" meaning that the Individual is homogeneous in taxonomic identity to the level at which a taxonomist would collect multiple specimens and call them duplicates. That would get us out of the problem of deciding whether the several grass stems we collect and send off to different herbaria are actually the same biological individual or clones connected by underground stems. Other possible levels could be "BiologicalIndividual" for things known to be single biological individuals, and "Heterogeneous" for things that are know or suspect to be mixtures of lower level taxa but for which it is convenient to assign a determination at a higher taxonomic level at which we know the mixture to be homogeneous.
For AccessionedUnit, I think there should also be an accessionedUnitScope term. I defer to the museum people on this, but the boxes in the ASC diagram (unsorted lot, lot (presumably homogeneous), specimen (presumably one biological individual), and specimen component) could be a starting point. The "partOf" and "hasPart" properties could be used to related AccessionedUnits that are related to each other. Relating these various levels of AccessionedUnits to levels of Individual above "DuplicateLevel" is going to be tricky, but if people want to do this, I'm sure there is a way to represent the relationships in RDF.
THE BOTTOM LINE I believe that the proposed definition for the DwC class Individual should stand as it is (i.e. as a node to connect multiple Occurrences to multiple Identifications). To allow Identifications for Individuals that are homogeneous at higher taxonomic levels, we also need a term like dwc:individualScope. I believe that there needs to be a separate class that represents what I've described here as "AccessionedUnit" which also has some kind of scope property. I am not going to propose a name for this thing or propose what properties belong with it. Rich and the herbarium/museum/botanical garden/zoo people need to decide and propose that. AccessionedUnit then becomes one of several types of evidence that can be used to support an Occurrence, with dctype:StillImage, dctype:Sound, dctype:Text as other possibilities. Darwin Core does not need to define their properties and types since others (MRTG, DCMI) have already done so. We then need two more terms: one to relate the evidence to the Occurrence and one to relate the Occurrence to the evidence (I would suggest "hasEvidence" and "isEvidenceFor" as possibilities). If we can do these things, I think we could say that a general (i.e. denormalized enough to satisfy everyone who is dissatisfied at the present moment) Darwin Core model is "complete" to the "left" of Identification on the http://bioimages.vanderbilt.edu/pages/full-model.jpg diagram. I'm not going to touch the Taxon side right now.
Whether or not action is taken on creating a class for what I'm calling "AccessionedUnit", there is no reason to hold up action on my Individual class proposal if people agree with the points I've made here.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.