I have done a couple read-throughs of your posts and I had two immediate comments. The first is that I think we want to accomplish many of the same things here and that the problem really is what we want to call things, not what we want to accomplish. So I'm encouraged by that. I think that what I need to do is to get a piece of paper out and try to map out what you are saying and what I'm saying. I think they will turn out to be mostly congruent but with different labels.
Yes, I have the same sense. Maybe if we come to a consensus, we can create a summary description of everything, for the benefit of those who do not have time to read and digest all these posts.
The second thing is that the point of recounting that story had nothing to do with the rank of the tree (species, subspecies, or whatever).
OK, sorry. The parts that threw me on that were:
If we call mixtures of biological individuals of different lower-level taxa Individuals, then we loose the certainty of that all instances of Occurrences arising from that Individual are the same taxa.
and
I have accepted the broadening of the definition to include "taxa" at any level, but I'm thinking that may have been a mistake.
I (mis)interpreted this to mean that we shouldn't try to regard organisms identified to infra-specific ranks as "Individuals". Sorry that I misunderstood.
The point I was trying to make with the story was that on the scale we've been talking about from the entire biosphere to populations to individual organisms to parts of organisms to molecules, the individual organism is the point at which we no longer have to worry that further subdivisions might not share a common Identification.
OK, I understand that, and despite my seemingly passionate please to maintain the scope of "Individual" to include sub-WholeOrganism units, I'm keeping an open mind on that. I (maybe) could be persuaded that the "Individual" ends with a single WholeOrganism, and parts may be dealt with in some other way ("associatedParts"? "associatedSubunits"? As members of the Individual class?)
I guess part of the passion of my fight stems from my hope that the pendulum doesn't swing so far that biological collections objects can no longer be represented as records unto themselves through DwC (as opposed to only represented as attributes of some sort of semi-abstract unit of "Individual" or "Occurrence" that isn't directly represented in many/most real-world databases).
That is what I'm saying is "special" about the whole organism level (vs. parts). If you known that pieces came from the same whole organism, then you can be confident that an identification that is assigned to any of the pieces down to any level of further subdivision will be the same as an identification assigned to any other piece.
I agree that along the continuum from "population" down to "single molecule extracted from an organism", there is a pretty clear (though not perfect) inflection point at the level of singel whole organism; and I can see reconising that in some way in DwC. But I just have this sense that the demarcation should be at the level of our controlled vocabulary for something like "individualScope", rather than what is considered "in" vs "out" of scope for the class "Individual".
Thus it is superfluous to assign separate identifications to every piece when you can simply assign a single identification to the whole organism and infer that that identification applies to all of the pieces.
Yes, but DwC is, by its nature, denormalized. There is unnecessary repetition of information built into it. As long as Identification instances have proper GUIDs (or even LUIDs within a defined dataset), which DwC encourages via "identificationID", then I see no reason why an identification instance cannot be simultaneously shared by instances of "Individual" at the scope of "WholeOrganism" and below. In fact, it logically works above the scope of "WholeOrganism". For example, our fish collection assigns catalog numbers to "lots", which contain 1...n WholeSpecimens (or sometimes parts of a whole organism). The Identifications apply to the lots. So if there are 15 whole specimens (dwc:individualCount=15) within a Lot identified as "Aus bus", then the identification is implied for each of the 15 individual whole organisms. Sometimes we have reason to recognize attributes of individual whole organisms (e.g., individual lengths, or other morphological characters), in such cases, I would imagine establishing child "Individual" instances (each with dwc:individualCount=1), but I don't see why I would have to replicate the Identification 15 times (each with a separate identificationID). I would rather have the 15 wholespecimens inherit the single Identification instance thrgough an appropriate relationship link between parent "Lot" and child "WholeSpecimen".
I think there is a difference, though. Whereas in the case of WholeOrganism and its derived parts, the inheritance of Identification instances is bidirectional. That is, if a tissue sample is sequenced, and evidence from that sequence leads to an Identification, then surely the WholeOrganism Individual instance would inherit this Identification.
However, at scopes of Individual broader than "WholeOrganism", the inheritance of Identification is unidirectional. That is, a child can inherit the Identifications of the Parent, but the parent cannot necessarily inherit the Identifications of the child. This it the point that nags at the back of my brain and tries to persuade me to throw in the towel and agree with you that "Individual" does not extend below the level of "WholeOrganism".
This is assuming that you have all of the pieces from that organism. If you have some of the pieces and somebody else has some of them, of course the two of you would assign separate identifications to your sets of pieces (unless you had synchronized databases that "knew" that you were both talking about the same organism - one of the points of having an identifier for individual organisms is so you can do that).
The problem is, at least for the billions of specimen records already extant, this information is not known beforehand. Obviously, going forward we want to capture this information at the outset (ideally, in the field at the time of specimen acquisition). Indeed, this is one of the *key* goals of the NSF BiSciCol grant -- to facilitate exactly this.
So, what I think it all boils down to is, when we discover multiple specimens that each represent a part of a single "WholeOrganism" individual, how do we map the pre-existing dwc:individualID values assigned to each of the multiple parts to each other?
In my world-view, those parts are themselves individuals, so we would generate a new "parent" instance of Individual, with its own dwc:individualID and scope of "WholeOrganism", linked to all the parts via the appropriate parent/child semantics.
If I undertsand your world-view on this, all of those existing dwc:individualID values would have been implicitly established *for* the WholeOrganism (of which only a part is represented as "evidence" in a herbarium), and thus they would all be aggregated via some sort of "sameAs" semantic relationship.
Is that a fair distinction between our respective perspectives?
I'm going to digest what you wrote for a while before I make further comments.
Likewise for me on your earlier posts.
Thanks for keeping this discussion interesting and thought-provoking!
And, apologies to everyone for the volume of email on this topic...not that any of those people have got this far into this message.....
Aloha, Rich