Rich, I have done a couple read-throughs of your posts and I had two immediate comments. The first is that I think we want to accomplish many of the same things here and that the problem really is what we want to call things, not what we want to accomplish. So I'm encouraged by that. I think that what I need to do is to get a piece of paper out and try to map out what you are saying and what I'm saying. I think they will turn out to be mostly congruent but with different labels.
The second thing is that the point of recounting that story had nothing to do with the rank of the tree (species, subspecies, or whatever). I have no opinion on lumping, splitting, or whether species, subspecies, etc. actually exist or not or whether it is better to call something a species, subspecies, or variety. I really don't care what kind of name you apply to the whole organism. The point I was trying to make with the story was that on the scale we've been talking about from the entire biosphere to populations to individual organisms to parts of organisms to molecules, the individual organism is the point at which we no longer have to worry that further subdivisions might not share a common Identification. That is what I'm saying is "special" about the whole organism level (vs. parts). If you known that pieces came from the same whole organism, then you can be confident that an identification that is assigned to any of the pieces down to any level of further subdivision will be the same as an identification assigned to any other piece. Thus it is superfluous to assign separate identifications to every piece when you can simply assign a single identification to the whole organism and infer that that identification applies to all of the pieces. This is assuming that you have all of the pieces from that organism. If you have some of the pieces and somebody else has some of them, of course the two of you would assign separate identifications to your sets of pieces (unless you had synchronized databases that "knew" that you were both talking about the same organism - one of the points of having an identifier for individual organisms is so you can do that). I think you pretty much said the same thing below using different words.
I'm going to digest what you wrote for a while before I make further comments. Steve
Richard Pyle wrote:
Hi Steve,
I've finally had time to carefully read your recent series of emails on the acceptible scope of "Individual".
It has become somewhat apparent that we each support the establishment of the class "Individual" in DWC for different reasons, as evidenced by our different perspectives on what the acceptable scope of an "Individual" can be. I tend to think of "Individual" in the context of the ASC model's "BiologicalObject"; whereas you tend to see it more in terms of an "organismal" individual.
DwC began as a very-much PreservedSpecimen-oriented exercise. In order to include non-PreservedSpecimen instances of biodiversity data, the attributes of PreservedSpecimen were largely folded into the core class "Occurrence". I am a HUGE fan of broadening the scope of data that can be represented and exchanged via DwC, so I mostly saw this as a Good Thing. But I always had a pang of apprehension for representing PreservedSpecimens as Occurrences, because whereas both HumanObservations and PreservedSpecimens bear Occurrence-related information, and this Occurrence-related information is one of the most popular uses of DwC content (e.g., maps, modelling), PreservedSpecimens are much more than "Occurrence". Things like DNA sequences, morphological characteristics, preservation methods, storage details, loan information, and so on are all kinds of information that people holding the data associate with a PreservedSpecimen and share via, but it seems somewhat convoluted to represent these as attributes of an Occurrence.
I had supported the notion of a class "Individual" in large part to serve as a conceptual object on to which many of these things would be more appropriately attached as attributes than to Occurrence. My concern now is that the pendulum is swinging too far in the other direction. In otherwords, the move from supporting PreservedSpecimen data almost exclusively, to supporting more general biodiversity data, may be swinging further into a realm where it fails to support Specimen data adequately. As I said, I am very much a supporter of "big tent" DwC, and I would hate to see objects in DwC scoped in such a way that it unnecessarily excludes content representtion.
So I guess what I'm trying to say is, that the less the proposed class "Individual" can solve what I see as problems with DwC, the less supportive of it I become.
Before I get into the nitty gritty, I want to dispense with your "splitter" example. "Splitters" work at the rank of species every bit as much (even moreso) than at the rank of subspecies. There are analagous stories where the hyper-splitter would treat different parts of the same organism as different taxa at the rank of species. My point is, this story does not, in my mind, in any way support the exclusion of "Individuals" being identified to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of species. As far as I'm concerned, limiting an "Individual" to be only those things we can confidently assign to a taxon at the rank of "species" is a non-starter. I could fill this email with reasons why, but I think I've already done that in previous emails, so no need to repeat here.
But I do concede there is a rational basis for not treating "parts" of an organism as distinct Individuals. I'm not yet completely convinced, however. To be persuaded that subcomponents (parts) of a single "organism" should not be represented through records of the proposed DwC "Individual" class, I'll need to believe that the potential harm/confusion in doing so would (still not clear on what that is) cannot be easily mitigated by filtering with a "individualScope" property.
OK, so I'll try to address each of your reasons why you think that the scope of instances of the proposed "Individual" class should not include units below a "single organism".
if you consider the comment, which describes the primary function of Individual: "Instances of this class can serve the purpose of connecting one or more instances of the Darwin Core class Occurrence to one or more instances of the Darwin Core class Identification" it becomes clear that making parts of organisms Individuals defeats this primary purpose for the term.
I'm not sure I agree with that last statement. In other words, I don't see how the purpose of "Individual" is defeated if the lower limit of the scope of "Individual" is a whole organism.
Setting aside the cases where "whole organism" can be a bit ambiguous (corals, sponges, fungi, etc.), suppose we only have a preserved part of an organism -- a Herbarium specimen, for example. It's common practice to have multiple samples of the same plant preserved as different PreservedSpecimens, sometimes housed in different institutions. A large problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.) are trying to solve, is the problem of linking this disparate PreservedSpecimens (as well as the tissue samples derived therefrom) together. Different collections that house multiple specimens from the same individual plant (but don't yet realize it), would presumably each establish an instance of "Individual" to represent their specimen data via DwC. Thus, each of the indivudal PreservedSpecimens would have its own unique value of dwc:individualID. The question then becomes, how do we aggregate these instances of Individuals to represent the "same thing"?
In my way of thinking, where "Individual" is functionally equivalent to the ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me I'm wrong on this), then the original dwc:individualID records would continue to exist as their own distinct record, with dwc:individualScope="PartOfOrganism", with their own distinct associated data for preservation method, linked photos, etc., etc. They would be aggregated by the establishment of a new instance of "Individual", with its own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism". The various Individual instances where dwc:individualScope="PartOfOrganism" would be aggregated when they each establish a "isPartOf" or "derivedFrom" relationship with the single Individual instance where dwc:individualScope="WholeOrganism". The same model could apply to tissue samples, and other derived bits of a whole organism. As long as the dwc:individualScope value is properly applied, then it should be easy to apply appropriate reasoning logic. No?
How, then, would you represent this sort of information if the class Individual were not allowed to be applied to less-than whole organism instances? I gather that the dwc:individualID values established by the different collections for parts of the same whole organism would each effectively refer to the same whole organism, so you would link them together via "sameAs" relationships?
The major selling point for having Individuals at all is to get out of the business of applying determinations to all of the pieces of evidence such as specimens, images, sounds, etc. that get collected from the same biological individual through multiple Occurrences.
For me the main selling point of the Individual class is to remove information that does not intrinsically belong to an "Occurrence" out of that class, and into a more appropriate class.
This has the benefit that if one applies an Identification to the Individual, all physical and information resources that are derived from the individual automatically get associated with the Identification and hence the taxonomic informations referenced by the Identification. If we call preserved specimens that are pieces of organism Individuals having a value of individualScope="part", then do we do the same thing to them as we do with Individuals at higher levels, namely apply Identifications to them?
If appropriate, yes. By "appropriate", I mean if you are a herbarium, and have a specimen in your collection, and you don't know if other specimens from the same individual whole plant exist in other collections, then you assign it an individualID, and scope it as "PartOfOrganism". You attached a taxon Identification to it (of course), because you have nothing else to attach the Identification to. If later it is discovered that another specimen in another herbarium had a different dwc:individualID assigned to it (with it's own Identification), then you establish a semantic link between them (either by aggregating them under a new Individual instance with scope "WholeOrganism", or by "sameAs" relationships as I imagine you would suggest). In either case, you've got two Identification instances applying to the same WholeOrganism, which have exactly the same relationship to each other as any Individual instance with more than one Identification. That is, the Identifications either compete with each other (if different taxa are implicated), or they reinforce each other (if the same taxon is implicated). Using my approach (establishing a new Individual instance with scope "WholeOrganism"), it's fairly easy to rationalize, because you simply impose the logic that parent Individual instances scoped as "WholeOrganism" inherit the Identifications of their constituent parts, and treat them accordingly.
So, where might it not be appropriate? Well, suppose I collect a fish, and establish it as a WholeOrganism PreservedSpecimen instance of Indivdual. Then I derive from it a tissue sample, that I assign a new Individual instance for, with scope "PartOfOrganism". In that case, the child would probably not receive its own Idientification instance it all; rather, it would inherit the Identification instance from its parent. But then suppose I send that tissue off to Kansas, where it is accessioned in the tissue repository there, and then sequenced. Suppose the sequence then yields a competing Identification, different from the one assigned to the WholeOrganism. What I want to have happen is that this competing Identification instance becomes known to me, the holder of the WholeSpecimen. Conversely, if an expert re-identifies the WholeSpecimen, I would like to see that new Identification instance transferred to the derived Individuals that are "PartOfOrganism".
I *think* I understand how you would manage these things if instances of the class "Individual" were not allowed to apply to anything less than a WholeOrganism, but it would be better if you described it in your own words.
If so, then we are back in the business of assigning Identifications to all of our derivative resources rather than the biological individuals from which they came.
I don't think so. A photograph and a DNA sequence are *not* individuals. They are reflections of individuals. Very much like morphological character states scored for a particular WholeOrganism are not Individuals. These are clearly different classes of things, because they are not formed of physical biological material. The "essence" that unites everything from a population to a single cell extracted from a multicellular organism is that all of them represent biological material. The distinction between "WholeOrganism" and "PartOfOrganism" is reasonably clear in most cases, but not all cases. And to me, it seems to be a lesser offense in such cases to have to decide arbitrarily whether something falls into one of two different classes of thing, vs. whether it gets scores as one of two alternate scope terms (e.g., "WholeOrganism" vs. "PartOfOrganism").
If we just say that we'll skip assigning separate Identifications to the derivative resources, then we have something that doesn't fit the functional role for which Individual was designed.
That assumes that the *only* functional role of an Individual is to join an Occurrence to an Identification. As I have described above and elsewhere, I do not see this as the *only* functional role of an Individual.
In that case an "Individual" which is an organism part is such a different thing that one might as well call it as something else (i.e. a PreservedSpecimen).
I don't think "PreservedSpecimen" is the appropriate alternative. This term can certainly apply to parts of an organism as well as whole organisms, etc. I think the alternative to including parts within the scope of Individual is to establish something new, like "DerivedIndividual", or "IndividualPart". But like I said, it seems dangerous to me to establish a new class for something that transitionally overlaps with another class. There is no overlap between the scope of "Taxon" and the scope of "Location". Indeed, I can't think of a single other case among the DwC classes where one would have to think carefully about which class a particular data belonged. But if you wanted to treat Populations through Whole organisms as one class, and derived components of Whole organisms as a separate class, I can think of many examples where there is potential overlap between the two.
The case of a whole organism (live as a LivingSpecimen or dead as a PreservedSpecimen) is different because in that case we would have a single resource serving as the evidence (the whole organism itself).
Evidence of what? Occurrence? I guess this comes back to my original point, and my reason for supporting an Individual class, which is that specimens serve the function of much more than evidence of occurrence. (So do images and HumanObservations and most other things of that sort -- but that's a topic for another thread).
By definition, there can't be many of those (there would just be one) and it would already have an Identification assigned to it, because it is the same Individual that it is providing evidence for. So there is no superfluous assignment of Identifications in that case.
In principle, I tend to agree -- but as we have dicussed before, DwC is an exchange standard, and as such necessarily serves as a compromise between the way data "are", and the way data "ought to be".
I have had the tendency of thinking that the tokens supported the Occurrence, but there does not need to be just one purpose for the token. They also support the existence of the Individual.
Yes, exactly!
This should probably make you happy, because the pieces of the Individual (preserved specimens, tissue samples) would be derived from the Individual.
Yup! :-)
I have created a number of similar charts showing how these relationships could apply to various types of tokens:
I'll need to digest these some more before commenting.
I guess I'm still having difficulty understanding how you envision placing properties/attributes of tokens into records represented via DwC. I'll need to spend some more time thinking through what a token ins, ahow it maps to fields and tables in my database, and how I structure their specific properties into DwC terms.
But what I'm not sure I understand is how any of this supports your contention that the scope of "Individual" should not be allowed to apply to parts of a WholeOrganism.
This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be re-assigned to Record-level terms. Was there some reason this isn't appropriate?
I think it is appropriate because they should be usable with at least two classes: Individual (for living specimens) and Occurrences (e.g. preserved specimens, images)
I think this gets to the heart of the difference you and I have in viewing the function of "Individual". My *primary* reason for supporting it is to get properties/attributes of PreservedSpecimen *out* of the Occurrence class.
And, a mechanism to track series of "derived from" Individuals. The ASC model covered this, I think (right, Stan?)
I didn't see it in the flow chart, but it could be there somewhere.
I don't have the chart in front of me now, but I'm fairly certain that BiologicalObject can be a child of another BiologicalObject, and the scope included things like Lot, individual whole organism, part of organism, etc.
The risk that we make the definition of Individual so broad that it can't perform any of the functions it was defined to serve. We've already lost one of them (the ability to infer
duplicates) when I agreed to the broader definition, but that's the subject of another post.
These are some principles that I always try to keep in mind when discussing these things:
- DwC is a data exchange standard, not so much a physical data model.
- There is a necessary balance between structuring DwC around how data
actually exist in content-provider databases, and how data *should* be represented in a normalised world
- When in doubt, DwC should be accomodating, rather than restrictive --
especially when more restrictive needs can be met via associated data filtering
There are other principles as well, but these are the ones I keep having to remind myself of.
I think that what I I have suggested above is very unrestrictive. We let evidence be the type of things that they are (PreservedSpecimens, Individuals, StillImages, SoundRecordings, DNA sequences, etc.). We don't determine their type by what we want to use them for. That was the mistake that I made in the Biodiversity Informatics paper. If we follow this approach, then a StillImage can fill any role that we want: evidence that an Occurrence happened, information to support an Identification, a character for a visual key, a logo, etc. We let it fulfill those roles by giving it an identifier and connecting it to other resources using appropriate terms (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
I think maybe so. Maybe the appropriate course of action here as well is to let people try different approaches out and if they turn out to work and be needed, then we talk about applying them to Darwin Core.
Ultimately, I think people will use it in accordance to what terms are nested within it -- which is why I think it's important to have this conversation we're having now.
As I indicated at an earlier time, I think that there are very few terms that should be properties of Individual since it is primarily a node that connects Occurrences to Identifications (and I guess now to derived tokens).
Aloha, Rich
Looking forward to responses! But I don't think development of these ideas should hold up the proposal for the class Individual, which can stand on its own with its current (revised) definition. Steve
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt
.