Why it matters what kind of things we include in the definition of Individual
What I think is getting lost in this attempt to define what is and what is not an Individual is that there is a clear and straightforward functional definition of Individual based on what it is intended to do:
An Individual serves as a resource relationship node that connects Occurrences to Identifications. (This is stated explicitly in the comment I included with the term definition.)
If you don't like the technical language, then look at the diagram: http://bioimages.vanderbilt.edu/pages/token-explicit.gif which shows that there is a many-to-one relationship between Occurrence and Individual, and a one-to-many relationship between Individual and Identification.
If you prefer it in layman's language: an Individual can connect many Occurrences to many Identifications.
If something that you want to call an Individual can't or doesn't do this, then it shouldn't be an Individual. The purpose why I have asked for this class to be added to DwC is to be able to accomplish the purpose listed above, not to see how many things we can think of for which we have philosophical reasons to think that they should be called an "individual".
We gain three clear benefits from being able to create instances of the Individual class: Benefit 1. We can group Occurrences that document the same Individual over time (i.e. resampling). This is exactly the reason why the present term dwc:individualID exists (read the definition at http://rs.tdwg.org/dwc/terms/index.htm#individualID). That function is represented by the triangle on the left side of Individual in the diagram referenced above. Benefit 2. If there are multiple Identifications of an Individual, those identifications automatically are associated to all Occurrences that are associated with the Individual. That function is represented by the triangle on the right side of the diagram. If we connect several tokens to an Individual, those multiple Identifications are automatically associated with all of the tokens as well. Benefit 3. Individuals allow us to do semantic reasoning of a very primitive sort. If an Occurrence A and the token that acts as its evidence are associated with Individual A having Identification A, and if Occurrence B and the token that acts as its evidence are associated with Individual B having Identification B, then if we discover that Individual A is the same as Individual B then we know that Identification B also applies to Occurrence A (and its documenting token) and that Identification A applies to Occurrence B (and its documenting token). Writing it in this abstract way is a bit hard to follow, so I'll illustrate with two examples. In a previous post, I mentioned a living individual (possibly the only one) of Crataegus harbisonii. I have documented the Occurrence of this Individual on 2008-10-31T09:49:29 at 36.07° latitude, -86.88° longitude by the token http://bioimages.vanderbilt.edu/baskauf/70915 (an image) and have applied an Identification of Crataegus harbisonii to that Individual. Ron Lance has also recorded the Occurrence of the same Individual at the same location around 2000 and documented it by propagating it by a cutting which is now a living specimen in the North Carolina Arboretum. If someone examines that living specimen and and applies an Identification of Crataegus somethingelse to the Individual from which it was collected, then I can infer automatically that his/her Identification of Crataegus somethingelse applies to my 2008 Occurrence and its associated image. The person who looked at the living specimen would not need to look at my image for me to know that. Another example happened when a taxonomist was looking at several bark and leaf images for a particular species I had photographed. He wanted to know which flower images that I had taken came from the same tree as particular bark and leaf images. He knew logically that if he could identify the Individual by its flower that by inference that Identification would also apply to the bark image even if he couldn't do the actual identification based on the bark alone. A final application involves Identifications of "duplicates" found in different herbaria. A taxonomist is doing a revision of a genus and borrows specimens of that genus from several herbaria. Specimen A from herbarium A was identified as species A in the genus of interest. Specimen B from herbarium B was identified as species B in the same genus. By careful examination of the label records, the taxonomist is able to determine that the specimens are "duplicates" (i.e. they are from the same Individual). By inference, the taxonomist knows that the identifications of species A and species B apply to both specimen A and specimen B because they are both from the same Individual.
In my original thinking about what should constitute an instance of the class Individual, I only allowed actual biological individuals, or small localized populations that were so tightly linked that a taxonomist collecting specimens from it would call them "duplicates". Under that definition of Individual, all three of the benefits listed above would apply. My qualms about applying the term Individual to the various buckets of dead homogeneous and heterogeneous mixtures of organisms stems from loss of benefit number 1 in those cases. Moving subsets of those dead organisms around and putting them into different jars has no aspect of resampling. Sorting and re-assigning individualIDs to the various jars still only involves a single Occurrence, the one in which the trawler collected the original bucket from the ocean. There are clever things we can do with multiple Identifications, but we've basically lost the triangle on the left side of Individual (no benefit #1). My qualms about applying the term Individual to cut up pieces of organisms involves the triangle on the right side of Individual (connecting Individuals to Identifications). If you chop up a fish into 100 pieces of organs, tissues, DNA samples, etc. and call all of those pieces Individuals, there is no point in assigning separate Identifications to all of them. Unless the original fish has had some kind of tricky human intervention like interspecific organ transplants, grafting, or creation of a chimera, it is a foregone conclusion that all of the parts of the individual fish have the same Identification. Assigning them all separate identifications would be a waste of time - no Benefit #2. Finally, applying the term Individual to containers that we know to contain biological individuals that probably differ at lower taxonomic levels causes problems with Benefit #3. Unless one has a way to specify that the Individual he is talking about is the kind of Individual that a taxonomist would take "duplicates" from (i.e. reliably a single taxon at a low level), it becomes difficult to be sure of the accuracy of the type of reasoning that I'd like us to be able to do based on Occurrences and tokens documenting a common Individual.
So what I've tried to do here is to explain why I'm opposed to broadening the definition of Individual to include all of the things that people have suggested it should include. If the definition becomes so broad that we loose the benefits that were the reason for establishing the class Individual, then there is no point in having the class at all. I think that if we stick to the definition that I proposed, we can at least get Benefits #1 and #2. With the substitution of "taxon" for "species or lower...", I think to get benefit #3 we are going to need to also have the individualScope term that Rich proposed and it would need to include a value that indicated that the group of biological individuals were restricted to those that a taxonomist would call "duplicates".
Steve
Laughing at the "splitter"
After I wrote this post last night, I remembered a funny story that I heard when I used to hang out with plant taxonomists as a grad student at Vanderbilt (that was when Vanderbilt actually still HAD taxonomists). The story was told of a certain old taxonomist who was a "splitter". He was always taking species and dividing them into subspecies or varieties that didn't seem to be clearly definable. When others would question this, he always said that he could tell the difference. One day his graduate student went to a tree of a species that he had "split" and cut a half dozen branches from it. The student came in and asked his mentor to help him identify the specimens that he'd collected. The mentor confidently placed the branches into piles representing the several subspecies that he had established. Then the grad student revealed that they were all from the same tree.
Every time this story was told, it was greeted with snickering and laughter. Why do people think this story is funny? It's because we know from our background in biology that it is not possible for several pieces of the same tree to be different taxa. That is why it is useful to anchor the concept of Individual at the level of the biological Individual. If we call mixtures of biological individuals of different lower-level taxa Individuals, then we loose the certainty of that all instances of Occurrences arising from that Individual are the same taxa. If we call pieces of biological individuals Individuals, then we set up the circumstance where we can assign those pieces differing Identifications. We set ourselves up to be the butt of jokes like the one I told above. In my original thinking about the proposed class Individual, I only grudgingly accepted the idea that several biological individuals could fall into the category Individual. I did this because I acknowledged that sometimes I couldn't tell when one individual ended and another began (think individuals of moss) or because it wasn't worth the effort to separate biological individuals (try to take a photo of a single individual of grass; herbarium sheets of grass often have several biological individuals on them because it's clear they are the same species). I have accepted the broadening of the definition to include "taxa" at any level, but I'm thinking that may have been a mistake. However, I don't know how to restate the definition to mean what I think it needs to mean without causing the taxonomists of the group to go for each other's jugular veins.
Steve
Steve Baskauf wrote:
What I think is getting lost in this attempt to define what is and what is not an Individual is that there is a clear and straightforward functional definition of Individual based on what it is intended to do:
An Individual serves as a resource relationship node that connects Occurrences to Identifications. (This is stated explicitly in the comment I included with the term definition.)
If you don't like the technical language, then look at the diagram: http://bioimages.vanderbilt.edu/pages/token-explicit.gif which shows that there is a many-to-one relationship between Occurrence and Individual, and a one-to-many relationship between Individual and Identification.
If you prefer it in layman's language: an Individual can connect many Occurrences to many Identifications.
If something that you want to call an Individual can't or doesn't do this, then it shouldn't be an Individual. The purpose why I have asked for this class to be added to DwC is to be able to accomplish the purpose listed above, not to see how many things we can think of for which we have philosophical reasons to think that they should be called an "individual".
We gain three clear benefits from being able to create instances of the Individual class: Benefit 1. We can group Occurrences that document the same Individual over time (i.e. resampling). This is exactly the reason why the present term dwc:individualID exists (read the definition at http://rs.tdwg.org/dwc/terms/index.htm#individualID). That function is represented by the triangle on the left side of Individual in the diagram referenced above. Benefit 2. If there are multiple Identifications of an Individual, those identifications automatically are associated to all Occurrences that are associated with the Individual. That function is represented by the triangle on the right side of the diagram. If we connect several tokens to an Individual, those multiple Identifications are automatically associated with all of the tokens as well. Benefit 3. Individuals allow us to do semantic reasoning of a very primitive sort. If an Occurrence A and the token that acts as its evidence are associated with Individual A having Identification A, and if Occurrence B and the token that acts as its evidence are associated with Individual B having Identification B, then if we discover that Individual A is the same as Individual B then we know that Identification B also applies to Occurrence A (and its documenting token) and that Identification A applies to Occurrence B (and its documenting token). Writing it in this abstract way is a bit hard to follow, so I'll illustrate with two examples. In a previous post, I mentioned a living individual (possibly the only one) of Crataegus harbisonii. I have documented the Occurrence of this Individual on 2008-10-31T09:49:29 at 36.07° latitude, -86.88° longitude by the token http://bioimages.vanderbilt.edu/baskauf/70915 (an image) and have applied an Identification of Crataegus harbisonii to that Individual. Ron Lance has also recorded the Occurrence of the same Individual at the same location around 2000 and documented it by propagating it by a cutting which is now a living specimen in the North Carolina Arboretum. If someone examines that living specimen and and applies an Identification of Crataegus somethingelse to the Individual from which it was collected, then I can infer automatically that his/her Identification of Crataegus somethingelse applies to my 2008 Occurrence and its associated image. The person who looked at the living specimen would not need to look at my image for me to know that. Another example happened when a taxonomist was looking at several bark and leaf images for a particular species I had photographed. He wanted to know which flower images that I had taken came from the same tree as particular bark and leaf images. He knew logically that if he could identify the Individual by its flower that by inference that Identification would also apply to the bark image even if he couldn't do the actual identification based on the bark alone. A final application involves Identifications of "duplicates" found in different herbaria. A taxonomist is doing a revision of a genus and borrows specimens of that genus from several herbaria. Specimen A from herbarium A was identified as species A in the genus of interest. Specimen B from herbarium B was identified as species B in the same genus. By careful examination of the label records, the taxonomist is able to determine that the specimens are "duplicates" (i.e. they are from the same Individual). By inference, the taxonomist knows that the identifications of species A and species B apply to both specimen A and specimen B because they are both from the same Individual.
In my original thinking about what should constitute an instance of the class Individual, I only allowed actual biological individuals, or small localized populations that were so tightly linked that a taxonomist collecting specimens from it would call them "duplicates". Under that definition of Individual, all three of the benefits listed above would apply. My qualms about applying the term Individual to the various buckets of dead homogeneous and heterogeneous mixtures of organisms stems from loss of benefit number 1 in those cases. Moving subsets of those dead organisms around and putting them into different jars has no aspect of resampling. Sorting and re-assigning individualIDs to the various jars still only involves a single Occurrence, the one in which the trawler collected the original bucket from the ocean. There are clever things we can do with multiple Identifications, but we've basically lost the triangle on the left side of Individual (no benefit #1). My qualms about applying the term Individual to cut up pieces of organisms involves the triangle on the right side of Individual (connecting Individuals to Identifications). If you chop up a fish into 100 pieces of organs, tissues, DNA samples, etc. and call all of those pieces Individuals, there is no point in assigning separate Identifications to all of them. Unless the original fish has had some kind of tricky human intervention like interspecific organ transplants, grafting, or creation of a chimera, it is a foregone conclusion that all of the parts of the individual fish have the same Identification. Assigning them all separate identifications would be a waste of time - no Benefit #2. Finally, applying the term Individual to containers that we know to contain biological individuals that probably differ at lower taxonomic levels causes problems with Benefit #3. Unless one has a way to specify that the Individual he is talking about is the kind of Individual that a taxonomist would take "duplicates" from (i.e. reliably a single taxon at a low level), it becomes difficult to be sure of the accuracy of the type of reasoning that I'd like us to be able to do based on Occurrences and tokens documenting a common Individual.
So what I've tried to do here is to explain why I'm opposed to broadening the definition of Individual to include all of the things that people have suggested it should include. If the definition becomes so broad that we loose the benefits that were the reason for establishing the class Individual, then there is no point in having the class at all. I think that if we stick to the definition that I proposed, we can at least get Benefits #1 and #2. With the substitution of "taxon" for "species or lower...", I think to get benefit #3 we are going to need to also have the individualScope term that Rich proposed and it would need to include a value that indicated that the group of biological individuals were restricted to those that a taxonomist would call "duplicates".
Steve
Hi Steve,
I've finally had time to carefully read your recent series of emails on the acceptible scope of "Individual".
It has become somewhat apparent that we each support the establishment of the class "Individual" in DWC for different reasons, as evidenced by our different perspectives on what the acceptable scope of an "Individual" can be. I tend to think of "Individual" in the context of the ASC model's "BiologicalObject"; whereas you tend to see it more in terms of an "organismal" individual.
DwC began as a very-much PreservedSpecimen-oriented exercise. In order to include non-PreservedSpecimen instances of biodiversity data, the attributes of PreservedSpecimen were largely folded into the core class "Occurrence". I am a HUGE fan of broadening the scope of data that can be represented and exchanged via DwC, so I mostly saw this as a Good Thing. But I always had a pang of apprehension for representing PreservedSpecimens as Occurrences, because whereas both HumanObservations and PreservedSpecimens bear Occurrence-related information, and this Occurrence-related information is one of the most popular uses of DwC content (e.g., maps, modelling), PreservedSpecimens are much more than "Occurrence". Things like DNA sequences, morphological characteristics, preservation methods, storage details, loan information, and so on are all kinds of information that people holding the data associate with a PreservedSpecimen and share via, but it seems somewhat convoluted to represent these as attributes of an Occurrence.
I had supported the notion of a class "Individual" in large part to serve as a conceptual object on to which many of these things would be more appropriately attached as attributes than to Occurrence. My concern now is that the pendulum is swinging too far in the other direction. In otherwords, the move from supporting PreservedSpecimen data almost exclusively, to supporting more general biodiversity data, may be swinging further into a realm where it fails to support Specimen data adequately. As I said, I am very much a supporter of "big tent" DwC, and I would hate to see objects in DwC scoped in such a way that it unnecessarily excludes content representtion.
So I guess what I'm trying to say is, that the less the proposed class "Individual" can solve what I see as problems with DwC, the less supportive of it I become.
Before I get into the nitty gritty, I want to dispense with your "splitter" example. "Splitters" work at the rank of species every bit as much (even moreso) than at the rank of subspecies. There are analagous stories where the hyper-splitter would treat different parts of the same organism as different taxa at the rank of species. My point is, this story does not, in my mind, in any way support the exclusion of "Individuals" being identified to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of species. As far as I'm concerned, limiting an "Individual" to be only those things we can confidently assign to a taxon at the rank of "species" is a non-starter. I could fill this email with reasons why, but I think I've already done that in previous emails, so no need to repeat here.
But I do concede there is a rational basis for not treating "parts" of an organism as distinct Individuals. I'm not yet completely convinced, however. To be persuaded that subcomponents (parts) of a single "organism" should not be represented through records of the proposed DwC "Individual" class, I'll need to believe that the potential harm/confusion in doing so would (still not clear on what that is) cannot be easily mitigated by filtering with a "individualScope" property.
OK, so I'll try to address each of your reasons why you think that the scope of instances of the proposed "Individual" class should not include units below a "single organism".
if you consider the comment, which describes the primary function of Individual: "Instances of this class can serve the purpose of connecting one or more instances of the Darwin Core class Occurrence to one or more instances of the Darwin Core class Identification" it becomes clear that making parts of organisms Individuals defeats this primary purpose for the term.
I'm not sure I agree with that last statement. In other words, I don't see how the purpose of "Individual" is defeated if the lower limit of the scope of "Individual" is a whole organism.
Setting aside the cases where "whole organism" can be a bit ambiguous (corals, sponges, fungi, etc.), suppose we only have a preserved part of an organism -- a Herbarium specimen, for example. It's common practice to have multiple samples of the same plant preserved as different PreservedSpecimens, sometimes housed in different institutions. A large problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.) are trying to solve, is the problem of linking this disparate PreservedSpecimens (as well as the tissue samples derived therefrom) together. Different collections that house multiple specimens from the same individual plant (but don't yet realize it), would presumably each establish an instance of "Individual" to represent their specimen data via DwC. Thus, each of the indivudal PreservedSpecimens would have its own unique value of dwc:individualID. The question then becomes, how do we aggregate these instances of Individuals to represent the "same thing"?
In my way of thinking, where "Individual" is functionally equivalent to the ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me I'm wrong on this), then the original dwc:individualID records would continue to exist as their own distinct record, with dwc:individualScope="PartOfOrganism", with their own distinct associated data for preservation method, linked photos, etc., etc. They would be aggregated by the establishment of a new instance of "Individual", with its own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism". The various Individual instances where dwc:individualScope="PartOfOrganism" would be aggregated when they each establish a "isPartOf" or "derivedFrom" relationship with the single Individual instance where dwc:individualScope="WholeOrganism". The same model could apply to tissue samples, and other derived bits of a whole organism. As long as the dwc:individualScope value is properly applied, then it should be easy to apply appropriate reasoning logic. No?
How, then, would you represent this sort of information if the class Individual were not allowed to be applied to less-than whole organism instances? I gather that the dwc:individualID values established by the different collections for parts of the same whole organism would each effectively refer to the same whole organism, so you would link them together via "sameAs" relationships?
The major selling point for having Individuals at all is to get out of the business of applying determinations to all of the pieces of evidence such as specimens, images, sounds, etc. that get collected from the same biological individual through multiple Occurrences.
For me the main selling point of the Individual class is to remove information that does not intrinsically belong to an "Occurrence" out of that class, and into a more appropriate class.
This has the benefit that if one applies an Identification to the Individual, all physical and information resources that are derived from the individual automatically get associated with the Identification and hence the taxonomic informations referenced by the Identification. If we call preserved specimens that are pieces of organism Individuals having a value of individualScope="part", then do we do the same thing to them as we do with Individuals at higher levels, namely apply Identifications to them?
If appropriate, yes. By "appropriate", I mean if you are a herbarium, and have a specimen in your collection, and you don't know if other specimens from the same individual whole plant exist in other collections, then you assign it an individualID, and scope it as "PartOfOrganism". You attached a taxon Identification to it (of course), because you have nothing else to attach the Identification to. If later it is discovered that another specimen in another herbarium had a different dwc:individualID assigned to it (with it's own Identification), then you establish a semantic link between them (either by aggregating them under a new Individual instance with scope "WholeOrganism", or by "sameAs" relationships as I imagine you would suggest). In either case, you've got two Identification instances applying to the same WholeOrganism, which have exactly the same relationship to each other as any Individual instance with more than one Identification. That is, the Identifications either compete with each other (if different taxa are implicated), or they reinforce each other (if the same taxon is implicated). Using my approach (establishing a new Individual instance with scope "WholeOrganism"), it's fairly easy to rationalize, because you simply impose the logic that parent Individual instances scoped as "WholeOrganism" inherit the Identifications of their constituent parts, and treat them accordingly.
So, where might it not be appropriate? Well, suppose I collect a fish, and establish it as a WholeOrganism PreservedSpecimen instance of Indivdual. Then I derive from it a tissue sample, that I assign a new Individual instance for, with scope "PartOfOrganism". In that case, the child would probably not receive its own Idientification instance it all; rather, it would inherit the Identification instance from its parent. But then suppose I send that tissue off to Kansas, where it is accessioned in the tissue repository there, and then sequenced. Suppose the sequence then yields a competing Identification, different from the one assigned to the WholeOrganism. What I want to have happen is that this competing Identification instance becomes known to me, the holder of the WholeSpecimen. Conversely, if an expert re-identifies the WholeSpecimen, I would like to see that new Identification instance transferred to the derived Individuals that are "PartOfOrganism".
I *think* I understand how you would manage these things if instances of the class "Individual" were not allowed to apply to anything less than a WholeOrganism, but it would be better if you described it in your own words.
If so, then we are back in the business of assigning Identifications to all of our derivative resources rather than the biological individuals from which they came.
I don't think so. A photograph and a DNA sequence are *not* individuals. They are reflections of individuals. Very much like morphological character states scored for a particular WholeOrganism are not Individuals. These are clearly different classes of things, because they are not formed of physical biological material. The "essence" that unites everything from a population to a single cell extracted from a multicellular organism is that all of them represent biological material. The distinction between "WholeOrganism" and "PartOfOrganism" is reasonably clear in most cases, but not all cases. And to me, it seems to be a lesser offense in such cases to have to decide arbitrarily whether something falls into one of two different classes of thing, vs. whether it gets scores as one of two alternate scope terms (e.g., "WholeOrganism" vs. "PartOfOrganism").
If we just say that we'll skip assigning separate Identifications to the derivative resources, then we have something that doesn't fit the functional role for which Individual was designed.
That assumes that the *only* functional role of an Individual is to join an Occurrence to an Identification. As I have described above and elsewhere, I do not see this as the *only* functional role of an Individual.
In that case an "Individual" which is an organism part is such a different thing that one might as well call it as something else (i.e. a PreservedSpecimen).
I don't think "PreservedSpecimen" is the appropriate alternative. This term can certainly apply to parts of an organism as well as whole organisms, etc. I think the alternative to including parts within the scope of Individual is to establish something new, like "DerivedIndividual", or "IndividualPart". But like I said, it seems dangerous to me to establish a new class for something that transitionally overlaps with another class. There is no overlap between the scope of "Taxon" and the scope of "Location". Indeed, I can't think of a single other case among the DwC classes where one would have to think carefully about which class a particular data belonged. But if you wanted to treat Populations through Whole organisms as one class, and derived components of Whole organisms as a separate class, I can think of many examples where there is potential overlap between the two.
The case of a whole organism (live as a LivingSpecimen or dead as a PreservedSpecimen) is different because in that case we would have a single resource serving as the evidence (the whole organism itself).
Evidence of what? Occurrence? I guess this comes back to my original point, and my reason for supporting an Individual class, which is that specimens serve the function of much more than evidence of occurrence. (So do images and HumanObservations and most other things of that sort -- but that's a topic for another thread).
By definition, there can't be many of those (there would just be one) and it would already have an Identification assigned to it, because it is the same Individual that it is providing evidence for. So there is no superfluous assignment of Identifications in that case.
In principle, I tend to agree -- but as we have dicussed before, DwC is an exchange standard, and as such necessarily serves as a compromise between the way data "are", and the way data "ought to be".
I have had the tendency of thinking that the tokens supported the Occurrence, but there does not need to be just one purpose for the token. They also support the existence of the Individual.
Yes, exactly!
This should probably make you happy, because the pieces of the Individual (preserved specimens, tissue samples) would be derived from the Individual.
Yup! :-)
I have created a number of similar charts showing how these relationships could apply to various types of tokens:
I'll need to digest these some more before commenting.
I guess I'm still having difficulty understanding how you envision placing properties/attributes of tokens into records represented via DwC. I'll need to spend some more time thinking through what a token ins, ahow it maps to fields and tables in my database, and how I structure their specific properties into DwC terms.
But what I'm not sure I understand is how any of this supports your contention that the scope of "Individual" should not be allowed to apply to parts of a WholeOrganism.
This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be re-assigned to Record-level terms. Was there some reason this isn't appropriate?
I think it is appropriate because they should be usable with at least two classes: Individual (for living specimens) and Occurrences (e.g. preserved specimens, images)
I think this gets to the heart of the difference you and I have in viewing the function of "Individual". My *primary* reason for supporting it is to get properties/attributes of PreservedSpecimen *out* of the Occurrence class.
And, a mechanism to track series of "derived from" Individuals. The ASC model covered this, I think (right, Stan?)
I didn't see it in the flow chart, but it could be there somewhere.
I don't have the chart in front of me now, but I'm fairly certain that BiologicalObject can be a child of another BiologicalObject, and the scope included things like Lot, individual whole organism, part of organism, etc.
The risk that we make the definition of Individual so broad that it can't perform any of the functions it was defined to serve. We've already lost one of them (the ability to infer
duplicates) when I agreed to the broader definition, but that's the subject of another post.
These are some principles that I always try to keep in mind when discussing these things:
- DwC is a data exchange standard, not so much a physical data model. - There is a necessary balance between structuring DwC around how data actually exist in content-provider databases, and how data *should* be represented in a normalised world - When in doubt, DwC should be accomodating, rather than restrictive -- especially when more restrictive needs can be met via associated data filtering
There are other principles as well, but these are the ones I keep having to remind myself of.
I think that what I I have suggested above is very unrestrictive. We let evidence be the type of things that they are (PreservedSpecimens, Individuals, StillImages, SoundRecordings, DNA sequences, etc.). We don't determine their type by what we want to use them for. That was the mistake that I made in the Biodiversity Informatics paper. If we follow this approach, then a StillImage can fill any role that we want: evidence that an Occurrence happened, information to support an Identification, a character for a visual key, a logo, etc. We let it fulfill those roles by giving it an identifier and connecting it to other resources using appropriate terms (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
I think maybe so. Maybe the appropriate course of action here as well is to let people try different approaches out and if they turn out to work and be needed, then we talk about applying them to Darwin Core.
Ultimately, I think people will use it in accordance to what terms are nested within it -- which is why I think it's important to have this conversation we're having now.
As I indicated at an earlier time, I think that there are very few terms that should be properties of Individual since it is primarily a node that connects Occurrences to Identifications (and I guess now to derived tokens).
Aloha, Rich
Looking forward to responses! But I don't think development of these ideas should hold up the proposal for the class Individual, which can stand on its own with its current (revised) definition. Steve
.
Rich, I have done a couple read-throughs of your posts and I had two immediate comments. The first is that I think we want to accomplish many of the same things here and that the problem really is what we want to call things, not what we want to accomplish. So I'm encouraged by that. I think that what I need to do is to get a piece of paper out and try to map out what you are saying and what I'm saying. I think they will turn out to be mostly congruent but with different labels.
The second thing is that the point of recounting that story had nothing to do with the rank of the tree (species, subspecies, or whatever). I have no opinion on lumping, splitting, or whether species, subspecies, etc. actually exist or not or whether it is better to call something a species, subspecies, or variety. I really don't care what kind of name you apply to the whole organism. The point I was trying to make with the story was that on the scale we've been talking about from the entire biosphere to populations to individual organisms to parts of organisms to molecules, the individual organism is the point at which we no longer have to worry that further subdivisions might not share a common Identification. That is what I'm saying is "special" about the whole organism level (vs. parts). If you known that pieces came from the same whole organism, then you can be confident that an identification that is assigned to any of the pieces down to any level of further subdivision will be the same as an identification assigned to any other piece. Thus it is superfluous to assign separate identifications to every piece when you can simply assign a single identification to the whole organism and infer that that identification applies to all of the pieces. This is assuming that you have all of the pieces from that organism. If you have some of the pieces and somebody else has some of them, of course the two of you would assign separate identifications to your sets of pieces (unless you had synchronized databases that "knew" that you were both talking about the same organism - one of the points of having an identifier for individual organisms is so you can do that). I think you pretty much said the same thing below using different words.
I'm going to digest what you wrote for a while before I make further comments. Steve
Richard Pyle wrote:
Hi Steve,
I've finally had time to carefully read your recent series of emails on the acceptible scope of "Individual".
It has become somewhat apparent that we each support the establishment of the class "Individual" in DWC for different reasons, as evidenced by our different perspectives on what the acceptable scope of an "Individual" can be. I tend to think of "Individual" in the context of the ASC model's "BiologicalObject"; whereas you tend to see it more in terms of an "organismal" individual.
DwC began as a very-much PreservedSpecimen-oriented exercise. In order to include non-PreservedSpecimen instances of biodiversity data, the attributes of PreservedSpecimen were largely folded into the core class "Occurrence". I am a HUGE fan of broadening the scope of data that can be represented and exchanged via DwC, so I mostly saw this as a Good Thing. But I always had a pang of apprehension for representing PreservedSpecimens as Occurrences, because whereas both HumanObservations and PreservedSpecimens bear Occurrence-related information, and this Occurrence-related information is one of the most popular uses of DwC content (e.g., maps, modelling), PreservedSpecimens are much more than "Occurrence". Things like DNA sequences, morphological characteristics, preservation methods, storage details, loan information, and so on are all kinds of information that people holding the data associate with a PreservedSpecimen and share via, but it seems somewhat convoluted to represent these as attributes of an Occurrence.
I had supported the notion of a class "Individual" in large part to serve as a conceptual object on to which many of these things would be more appropriately attached as attributes than to Occurrence. My concern now is that the pendulum is swinging too far in the other direction. In otherwords, the move from supporting PreservedSpecimen data almost exclusively, to supporting more general biodiversity data, may be swinging further into a realm where it fails to support Specimen data adequately. As I said, I am very much a supporter of "big tent" DwC, and I would hate to see objects in DwC scoped in such a way that it unnecessarily excludes content representtion.
So I guess what I'm trying to say is, that the less the proposed class "Individual" can solve what I see as problems with DwC, the less supportive of it I become.
Before I get into the nitty gritty, I want to dispense with your "splitter" example. "Splitters" work at the rank of species every bit as much (even moreso) than at the rank of subspecies. There are analagous stories where the hyper-splitter would treat different parts of the same organism as different taxa at the rank of species. My point is, this story does not, in my mind, in any way support the exclusion of "Individuals" being identified to taxa at ranks above or below the (yes, I'll say it) "arbitrary" rank of species. As far as I'm concerned, limiting an "Individual" to be only those things we can confidently assign to a taxon at the rank of "species" is a non-starter. I could fill this email with reasons why, but I think I've already done that in previous emails, so no need to repeat here.
But I do concede there is a rational basis for not treating "parts" of an organism as distinct Individuals. I'm not yet completely convinced, however. To be persuaded that subcomponents (parts) of a single "organism" should not be represented through records of the proposed DwC "Individual" class, I'll need to believe that the potential harm/confusion in doing so would (still not clear on what that is) cannot be easily mitigated by filtering with a "individualScope" property.
OK, so I'll try to address each of your reasons why you think that the scope of instances of the proposed "Individual" class should not include units below a "single organism".
if you consider the comment, which describes the primary function of Individual: "Instances of this class can serve the purpose of connecting one or more instances of the Darwin Core class Occurrence to one or more instances of the Darwin Core class Identification" it becomes clear that making parts of organisms Individuals defeats this primary purpose for the term.
I'm not sure I agree with that last statement. In other words, I don't see how the purpose of "Individual" is defeated if the lower limit of the scope of "Individual" is a whole organism.
Setting aside the cases where "whole organism" can be a bit ambiguous (corals, sponges, fungi, etc.), suppose we only have a preserved part of an organism -- a Herbarium specimen, for example. It's common practice to have multiple samples of the same plant preserved as different PreservedSpecimens, sometimes housed in different institutions. A large problem that several initiatives (ADBC, BiSciCol, Virtual Herbarium, etc.) are trying to solve, is the problem of linking this disparate PreservedSpecimens (as well as the tissue samples derived therefrom) together. Different collections that house multiple specimens from the same individual plant (but don't yet realize it), would presumably each establish an instance of "Individual" to represent their specimen data via DwC. Thus, each of the indivudal PreservedSpecimens would have its own unique value of dwc:individualID. The question then becomes, how do we aggregate these instances of Individuals to represent the "same thing"?
In my way of thinking, where "Individual" is functionally equivalent to the ASC model's "BiologicalObject" (I'm waiting for Stan to jump in and tell me I'm wrong on this), then the original dwc:individualID records would continue to exist as their own distinct record, with dwc:individualScope="PartOfOrganism", with their own distinct associated data for preservation method, linked photos, etc., etc. They would be aggregated by the establishment of a new instance of "Individual", with its own value for dwc:IndividualID, with dwc:individualScope="WholeOrganism". The various Individual instances where dwc:individualScope="PartOfOrganism" would be aggregated when they each establish a "isPartOf" or "derivedFrom" relationship with the single Individual instance where dwc:individualScope="WholeOrganism". The same model could apply to tissue samples, and other derived bits of a whole organism. As long as the dwc:individualScope value is properly applied, then it should be easy to apply appropriate reasoning logic. No?
How, then, would you represent this sort of information if the class Individual were not allowed to be applied to less-than whole organism instances? I gather that the dwc:individualID values established by the different collections for parts of the same whole organism would each effectively refer to the same whole organism, so you would link them together via "sameAs" relationships?
The major selling point for having Individuals at all is to get out of the business of applying determinations to all of the pieces of evidence such as specimens, images, sounds, etc. that get collected from the same biological individual through multiple Occurrences.
For me the main selling point of the Individual class is to remove information that does not intrinsically belong to an "Occurrence" out of that class, and into a more appropriate class.
This has the benefit that if one applies an Identification to the Individual, all physical and information resources that are derived from the individual automatically get associated with the Identification and hence the taxonomic informations referenced by the Identification. If we call preserved specimens that are pieces of organism Individuals having a value of individualScope="part", then do we do the same thing to them as we do with Individuals at higher levels, namely apply Identifications to them?
If appropriate, yes. By "appropriate", I mean if you are a herbarium, and have a specimen in your collection, and you don't know if other specimens from the same individual whole plant exist in other collections, then you assign it an individualID, and scope it as "PartOfOrganism". You attached a taxon Identification to it (of course), because you have nothing else to attach the Identification to. If later it is discovered that another specimen in another herbarium had a different dwc:individualID assigned to it (with it's own Identification), then you establish a semantic link between them (either by aggregating them under a new Individual instance with scope "WholeOrganism", or by "sameAs" relationships as I imagine you would suggest). In either case, you've got two Identification instances applying to the same WholeOrganism, which have exactly the same relationship to each other as any Individual instance with more than one Identification. That is, the Identifications either compete with each other (if different taxa are implicated), or they reinforce each other (if the same taxon is implicated). Using my approach (establishing a new Individual instance with scope "WholeOrganism"), it's fairly easy to rationalize, because you simply impose the logic that parent Individual instances scoped as "WholeOrganism" inherit the Identifications of their constituent parts, and treat them accordingly.
So, where might it not be appropriate? Well, suppose I collect a fish, and establish it as a WholeOrganism PreservedSpecimen instance of Indivdual. Then I derive from it a tissue sample, that I assign a new Individual instance for, with scope "PartOfOrganism". In that case, the child would probably not receive its own Idientification instance it all; rather, it would inherit the Identification instance from its parent. But then suppose I send that tissue off to Kansas, where it is accessioned in the tissue repository there, and then sequenced. Suppose the sequence then yields a competing Identification, different from the one assigned to the WholeOrganism. What I want to have happen is that this competing Identification instance becomes known to me, the holder of the WholeSpecimen. Conversely, if an expert re-identifies the WholeSpecimen, I would like to see that new Identification instance transferred to the derived Individuals that are "PartOfOrganism".
I *think* I understand how you would manage these things if instances of the class "Individual" were not allowed to apply to anything less than a WholeOrganism, but it would be better if you described it in your own words.
If so, then we are back in the business of assigning Identifications to all of our derivative resources rather than the biological individuals from which they came.
I don't think so. A photograph and a DNA sequence are *not* individuals. They are reflections of individuals. Very much like morphological character states scored for a particular WholeOrganism are not Individuals. These are clearly different classes of things, because they are not formed of physical biological material. The "essence" that unites everything from a population to a single cell extracted from a multicellular organism is that all of them represent biological material. The distinction between "WholeOrganism" and "PartOfOrganism" is reasonably clear in most cases, but not all cases. And to me, it seems to be a lesser offense in such cases to have to decide arbitrarily whether something falls into one of two different classes of thing, vs. whether it gets scores as one of two alternate scope terms (e.g., "WholeOrganism" vs. "PartOfOrganism").
If we just say that we'll skip assigning separate Identifications to the derivative resources, then we have something that doesn't fit the functional role for which Individual was designed.
That assumes that the *only* functional role of an Individual is to join an Occurrence to an Identification. As I have described above and elsewhere, I do not see this as the *only* functional role of an Individual.
In that case an "Individual" which is an organism part is such a different thing that one might as well call it as something else (i.e. a PreservedSpecimen).
I don't think "PreservedSpecimen" is the appropriate alternative. This term can certainly apply to parts of an organism as well as whole organisms, etc. I think the alternative to including parts within the scope of Individual is to establish something new, like "DerivedIndividual", or "IndividualPart". But like I said, it seems dangerous to me to establish a new class for something that transitionally overlaps with another class. There is no overlap between the scope of "Taxon" and the scope of "Location". Indeed, I can't think of a single other case among the DwC classes where one would have to think carefully about which class a particular data belonged. But if you wanted to treat Populations through Whole organisms as one class, and derived components of Whole organisms as a separate class, I can think of many examples where there is potential overlap between the two.
The case of a whole organism (live as a LivingSpecimen or dead as a PreservedSpecimen) is different because in that case we would have a single resource serving as the evidence (the whole organism itself).
Evidence of what? Occurrence? I guess this comes back to my original point, and my reason for supporting an Individual class, which is that specimens serve the function of much more than evidence of occurrence. (So do images and HumanObservations and most other things of that sort -- but that's a topic for another thread).
By definition, there can't be many of those (there would just be one) and it would already have an Identification assigned to it, because it is the same Individual that it is providing evidence for. So there is no superfluous assignment of Identifications in that case.
In principle, I tend to agree -- but as we have dicussed before, DwC is an exchange standard, and as such necessarily serves as a compromise between the way data "are", and the way data "ought to be".
I have had the tendency of thinking that the tokens supported the Occurrence, but there does not need to be just one purpose for the token. They also support the existence of the Individual.
Yes, exactly!
This should probably make you happy, because the pieces of the Individual (preserved specimens, tissue samples) would be derived from the Individual.
Yup! :-)
I have created a number of similar charts showing how these relationships could apply to various types of tokens:
I'll need to digest these some more before commenting.
I guess I'm still having difficulty understanding how you envision placing properties/attributes of tokens into records represented via DwC. I'll need to spend some more time thinking through what a token ins, ahow it maps to fields and tables in my database, and how I structure their specific properties into DwC terms.
But what I'm not sure I understand is how any of this supports your contention that the scope of "Individual" should not be allowed to apply to parts of a WholeOrganism.
This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be re-assigned to Record-level terms. Was there some reason this isn't appropriate?
I think it is appropriate because they should be usable with at least two classes: Individual (for living specimens) and Occurrences (e.g. preserved specimens, images)
I think this gets to the heart of the difference you and I have in viewing the function of "Individual". My *primary* reason for supporting it is to get properties/attributes of PreservedSpecimen *out* of the Occurrence class.
And, a mechanism to track series of "derived from" Individuals. The ASC model covered this, I think (right, Stan?)
I didn't see it in the flow chart, but it could be there somewhere.
I don't have the chart in front of me now, but I'm fairly certain that BiologicalObject can be a child of another BiologicalObject, and the scope included things like Lot, individual whole organism, part of organism, etc.
The risk that we make the definition of Individual so broad that it can't perform any of the functions it was defined to serve. We've already lost one of them (the ability to infer
duplicates) when I agreed to the broader definition, but that's the subject of another post.
These are some principles that I always try to keep in mind when discussing these things:
- DwC is a data exchange standard, not so much a physical data model.
- There is a necessary balance between structuring DwC around how data
actually exist in content-provider databases, and how data *should* be represented in a normalised world
- When in doubt, DwC should be accomodating, rather than restrictive --
especially when more restrictive needs can be met via associated data filtering
There are other principles as well, but these are the ones I keep having to remind myself of.
I think that what I I have suggested above is very unrestrictive. We let evidence be the type of things that they are (PreservedSpecimens, Individuals, StillImages, SoundRecordings, DNA sequences, etc.). We don't determine their type by what we want to use them for. That was the mistake that I made in the Biodiversity Informatics paper. If we follow this approach, then a StillImage can fill any role that we want: evidence that an Occurrence happened, information to support an Identification, a character for a visual key, a logo, etc. We let it fulfill those roles by giving it an identifier and connecting it to other resources using appropriate terms (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
I think maybe so. Maybe the appropriate course of action here as well is to let people try different approaches out and if they turn out to work and be needed, then we talk about applying them to Darwin Core.
Ultimately, I think people will use it in accordance to what terms are nested within it -- which is why I think it's important to have this conversation we're having now.
As I indicated at an earlier time, I think that there are very few terms that should be properties of Individual since it is primarily a node that connects Occurrences to Identifications (and I guess now to derived tokens).
Aloha, Rich
Looking forward to responses! But I don't think development of these ideas should hold up the proposal for the class Individual, which can stand on its own with its current (revised) definition. Steve
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt
.
I have done a couple read-throughs of your posts and I had two immediate comments. The first is that I think we want to accomplish many of the same things here and that the problem really is what we want to call things, not what we want to accomplish. So I'm encouraged by that. I think that what I need to do is to get a piece of paper out and try to map out what you are saying and what I'm saying. I think they will turn out to be mostly congruent but with different labels.
Yes, I have the same sense. Maybe if we come to a consensus, we can create a summary description of everything, for the benefit of those who do not have time to read and digest all these posts.
The second thing is that the point of recounting that story had nothing to do with the rank of the tree (species, subspecies, or whatever).
OK, sorry. The parts that threw me on that were:
If we call mixtures of biological individuals of different lower-level taxa Individuals, then we loose the certainty of that all instances of Occurrences arising from that Individual are the same taxa.
and
I have accepted the broadening of the definition to include "taxa" at any level, but I'm thinking that may have been a mistake.
I (mis)interpreted this to mean that we shouldn't try to regard organisms identified to infra-specific ranks as "Individuals". Sorry that I misunderstood.
The point I was trying to make with the story was that on the scale we've been talking about from the entire biosphere to populations to individual organisms to parts of organisms to molecules, the individual organism is the point at which we no longer have to worry that further subdivisions might not share a common Identification.
OK, I understand that, and despite my seemingly passionate please to maintain the scope of "Individual" to include sub-WholeOrganism units, I'm keeping an open mind on that. I (maybe) could be persuaded that the "Individual" ends with a single WholeOrganism, and parts may be dealt with in some other way ("associatedParts"? "associatedSubunits"? As members of the Individual class?)
I guess part of the passion of my fight stems from my hope that the pendulum doesn't swing so far that biological collections objects can no longer be represented as records unto themselves through DwC (as opposed to only represented as attributes of some sort of semi-abstract unit of "Individual" or "Occurrence" that isn't directly represented in many/most real-world databases).
That is what I'm saying is "special" about the whole organism level (vs. parts). If you known that pieces came from the same whole organism, then you can be confident that an identification that is assigned to any of the pieces down to any level of further subdivision will be the same as an identification assigned to any other piece.
I agree that along the continuum from "population" down to "single molecule extracted from an organism", there is a pretty clear (though not perfect) inflection point at the level of singel whole organism; and I can see reconising that in some way in DwC. But I just have this sense that the demarcation should be at the level of our controlled vocabulary for something like "individualScope", rather than what is considered "in" vs "out" of scope for the class "Individual".
Thus it is superfluous to assign separate identifications to every piece when you can simply assign a single identification to the whole organism and infer that that identification applies to all of the pieces.
Yes, but DwC is, by its nature, denormalized. There is unnecessary repetition of information built into it. As long as Identification instances have proper GUIDs (or even LUIDs within a defined dataset), which DwC encourages via "identificationID", then I see no reason why an identification instance cannot be simultaneously shared by instances of "Individual" at the scope of "WholeOrganism" and below. In fact, it logically works above the scope of "WholeOrganism". For example, our fish collection assigns catalog numbers to "lots", which contain 1...n WholeSpecimens (or sometimes parts of a whole organism). The Identifications apply to the lots. So if there are 15 whole specimens (dwc:individualCount=15) within a Lot identified as "Aus bus", then the identification is implied for each of the 15 individual whole organisms. Sometimes we have reason to recognize attributes of individual whole organisms (e.g., individual lengths, or other morphological characters), in such cases, I would imagine establishing child "Individual" instances (each with dwc:individualCount=1), but I don't see why I would have to replicate the Identification 15 times (each with a separate identificationID). I would rather have the 15 wholespecimens inherit the single Identification instance thrgough an appropriate relationship link between parent "Lot" and child "WholeSpecimen".
I think there is a difference, though. Whereas in the case of WholeOrganism and its derived parts, the inheritance of Identification instances is bidirectional. That is, if a tissue sample is sequenced, and evidence from that sequence leads to an Identification, then surely the WholeOrganism Individual instance would inherit this Identification.
However, at scopes of Individual broader than "WholeOrganism", the inheritance of Identification is unidirectional. That is, a child can inherit the Identifications of the Parent, but the parent cannot necessarily inherit the Identifications of the child. This it the point that nags at the back of my brain and tries to persuade me to throw in the towel and agree with you that "Individual" does not extend below the level of "WholeOrganism".
This is assuming that you have all of the pieces from that organism. If you have some of the pieces and somebody else has some of them, of course the two of you would assign separate identifications to your sets of pieces (unless you had synchronized databases that "knew" that you were both talking about the same organism - one of the points of having an identifier for individual organisms is so you can do that).
The problem is, at least for the billions of specimen records already extant, this information is not known beforehand. Obviously, going forward we want to capture this information at the outset (ideally, in the field at the time of specimen acquisition). Indeed, this is one of the *key* goals of the NSF BiSciCol grant -- to facilitate exactly this.
So, what I think it all boils down to is, when we discover multiple specimens that each represent a part of a single "WholeOrganism" individual, how do we map the pre-existing dwc:individualID values assigned to each of the multiple parts to each other?
In my world-view, those parts are themselves individuals, so we would generate a new "parent" instance of Individual, with its own dwc:individualID and scope of "WholeOrganism", linked to all the parts via the appropriate parent/child semantics.
If I undertsand your world-view on this, all of those existing dwc:individualID values would have been implicitly established *for* the WholeOrganism (of which only a part is represented as "evidence" in a herbarium), and thus they would all be aggregated via some sort of "sameAs" semantic relationship.
Is that a fair distinction between our respective perspectives?
I'm going to digest what you wrote for a while before I make further comments.
Likewise for me on your earlier posts.
Thanks for keeping this discussion interesting and thought-provoking!
And, apologies to everyone for the volume of email on this topic...not that any of those people have got this far into this message.....
Aloha, Rich
When you guys finally agree (or not) here are some questions as a developer that I would ask you each. They are not that different from the issue I raised timidly about tying the semantics of "Individual" to distinguishing the "origin" of an aggregation. As always, pardon any misuse of biological terms.
1. Neglecting-if one can--issues about colonial organisms--is a lichen one individual or two? 2. In a way consistent with your answer to 1, counting the human obligate symbionts such as gut bacteria, is a human one individual or many? 3. If you feel no compulsion to be consistent in answering 1 and 2, will the addition of class Individual to DwC require further properties to determine which of your inconsistent uses is in play, in order that semantic integration about data on Individuals not become logically inconsistent? 4. In the case of lichens or other(???) "compound" organisms whose taxon name is conventionally given by the name of the fungal component, are new DwC terms needed to distinguish whether an Individual of that name is a lichen or a fungus?
(As far as I can tell from a bit of browsing on the web, the current answer to the conundrum of 4 seems to be that the distinction is made mainly in the dataset metadata, declaring in some way that the dataset is of fungi or is of lichens. This doesn't seem very satisfactory in a world where aggregators may isolate data from the datasets. It probably imposes higher-than-record-level provenance requirements on the integration.)
Bob Morris
Thanks, Bob. This is helpful.
- Neglecting-if one can--issues about colonial organisms--is
a lichen one individual or two?
If my understanding of lichen is correct, I would answer two; one for each taxon represented.
- In a way consistent with your answer to 1, counting the
human obligate symbionts such as gut bacteria, is a human one individual or many?
This is a problem that is so fundamental that I don't think we can address it through the DwC class individual (I thought about bringing it up at one point during the thread, but decided not to go there...). I would wager that something on the order of 90% of the biological cells represented by *any* Preserved Specimen are bacteria (some were there at the time of capture, some came later). If we took this into account, then every non-bacterial specimen we have in our collections could be identified only to the level of "Life". Not very practical. Nor is it very practical to take steps to remove all the bacterial cells from the non-bacteria specimens in our collection.
I think the best way to characterize it is that the "Individual" is the organism "of interest". This is not to say that parasites are not interesting. But when they are encountered on living or preserved specimen hosts, and are of interest to a parasitologist, then I would assume a new Individual would be established (whether or not the parasite was physically separated from the host). We have this a lot, for example, in shark specimens -- which have many internal (macro) parasites that are of interest to the respective biologists. Obviously, when a new Individual instance is established for a parasite (or other commensal organism), thent he appropriate semantic relationship would be establisshed between it and the host Individual.
So....from my perspective, the answer to your question is that a human is one individual, with a taxon Identification of "Homo sapiens Linnaeus sec. Linnaeus", which is implied to exclude all of the organisms physically associated with that human. As those physically associated organisms are recognized as something of interest, and Identified to a taxon other than "Homo sapiens Linnaeus sec. Linnaeus", a new instance of Individual is established for it/them.
Does that adequately address your question?
- If you feel no compulsion to be consistent in answering 1
and 2, will the addition of class Individual to DwC require further properties to determine which of your inconsistent uses is in play, in order that semantic integration about data on Individuals not become logically inconsistent?
If my answer above for #2 is satisfactory, you can count on me using it consistently.
- In the case of lichens or other(???) "compound" organisms
whose taxon name is conventionally given by the name of the fungal component, are new DwC terms needed to distinguish whether an Individual of that name is a lichen or a fungus?
Hmmm....isn't that a function of taxonomy? If more than one taxon of interest is involved, then I would think that a corresponding number of Individuals would be established, with appropriate semantic relationship(s) established between/among them.
(As far as I can tell from a bit of browsing on the web, the current answer to the conundrum of 4 seems to be that the distinction is made mainly in the dataset metadata, declaring in some way that the dataset is of fungi or is of lichens. This doesn't seem very satisfactory in a world where aggregators may isolate data from the datasets. It probably imposes higher-than-record-level provenance requirements on the integration.)
Maybe you could describe the situation better. My understanding of lichens is that we're dealing with composite organisms involving more than one taxon, in which case our definition of "Individual" would require more than one instance of such -- one for each discernable taxon (of interest) in the aggregate. The aggregate specimen can still be treated as a single Individual (with either an appropriately broad taxonomic identification, or no taxonomic identification at all). I suppose the situation is analagous with Corals, many of which include symbiotic photosynthetic zooxanthellae (algae) in their tissues. Many coral specimens in Museums are dried skeletons (hence, sans zooxanthellae); but some are alcohol-preserved (hence, taxonomic aggregate). I see no problem in establishing the coral as an Individual, with an appropriate coral taxon identification, and leaving it as that up until the point that someone has an interest in the zooxanthellae cells also contained in the sample -- at which point a new Individual instance is established.
I don't think the coral curators of the world will cry foul; but will the lichen curators of the world be resistent to the idea of establishing two individuals for each lichen specimen that consists of two taxa?
Aloha, Rich
It's a little technical, but I have written a document at http://paulmurraywork.wordpress.com/2010/11/08/it-depends-on-what-you-mean-b...
It demonstrates a method by which properties reified into property objects (such as dwc:ResourceRelationship individuals) can be turned into properties and then reasoned over, and by which the existence of properties can be made dependent on the kinds of things other properties apply to.
------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
------
In filing away messages, I noticed that a response was (I think) requested from me here. I am going to dodge answering the question for two reasons: first, it wasn't my preference to address the issue of heterogeneous composite samples through defining them as Individuals and second, I think Rich's answer was as good as any I could give. However, in typical evil teacher fashion, I'll answer the question with a question. In the existing system where Individuals are denormalized out of the picture and Identifications are applied directly to Occurrences rather than Individuals, did anyone ask the same questions about Occurrences? If one is recording an Occurrence of a lichen, human, or coral is it required that the Identification make the distinctions you are asking for here? I think we have been satisfied with an level of sloppiness in semantics in that case that is equal to what we may have to put up with if we allow Individuals to exist as a class.
Steve
Bob Morris wrote:
When you guys finally agree (or not) here are some questions as a developer that I would ask you each. They are not that different from the issue I raised timidly about tying the semantics of "Individual" to distinguishing the "origin" of an aggregation. As always, pardon any misuse of biological terms.
- Neglecting-if one can--issues about colonial organisms--is a lichen
one individual or two? 2. In a way consistent with your answer to 1, counting the human obligate symbionts such as gut bacteria, is a human one individual or many? 3. If you feel no compulsion to be consistent in answering 1 and 2, will the addition of class Individual to DwC require further properties to determine which of your inconsistent uses is in play, in order that semantic integration about data on Individuals not become logically inconsistent? 4. In the case of lichens or other(???) "compound" organisms whose taxon name is conventionally given by the name of the fungal component, are new DwC terms needed to distinguish whether an Individual of that name is a lichen or a fungus?
(As far as I can tell from a bit of browsing on the web, the current answer to the conundrum of 4 seems to be that the distinction is made mainly in the dataset metadata, declaring in some way that the dataset is of fungi or is of lichens. This doesn't seem very satisfactory in a world where aggregators may isolate data from the datasets. It probably imposes higher-than-record-level provenance requirements on the integration.)
Bob Morris
Crap. I accidentally sent that before I was tready to send it. I'll continue where I ended:
The risk that we make the definition of Individual so broad that it can't perform any of the functions it was defined to serve.
...or so narrow that it can only perform a fraction of what we'd like it to serve.....
We've already lost one of them (the ability to infer duplicates) when I agreed to the broader definition, but that's the subject of another post.
Huh? I don't understand how we've lost the ability to infer duplicates, and what aspect of the broader definition caused us to lose it.
I think that what I I have suggested above is very unrestrictive. We let evidence be the type of things that they are (PreservedSpecimens, Individuals, StillImages, SoundRecordings, DNA sequences, etc.). We don't determine their type by what we want to use them for. That was the mistake that I made in the Biodiversity Informatics paper. If we follow this approach, then a StillImage can fill any role that we want: evidence that an Occurrence happened, information to support an Identification, a character for a visual key, a logo, etc. We let it fulfill those roles by giving it an identifier and connecting it to other resources using appropriate terms (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
I guess what I still don't quite understand is how we represent the attributes of the "evidence" in DwC.
I'm out of time for now. Will address the other emails later.
Aloha, Rich
participants (4)
-
Bob Morris
-
Paul Murray
-
Richard Pyle
-
Steve Baskauf