I got Rich's several emails just before I left work and had several hours in the car to think about it before responding (probably always a good idea - the hours of thinking, not the car ride). Since I got home, I've read the subsequent emails from various people and they confirm my "car thinking". I get the point Rich is making and after pondering, I'm now OK with just leaving the definition at "taxon". As in the other classes he mentions (Location,time) we have the freedom to define the scale and I suppose we should have that here as well.
What we need to realize is that in doing this, we are establishing establishing dwc:Individual as the de facto "aggregation" unit of the sort we have talked about previously. I suppose that is OK as long as we provide sufficient means for humans or semantic reasoners to be able to know what they are getting information about when they retrieve metadata about an Individual. That can be done for humans by making appropriate comments in the proposed dwc:individualRemarks . For machines, I have been re-thinking the idea of having dwc:individualCount being a property of an Individual. In an earlier post, I suggested that it should remain a property of Occurrences, since the number of Individuals can change over time (think wolf pack or plant deme over time). However, in the current context, something like dwc:individualCount (if not individualCount itself) that is a property of Individual could have controlled values of "1" and ">1". If the value is "1", then the Individual is a biological individual (still subject to some uncertainty of definition in the case of clones) and we can safely assume that if taxonRank has a high level value then the determiner holds out the possibility that the same named biological individual (i.e. possessing a certain GUID) could be identified to a lower level. If the value is ">1" then the Individual is an aggregation, bag, cluster, or whatever we want to call it. The biological individuals involved could have a common identification on any level from "life" for a photo of a eubacterium and archaebacterium, to variety, subspecies, forma or whatever. In the case of ">1" it would be understood that it may or may not be possible to refine the taxonomic level of the Identification without creating new Individuals from subsets of the original one. If the Individual had no associated Identification, then it would just have to be assumed that there were more than one living thing included in the Individual and nothing more than that could be inferred about what kind of taxa were included (e.g. the raw marine samples right out of the scoop, unanalyzed satellite photo, etc.). I suppose the value could even be "0", indicating that an unsuccessful effort was made to find the taxon linked through the identification during the Event associated with the Occurrence (something that came up earlier).
The other issue would be how to indicate that one of these "aggregate" (individualCount>1) Individuals was segregated into several Individuals at a lower taxonomic level (e.g. partially sorted lots of marine organisms get more sorted, fossil aggregations get separated into individual fossils, an image of a forest has individual trees delineated). Rich has stated that this would be necessary any time subsets of the original Individual were given divergent Identifications and I agree totally. Assigning new GUIDs to these new Individuals would be no problem and I suppose Pete's suggestion of using "hasPart" and "isPartOf" could be used to establish the relationships between the original Individual and its "children".
I feel like there would be quite a bit of work that would be required to make this all work, but I guess I now believe that it COULD work, so I'm going to officially drop my objection to just inserting "taxon" into the definition as was proposed since what other have said have swayed me to believing that the broader definition won't stop me from using Individual as I had originally intended. If the proposed addition goes through, we need to have a really good Google Code entry that summarizes the understanding that we have come to in this discussion (if we have come to one! :-).
Steve
Richard Pyle wrote:
Hi Dean,
I agree with you -- but the question is, would you want to require that your "rough sorts" create Individual instances on a 1:1 basis with taxa at the rank of species or lower? I was using the "Animalia" example as an extreme case. In the real world, it's more like you describe. That is, it's usually realtively easy, and extremely useful, to pre-sort such lots down tot he level of, say, family, and then generate instances of Individuals accordingly. But the implication of the "species" requirement would be that you'd have to guestimate how many distinct species are represented in the lot, then generate an Individual instance for each one of them. This, I think, would not be so easy in many cases.
I guess in your case you've divided Indentifications into two subclasses: "real", and....err...."artifical"? "temporary"? "provisiolnal"?
I'm not sure that's a solution that will work for most data providers/consumers (i.e., defining subclasses of Identifications), so it may not be ideal to "go there" in DwC.
I guess what I'm still struggling with is why species-level identifications are somehow fundamentally different from higher-rank identifications (fundamental enough that the scope of the Individual class is constained by them). Especially when it's easy enough to filter out the higher-than-species-rank Identifications in cases where you want to exlcude them.
I think what we're circling around is something to the effect of: "An individual should be circumscibed/scoped in such a way that it can be Identified to a single Taxon". I'm just saying there should be no constraints on the scope of that taxon.
For example, let's take the example of a lot that includes ophiuroids, gastropods, isopods, algae, and fish larvae. I think there is consensus that we would *not* want to establish a single Individual instance that has multiple and concurrently legitimate Identification instances applied to it. In other words, we don't want an Individual instance that has one Identification instance for "ophiuroid", one for "gastropod", one for "isopod", one for "algae", and one for "fish", all simultaneously and legitimately applied to the single Individual instance. Stated another way, we don't want to allow for an Identification instance to apply to only part of an Individual -- an Identification instance should apply to the *entire* Individual.
So, in this example, we either would want no Identification instance at all applied to the aggregated lot, or if we really wanted an Identification, the best we could do (depending on the kind of algae) would be "Eukarya". But if you want to acknowledge that there are five different subsets within that lot, then you generate five "child" Individual instances, each of which has a single legitimate Identification instance to its respective higher-rank taxon.
So, I think what we're really after here is that, when there are multiple Identification instances linking an instance of Individual to more than one distinct Taxon, these should always be mutually exclusive -- they should never be simultaneously legitimate (i.e., they should never apply to different "parts" of a single Individual instance).
Looking at the big picture....we seem to agree that the maximum scope of an Event instance in terms of place is "Planet Earth" (for now, anyway); the maximum scope for time is "any window of time since there has been life on the planet" (i.e., up to 4 billion years or so). We have discussed the ideal of establishing scoping metadata for both place (e.g., coordinateUncertaintyInMeters, coordinatePrecision), and Time (not yet defined in DwC). I think the same logic should apply to Indetifications. That is, the scope of Taxon that can be assigned to an Individual via an Identification should span anything from "forma" (or whatever the most precise taxon identification there is), all the way up to Domain (or even simply "Life"). DwC:taxonRank already serves as a rough guide to defining the scope of any particular taxon Identification (with apologies to Nico-the-other-one).
So the point is, I think the general rule for DwC should be "maximum flexibility for allowing breadth of usage, with sufficient metadata to allow for filtering to more restrictive usage".
I'm guessing that most of that doesn't make much sense....sorry for that.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Dean Pentcheff Sent: Wednesday, November 03, 2010 9:52 AM To: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] tdwg-content Digest, Vol 20, Issue 17
Leveraging off my earlier toss-in of the parent-child collection scheme, let me toss in this observation.
I'll preface it by saying that although it's a situation we deal with in reality, my gut impulse is that it should probably _not_ be accomodated by the "Individual" concept under development.
We have many records for un- or partially-sorted lots of marine invertebrate samples. Often we can make very rough determinations of what are in those lots (e.g., we can see that a jar contains ophiuroids, gastropods, sphaeromatid isopods, red algae, and larval fish). Critically, these are multiple particular and disjunct parts of the taxonomic hierarchy, not just a single "highest containing rank" determination.
It turns out to be super-useful to record that very rough determination because (as alluded to by Rich) we can then appropriately make that jar available to visitors seeking particular taxa (and save them the trouble of grubbing through shelves of jars where we already "know" there's nothing of interest to them).
Right now, we do _not_ conflate this rough determination with a Real Taxonomic Determination (RT and all that): they are two completely separate fields. So to find all the jars we know have ophiuroids, one does indeed have to search both the real taxonomic determination field as well as the rough-determination (text) field (if one wants to include unsorted lots in the quest).
I'm introducing this case more with the idea that it may usefully help define the outer limits for "Individual" -- something that the "Individual" concept should _not_ accomodate. I can't really wrap my head around how the developing "Individual" concept can usefully be mutilated to accomodate this case.
-Dean
Dean Pentcheff pentcheff@gmail.com dpentche@nhm.org
On Wed, Nov 3, 2010 at 11:53 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
> I think if I'm understanding what John wrote, > he was going to substitute "taxon" for "species > (or lower taxonomic rank if it exists)" with > the understanding that Individual is not > intended to be used for aggregates of > different taxa. That would solve this problem, right?
It depends on what you mean by "different taxa". If you are using the word "taxa" here to imply "species or lower ranks", than I don't think it would solve the problem. But if you mean it in a generic way, then I'm OK with that. By "in a generic way", suppose I had a trawl sample or a plankton tow sample that included unidentified organisms from multiple phyla, all of which are animals. I should not be prevented from representing this aggregate as an "Individual", with an identification instances linked to a taxon concept labelled as "Animalia". This means the contents of the Individual all belong to a single taxon (Animalia), and therefore it does not violate the condition excluding aggregates of different taxa. An instance of Individual so identified would be almost useless for many purposes, I agree -- but it's easy enough to filter such Individuals out by looking at dwc:taxonRank of the Taxon to which the Individual was identified. Also, it's not useless for all purposes, because a botanist would like to know that s/he doesn't have to look through that sample to find stuff of interest. I guess my point is, there should not be any rank-based requirement for the implied taxon circumscription of an "Individual". Rich
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .