Hi Steve,
First of all, thank you for taking the time to carefully articulate your perspective on this. As someone who has written more than a few "epic" posts to these lists, I am personally grateful for the careful explanation. I know many people don't have time to read long messages, but I think that the lack of explicit descriptions of things has led to confusion, which, in the long run, has cost us all even more time.
For much of your emails, I found myself nodding in agreement. I'm only commenting below on a few passages that caught my attention. I don't know how much your later posts supercede this one, but here are my comments to your first.
From your first post of Oct 15:
In the case of putting "dots on a map" to show the distribution of a species, the case is simple if the occurrences are specimens where the whole dead organism is collected. It is not so simple with other types of occurrences. Let me illustrate with an example. There is currently precisely one known individual of Crataegus harbisonii in nature. I have given this individual the URI http://bioimages.vanderbilt.edu/ind-baskauf/70905 . I have approximately 62 images of that individual at http://bioimages.vanderbilt.edu/ind-baskauf/70905.htm and http://www.cas.vanderbilt.edu/bioimages/species/crha2.htm . Each one of these images represents an occurrence in that I pressed the shutter on my camera at different times for each one.
Yes, technically, you could represent these 62 images as 62 separate Occurrence records (rather, as 62 separate events). But assuming the 62 shutter releases were all within a reasonable period of time (e.g., the same day), it would be perfectly appropriate to collapse these 62 events into a single event, which spanned in time from the first shutter-release to the last. The reason you would do that is that very, very few end-users would gain much wisdom about nature knowing the precise points in time that the organism occurred at that place. Most would be quite happy to infer that it also existed at the same place in-between the individual shutter-releases. Thus, representing a single event (anchored to the Occurrence) as a range of time from first shutter-release to last, would be adequate for essentially all use-cases.
I relaize that each image contains with it the metadata for its moment of capture, but of course that is metadata that applies to the image (evidence of the occurrence); not the occurrence itself (which can be safely flattened to a single occurrence).
For analagous reasons, many natural history collections aggregate multiple specimens of the same taxon collected at the same event into a single "lot", which gets a single catalog number, and is represented as a single occurrence. If I collect 100 individuals of [what I identify as] the same fish species at the same poison station, I will establish one specimen record, with individualCount set to 100, and represent it as a single Occurrence (even though technically, the 100 fish were captured at slightly different different times, and thus I could technically generate a different Event instance for each, and thus have 100 different occurrence records).
Ron Lance has collected tissue from this tree for grafting purposes and now has an occurrence with basisOfRecord="LivingSpecimen" in his arboretum in North Carolina. Andrea Bishop of the Tennessee Dept of Environment and Conservation has seeds collected from the tree - I'd call the collection of those seeds an occurrence record.
I would probably do the same, assuming they weren't collected at the same time (plus or minus a few hours to a few days) as the tissue sample. But in either case, the tree at the place and time is the occurrence; the tissue sample and the seeds are just evidence of the occurrence. If either the time is substantially different (certainly possible!), or the location is different (not likely, it being a tree), then I would see justification for treating it as two separate occurrence records, so we know that the tree was at that place at those two (reasonably separate) times. And, of course, we'd want to join the two Occurrences via individualID.
So my question to Marcus and others at GBIF is: how many dots will you put on your map for this tree?
My answer: one for each distinctly different moment in time that the tree was confirmed to be in that place (by whatever evidence). How does one define "distinctly different"? Well, I suspect that would be best judged by a biologist who could determine whether there is meaningful knowledge to be gained by knowning this tree was at this place at multiple points in time within the span of an hour, vs. multiple points in time within the spann of a day, or a month, or a year. Is diel variation important to document? Is lunar-cycle variation important to document? Is seasonal variation important to document? The answers to these questions would inform the decision of how many of these "potentially unique Occurrences" should be aggregated into a single occurrence, which would be represented by less-precision of time and/or place.
I anticipate that one response to this question will be to call each imaging bout one "observation" having a number of dwc:associatedMedia references. That collapses the number of occurrence records considerably, but not down to one.
It could be one. What is the time span of a "bout", compared to the time-span of the collection of all bouts combined? How important is it to resolve the time component of each bout individually, vs. aggregate them into a single bout, represented by a broader (less precise) window of time?
I took images of that tree on at least three separate instances over the course of a year and Ron collected his graft tissue years before that.
Sounds to me like four occurrence records; presuming there is value in distingushing the occurrence of the tree at the place at different times of the year.
There is simply no way to reduce the number of occurrences for this tree to one, nor should we want to.
There is a way -- if you don't care about seasonal variation, you simply define the Event as a span of time covering all four visits to the tree. You probably wouldn't want to do that; but you certainly could.
A possible use of multiple occurrence records (i.e. my first point above) of this sort might be to establish how long individuals of Crataegus harbisonii live and each occurrence record (whether separated by years or by the seconds between shutter clicks) is a part of the record that we should be able to (and want to) preserve. Another use would be to track a non-sessile organism (e.g. a whale) in both time and space. In that case, the record on a map for an individual would be some kind of curve rather than a dot. But in any case, recognizing the existence of an entity that I'm calling an Individual facilitates these broader uses of occurrence data and it's really hard for me to see how that is going to happen if we ONLY have occurrences as separate entities. Response Markus? How does GBIF deal with whale tracks or multiple banded bird observations for a single bird?
If I understand your overall point correctly, it is something like this:
Individuals potentially span multiple Occurrences. DwC uses individualID to link these multiple Occurrences together. However, there is no class for individualID, and hence no way to apply additional dwc metadata to the object represented by the individualID, other than through one or more occurrence records.
Is that about right? I mean, we certainly can provide an individualID as a component of dwc:occurrence; but we have no way with dwc of assigning metadata specific to that individual, except through a series of occrence instances.
(In the oversimplified examples I gave earlier, I applied a scientific name directly to an individual. In actual practice, I relate individuals to identifications and then relate the identifications to taxa.)
Good to know! This is, in my opinion, the right way to do it.
Again, to illustrate with a real-life example, when Bruce Kirchoff was developing his Woody Plants of the Southeastern US learning software, he asked a taxonomist to go through the images of mine that he was using for the project to verify that they were identified correctly. My old website just threw together all images of a particular species onto one page without regard to the individuals from which they originated (e.g. http://www.cas.vanderbilt.edu/bioimages/species/sarar3.htm and http://www.cas.vanderbilt.edu/bioimages/species/soam3.htm). It turns out that I had carelessly misidentified a vegetative Sambucus racemosa ssp. racemosa individual as Sorbus americana. The taxonomist asked me which of the various bark, twig, leaf, etc. images were from the same plant and the only way I could find out was through the laborious process of looking for images with similar time/date values and my hand written field notes. It was a nightmare finding all of the particular image records that needed to have their identifications fixed and then correcting them. On my new website (e.g. http://bioimages.vanderbilt.edu/metadata.htm, then click on Quercus chrysolepis), the images are connected to the individual from which they originated. If I discover by looking at a particularly informative image that I have misidentified the individual, I only need to add an updated determination (i.e. identification) to that individual's record and automatically all images from that individual are displayed with the correct name and are placed on the correct species page.
Well...I would counter that you could achieve the same thing by representing things via Occurrence, and then cross-linking those Occurences that represent the same individual by using the shared individualID. The only thing you don't have is metdata specific to the individual anchored directly to the individualID. Instead, dwc denormalizes this a bit and aggregates those individual-specific metadata to other classes (Occurrence, Idnetification, etc.)
It's not that I disagree that an "individual" is a useful class in the realm of biodiversity informatics. I also think there are a couple of important entities in taxon name/concept space that warrant their own classes. However, as John W. and Markus have both emphasized, DwC (necessarily) represents a compromise between a proper ontological mapping of the information classes, and a practical vehicle for information exchange amongst holders of biodiversity datasets. I find this constraining sometimes, but it helps when I remind myself of the following: DwC is a mechanism for exchanging data & metadata, not a database model or schema. As such, I think it covers most (but not all) of the need, with a reasonable (as opposed to normalized) set of classes and terms. If you have metadata for an individual, you can resolve (err...deference) that metdata by providing an appropriate individualID via dwc occurrence records.
So, to re-state -- I don't disagree with your premise that there logically ought to be a class for individual; I'm just not sure it is necessary for DwC at this time.
Having said that, and being a database-nerd with a tendency to hyper-normalize data models, I am mostly playing devil's advocate in my message here. In truth, I share your view that there should be (should have been) a class for Individual, and Occurrence should have simply been the union of an Individual and an Event. So....I reserve the right to stop playing Devil's Advocate, and join you in your efforts to make the case for an "individual" class. My only concern is that we may have different perspecives on how to scope "individual". In my mind, a better term would be "organism", rather than "individual"; because in my mind, once you allow a single coral head (as oppose to the individual polyps) to be represented as an "individual", you've just allowed for multiple "individuals" -- which opens the door to ever broader circumscriptions of multiple individuals (colony-->small group-->herd/school/flock-->population-->taxon concept).
I recognize that many "specimen-based" organizations aren't really going to care one whit about this. That's fine. In their databases and personal XML schemas they can ignore Individuals as it is their prerogative.
Actually, no we can't. Often is the case that a "lot" of 10 specimens is later determined to contain more than one taxon. In such cases, we *do* need to identify individuals, so we can separate the lot accordingly. I know it's not exactly the same as the examples you give, but fundamentally it's the same basic information flow: aggregated/abstract occurrence needs more precise recognition of individual organism.
But when we build RDF templates, I believe strongly that for the benefit of those of us who care about the broader applications of occurrences those templates should use individuals to connect (one or more) occurrences and (one or more) identifications. For those with a technical bent, you can see how I have done this for an herbarium specimen by looking at the page source RDF of the example http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf . For those of a non-technical bent, just look at the webpage that shows up when you click on the link. It looks just like any other web page for a specimen and you don't even have to know that the underlying RDF supports using Individuals as a grouping mechanism.
I guess my real question is: is the case for individual as a class because you cannot represent certain information within DwC (via individualID) at all? Or that you could do so more elegantly if individual was broken out as a class? I certainly agree that it would be more elegant to treat individual as a class; I'm just not convinced that the increase in elegance justifies the increase in complexity/normalization of DwC.
In summary, I think we need Individual as a DwC class to enable understandable rdfs:typing of records of individuals and to create a context in which instances of individuals can be placed (i.e. people would assign and use identifiers for individuals when they document occurrences). These instances (and their assigned URI GUIDSs) would allow for "connecting" identifications and occurrences in a more meaningful way. I am not suggesting that the occurrence be dethroned as the center of biodiversity records. Assuming that the xxxxID terms end up being moved out of the various classes and into the record-level terms area as was suggested recently, I think that there are really only about two terms that should be put into a new Individual class: the other new term I have proposed (individualRemarks) and establishmentMeans (but that is the topic of another email). It may seem odd to suggest a adding a class that has very few terms in it, but if you follow my reasoning above you will hopefully understand why I have done so.
OK, I guess I should have read this paragraph first -- it would have saved me a lot of typing above. But the words are already typed above, and I don't have time to go figure out which ones are no longer necessary, so I'm leaving it as written (Sorry!). Anyway, the way you frame it here certainly clarifies things in my mind, and nudges me closer to joining your crusade for establishment of an individual class. But I'm not yet sure we fully agree on the scope of what an "individual" is. The most intruiging passage in your entire email (for me) is this one:
I think that there are really only about two terms that should be put into a new Individual class: the other new term I have proposed (individualRemarks) and establishmentMeans (but that is the topic of another email)
By including establishmentMeans in your attributes of individual, you've piqued my interest in reading your "another email".... :-)
I hope that the discussion (and criticism!) will continue. Again, I'm interested in hearing alternatives.
I'll find time later to read and contemplate your other emails. For now, Id be interested in whether my comments in this email are useful, or am I just misunderstanding your basic point.
Aloha, Rich