Speaking strictly from ignorance rather than wisdom here, I don't believe there is one right way to use the standard, though I agree that they are innumerable wrong ways to do so. It's this basic unease that makes me intuitively shy of expressing "A [single] TDWG Ontology".
What if we try a slightly different world view from the one you propose centered on the Individual? Namely, let the Occurrence stand as "evidence that a taxon occurred at a place and time." That is to say, we may or may not care about the concept of an individual in our thinking and our data capture. In this view, the Occurrence remains the central concept, and the rest of the data highlights the evidence. Hence, a skull in a collection (and the information gathered about the collection event) is the evidence that a taxon occurred at a place and time. Similarly, a digital image of an identifiable individual from a camera trap is the evidence that a taxon occurred at a place and time. A fossil having myriad individuals is evidence that taxa occurred at a place and time based on a GeologicalContext. In plain English, which we could express as RDF with an appropriate set of predicates, we would always have the same pattern to describe Occurrences from the Occurrence-centric world view, namely
the Occurrence O gives evidence that Taxon T determined based on Identification criteria I occurred at Location L within GeologicalContext G during the Event E based on evidence captured in properties of the Occurrence and distinguishable in the type of evidence as recorded in the dcterms:type and or the dwc:basisOfRecord.
I don't see anything "wrong" with this formulation, as all of the predicates appropriately associate subjects and objects.
In other words, what is special about the Individual-centric view (or any other view) except the way one wants to think about and express the relationships (predicates) or formulates the questions?
On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything ( https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve