[tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz
Arlin Stoltzfus
arlin at umd.edu
Thu Oct 14 16:22:44 CEST 2010
On Oct 14, 2010, at 10:05 AM, John Wieczorek wrote:
> What if we try a slightly different world view from the one you
> propose centered on the Individual? Namely, let the Occurrence stand
> as "evidence that a taxon occurred at a place and time." That is to
> say, we may or may not care about the concept of an individual in
> our thinking and our data capture. In this view, the Occurrence
> remains the central concept, and the rest of the data highlights the
> evidence. Hence, a skull in a collection (and the information
> gathered about the collection event) is the evidence that a taxon
> occurred at a place and time. Similarly, a digital image of an
> identifiable individual from a camera trap is the evidence that a
> taxon occurred at a place and time. A fossil having myriad
> individuals is evidence that taxa occurred at a place and time based
> on a GeologicalContext.
If users try to pack a lot of context-dependent significance and
meaning into their annotations (what the user "cares about" in the
example), and present a fundamental observation only through layers of
inference, this makes it more difficult to re-use or re-purpose the
results, because the ultimate consumer of the information may not
share the same motivations and perspectives.
Arlin
> In plain English, which we could express as RDF with an appropriate
> set of predicates, we would always have the same pattern to describe
> Occurrences from the Occurrence-centric world view, namely
>
> the Occurrence O gives evidence that Taxon T determined based on
> Identification criteria I occurred at Location L within
> GeologicalContext G during the Event E based on evidence captured in
> properties of the Occurrence and distinguishable in the type of
> evidence as recorded in the dcterms:type and or the dwc:basisOfRecord.
>
> I don't see anything "wrong" with this formulation, as all of the
> predicates appropriately associate subjects and objects.
>
> In other words, what is special about the Individual-centric view
> (or any other view) except the way one wants to think about and
> express the relationships (predicates) or formulates the questions?
>
> On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf at vanderbilt.edu
> > wrote:
> I was just ready to leave work when I wrote this and since then I'm
> feeling like I should clarify just what I mean by "wrong" ways of
> using RDF. I recognize that TDWG encourages flexibility in the ways
> that standards such as DwC are used. As such, it doesn't usually
> define "right" and "wrong" ways of using the standards. What I mean
> by calling some uses "wrong" is not intended to discourage the
> creative use of DwC terms in RDF. What I mean is that one must be
> careful to make sure that RDF statements mean what is intended.
> Here is an example. The Dublin Core term dcterms:language means
> "the language of the resource". On multiple occasions, I've seen
> this term used in RDF as a property of a resource whose metadata is
> written in a certain language. This is "wrong" because the subject
> of the statement is the resource itself, not the resource's
> metadata. The need for this kind of clarity is apparent in the case
> of media. For example, if we are providing metadata in English that
> describes a nature film which has audio in German, the correct
> statement is that [film] dcterms:language "de", NOT [film]
> dcterms:language "en". This problem is handled appropriately in the
> MRTG schema by creating the (required) term mrtg:metadataLanguage.
> The correct statement would be [film] mrtg:metadataLanguage "en" .
> (I'm using "[film]" in lieu of a URI identifier for the film.) If,
> however, we were writing RDF to describe the metadata itself rather
> than the film, then it would be appropriate to say [film's metadata]
> dcterms:language "en" . In straight XML, we might get away with
> semantic sloppiness if the senders and receivers of the XML
> "understand" what the intended subject is of the term
> dcterms:language. But in RDF, we have to assume that the receiver
> of the RDF is a "stupid" computer which only infers exactly what is
> said and not what we MEANT to say.
>
> I believe that this is a very important point that all parties need
> to keep in mind before we happily march off creating RDF templates
> for the general public to use. In particular, I have some serious
> problems with the way that people are associating properties with
> instances of the dwc:Occurrence class. I believe that these "wrong"
> ways originate with the historical roots of Darwin Core as a means
> to describe specimens. I will illustrate what I mean. In many
> cases, a specimen is created by killing an organism and gluing it to
> a piece of paper (if it's a plant) or putting it in a jar (if it's
> an animal). It is natural to ask the question "what kind of species
> is the specimen?". We can look at the specimen and make a statement
> like [specimen] dwc:scientificName "Drosophila melanogaster" and it
> pretty much makes sense. However, in the new Darwin Core standard,
> we have a broader category of "things" (a.k.a. resources) that we
> call Occurrences which include specimens but which also includes
> observations and probably all kinds of things like images, DNA
> samples, and a whole lot of other things. If we try to apply the
> same kind of statement to other kinds of Occurrences besides
> specimens we immediately run into problems. If we say that [digital
> image] dwc:scientificName "Drosophila melanogaster" we are making a
> nonsensical statement. The digital image can have properties like
> its photographer, its format, its pixel dimensions, etc. but the
> image itself does not have a scientific name. The scientific name
> is a property of the thing that was photographed. It makes even
> less sense if we are talking about observations. An observation is
> a situation where somebody observes an organism. The observation
> can have properties like the observer, the location, etc. However,
> if we say [observation] dwc:scientificName "Drosophila melanogaster"
> we are saying that that act of observing has a scientific name.
> That is an incorrect statement. So the general statement
> [Occurrence] dwc:scientificName "Drosophila melanogaster" does not
> make sense when applied to all possible types of Occurrences.
> Rather, the organism that we are observing is the thing that has a
> scientific name.
>
> In all of the examples above, the correct statement is [individual
> organism] dwc:scientificName "Drosophila melanogaster". The
> specimen is an occurrence of the individual organism. The image is
> an occurrence of the individual organism. The observation is an
> occurrence of the individual organism. These statements may seem
> odd because we are used to thinking of an Occurrence being an
> occurrence of the "species" but it's not really. The image is not
> an image of the Drosophila species concept nor is it an image of the
> string "Drosophila melanogaster". The image is an image of an
> individual fruit fly. The individual fruit fly is a representative
> of the taxon, the image and the observation are not.
>
> This point becomes more clear if we look at a situation where
> several types of occurrence records are collected from the same
> individual. Let's say that we capture a bird, photograph it,
> collect a feather from it, collect a DNA sample and band it and let
> it go. Later somebody sees the band and reports that as an
> observation. How do we connect all of these things? Do we create
> an identifier for the specimen (the feather) and then say that the
> image and the DNA sample came from it? That would be wrong. We
> could take an image of the feather, but that would be a different
> thing from an image of the bird. We didn't get the DNA sample from
> the feather, we got it via a blood sample from the bird. The band
> observation is not an observation of the feather, or the image or
> the DNA sample. It's an observation of the bird which was never any
> kind of specimen living or dead. The bird is an individual organism
> and that's what we need to call it. Right now we don't have
> anything in Darwin Core that can be used to rdfs:type the bird,
> which is why I proposed Individual as a Darwin Core class.
>
> I could say these things more clearly in RDF, but since because many
> members of the audience of this message aren't familiar with RDF/XML
> they would probably zone out and the point would be lost. The point
> is that we need to have identifiable classes of "resources" (the
> technical name for "things" like physical artifacts, concepts, and
> electronic representations) for all of the things that that we need
> to describe and inter-relate in the Darwin Core world. Right now,
> we are missing one of the important pieces that we need, which is a
> class for the Individual. If we are satisfied with creating an RDF
> model that only works for specimens and one-time observations, then
> we probably don't need Individual as a Darwin Core class. On the
> other hand, if TDWG and GBIF are really serious about creating a
> system (Darwin Core and RDF based on it) that can handle other types
> of Occurrences like multiple images of live organisms, observations
> of the same organism over time, and multiple types of Occurrences
> collected from the same organism, then this capability should be
> built into the system from the start. When I got back from the TDWG
> meeting, I was all excited about trying to use Darwin Core Archives
> with my live plant image collection. However, it quickly became
> evident that it could not work because Occurrences were at the
> center of the diagram rather than Individuals. So unless something
> changes, we are already embarking on the process of locking out
> these other Occurrence types.
>
> I hate to sound like a broken record (do we have those any more?),
> but read my paper on this subject. It explains the rationale better
> than this email, has nice diagrams, and gives RDF examples to
> illustrate everything (https://journals.ku.edu/index.php/jbi/article/view/3664
> ). If somebody has a better idea of how to develop an internally
> consistent system that can handle the problems I've raised here that
> DOESN'T involve Individuals (i.e. other "right"[=semantically
> accurate] ways to express properties and relationships among
> Identifications, Taxa, diverse types of Occurrences, etc.) I'd like
> to hear what it is. Or perhaps as Stan has suggested, there needs
> to be a task group that can hash out alternative views. But let's
> have the discussion before we post models and suggest people use them.
>
> Steve
>
> <ATT00001.txt>
-------
Arlin Stoltzfus (arlin at umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101014/87a80f7e/attachment-0001.html
More information about the tdwg-content
mailing list