[tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz

John Wieczorek tuco at berkeley.edu
Thu Oct 14 16:05:27 CEST 2010


Speaking strictly from ignorance rather than wisdom here, I don't believe
there is one right way to use the standard, though I agree that they are
innumerable wrong ways to do so. It's this basic unease that makes me
intuitively shy of expressing "A [single] TDWG Ontology".

What if we try a slightly different world view from the one you propose
centered on the Individual? Namely, let the Occurrence stand as "evidence
that a taxon occurred at a place and time." That is to say, we may or may
not care about the concept of an individual in our thinking and our data
capture. In this view, the Occurrence remains the central concept, and the
rest of the data highlights the evidence. Hence, a skull in a collection
(and the information gathered about the collection event) is the evidence
that a taxon occurred at a place and time.  Similarly, a digital image of an
identifiable individual from a camera trap is the evidence that a taxon
occurred at a place and time. A fossil having myriad individuals is evidence
that taxa occurred at a place and time based on a GeologicalContext.
In plain English, which we could express as RDF with an appropriate set of
predicates, we would always have the same pattern to describe Occurrences
from the Occurrence-centric world view, namely

the Occurrence O gives evidence that Taxon T determined based on
Identification criteria I occurred at Location L within GeologicalContext G
during the Event E based on evidence captured in properties of the
Occurrence and distinguishable in the type of evidence as recorded in the
dcterms:type and or the dwc:basisOfRecord.

I don't see anything "wrong" with this formulation, as all of the predicates
appropriately associate subjects and objects.

In other words, what is special about the Individual-centric view (or any
other view) except the way one wants to think about and express the
relationships (predicates) or formulates the questions?

On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf at vanderbilt.edu
> wrote:

>  I was just ready to leave work when I wrote this and since then I'm
> feeling like I should clarify just what I mean by "wrong" ways of using
> RDF.  I recognize that TDWG encourages flexibility in the ways that
> standards such as DwC are used.  As such, it doesn't usually define "right"
> and "wrong" ways of using the standards.  What I mean by calling some uses
> "wrong" is not intended to discourage the creative use of DwC terms in RDF.
> What I mean is that one must be careful to make sure that RDF statements
> mean what is intended.  Here is an example.  The Dublin Core term
> dcterms:language means "the language of the resource".  On multiple
> occasions, I've seen this term used in RDF as a property of a resource whose
> metadata is written in a certain language.  This is "wrong" because the
> subject of the statement is the resource itself, not the resource's
> metadata.  The need for this kind of clarity is apparent in the case of
> media.  For example, if we are providing metadata in English that describes
> a nature film which has audio in German, the correct statement is that
> [film] dcterms:language "de", NOT [film] dcterms:language "en".  This
> problem is handled appropriately in the MRTG schema by creating the
> (required) term mrtg:metadataLanguage.   The correct statement would be
> [film] mrtg:metadataLanguage "en" .  (I'm using "[film]" in lieu of a URI
> identifier for the film.)  If, however, we were writing RDF to describe the
> metadata itself rather than the film, then it would be appropriate to say
> [film's metadata] dcterms:language "en" .  In straight XML, we might get
> away with semantic sloppiness if the senders and receivers of the XML
> "understand" what the intended subject is of the term dcterms:language.  But
> in RDF, we have to assume that the receiver of the RDF is a "stupid"
> computer which only infers exactly what is said and not what we MEANT to
> say.
>
> I believe that this is a very important point that all parties need to keep
> in mind before we happily march off creating RDF templates for the general
> public to use.  In particular, I have some serious problems with the way
> that people are associating properties with instances of the dwc:Occurrence
> class.  I believe that these "wrong" ways originate with the historical
> roots of Darwin Core as a means to describe specimens.  I will illustrate
> what I mean.  In many cases, a specimen is created by killing an organism
> and gluing it to a piece of paper (if it's a plant) or putting it in a jar
> (if it's an animal).  It is natural to ask the question "what kind of
> species is the specimen?".  We can look at the specimen and make a statement
> like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty
> much makes sense.  However, in the new Darwin Core standard, we have a
> broader category of "things" (a.k.a. resources) that we call Occurrences
> which include specimens but which also includes observations and probably
> all kinds of things like images, DNA samples, and a whole lot of other
> things.  If we try to apply the same kind of statement to other kinds of
> Occurrences besides specimens we immediately run into problems.  If we say
> that [digital image] dwc:scientificName "Drosophila melanogaster" we are
> making a nonsensical statement.  The digital image can have properties like
> its photographer, its format, its pixel dimensions, etc. but the image
> itself does not have a scientific name.  The scientific name is a property
> of the thing that was photographed.  It makes even less sense if we are
> talking about observations.  An observation is a situation where somebody
> observes an organism.  The observation can have properties like the
> observer, the location, etc.  However, if we say [observation]
> dwc:scientificName "Drosophila melanogaster" we are saying that that act of
> observing has a scientific name.  That is an incorrect statement.  So the
> general statement [Occurrence] dwc:scientificName "Drosophila melanogaster"
> does not make sense when applied to all possible types of Occurrences.
> Rather, the organism that we are observing is the thing that has a
> scientific name.
>
> In all of the examples above, the correct statement is [individual
> organism] dwc:scientificName "Drosophila melanogaster".  The specimen is an
> occurrence of the individual organism.  The image is an occurrence of the
> individual organism.  The observation is an occurrence of the individual
> organism.  These statements may seem odd because we are used to thinking of
> an Occurrence being an occurrence of the "species" but it's not really.  The
> image is not an image of the Drosophila species concept nor is it an image
> of the string "Drosophila melanogaster".  The image is an image of an
> individual fruit fly.  The individual fruit fly is a representative of the
> taxon, the image and the observation are not.
>
> This point becomes more clear if we look at a situation where several types
> of occurrence records are collected from the same individual.  Let's say
> that we capture a bird, photograph it, collect a feather from it, collect a
> DNA sample and band it and let it go.  Later somebody sees the band and
> reports that as an observation.  How do we connect all of these things?  Do
> we create an identifier for the specimen (the feather) and then say that the
> image and the DNA sample came from it?  That would be wrong.  We could take
> an image of the feather, but that would be a different thing from an image
> of the bird.  We didn't get the DNA sample from the feather, we got it via a
> blood sample from the bird.  The band observation is not an observation of
> the feather, or the image or the DNA sample.  It's an observation of the
> bird which was never any kind of specimen living or dead.  The bird is an
> individual organism and that's what we need to call it.  Right now we don't
> have anything in Darwin Core that can be used to rdfs:type the bird, which
> is why I proposed Individual as a Darwin Core class.
>
> I could say these things more clearly in RDF, but since because many
> members of the audience of this message aren't familiar with RDF/XML they
> would probably zone out and the point would be lost.  The point is that we
> need to have identifiable classes of "resources" (the technical name for
> "things" like physical artifacts, concepts, and electronic representations)
> for all of the things that that we need to describe and inter-relate in the
> Darwin Core world.  Right now, we are missing one of the important pieces
> that we need, which is a class for the Individual.  If we are satisfied with
> creating an RDF model that only works for specimens and one-time
> observations, then we probably don't need Individual as a Darwin Core
> class.  On the other hand, if TDWG and GBIF are really serious about
> creating a system (Darwin Core and RDF based on it) that can handle other
> types of Occurrences like multiple images of live organisms, observations of
> the same organism over time, and multiple types of Occurrences collected
> from the same organism, then this capability should be built into the system
> from the start.  When I got back from the TDWG meeting, I was all excited
> about trying to use Darwin Core Archives with my live plant image
> collection.  However, it quickly became evident that it could not work
> because Occurrences were at the center of the diagram rather than
> Individuals.  So unless something changes, we are already embarking on the
> process of locking out these other Occurrence types.
>
> I hate to sound like a broken record (do we have those any more?), but read
> my paper on this subject.  It explains the rationale better than this email,
> has nice diagrams, and gives RDF examples to illustrate everything (
> https://journals.ku.edu/index.php/jbi/article/view/3664).  If somebody has
> a better idea of how to develop an internally consistent system that can
> handle the problems I've raised here that DOESN'T involve Individuals (i.e.
> other "right"[=semantically accurate] ways to express properties and
> relationships among Identifications, Taxa, diverse types of Occurrences,
> etc.) I'd like to hear what it is.  Or perhaps as Stan has suggested, there
> needs to be a task group that can hash out alternative views.  But let's
> have the discussion before we post models and suggest people use them.
>
> Steve
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101014/7275a73a/attachment.html 


More information about the tdwg-content mailing list