[tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz

Thu Oct 14 16:22:44 CEST 2010

On Oct 14, 2010, at 10:05 AM, John Wieczorek wrote:

> What if we try a slightly different world view from the one you  
> propose centered on the Individual? Namely, let the Occurrence stand  
> as "evidence that a taxon occurred at a place and time." That is to  
> say, we may or may not care about the concept of an individual in  
> our thinking and our data capture. In this view, the Occurrence  
> remains the central concept, and the rest of the data highlights the  
> evidence. Hence, a skull in a collection (and the information  
> gathered about the collection event) is the evidence that a taxon  
> occurred at a place and time.  Similarly, a digital image of an  
> identifiable individual from a camera trap is the evidence that a  
> taxon occurred at a place and time. A fossil having myriad  
> individuals is evidence that taxa occurred at a place and time based  
> on a GeologicalContext.

If users try to pack a lot of context-dependent significance and  
meaning into their annotations (what the user "cares about" in the  
example), and present a fundamental observation only through layers of  
inference, this makes it more difficult to re-use or re-purpose the  
results, because the ultimate consumer of the information may not  
share the same motivations and perspectives.

Arlin

> In plain English, which we could express as RDF with an appropriate  
> set of predicates, we would always have the same pattern to describe  
> Occurrences from the Occurrence-centric world view, namely
>
> the Occurrence O gives evidence that Taxon T determined based on  
> Identification criteria I occurred at Location L within  
> GeologicalContext G during the Event E based on evidence captured in  
> properties of the Occurrence and distinguishable in the type of  
> evidence as recorded in the dcterms:type and or the dwc:basisOfRecord.
>
> I don't see anything "wrong" with this formulation, as all of the  
> predicates appropriately associate subjects and objects.
>
> In other words, what is special about the Individual-centric view  
> (or any other view) except the way one wants to think about and  
> express the relationships (predicates) or formulates the questions?
>
> On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf at vanderbilt.edu 
> > wrote:
> I was just ready to leave work when I wrote this and since then I'm  
> feeling like I should clarify just what I mean by "wrong" ways of  
> using RDF.  I recognize that TDWG encourages flexibility in the ways  
> that standards such as DwC are used.  As such, it doesn't usually  
> define "right" and "wrong" ways of using the standards.  What I mean  
> by calling some uses "wrong" is not intended to discourage the  
> creative use of DwC terms in RDF.  What I mean is that one must be  
> careful to make sure that RDF statements mean what is intended.   
> Here is an example.  The Dublin Core term dcterms:language means  
> "the language of the resource".  On multiple occasions, I've seen  
> this term used in RDF as a property of a resource whose metadata is  
> written in a certain language.  This is "wrong" because the subject  
> of the statement is the resource itself, not the resource's  
> metadata.  The need for this kind of clarity is apparent in the case  
> of media.  For example, if we are providing metadata in English that  
> describes a nature film which has audio in German, the correct  
> statement is that [film] dcterms:language "de", NOT [film]  
> dcterms:language "en".  This problem is handled appropriately in the  
> MRTG schema by creating the (required) term mrtg:metadataLanguage.    
> The correct statement would be [film] mrtg:metadataLanguage "en" .   
> (I'm using "[film]" in lieu of a URI identifier for the film.)  If,  
> however, we were writing RDF to describe the metadata itself rather  
> than the film, then it would be appropriate to say [film's metadata]  
> dcterms:language "en" .  In straight XML, we might get away with  
> semantic sloppiness if the senders and receivers of the XML  
> "understand" what the intended subject is of the term  
> dcterms:language.  But in RDF, we have to assume that the receiver  
> of the RDF is a "stupid" computer which only infers exactly what is  
> said and not what we MEANT to say.
>
> I believe that this is a very important point that all parties need  
> to keep in mind before we happily march off creating RDF templates  
> for the general public to use.  In particular, I have some serious  
> problems with the way that people are associating properties with  
> instances of the dwc:Occurrence class.  I believe that these "wrong"  
> ways originate with the historical roots of Darwin Core as a means  
> to describe specimens.  I will illustrate what I mean.  In many  
> cases, a specimen is created by killing an organism and gluing it to  
> a piece of paper (if it's a plant) or putting it in a jar (if it's  
> an animal).  It is natural to ask the question "what kind of species  
> is the specimen?".  We can look at the specimen and make a statement  
> like [specimen] dwc:scientificName "Drosophila melanogaster" and it  
> pretty much makes sense.  However, in the new Darwin Core standard,  
> we have a broader category of "things" (a.k.a. resources) that we  
> call Occurrences which include specimens but which also includes  
> observations and probably all kinds of things like images, DNA  
> samples, and a whole lot of other things.  If we try to apply the  
> same kind of statement to other kinds of Occurrences besides  
> specimens we immediately run into problems.  If we say that [digital  
> image] dwc:scientificName "Drosophila melanogaster" we are making a  
> nonsensical statement.  The digital image can have properties like  
> its photographer, its format, its pixel dimensions, etc. but the  
> image itself does not have a scientific name.  The scientific name  
> is a property of the thing that was photographed.  It makes even  
> less sense if we are talking about observations.  An observation is  
> a situation where somebody observes an organism.  The observation  
> can have properties like the observer, the location, etc.  However,  
> if we say [observation] dwc:scientificName "Drosophila melanogaster"  
> we are saying that that act of observing has a scientific name.   
> That is an incorrect statement.  So the general statement  
> [Occurrence] dwc:scientificName "Drosophila melanogaster" does not  
> make sense when applied to all possible types of Occurrences.   
> Rather, the organism that we are observing is the thing that has a  
> scientific name.
>
> In all of the examples above, the correct statement is [individual  
> organism] dwc:scientificName "Drosophila melanogaster".  The  
> specimen is an occurrence of the individual organism.  The image is  
> an occurrence of the individual organism.  The observation is an  
> occurrence of the individual organism.  These statements may seem  
> odd because we are used to thinking of an Occurrence being an  
> occurrence of the "species" but it's not really.  The image is not  
> an image of the Drosophila species concept nor is it an image of the  
> string "Drosophila melanogaster".  The image is an image of an  
> individual fruit fly.  The individual fruit fly is a representative  
> of the taxon, the image and the observation are not.
>
> This point becomes more clear if we look at a situation where  
> several types of occurrence records are collected from the same  
> individual.  Let's say that we capture a bird, photograph it,  
> collect a feather from it, collect a DNA sample and band it and let  
> it go.  Later somebody sees the band and reports that as an  
> observation.  How do we connect all of these things?  Do we create  
> an identifier for the specimen (the feather) and then say that the  
> image and the DNA sample came from it?  That would be wrong.  We  
> could take an image of the feather, but that would be a different  
> thing from an image of the bird.  We didn't get the DNA sample from  
> the feather, we got it via a blood sample from the bird.  The band  
> observation is not an observation of the feather, or the image or  
> the DNA sample.  It's an observation of the bird which was never any  
> kind of specimen living or dead.  The bird is an individual organism  
> and that's what we need to call it.  Right now we don't have  
> anything in Darwin Core that can be used to rdfs:type the bird,  
> which is why I proposed Individual as a Darwin Core class.
>
> I could say these things more clearly in RDF, but since because many  
> members of the audience of this message aren't familiar with RDF/XML  
> they would probably zone out and the point would be lost.  The point  
> is that we need to have identifiable classes of "resources" (the  
> technical name for "things" like physical artifacts, concepts, and  
> electronic representations) for all of the things that that we need  
> to describe and inter-relate in the Darwin Core world.  Right now,  
> we are missing one of the important pieces that we need, which is a  
> class for the Individual.  If we are satisfied with creating an RDF  
> model that only works for specimens and one-time observations, then  
> we probably don't need Individual as a Darwin Core class.  On the  
> other hand, if TDWG and GBIF are really serious about creating a  
> system (Darwin Core and RDF based on it) that can handle other types  
> of Occurrences like multiple images of live organisms, observations  
> of the same organism over time, and multiple types of Occurrences  
> collected from the same organism, then this capability should be  
> built into the system from the start.  When I got back from the TDWG  
> meeting, I was all excited about trying to use Darwin Core Archives  
> with my live plant image collection.  However, it quickly became  
> evident that it could not work because Occurrences were at the  
> center of the diagram rather than Individuals.  So unless something  
> changes, we are already embarking on the process of locking out  
> these other Occurrence types.
>
> I hate to sound like a broken record (do we have those any more?),  
> but read my paper on this subject.  It explains the rationale better  
> than this email, has nice diagrams, and gives RDF examples to  
> illustrate everything (https://journals.ku.edu/index.php/jbi/article/view/3664 
> ).  If somebody has a better idea of how to develop an internally  
> consistent system that can handle the problems I've raised here that  
> DOESN'T involve Individuals (i.e. other "right"[=semantically  
> accurate] ways to express properties and relationships among  
> Identifications, Taxa, diverse types of Occurrences, etc.) I'd like  
> to hear what it is.  Or perhaps as Stan has suggested, there needs  
> to be a task group that can hash out alternative views.  But let's  
> have the discussion before we post models and suggest people use them.
>
> Steve
>
> <ATT00001.txt>

-------
Arlin Stoltzfus (arlin at umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101014/87a80f7e/attachment-0001.html