[tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz

Thu Oct 14 11:37:11 CEST 2010

Steve,
just a quick mail while Im still reading your longer post - it might be outdated already...

It feels we are trapped in the old debate about simple vs complex models. Darwin Core was not meant to be a full, exact model. Its the "core" of the data we are dealing with and often contains shortcuts as Rich explained. There are much richer models around, but you will find it hard to exchange data based on them between very heterogenous databases.

You might be interested in looking at the EDIT CDM how it implements the idea of derived specimens/observations:
http://wp5.e-taxonomy.eu/cdm/v22/EARoot/EA8/EA246.png
(diagram taken from complete model at http://wp5.e-taxonomy.eu/cdm/v22/ )

that is actually based on the DCEFD model published by walter in 1997:
http://www.bgbm.org/CDEFD/CollectionModel/units.htm

Some quick remarks to increase the confusion:
- asserting that several occurrences are talking about the same individual can be done via dwc:individualID already. How this knowledge is established is rather difficult I would think, but for some occurrences at least banding or dna fingerprints might be a way
- the scientific name shortcut in dwc is most often rejected by people that need to track identification histories

Markus

On Oct 14, 2010, at 10:53, Richard Pyle wrote:

>> In many cases, a specimen is created by killing an organism and gluing it
> to a 
>> piece of paper (if it's a plant) or putting it in a jar (if it's an
> animal).  
>> It is natural to ask the question "what kind of species is the specimen?".
> 
>> We can look at the specimen and make a statement like [specimen] 
>> dwc:scientificName "Drosophila melanogaster" and it pretty much makes
> sense.  
>> However, in the new Darwin Core standard, we have a broader category of 
>> "things" (a.k.a. resources) that we call Occurrences which include
> specimens 
>> but which also includes observations and probably all kinds of things like
> 
>> images, DNA samples, and a whole lot of other things.  If we try to apply 
>> the same kind of statement to other kinds of Occurrences besides specimens
> 
>> we immediately run into problems.  If we say that [digital image] 
>> dwc:scientificName "Drosophila melanogaster" we are making a nonsensical 
>> statement.  The digital image can have properties like its photographer, 
>> its format, its pixel dimensions, etc. but the image itself does not have
> a 
>> scientific name.  The scientific name is a property of the thing that was 
>> photographed.  It makes even less sense if we are talking about
> observations.  
>> An observation is a situation where somebody observes an organism.  
>> The observation can have properties like the observer, the location, etc. 
>> However, if we say [observation] dwc:scientificName "Drosophila
> melanogaster"
>> we are saying that that act of observing has a scientific name. 
>> That is an incorrect statement.  So the general statement [Occurrence] 
>> dwc:scientificName "Drosophila melanogaster" does not make sense when 
>> applied to all possible types of Occurrences.  Rather, the organism 
>> that we are observing is the thing that has a scientific name.  
> 
> OK, I admit that I have not been following this list as closely as I should
> have -- especially during the latter half of 2009.  But I have to
> ask....seriously....is this the level of misunderstanding that still exists
> in our community?
> 
> Perhaps I'm the idiot here, but it has *always* been my understanding that
> the "thing" (I hesitate to use the word "basis") of an Occurrence instance
> is *always* the organism (or set of organisms, or impression of an organism
> in the case of fossils).  If the organisms were captured and preserved in a
> Museum, then we call it a specimen.  If the organisms were only witnessed
> and not captured, we call it an observation.  Everything else (including the
> physical specimen) is just layers of evidence to support the existence and
> taxonomic identification of the organism within the Occurrence.  When
> photons reflected off the outer surface of an organism find their way
> through a lense and onto some mechanism for recording said photos (either a
> human retina and neurons in the brain, or sheet of celluloid, or digital
> image sensor and memory stick), it's still the organism that the photons
> reflected off of, which represents the "thing" of the Occurrence to which
> metadata apply. Same goes for vocalizations transmitted through pressure
> waves in the air onto some recording device (ear/brain, or microphone/tape).
> 
> So while it's certainly true that a media object such as a 35mm slide or
> digital image file does not itself have a scientificName (then again, some
> of my old Kodachromes have enough mold on them that they might....), said
> media objects are *not* the Occurrence itself -- they merely represent
> evidence of the occurrence.  Even a specimen in a jar is not the Occurrence
> itself.  The Occurrence occurred when the specimen was captured (e.g., 400
> feet deep on a coral reef).  A specimen in a jar on a shelf in a Museum is
> no longer the "Occurrence"; it is the evidence of the Occurrence.
> 
> When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an
> "Observation"), I'm certainly not trying to identify the act of observation;
> I'm identifying the organism that was observed, at the time and place that
> it was observed.
> 
> For what it's worth, if I only have a still or video image of an organism
> (e.g., http://www.youtube.com/watch?v=GVTd11q3Ppc; taken by Rob Whitton, who
> some of you met at TDWG this year), and didn't collect the specimen, I
> create an Observation record, and link the image to it as associatedMedia.
> I would never assign a taxon name to the video clip -- only to the "content
> item" of the video that represents an organism, serving as the basis of an
> Occurrence record.
> 	
>> The specimen is an occurrence of the individual organism.  
>> The image is an occurrence of the individual organism.  
>> The observation is an occurrence of the individual organism.  
> 
> I would say in all three cases that the presence of an organism at a place
> and time was the Occurrence.  Specimens, images, and reported observations
> are merely the evidence that the occurrence existed (and to varying degrees,
> can also allow for subsequent interpretations of taxonomic identification).
> 
>> These statements may seem odd because we are used to 
>> thinking of an Occurrence being an occurrence of the 
>> "species" but it's not really.  
> 
> I completely agree.  The occurrence was the organism at a place and time.
> The "species" is merely the taxon concept that someone identified the
> organism as belonging to.  The scientificName is merely the label that
> someone applied to the taxon concept.  In other words, the scientificName is
> really a property of the Taxon Concept, and the Taxon Concept is the subject
> of an identification event, and the identification event was applied to the
> organism, which itself represents the basis of an Occurrence.  But very few
> people go to the trouble of creating that full chain of relationships, so as
> a short-hand, the scientificName is often treated as a direct property of
> the occurrence (collected or observed organism).  I think this short-hand is
> perfectly fine in the context of DwC, but only as long as people understand
> the implied chain of linked entities.  If we start to forget what's really
> going on, then we run into trouble. 
> 
> Which, I guess, was the whole point of Steve's post.
> 
> What concerns me, though, is that we're not (yet?) already beyond this.
> 
>> This point becomes more clear if we look at a situation where several 
>> types of occurrence records are collected from the same individual.  
>> Let's say that we capture a bird, photograph it, collect a feather from
> it, 
>> collect a DNA sample and band it and let it go.  Later somebody sees the 
>> band and reports that as an observation.  
>> How do we connect all of these things?  
> 
> Two Occurences:  The first one when it was captured, photographed, and
> relieved of a feather. The second when it was observed at a later date.
> 
>> Do we create an identifier for the specimen (the feather) 
>> and then say that the image and the DNA sample came from it?  
> 
> We create an identifier for the first Occurrence, capture the
> specimen-relevant metadata of the preserved feather, and track the DNA
> sample via associatedSequences.
> 
>> That would be wrong.  We could take an image of the feather, 
>> but that would be a different thing from an image of the bird.  
> 
> It's certainly different from an image of the whole Bird, but that doesn't
> preclude us from including both bird and feather images among
> associatedMedia for the first Occurrence.
> 
>> We didn't get the DNA sample from the feather, we got it 
>> via a blood sample from the bird.  
> 
> I don't see that as a problem, because the feather is only the evidence of
> the bird at the place and time (i.e., the first Occurrence). Thus, the
> sequence can still be included as part of the associatedSequences for the
> first Occurrence.
> 
>> The band observation is not an observation of the feather, 
>> or the image or the DNA sample.  It's an observation of 
>> the bird which was never any kind of specimen living or dead.  
>> The bird is an individual organism and that's what we need to call it.  
> 
> Agreed -- it forms the basis for the second Occurrence record (later date).
> The two Occurrence records can be cross referenced, either via a shared
> individualID, or via associatedOccurrences.
> 
>> Right now we don't have anything in Darwin Core that can 
>> be used to rdfs:type the bird, which is why I proposed Individual 
>> as a Darwin Core class.  
> 
> As someone else alluded to earlier in this thread, there are near-infinite
> ways that we can slice & cluster biodiversity data. I think there are some
> cases where "individual" makes a lot of sense as a class (banded birds,
> managed organisms in zoos and curated gardens, whale and shark observation
> datasets, plant monitoring projects, etc.). But I think the notion of
> "Occurrence" makes more sense at this point in biodiversity informatics
> history, because the vast majority of datasets can be organized in this way
> realtively painlessly, and because the majority of questions being asked of
> these data revolve around presence of organisms identified to taxon concepts
> occurring at place and time.
> 	
>> I could say these things more clearly in RDF, but since 
>> because many members of the audience of this message 
>> aren't familiar with RDF/XML they would probably zone 
>> out and the point would be lost.  
> 
> Myself among them.  Thank you for presenting it in the less-efficient
> English Prose form.
> 
>> The point is that we need to have identifiable classes of "resources" 
>> (the technical name for "things" like physical artifacts, concepts, 
>> and electronic representations) for all of the things that that we
>> need to describe and inter-relate in the Darwin Core world.  
>> Right now, we are missing one of the important pieces that we need, 
>> which is a class for the Individual.  If we are satisfied with creating 
>> an RDF model that only works for specimens and one-time observations, 
>> then we probably don't need Individual as a Darwin Core class.  On the 
>> other hand, if TDWG and GBIF are really serious about creating a 
>> system (Darwin Core and RDF based on it) that can handle other types 
>> of Occurrences like multiple images of live organisms, observations 
>> of the same organism over time, and multiple types of Occurrences 
>> collected from the same organism, then this capability should be built 
>> into the system from the start.  When I got back from the TDWG meeting, 
>> I was all excited about trying to use Darwin Core Archives with my 
>> live plant image collection.  However, it quickly became evident 
>> that it could not work because Occurrences were at the center of the 
>> diagram rather than Individuals.  So unless something changes, we 
>> are already embarking on the process of locking out these other 
>> Occurrence types.
> 
> Well...I certainly agree with you that we need *clear* documentation on what
> these classes are intended to represent.  I had *thought* it was clear that
> an Occurrence was as I have outlined above.  But like I said, I'm perfectly
> willing to accept that I'm the idiot in this case, and am completely out of
> phase with the rest of the community.
> 
> As to whether or not we need to define a class for Individual, I'm not so
> sure that's entirely necessary.  I guess DwC is already primed for it
> (http://rs.tdwg.org/dwc/terms/index.htm#individualID) -- but I'm not sure
> what properties would apply to such a class that are not already covered in
> DwC.  Pronbably the next intieration of DwC would move some of the
> properties of the Occurrence class (catalogNumber, individualCount,
> preparations, disposition, associatedSequences, previousIdentifications)
> over to the Individual Class, at which point the Occurrence becomes the
> intersection of an Individual and an Event.
> 
> But let me ask: how would you scope "Individual"? (see my previous rants on
> this list in recent days)  Would it be restricted to a particular individual
> organism? Or, would it be extended to include specified groups of organisms
> (as dwc:individualID already does)? What about populations?  Taxon Concepts?
> 
>> I hate to sound like a broken record (do we have those any more?),
>> but read my paper on this subject.  
> 
> I've had gotten through the first few pages, and intend to finish soon.  But
> it's much more fun to write emails about this stuff..... :-)
> 
> Aloha,
> Rich
> 
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
> Associate Zoologist in Ichthyology
> Dive Safety Officer
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
> 
> 
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content