[tdwg-content] Occurrences, Organisms, and CollectionObjects: a review

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Sep 13 17:48:17 CEST 2011


Well, I would respectfully disagree that we are building an ad hoc model 
here.  We are actually at the end of a rather lengthy process of trying 
to develop a consensus of what exactly an Occurrence is. 

Since at least October of 2009, there has been discussion on the 
tdwg-content list about the meaning of Occurrence.  I realize that 
because of the large number of posts, not everyone had the time to keep 
up with that conversation and parallel discussions on the topic of 
Organisms/Individuals, and CollectionObjects/Samples/Tokens.  The 
suggestion was made that someone take the time to summarize these 
discussions for those who didn't have the time to keep up with them as 
they were happening.  I have expended a rather large amount of time to 
do exactly that: you can find a somewhat chronological summary at 
http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary and 
topical summaries at 
http://code.google.com/p/darwin-sw/wiki/ClassOccurrence , 
http://code.google.com/p/darwin-sw/wiki/ClassIndividual which have links 
to the many of the individual posts that were made on those topics. 

What emerged from these discussions was what appeared (to me at least) 
to be a consensus about what an Occurrence was.  I will refer you to the 
http://code.google.com/p/darwin-sw/wiki/ClassOccurrence page for some of 
the definitions that were suggested.  This consensus made the 
distinction between the record that an organism was present at a 
particular time and place, and the evidence (if any) that was used to 
document the Occurrence.  The proposal that John made is a reflection of 
the apparent consensus that came out of that discussion. 

The Darwin Core standard has a process in place for making additions and 
changes to the terms in its vocabulary.  That process involves 
discussion, consensus-building, and the defining and adoption of new 
terms (or modification of the definitions of existing terms) when they 
are needed by a significant portion of the TDWG community.  That process 
has taken place in this instance and I believe that John is right to 
"call the question" on these new term proposals.  TDWG has a reputation 
as an organization where people talk endlessly and nothing ever really 
gets accomplished.  We have an opportunity here prove that reputation 
wrong.  If we simply start asking the same questions (which have already 
been discussed ad nauseum) over again without making an actual decision 
on the proposal, then what we are doing really IS a waste of time.  I, 
for one, have no interest in spending any more time on this issue.  So I 
would recommend that people who want to comment about the proposed 
definitions of Occurrence, Organism, and CollectionObject review the 
discussion summaries that I've noted above before restarting 
conversations that have already pretty much been run into the ground. 

I would also respectfully disagree that through these proposals we are 
building a complex model by adding terms for CollectionObject and 
Organism.  The proposals are ONLY for adding terms.  Nothing in those 
proposals models how the new classes are related to any other existing 
classes in Darwin Core in a formal way (e.g. through OWL or RDFS).  
There have been repeated calls for further discussion that would build a 
consensus about a more complex model, possibly built on top of a 
foundation based on Darwin Core classes and property terms.  As a 
consequence, we are attempting to charter a group to discuss more 
complex RDF models that can be used by those who need them (see 
http://code.google.com/p/tdwg-rdf/wiki/CharterOfIG).  As you suggest, 
the core members of the proposed group includes representation from the 
Observations community as well as other constituencies within TDWG.  But 
that discussion is really just starting.

With regards to Markus' concern about whether people will be able to 
know whether somebody is talking about a "new-style" Occurrence or an 
"old" Occurrence, I would assert that the "old" Occurrence didn't really 
have a clear meaning.  If you review the summary of the discussion on 
Occurrence, you can see that it was used to mean at least three 
different kinds of "things" by different people.  What John is actually 
doing with his proposal is to add clarity about what an Occurrence is 
where it didn't exist before.  I think that is a good thing.  If, by the 
"old" kind of Occurrence people are meaning that Occurrence is a fancier 
name for PreservedSpecimen (which I believe is how some people in the 
museum community are thinking of it), then I would say that such a 
characterization is incorrect (based on the apparent consensus) and that 
clarifying the incorrectness of that view is a really good thing.

Steve

Éamonn Ó Tuama (GBIF) wrote:
> It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) 
> _Éamonn
>
> -----Original Message-----
> From: Dag Endresen (GBIF) [mailto:dendresen at gbif.org] 
> Sent: 13 September 2011 12:18
> To: "Markus Döring (GBIF)"
> Cc: tdwg-content at lists.tdwg.org; Éamonn Ó Tuama
> Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
>
>  Hi Markus,
>
>  I believe that the discussion here originates from the view that the 
>  "CollectionObject"/"Sample" is a different thing from the "Organism" - 
>  and that there can be a relationship between CollectionObjects/Samples 
>  and Organisms that could be difficult to describe if these things are 
>  identified as the same think (occurrenceID). Do you think that the 
>  "Occurrence" would be seen as a thing different from the proposed 
>  CollectionObject/Sample and Organism - or as a super-class that would 
>  include CollectionObjects/Samples and Organisms? Would the semantics of 
>  Occurrence change?
>
>  I fully share your view that the Darwin Core Archive (DwC-A) would not 
>  be suited to share the full complex relationship between entities - even 
>  if persistent identifiers would be used. However if we start to describe 
>  and include other things (core types) than only the taxon and 
>  occurrences then perhaps the DwC-A could be a useful way to provide a 
>  simple list of these entities? This could perhaps provide easier 
>  indexing and discovery of these new entities?
>
>  Dag
>
>
>
>  On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
>   
>> I have to say that the change in semantics to the Occurrence class
>> makes me a bit nervous.
>> Can someone try help fighting my fears?
>>
>> DarwinCore has no versioning of namespaces, so there is no way for a
>> consumer to detect if its an old style Occurrence or a new one. I am
>> currently parsing various RSS feeds and even though its a mess having
>> to parse 10 different styles I am glad that at least the designers
>> made sure they all have their own namespace! Also removing or 
>> renaming
>> terms might cause serious problems. Would discrete versions of dwc
>> with their own namespace hurt?
>>
>> Another observation relates to dwc archives and its star schema. As
>> an index to data that has been flattened there is no problem with 
>> more
>> classes and core row types, but if you want it as a way to transfer
>> complete normalized data it will not work. But that never really was
>> the intention and I simply wanted to stress that fact.
>>
>> Markus
>>
>>
>>
>> On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
>>
>>     
>>> Richard Pyle wrote:
>>>       
>>>> I'm also wondering if we necessarily need to "break" the 
>>>> traditional view of
>>>> the "Occurrence" class in order to implement Organism and 
>>>> CollectionObject.
>>>> As long as we keep in mind that DwC is a vocabulary of terms 
>>>> focused on
>>>> representing an exchange standard (rather than a full-blown 
>>>> Ontology),
>>>> perhaps Occurrence records can continue to be represented in the 
>>>> traditional
>>>> way as "flat" content, but the Organism and CollectionObject 
>>>> classes allow
>>>> us to present data in a somewhat more "normalized" way in those
>>>> circumstances that call for it (e.g. tracking individuals or groups 
>>>> over
>>>> time [Organism], or managing fossil rocks with multiple taxa
>>>> [CollectionObject] -- to name just two).
>>>>
>>>>         
>>> I've been thinking about this issue of "backward compatibility" with
>>> respect to Occurrences if the CollectionObject/Sample/Token/whatever
>>> class is adopted.  I really don't think it is going to be as big of 
>>> a
>>> deal as people are making it out to be.
>>>
>>> It seems to me that the main problems arise in two areas: when one 
>>> wants
>>> to be clear about typing and when one wants to express relationships 
>>> in
>>> a system where it is possible to do through semantics (like RDF).  
>>> In
>>> that kind of circumstance, it's bad (oh yeh, I forgot - the term is
>>> "naughty") to say  something like
>>> resourceA hasOccurrence resourceB
>>> when resourceB isn't actually an Occurrence.   "Wrong" typing also
>>> happens all the time because the classes don't exist (yet) to do the
>>> typing correctly.  As a case in point, in the Morphbank system, I 
>>> have
>>> multiple images of the same tree.  In that system the tree is typed 
>>> as a
>>> "specimen".  That is totally wrong because the tree isn't a 
>>> specimen,
>>> but what else is it going to be typed as?  There isn't (yet) an
>>> appropriate class to put it in.
>>>
>>> Although these two problems (wrong typing and using a term with the
>>> wrong kind of object which are actually different manifestations of 
>>> the
>>> same class-based problem) are naughty, realistically very few people 
>>> are
>>> actually using a system that is "semantic-aware" (e.g. serving and
>>> consuming RDF) so right now making those mistakes doesn't really 
>>> "break"
>>> anything.  Most data providers are using traditional databases or 
>>> even
>>> Excel spreadsheets where the DwC terms are just column headings with 
>>> no
>>> real "meaning" other than what the data managers intend for them to
>>> mean.  So if a manager has a table where each line contains a record 
>>> for
>>> a specimen and has a column heading for a column entitled
>>> "dwc:catalogNumber", there isn't really anything other than an idea 
>>> in
>>> the manager's head that the catalogNumber is a property of a 
>>> specimen or
>>> Occurence or CollectionObject.  If each line in the database table 
>>> is
>>> "flat" such that one specimen=one CollectionObject=one Occurrence, 
>>> all
>>> that is required to make catalogNumber be a property of a
>>> CollectionObject instead of an Occurrence is a different way of 
>>> thinking
>>> in the managers mind because there are really no semantics embedded 
>>> in
>>> the table.  We are already doing this kind of mental gymnastics with
>>> existing classes like dwc:Identification .  If our hypothetical 
>>> database
>>> manager has a column heading that says "dwc:identifiedBy" in the
>>> specimen table, that is really a property of dwc:Identification, not
>>> dwc:Occurrence but again that is a distinction that is only going to 
>>> be
>>> made in the manager's mind.  Making the distinction really only 
>>> becomes
>>> an issue when the database stops being "flat" for a particular
>>> relationship, e.g. if the database wants to allow multiple
>>> Identifications per specimen record.  Then the database structure 
>>> must
>>> be changed accordingly to accommodate that "normalization".
>>>
>>> What we have here at the present moment is a situation where data
>>> providers don't have any way to have anything but "flat" records 
>>> where 1
>>> specimen=1 Occurrence=1 Organism.  By adding the Organism and
>>> CollectionObject classes, we allow people who need or want to have 
>>> less
>>> "flat" (=more "normalized") databases to have something to call the
>>> entities that are represented by the new tables they create to 
>>> handle
>>> 1:many relationships instead of 1:1 relationships.  Anybody who only
>>> cares about 1:1 relationships really doesn't need to worry about the
>>> fact that the new class exists, just as people currently don't have 
>>> to
>>> worry about the Identification class if they only allow one
>>> Identification per specimen in their database.
>>>
>>> So I guess what I'm saying is that if a database manager has a table
>>> labeled Occurrence, they really don't have to freak out if we now 
>>> tell
>>> them that their table actually should be labeled CollectionObject as
>>> long as there is only one CollectionObject per Occurrence.  They 
>>> didn't
>>> freak out before when we told them that they should call their table
>>> "Occurrence" instead of "Observation" or "Specimen" in 2009, did 
>>> they?
>>>
>>> I think what I'm saying here is what Rich was trying to say in the
>>> paragraph I quoted, but I'm not sure.
>>>
>>> Steve
>>>
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>       
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110913/d33093e4/attachment.html 


More information about the tdwg-content mailing list