[tdwg-content] Occurrences, Organisms, and CollectionObjects: a review

Steve Baskauf steve.baskauf at vanderbilt.edu
Thu Sep 8 17:22:24 CEST 2011

Well, this point brings to mind the previous discussions about 
"collapsing" or "denormalizing" complex models into simpler models.  In 
those discussions, it was noted that many if not most people 
"denormalize" out of existence parts of a more complex model when they 
don't need them in their particular situation.  People who have whole 
dead organisms in a jar do not care about the distinction between an 
Organism and a CollectionObject.  However, people who cut five branches 
from the same tree and send them to five different herbaria do care, as 
do people who both photograph and collect samples from an organism, 
people who make observations of the same organism over time, and people 
who make measurements of a temporarily captured animal at the same time 
they collect a DNA sample.  I don't consider the latter four examples to 
be very, very, very few cases.  If it seems like there are very few 
cases, that's just because TDWG was started by (and currently run 
mostly) by people who run museums.  If TDWG wants to broaden the 
biodiversity informatics tent to include people outside of the museum 
community, then it is important to create the classes and terms 
necessary to accommodate those other people. 

Simple DwC does not require that users include classes that they don't 
need in their databases.  If a museum only has specimens that are whole 
dead organisms, then there is no need for them to have a table in their 
database for Organism; they only need a table for CollectionObjects and 
the Organism represented by that CollectionObject can be inferred.  This 
is no different than flattening the relationships among Location, Event, 
and Occurrence by having a 1:1:1 relationship among those three classes, 
which might be done in a Darwin Core Archive.  The fact that some people 
prefer to "flatten" those three classes into a single database table 
doesn't mean that other people will not have the need to have several 
events at a single Location, or several Occurrences at one Event. 

So I guess I would say that I don't agree that there are very, very, 
very few cases where there are multiple CollectionObjects per Organism.  
One of the major points of the BiSciCol initiative is that there needs 
to be some way for connecting diverse collection objects that originate 
from the same organism but end up being scattered around in different 
institutions.  The lack of an Organism class is a big hole in modeling 
these kinds of situations and I haven't heard of an alternative way to 
fix that hole.  There is a strong desire among a diverse group of TDWG 
participants to move forward on more precisely modeling relationships 
among classes using RDF.  If we do not make the changes John has 
suggested, it is difficult to see how this will be accomplished without 
people "making up" classes on the fly to accommodate the more precise 
modeling that is sought.  That's what Cam and I had to do when we 
created darwin-sw.


Chuck Miller wrote:
> "in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject..."
> Rich has hit on an important point.  The discussion continues to focus on perfecting logic that is all encompassing.  But, is it the best thing to do for the community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).
> Maybe we should consider keeping the solutions simple (ie current DwC ) for the many, many, many cases and introduce complex extensions only for the very, very, very few.
> Chuck
> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Richard Pyle
> Sent: Thursday, September 08, 2011 3:52 AM
> To: 'Gregor Hagedorn'; tuco at berkeley.edu
> Cc: 'TDWG Content Mailing List'
> Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review
>> I see a problem with the "taxonomically homogeneous" since many taxa 
>> are not.
>> All obligatory mutualistically symbiontic organisms are excluded (you
> mention
>> symbiont, but the symbiont is the part of a symbiontic relation, e.g. 
>> both
> the
>> algae taxon and fungus taxon each are a symbiont in a lichen.
> I don't understand the problem.  Isn't this simply two instances of "Organism" (one symbiont and one host)?
> Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
>> Definition:     The information class pertaining to a specific 
>> instance or
> set of
>> instances of a life form or organism (virus, bacteria, symbiontic life
> forms,
>> individual, colony, group, population). Sets must reliably be known to 
>> taxonomically homogeneous (including obligatory symbiontic associations).
> I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is).  When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.
> All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).
> I originally thought that both "Organism" and "collectionObject" would be
> redundant, and that only one was really needed.   But I have now been
> convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism".  Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".
> One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens).  Does that mean that data providers will need to generate two separate Ids (one organismID and one
> collectionObjectID) to represent all of these specimens?
> Aloha,
> Rich
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> .

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110908/befb13a7/attachment.html 

More information about the tdwg-content mailing list