Well, this point brings to mind the previous discussions about "collapsing" or "denormalizing" complex models into simpler models.  In those discussions, it was noted that many if not most people "denormalize" out of existence parts of a more complex model when they don't need them in their particular situation.  People who have whole dead organisms in a jar do not care about the distinction between an Organism and a CollectionObject.  However, people who cut five branches from the same tree and send them to five different herbaria do care, as do people who both photograph and collect samples from an organism, people who make observations of the same organism over time, and people who make measurements of a temporarily captured animal at the same time they collect a DNA sample.  I don't consider the latter four examples to be very, very, very few cases.  If it seems like there are very few cases, that's just because TDWG was started by (and currently run mostly) by people who run museums.  If TDWG wants to broaden the biodiversity informatics tent to include people outside of the museum community, then it is important to create the classes and terms necessary to accommodate those other people. 

Simple DwC does not require that users include classes that they don't need in their databases.  If a museum only has specimens that are whole dead organisms, then there is no need for them to have a table in their database for Organism; they only need a table for CollectionObjects and the Organism represented by that CollectionObject can be inferred.  This is no different than flattening the relationships among Location, Event, and Occurrence by having a 1:1:1 relationship among those three classes, which might be done in a Darwin Core Archive.  The fact that some people prefer to "flatten" those three classes into a single database table doesn't mean that other people will not have the need to have several events at a single Location, or several Occurrences at one Event. 

So I guess I would say that I don't agree that there are very, very, very few cases where there are multiple CollectionObjects per Organism.  One of the major points of the BiSciCol initiative is that there needs to be some way for connecting diverse collection objects that originate from the same organism but end up being scattered around in different institutions.  The lack of an Organism class is a big hole in modeling these kinds of situations and I haven't heard of an alternative way to fix that hole.  There is a strong desire among a diverse group of TDWG participants to move forward on more precisely modeling relationships among classes using RDF.  If we do not make the changes John has suggested, it is difficult to see how this will be accomplished without people "making up" classes on the fly to accommodate the more precise modeling that is sought.  That's what Cam and I had to do when we created darwin-sw.

Steve

Chuck Miller wrote:
"in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject..."

Rich has hit on an important point.  The discussion continues to focus on perfecting logic that is all encompassing.  But, is it the best thing to do for the community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).

Maybe we should consider keeping the solutions simple (ie current DwC ) for the many, many, many cases and introduce complex extensions only for the very, very, very few.

Chuck

-----Original Message-----
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle
Sent: Thursday, September 08, 2011 3:52 AM
To: 'Gregor Hagedorn'; tuco@berkeley.edu
Cc: 'TDWG Content Mailing List'
Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review

  
I see a problem with the "taxonomically homogeneous" since many taxa 
are not.
All obligatory mutualistically symbiontic organisms are excluded (you
    
mention
  
symbiont, but the symbiont is the part of a symbiontic relation, e.g. 
both
    
the
  
algae taxon and fungus taxon each are a symbiont in a lichen.
    

I don't understand the problem.  Isn't this simply two instances of "Organism" (one symbiont and one host)?

Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".

  
Definition:     The information class pertaining to a specific 
instance or
    
set of
  
instances of a life form or organism (virus, bacteria, symbiontic life
    
forms,
  
individual, colony, group, population). Sets must reliably be known to 
taxonomically homogeneous (including obligatory symbiontic associations).
    

I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is).  When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.

All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).

I originally thought that both "Organism" and "collectionObject" would be
redundant, and that only one was really needed.   But I have now been
convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism".  Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".

One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens).  Does that mean that data providers will need to generate two separate Ids (one organismID and one
collectionObjectID) to represent all of these specimens?

Aloha,
Rich


_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

.

  

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu