[tdwg-content] Occurrences, Organisms, and CollectionObjects: a review

Richard Pyle deepreef at bishopmuseum.org
Fri Sep 9 00:21:39 CEST 2011

I just want to say that Steve *PERFECTLY* captured my own (current)
perspective on this in his message below.




From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Steve Baskauf
Sent: Thursday, September 08, 2011 5:22 AM
To: Chuck Miller
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a


Well, this point brings to mind the previous discussions about "collapsing"
or "denormalizing" complex models into simpler models.  In those
discussions, it was noted that many if not most people "denormalize" out of
existence parts of a more complex model when they don't need them in their
particular situation.  People who have whole dead organisms in a jar do not
care about the distinction between an Organism and a CollectionObject.
However, people who cut five branches from the same tree and send them to
five different herbaria do care, as do people who both photograph and
collect samples from an organism, people who make observations of the same
organism over time, and people who make measurements of a temporarily
captured animal at the same time they collect a DNA sample.  I don't
consider the latter four examples to be very, very, very few cases.  If it
seems like there are very few cases, that's just because TDWG was started by
(and currently run mostly) by people who run museums.  If TDWG wants to
broaden the biodiversity informatics tent to include people outside of the
museum community, then it is important to create the classes and terms
necessary to accommodate those other people.  

Simple DwC does not require that users include classes that they don't need
in their databases.  If a museum only has specimens that are whole dead
organisms, then there is no need for them to have a table in their database
for Organism; they only need a table for CollectionObjects and the Organism
represented by that CollectionObject can be inferred.  This is no different
than flattening the relationships among Location, Event, and Occurrence by
having a 1:1:1 relationship among those three classes, which might be done
in a Darwin Core Archive.  The fact that some people prefer to "flatten"
those three classes into a single database table doesn't mean that other
people will not have the need to have several events at a single Location,
or several Occurrences at one Event.  

So I guess I would say that I don't agree that there are very, very, very
few cases where there are multiple CollectionObjects per Organism.  One of
the major points of the BiSciCol initiative is that there needs to be some
way for connecting diverse collection objects that originate from the same
organism but end up being scattered around in different institutions.  The
lack of an Organism class is a big hole in modeling these kinds of
situations and I haven't heard of an alternative way to fix that hole.
There is a strong desire among a diverse group of TDWG participants to move
forward on more precisely modeling relationships among classes using RDF.
If we do not make the changes John has suggested, it is difficult to see how
this will be accomplished without people "making up" classes on the fly to
accommodate the more precise modeling that is sought.  That's what Cam and I
had to do when we created darwin-sw.


Chuck Miller wrote: 

"in the many, many, many cases where there will be a 1:1 relationship
between an Organism and a CollectionObject..."
Rich has hit on an important point.  The discussion continues to focus on
perfecting logic that is all encompassing.  But, is it the best thing to do
for the community as a whole to implement solutions that are more complex in
order to accommodate the very, very, very few cases (using the counter
description to Rich's).
Maybe we should consider keeping the solutions simple (ie current DwC ) for
the many, many, many cases and introduce complex extensions only for the
very, very, very few.
-----Original Message-----
From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Richard Pyle
Sent: Thursday, September 08, 2011 3:52 AM
To: 'Gregor Hagedorn'; tuco at berkeley.edu
Cc: 'TDWG Content Mailing List'
Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a

I see a problem with the "taxonomically homogeneous" since many taxa 
are not.
All obligatory mutualistically symbiontic organisms are excluded (you


symbiont, but the symbiont is the part of a symbiontic relation, e.g. 


algae taxon and fungus taxon each are a symbiont in a lichen.

I don't understand the problem.  Isn't this simply two instances of
"Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but
I see no trouble treating obligatory mutualistic symbionts and their host(s)
as distinct instances of "Organism".

Definition:     The information class pertaining to a specific 
instance or

set of

instances of a life form or organism (virus, bacteria, symbiontic life


individual, colony, group, population). Sets must reliably be known to 
taxonomically homogeneous (including obligatory symbiontic associations).

I guess it could be defined that way, but I've come around to Steve's view
that "taxonomically homogeneous" implies that in cases where more than one
individual is involved (colonies, small groups, populations), all such
individuals belong to a single species (independently of whether or not we
can identify what that species is).  When more than one species is
discovered amongst a multi-individual instance of "Organism", then one would
create additional instances of Organism to accommodate the heterogeneous
All of my original examples of things that I would want to be taxonomically
heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a
single rock with multiple phyla of invertebrates attached) can be easily
aggregated via a single instance of collectionObject, associated with
multiple instances of Organism (one for each species[ish] level taxon).
I originally thought that both "Organism" and "collectionObject" would be
redundant, and that only one was really needed.   But I have now been
convinced by Steve (and others) that this would be an unnecessary
over-loading of "Organism".  Now that we are contemplating two distinct
classes, I have no problem with the more "refined" definition of "Organism".
One concern I do have, however, is in the many, many, many cases where there
will be a 1:1 relationship between an Organism and a CollectionObject (i.e.,
the vast majority of all Museum specimens).  Does that mean that data
providers will need to generate two separate Ids (one organismID and one
collectionObjectID) to represent all of these specimens?
tdwg-content mailing list
tdwg-content at lists.tdwg.org
tdwg-content mailing list
tdwg-content at lists.tdwg.org

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110908/198fbde4/attachment.html 

More information about the tdwg-content mailing list