[tdwg-content] Occurrences, Organisms, and CollectionObjects: a review

John Wieczorek tuco at berkeley.edu
Wed Sep 14 03:08:03 CEST 2011


Comments in line.

On Tue, Sep 13, 2011 at 10:39 AM, Steve Baskauf
<steve.baskauf at vanderbilt.edu> wrote:
> Markus,
> Well, I don't know that I'd go so far as to say that it's a drastic change
> in semantics, at least in the formal semantics of the normative document
> (which I _think_ can be viewed at
> http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf).
> That document says (in human terms) that Occurrence is an rdfs:Class, that
> its status is Recommended, and some bookkeeping stuff about versioning.  The
> main change is in the rdfs:comment which presents the description "The
> category of information pertaining to evidence of an occurrence in nature,
> in a collection, or in a dataset (specimen, observation, etc.)." but that's
> really a human thing and as I've said, there has been quite a bit of
> misunderstanding about what the "human" definition means.  As has been noted
> on this list before, DwC doesn't get into domain and range issues for its
> terms at all and usually doesn't get into subclassing, so there is very
> little in the normative document to be "broken" in terms of semantics.
> That's a rather different situation than changing an owl ontology class
> definition where relationships among classes and their properties are likely
> to be more complicated (e.g. disjoint classes, subclassing, range, domain)
> and therefore more easily "broken".
>
> There is the issue that a number of property terms would have their
> dwcattributes:organizedInClass property changed from dwc:Occurrence to
> something else.  But my understanding was that the organization of the
> property terms under the DwC classes was more of a suggestion as opposed to
> a declaration of domain.

That is correct. dwcattributes:organizedInClass is something like a
label to suggest the class under which to list the term. It is a
convenience, and does not assert an ontological relationship between
the property and the class.

> So it doesn't seem likely that the understanding
> of machines will be "broken" by this change since I don't think that much of
> any machine reasoning based on DwC is going on at the present.  But this is
> getting beyond my area of expertise, so maybe others can clarify things on
> this point.

It wouldn't break existing schemas, but they would be incomplete if
the changes were not incorporated. So, it will require adjustments to
the reference schemas (http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd,
http://rs.tdwg.org/dwc/xsd/tdwg_dwc_class_terms.xsd, and
http://rs.tdwg.org/dwc/xsd/tdwg_dwcterms.xsd) to allow the use of the
new classes. It will affect Simple Darwin Core as text only insofar as
terms are added, removed, or their names changed, unless a new type of
core record is deemed necessary. We would probably also want to add or
add to the examples throughout the documentation.

>
> There is an RDF version of DwC viewable at
> http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.rdf
> which actually has dated versions of the terms (e.g.
> Occurrence-2009-04-29).  But I must confess, I don't understand how this
> document is related to the dwcterms.rdf document I mentioned above.  Perhaps
> John can enlighten us...

That document dwctermshistory.rdf contains the entire history of
Darwin Core terms as far back as I could go. It contains all versions
of terms, identifying them (for example, <rdf:Description
rdf:about="acceptedNameUsage-2009-09-21">), and relating them to each
other (for example, <rdfs:replaces
rdf:resource="http://rs.tdwg.org/dwc/terms/acceptedScientificName-2009-07-06"/>).

The document dwcterms.rdf contains only the terms for which the status
is recommended, without the history of relationship. So, if someone
wanted to reason across versions of terms (to gain a backward
compatibility), he or she would use the full history, not the
conveniently simplified dwcterms.rdf. In that sense,
dwctermshistory.rdf is the normative document for the terms.

> Steve
>
> Markus Döring (GBIF) wrote:
>
> Hi Steve,
> I agree this is a good thing to me more clear about what an occurrence
> actualize is and I would't disagree with the proposed 3 classes. Still there
> is a drastic change in semantics of an existing term Occurrence and I would
> feel more comfortable if we can tell those different usages apart. If thats
> via a namespace based versioning of (all?) darwin core terms, through the
> use of a different term name or sth else I don't know.
>
> Don't you think this an issue? Would you also change an owl ontology class
> definition in the same way and would't that be harmful to existing
> instances?
>
> Markus
>
>
>
>
>
>
>
> With regards to Markus' concern about whether people will be able to know
> whether somebody is talking about a "new-style" Occurrence or an "old"
> Occurrence, I would assert that the "old" Occurrence didn't really have a
> clear meaning.  If you review the summary of the discussion on Occurrence,
> you can see that it was used to mean at least three different kinds of
> "things" by different people.  What John is actually doing with his proposal
> is to add clarity about what an Occurrence is where it didn't exist before.
> I think that is a good thing.  If, by the "old" kind of Occurrence people
> are meaning that Occurrence is a fancier name for PreservedSpecimen (which I
> believe is how some people in the museum community are thinking of it), then
> I would say that such a characterization is incorrect (based on the apparent
> consensus) and that clarifying the incorrectness of that view is a really
> good thing.
>
> Steve
>
> Éamonn Ó Tuama (GBIF) wrote:
>
>
> It would be good to hear from someone who is familiar with the work going on
> in the Observations Task Group and could explain how a generic model for
> observations/measurements (e.g. OBOE) might help sort out these issues. It
> seems to me that we are trying to build in an ad-hoc manner an increasingly
> complex model on top of DwC which is really just a glossary of terms. That
> does not seem like a good approach - but I'm no modeller :-)
> _Éamonn
>
> -----Original Message-----
> From: Dag Endresen (GBIF) [
> mailto:dendresen at gbif.org
> ]
> Sent: 13 September 2011 12:18
> To: "Markus Döring (GBIF)"
> Cc:
> tdwg-content at lists.tdwg.org
> ; Éamonn Ó Tuama
> Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a
> review
>
>  Hi Markus,
>
>  I believe that the discussion here originates from the view that the
>  "CollectionObject"/"Sample" is a different thing from the "Organism" -
>  and that there can be a relationship between CollectionObjects/Samples
>  and Organisms that could be difficult to describe if these things are
>  identified as the same think (occurrenceID). Do you think that the
>  "Occurrence" would be seen as a thing different from the proposed
>  CollectionObject/Sample and Organism - or as a super-class that would
>  include CollectionObjects/Samples and Organisms? Would the semantics of
>  Occurrence change?
>
>  I fully share your view that the Darwin Core Archive (DwC-A) would not
>  be suited to share the full complex relationship between entities - even
>  if persistent identifiers would be used. However if we start to describe
>  and include other things (core types) than only the taxon and
>  occurrences then perhaps the DwC-A could be a useful way to provide a
>  simple list of these entities? This could perhaps provide easier
>  indexing and discovery of these new entities?
>
>  Dag
>
>
>
>  On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
>
>
>
>
> I have to say that the change in semantics to the Occurrence class
> makes me a bit nervous.
> Can someone try help fighting my fears?
>
> DarwinCore has no versioning of namespaces, so there is no way for a
> consumer to detect if its an old style Occurrence or a new one. I am
> currently parsing various RSS feeds and even though its a mess having
> to parse 10 different styles I am glad that at least the designers
> made sure they all have their own namespace! Also removing or
> renaming
> terms might cause serious problems. Would discrete versions of dwc
> with their own namespace hurt?
>
> Another observation relates to dwc archives and its star schema. As
> an index to data that has been flattened there is no problem with
> more
> classes and core row types, but if you want it as a way to transfer
> complete normalized data it will not work. But that never really was
> the intention and I simply wanted to stress that fact.
>
> Markus
>
>
>
> On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
>
>
>
>
>
> Richard Pyle wrote:
>
>
>
>
> I'm also wondering if we necessarily need to "break" the
> traditional view of
> the "Occurrence" class in order to implement Organism and
> CollectionObject.
> As long as we keep in mind that DwC is a vocabulary of terms
> focused on
> representing an exchange standard (rather than a full-blown
> Ontology),
> perhaps Occurrence records can continue to be represented in the
> traditional
> way as "flat" content, but the Organism and CollectionObject
> classes allow
> us to present data in a somewhat more "normalized" way in those
> circumstances that call for it (e.g. tracking individuals or groups
> over
> time [Organism], or managing fossil rocks with multiple taxa
> [CollectionObject] -- to name just two).
>
>
>
>
>
> I've been thinking about this issue of "backward compatibility" with
> respect to Occurrences if the CollectionObject/Sample/Token/whatever
> class is adopted.  I really don't think it is going to be as big of
> a
> deal as people are making it out to be.
>
> It seems to me that the main problems arise in two areas: when one
> wants
> to be clear about typing and when one wants to express relationships
> in
> a system where it is possible to do through semantics (like RDF).
> In
> that kind of circumstance, it's bad (oh yeh, I forgot - the term is
> "naughty") to say  something like
> resourceA hasOccurrence resourceB
> when resourceB isn't actually an Occurrence.   "Wrong" typing also
> happens all the time because the classes don't exist (yet) to do the
> typing correctly.  As a case in point, in the Morphbank system, I
> have
> multiple images of the same tree.  In that system the tree is typed
> as a
> "specimen".  That is totally wrong because the tree isn't a
> specimen,
> but what else is it going to be typed as?  There isn't (yet) an
> appropriate class to put it in.
>
> Although these two problems (wrong typing and using a term with the
> wrong kind of object which are actually different manifestations of
> the
> same class-based problem) are naughty, realistically very few people
> are
> actually using a system that is "semantic-aware" (e.g. serving and
> consuming RDF) so right now making those mistakes doesn't really
> "break"
> anything.  Most data providers are using traditional databases or
> even
> Excel spreadsheets where the DwC terms are just column headings with
> no
> real "meaning" other than what the data managers intend for them to
> mean.  So if a manager has a table where each line contains a record
> for
> a specimen and has a column heading for a column entitled
> "dwc:catalogNumber", there isn't really anything other than an idea
> in
> the manager's head that the catalogNumber is a property of a
> specimen or
> Occurence or CollectionObject.  If each line in the database table
> is
> "flat" such that one specimen=one CollectionObject=one Occurrence,
> all
> that is required to make catalogNumber be a property of a
> CollectionObject instead of an Occurrence is a different way of
> thinking
> in the managers mind because there are really no semantics embedded
> in
> the table.  We are already doing this kind of mental gymnastics with
> existing classes like dwc:Identification .  If our hypothetical
> database
> manager has a column heading that says "dwc:identifiedBy" in the
> specimen table, that is really a property of dwc:Identification, not
> dwc:Occurrence but again that is a distinction that is only going to
> be
> made in the manager's mind.  Making the distinction really only
> becomes
> an issue when the database stops being "flat" for a particular
> relationship, e.g. if the database wants to allow multiple
> Identifications per specimen record.  Then the database structure
> must
> be changed accordingly to accommodate that "normalization".
>
> What we have here at the present moment is a situation where data
> providers don't have any way to have anything but "flat" records
> where 1
> specimen=1 Occurrence=1 Organism.  By adding the Organism and
> CollectionObject classes, we allow people who need or want to have
> less
> "flat" (=more "normalized") databases to have something to call the
> entities that are represented by the new tables they create to
> handle
> 1:many relationships instead of 1:1 relationships.  Anybody who only
> cares about 1:1 relationships really doesn't need to worry about the
> fact that the new class exists, just as people currently don't have
> to
> worry about the Identification class if they only allow one
> Identification per specimen in their database.
>
> So I guess what I'm saying is that if a database manager has a table
> labeled Occurrence, they really don't have to freak out if we now
> tell
> them that their table actually should be labeled CollectionObject as
> long as there is only one CollectionObject per Occurrence.  They
> didn't
> freak out before when we told them that they should call their table
> "Occurrence" instead of "Observation" or "Specimen" in 2009, did
> they?
>
> I think what I'm saying here is what Rich was trying to say in the
> paragraph I quoted, but I'm not sure.
>
> Steve
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
>
> http://bioimages.vanderbilt.edu
>
>
>
> _______________________________________________
> tdwg-content mailing list
>
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
>
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
>
> http://bioimages.vanderbilt.edu
>
>
> .
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>


More information about the tdwg-content mailing list