Markus,
Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".

There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain. So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.

There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.rdf which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...

Steve

Markus Döring (GBIF) wrote:

Hi Steve,
I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.

Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?

Markus

With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning.  If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people.  What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before.  I think that is a good thing.  If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.

Steve

Éamonn Ó Tuama (GBIF) wrote:

It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-)
_Éamonn

-----Original Message-----
From: Dag Endresen (GBIF) [
mailto:dendresen@gbif.org
]
Sent: 13 September 2011 12:18
To: "Markus Döring (GBIF)"
Cc:
tdwg-content@lists.tdwg.org
; Éamonn Ó Tuama
Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review

 Hi Markus,

 I believe that the discussion here originates from the view that the
 "CollectionObject"/"Sample" is a different thing from the "Organism" -
 and that there can be a relationship between CollectionObjects/Samples
 and Organisms that could be difficult to describe if these things are
 identified as the same think (occurrenceID). Do you think that the
 "Occurrence" would be seen as a thing different from the proposed
 CollectionObject/Sample and Organism - or as a super-class that would
 include CollectionObjects/Samples and Organisms? Would the semantics of
 Occurrence change?

 I fully share your view that the Darwin Core Archive (DwC-A) would not
 be suited to share the full complex relationship between entities - even
 if persistent identifiers would be used. However if we start to describe
 and include other things (core types) than only the taxon and
 occurrences then perhaps the DwC-A could be a useful way to provide a
 simple list of these entities? This could perhaps provide easier
 indexing and discovery of these new entities?

 Dag

 On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:

I have to say that the change in semantics to the Occurrence class
makes me a bit nervous.
Can someone try help fighting my fears?

DarwinCore has no versioning of namespaces, so there is no way for a
consumer to detect if its an old style Occurrence or a new one. I am
currently parsing various RSS feeds and even though its a mess having
to parse 10 different styles I am glad that at least the designers
made sure they all have their own namespace! Also removing or
renaming
terms might cause serious problems. Would discrete versions of dwc
with their own namespace hurt?

Another observation relates to dwc archives and its star schema. As
an index to data that has been flattened there is no problem with
more
classes and core row types, but if you want it as a way to transfer
complete normalized data it will not work. But that never really was
the intention and I simply wanted to stress that fact.

Markus



On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:

Richard Pyle wrote:

I'm also wondering if we necessarily need to "break" the
traditional view of
the "Occurrence" class in order to implement Organism and
CollectionObject.
As long as we keep in mind that DwC is a vocabulary of terms
focused on
representing an exchange standard (rather than a full-blown
Ontology),
perhaps Occurrence records can continue to be represented in the
traditional
way as "flat" content, but the Organism and CollectionObject
classes allow
us to present data in a somewhat more "normalized" way in those
circumstances that call for it (e.g. tracking individuals or groups
over
time [Organism], or managing fossil rocks with multiple taxa
[CollectionObject] -- to name just two).

I've been thinking about this issue of "backward compatibility" with
respect to Occurrences if the CollectionObject/Sample/Token/whatever
class is adopted.  I really don't think it is going to be as big of
a
deal as people are making it out to be.

It seems to me that the main problems arise in two areas: when one
wants
to be clear about typing and when one wants to express relationships
in
a system where it is possible to do through semantics (like RDF).
In
that kind of circumstance, it's bad (oh yeh, I forgot - the term is
"naughty") to say  something like
resourceA hasOccurrence resourceB
when resourceB isn't actually an Occurrence.   "Wrong" typing also
happens all the time because the classes don't exist (yet) to do the
typing correctly.  As a case in point, in the Morphbank system, I
have
multiple images of the same tree.  In that system the tree is typed
as a
"specimen".  That is totally wrong because the tree isn't a
specimen,
but what else is it going to be typed as?  There isn't (yet) an
appropriate class to put it in.

Although these two problems (wrong typing and using a term with the
wrong kind of object which are actually different manifestations of
the
same class-based problem) are naughty, realistically very few people
are
actually using a system that is "semantic-aware" (e.g. serving and
consuming RDF) so right now making those mistakes doesn't really
"break"
anything.  Most data providers are using traditional databases or
even
Excel spreadsheets where the DwC terms are just column headings with
no
real "meaning" other than what the data managers intend for them to
mean.  So if a manager has a table where each line contains a record
for
a specimen and has a column heading for a column entitled
"dwc:catalogNumber", there isn't really anything other than an idea
in
the manager's head that the catalogNumber is a property of a
specimen or
Occurence or CollectionObject.  If each line in the database table
is
"flat" such that one specimen=one CollectionObject=one Occurrence,
all
that is required to make catalogNumber be a property of a
CollectionObject instead of an Occurrence is a different way of
thinking
in the managers mind because there are really no semantics embedded
in
the table.  We are already doing this kind of mental gymnastics with
existing classes like dwc:Identification .  If our hypothetical
database
manager has a column heading that says "dwc:identifiedBy" in the
specimen table, that is really a property of dwc:Identification, not
dwc:Occurrence but again that is a distinction that is only going to
be
made in the manager's mind.  Making the distinction really only
becomes
an issue when the database stops being "flat" for a particular
relationship, e.g. if the database wants to allow multiple
Identifications per specimen record.  Then the database structure
must
be changed accordingly to accommodate that "normalization".

What we have here at the present moment is a situation where data
providers don't have any way to have anything but "flat" records
where 1
specimen=1 Occurrence=1 Organism.  By adding the Organism and
CollectionObject classes, we allow people who need or want to have
less
"flat" (=more "normalized") databases to have something to call the
entities that are represented by the new tables they create to
handle
1:many relationships instead of 1:1 relationships.  Anybody who only
cares about 1:1 relationships really doesn't need to worry about the
fact that the new class exists, just as people currently don't have
to
worry about the Identification class if they only allow one
Identification per specimen in their database.

So I guess what I'm saying is that if a database manager has a table
labeled Occurrence, they really don't have to freak out if we now
tell
them that their table actually should be labeled CollectionObject as
long as there is only one CollectionObject per Occurrence.  They
didn't
freak out before when we told them that they should call their table
"Occurrence" instead of "Observation" or "Specimen" in 2009, did
they?

I think what I'm saying here is what Rich was trying to say in the
paragraph I quoted, but I'm not sure.

Steve

--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu



_______________________________________________
tdwg-content mailing list

tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list

tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu