OK, let's look at a concrete example. Take the specimen that is illustrated at http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf
The identifier "http://www.cyberfloralouisiana.com/specimens/lsu000/0138" could be associated with this "thing". (It actually isn't, I just made it up for an example. If you wish, you could substitute a UUID.) Now let's say that someone previously had asserted:
occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138
with the understanding that the Occurrence represented not only the fact that a plant identified as Egeria densa occurred at the location N 29.79 deg., W 90.632 deg. on 7 Sep 1977 but that it also represented the actual dried plant specimen itself (i.e. the evidence that the plant occurred there). This is the meaning of Occurrence that was implied (but not stated very explicitly) in the 2009 Darwin Core standard.

Under the new definition of Occurrence that is under consideration, the Occurrence represents the fact that a plant identified as Egeria densa occurred at the location N 29.79 deg., W 90.632 deg. on 7 Sep 1977. These metadata fall under the occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138. Technically, the actual dried plant specimen itself is now not part of the Occurrence but rather is a CollectionObject. Does that break something? Does it force the institution to create a new identifier for the CollectionObject that has just been defined into existence? I think not. If the particular institution has ONLY occurrence records for which single pieces of evidence are associated with each Occurrence, then they have a flat database that does not distinguish between the Occurrence and the CollectionObject associated with that Occurrence. The change to the term definition is essentially irrelevant to that institution. On the other hand, if the institution decides that they have a new policy which requires that all collected specimens must now be photographed prior to collection and a DNA sample collected and submitted to Genbank, the new definitions provide a way for them to associate three (or more) CollectionObjects having separate collectionObjectIDs with the single Occurrence. If they "de-flatten" their database to accommodate this more "normalized" structure, they could easily implement a rule like 'put "#sp" after the identifier for the Occurrence to construct a default identifier for the single CollectionObject associated with that Occurrence (e.g. collectionObjectID==http://www.cyberfloralouisiana.com/specimens/lsu000/0138#sp for the CollectionObject associated with the Occurrence having occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138) or make up new identifiers for the CollectionObjects if they want. But no TDWG "Big Brother" is making them change their database structure or add new identifiers unless they want to.

To put this in perspective, look at http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf
Here we have a specimen that has two dwc:Identifications, one that asserts the taxon Juncus diffusissimus sensu L. Urbatsch and one that asserts Juncus debilis sensu G. Montz. Accommodating a single specimen which has two dwc:Identifications in a completely flat database presents exactly the same problems as accommodating two CollectionObjects for a single Occurrence. All of the issues that have been raised for separating CollectionObjects from Occurrences apply equally well to creating the Identification class (like "do I have to assign new separate identifiers for the Identification instances" and "do I "break" things if I allow multiple Identifications in a world where people have databases that permit only a single determination per specimen/occurrence/organism"). I don't hear anybody gnashing their teeth or frothing at the mouth about the fact that we let the term dwc:Identification sneak into the 2009 Darwin Core and mess up our nice perfectly flat database world. Somebody explain to me how the issues raised with CollectionObject is different from this?

Steve

Richard Pyle wrote:

Ever since DwC transitioned from a "Federated Schema" to a "Vocabulary",
I've never been entirely clear on what sorts of alterations would break
backward-compatibility, and which are easily handled.  I've heard various
statements from people with much more understanding than I on the
implications of a "Vocabulary" that the classes are really intended as rough
clusters of terms, and it's the definition of terms that matter.  Have I
misunderstood this?  The point being: The only way we are threatening to
"break" DwC is by moving terms from the Occurrence class to two other new
classes.  Does that mean we are no longer allowed to represent those terms
as properties of a record with an OccurrenceID?  The tiny part of my brain
that "gets" ontology wants to believe that backward compatibility of what
would be the new DwC:Occurrence would be maintained with what is the
existing DwC:Occurrence *only* if the new classes ("Organism" and
"CollectionObject") are regarded as subclasses of Occurrence.  But the
slightly less tiny (but still tiny) part of my brain that "gets" information
modeling doesn't think that's the right way to represent the new classes.
Which tiny part of my brain is right? (I'm guessing neither...) Does it even
matter?

Obviously, we want a stable DwC.  But we also want a DwC that meets our
needs.  Clearly, there are needs that are not being met by the existing DwC.
The first question is, are those needs important enough to consider
destabilizing DwC (by introducing two new classes, and shuffling some terms
from one existing class to the new classes)?  The second question is: what
are the real costs/consequences of the "destabilization".  In my mind, the
answer to the first question is increasingly obvious ("yes").  But I don't
have a good feel for the answer to the second question.

Aloha,
Rich

P.S. Greg: I live on the other side of the world from *everyone*, yet that
hasn't prevented me from getting my words in... :-)

-----Original Message-----
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-
bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)"
Sent: Tuesday, September 13, 2011 6:59 AM
To: Steve Baskauf
Cc: tdwg-content@lists.tdwg.org; "Éamonn Ó Tuama (GBIF)"
Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects:
a review

Hi Steve,
I agree this is a good thing to me more clear about what an occurrence
actualize is and I would't disagree with the proposed 3 classes. Still

there is a

drastic change in semantics of an existing term Occurrence and I would

feel

more comfortable if we can tell those different usages apart. If thats via

namespace based versioning of (all?) darwin core terms, through the use of

different term name or sth else I don't know.

Don't you think this an issue? Would you also change an owl ontology class
definition in the same way and would't that be harmful to existing

instances?

Markus

With regards to Markus' concern about whether people will be able to

know whether somebody is talking about a "new-style" Occurrence or an
"old" Occurrence, I would assert that the "old" Occurrence didn't really

have

a clear meaning.  If you review the summary of the discussion on

Occurrence,

you can see that it was used to mean at least three different kinds of

"things"

by different people.  What John is actually doing with his proposal is to

add

clarity about what an Occurrence is where it didn't exist before.  I think

that is

a good thing.  If, by the "old" kind of Occurrence people are meaning that
Occurrence is a fancier name for PreservedSpecimen (which I believe is how
some people in the museum community are thinking of it), then I would say
that such a characterization is incorrect (based on the apparent

consensus)

and that clarifying the incorrectness of that view is a really good thing.

Steve

Éamonn Ó Tuama (GBIF) wrote:

It would be good to hear from someone who is familiar with the work
going on in the Observations Task Group and could explain how a
generic model for observations/measurements (e.g. OBOE) might help
sort out these issues. It seems to me that we are trying to build in
an ad-hoc manner an increasingly complex model on top of DwC which is
really just a glossary of terms. That does not seem like a good
approach - but I'm no modeller :-) _Éamonn

-----Original Message-----
From: Dag Endresen (GBIF) [
mailto:dendresen@gbif.org
]
Sent: 13 September 2011 12:18
To: "Markus Döring (GBIF)"
Cc:
tdwg-content@lists.tdwg.org
; Éamonn Ó Tuama
Subject: Re: [tdwg-content] Occurrences, Organisms, and
CollectionObjects: a review

 Hi Markus,

 I believe that the discussion here originates from the view that the
"CollectionObject"/"Sample" is a different thing from the "Organism"
-  and that there can be a relationship between
CollectionObjects/Samples  and Organisms that could be difficult to
describe if these things are  identified as the same think
(occurrenceID). Do you think that the  "Occurrence" would be seen as
a thing different from the proposed  CollectionObject/Sample and
Organism - or as a super-class that would  include
CollectionObjects/Samples and Organisms? Would the semantics of

Occurrence change?

 I fully share your view that the Darwin Core Archive (DwC-A) would
not  be suited to share the full complex relationship between
entities - even  if persistent identifiers would be used. However if
we start to describe  and include other things (core types) than only
the taxon and  occurrences then perhaps the DwC-A could be a useful
way to provide a  simple list of these entities? This could perhaps
provide easier  indexing and discovery of these new entities?

 Dag



 On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:

I have to say that the change in semantics to the Occurrence class
makes me a bit nervous.
Can someone try help fighting my fears?

DarwinCore has no versioning of namespaces, so there is no way for a
consumer to detect if its an old style Occurrence or a new one. I am
currently parsing various RSS feeds and even though its a mess
having to parse 10 different styles I am glad that at least the
designers made sure they all have their own namespace! Also removing
or renaming terms might cause serious problems. Would discrete
versions of dwc with their own namespace hurt?

Another observation relates to dwc archives and its star schema. As
an index to data that has been flattened there is no problem with
more classes and core row types, but if you want it as a way to
transfer complete normalized data it will not work. But that never
really was the intention and I simply wanted to stress that fact.

Markus



On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:

Richard Pyle wrote:

I'm also wondering if we necessarily need to "break" the
traditional view of the "Occurrence" class in order to implement
Organism and CollectionObject.
As long as we keep in mind that DwC is a vocabulary of terms
focused on representing an exchange standard (rather than a
full-blown Ontology), perhaps Occurrence records can continue to
be represented in the traditional way as "flat" content, but the
Organism and CollectionObject classes allow us to present data in
a somewhat more "normalized" way in those circumstances that call
for it (e.g. tracking individuals or groups over time [Organism],
or managing fossil rocks with multiple taxa [CollectionObject] --
to name just two).

I've been thinking about this issue of "backward compatibility"
with respect to Occurrences if the
CollectionObject/Sample/Token/whatever
class is adopted.  I really don't think it is going to be as big of
a deal as people are making it out to be.

It seems to me that the main problems arise in two areas: when one
wants to be clear about typing and when one wants to express
relationships in a system where it is possible to do through
semantics (like RDF).
In
that kind of circumstance, it's bad (oh yeh, I forgot - the term is
"naughty") to say  something like
resourceA hasOccurrence resourceB
when resourceB isn't actually an Occurrence.   "Wrong" typing also
happens all the time because the classes don't exist (yet) to do
the typing correctly.  As a case in point, in the Morphbank system,
I have multiple images of the same tree.  In that system the tree
is typed as a "specimen".  That is totally wrong because the tree
isn't a specimen, but what else is it going to be typed as?  There
isn't (yet) an appropriate class to put it in.

Although these two problems (wrong typing and using a term with the
wrong kind of object which are actually different manifestations of
the same class-based problem) are naughty, realistically very few
people are actually using a system that is "semantic-aware" (e.g.
serving and consuming RDF) so right now making those mistakes
doesn't really "break"
anything.  Most data providers are using traditional databases or
even Excel spreadsheets where the DwC terms are just column
headings with no real "meaning" other than what the data managers
intend for them to mean.  So if a manager has a table where each
line contains a record for a specimen and has a column heading for
a column entitled "dwc:catalogNumber", there isn't really anything
other than an idea in the manager's head that the catalogNumber is
a property of a specimen or Occurence or CollectionObject.  If each
line in the database table is "flat" such that one specimen=one
CollectionObject=one Occurrence, all that is required to make
catalogNumber be a property of a CollectionObject instead of an
Occurrence is a different way of thinking in the managers mind
because there are really no semantics embedded in the table.  We
are already doing this kind of mental gymnastics with existing
classes like dwc:Identification .  If our hypothetical database
manager has a column heading that says "dwc:identifiedBy" in the
specimen table, that is really a property of dwc:Identification,
not dwc:Occurrence but again that is a distinction that is only
going to be made in the manager's mind.  Making the distinction
really only becomes an issue when the database stops being "flat"
for a particular relationship, e.g. if the database wants to allow
multiple Identifications per specimen record.  Then the database
structure must be changed accordingly to accommodate that
"normalization".

What we have here at the present moment is a situation where data
providers don't have any way to have anything but "flat" records
where 1
specimen=1 Occurrence=1 Organism.  By adding the Organism and
CollectionObject classes, we allow people who need or want to have
less "flat" (=more "normalized") databases to have something to
call the entities that are represented by the new tables they
create to handle 1:many relationships instead of 1:1 relationships.
Anybody who only cares about 1:1 relationships really doesn't need
to worry about the fact that the new class exists, just as people
currently don't have to worry about the Identification class if
they only allow one Identification per specimen in their database.

So I guess what I'm saying is that if a database manager has a
table labeled Occurrence, they really don't have to freak out if we
now tell them that their table actually should be labeled
CollectionObject as long as there is only one CollectionObject per
Occurrence.  They didn't freak out before when we told them that
they should call their table "Occurrence" instead of "Observation"
or "Specimen" in 2009, did they?

I think what I'm saying here is what Rich was trying to say in the
paragraph I quoted, but I'm not sure.

Steve

--
Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University
Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu



_______________________________________________
tdwg-content mailing list

tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list

tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

--
Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.
of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu