<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

OK, let's look at a concrete example.&nbsp; Take the specimen that is

illustrated at

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf">http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf</a><br>

The identifier

<a class="moz-txt-link-rfc2396E" href="http://www.cyberfloralouisiana.com/specimens/lsu000/0138">"http://www.cyberfloralouisiana.com/specimens/lsu000/0138"</a> could be

associated with this "thing".&nbsp; (It actually isn't, I just made it up

for an example.&nbsp; If you wish, you could substitute a UUID.)&nbsp; Now let's

say that someone previously had asserted:<br>

occurrenceID=<a class="moz-txt-link-freetext" href="http://www.cyberfloralouisiana.com/specimens/lsu000/0138">http://www.cyberfloralouisiana.com/specimens/lsu000/0138</a><br>

with the understanding that the Occurrence represented not only the

fact that a plant identified as&nbsp;<i>Egeria densa</i> occurred at the

location N 29.79 deg., W 90.632 deg. on 7 Sep 1977 but that it also

represented the actual dried plant specimen itself (i.e. the evidence

that the plant occurred there).&nbsp; This is the meaning of Occurrence that

was implied (but not stated very explicitly) in the 2009 Darwin Core

standard.&nbsp; <br>

<br>

Under the new definition of Occurrence that is under consideration, the

Occurrence represents the fact that a plant identified as&nbsp;<i>Egeria

densa</i> occurred at the location N 29.79 deg., W 90.632 deg. on 7 Sep

1977.&nbsp; These metadata fall under the

occurrenceID=<a class="moz-txt-link-freetext" href="http://www.cyberfloralouisiana.com/specimens/lsu000/0138">http://www.cyberfloralouisiana.com/specimens/lsu000/0138</a>.&nbsp;

Technically, the actual dried plant specimen itself is now not part of

the Occurrence but rather is a CollectionObject. Does that break

something?&nbsp; Does it force the institution to create a new identifier

for the CollectionObject that has just been defined into existence?&nbsp; I

think not.&nbsp; If the particular institution has ONLY occurrence records

for which single pieces of evidence are associated with each

Occurrence, then they have a flat database that does not distinguish

between the Occurrence and the CollectionObject associated with that

Occurrence.&nbsp; The change to the term definition is essentially

irrelevant to that institution.&nbsp; On the other hand, if the institution

decides that they have a new policy which requires that all collected

specimens must now be photographed prior to collection and a DNA sample

collected and submitted to Genbank, the new definitions provide a way

for them to associate three (or more) CollectionObjects having separate

collectionObjectIDs with the single Occurrence.&nbsp; If they "de-flatten"

their database to accommodate this more "normalized" structure, they

could easily implement a rule like 'put "#sp" after the identifier for

the Occurrence to construct a default identifier for the single

CollectionObject associated with that Occurrence (e.g.

collectionObjectID==<a class="moz-txt-link-freetext" href="http://www.cyberfloralouisiana.com/specimens/lsu000/0138#sp">http://www.cyberfloralouisiana.com/specimens/lsu000/0138#sp</a>

for the CollectionObject associated with the Occurrence having

occurrenceID=<a class="moz-txt-link-freetext" href="http://www.cyberfloralouisiana.com/specimens/lsu000/0138">http://www.cyberfloralouisiana.com/specimens/lsu000/0138</a>)

or make up new identifiers for the CollectionObjects if they want.&nbsp; But

no TDWG "Big Brother" is making them change their database structure or

add new identifiers unless they want to.<br>

<br>

To put this in perspective, look at

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf">http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf</a><br>

Here we have a specimen that has two dwc:Identifications, one that

asserts the taxon <i>Juncus diffusissimus</i> sensu L. Urbatsch and

one that asserts <i>Juncus debilis</i> sensu G. Montz.&nbsp; Accommodating

a single specimen which has two dwc:Identifications in a completely

flat database presents exactly the same problems as accommodating two

CollectionObjects for a single Occurrence. All of the issues that have

been raised for separating CollectionObjects from Occurrences apply

equally well to creating the Identification class (like "do I have to

assign new separate identifiers for the Identification instances" and

"do I "break" things if I allow multiple Identifications in a world

where people have databases that permit only a single determination per

specimen/occurrence/organism").&nbsp; I don't hear anybody gnashing their

teeth or frothing at the mouth

about the fact that we let the term dwc:Identification sneak into the

2009 Darwin Core and mess up our nice perfectly flat database world.&nbsp;

Somebody explain to me how the issues raised with CollectionObject is

different from this?<br>

<br>

Steve<br>

<br>

Richard Pyle wrote:

<blockquote cite="mid:003901cc7241$239e7650$6adb62f0$@bishopmuseum.org"

 type="cite">

  <pre wrap="">Ever since DwC transitioned from a "Federated Schema" to a "Vocabulary",

I've never been entirely clear on what sorts of alterations would break

backward-compatibility, and which are easily handled.  I've heard various

statements from people with much more understanding than I on the

implications of a "Vocabulary" that the classes are really intended as rough

clusters of terms, and it's the definition of terms that matter.  Have I

misunderstood this?  The point being: The only way we are threatening to

"break" DwC is by moving terms from the Occurrence class to two other new

classes.  Does that mean we are no longer allowed to represent those terms

as properties of a record with an OccurrenceID?  The tiny part of my brain

that "gets" ontology wants to believe that backward compatibility of what

would be the new DwC:Occurrence would be maintained with what is the

existing DwC:Occurrence *only* if the new classes ("Organism" and

"CollectionObject") are regarded as subclasses of Occurrence.  But the

slightly less tiny (but still tiny) part of my brain that "gets" information

modeling doesn't think that's the right way to represent the new classes.

Which tiny part of my brain is right? (I'm guessing neither...) Does it even

matter?

Obviously, we want a stable DwC.  But we also want a DwC that meets our

needs.  Clearly, there are needs that are not being met by the existing DwC.

The first question is, are those needs important enough to consider

destabilizing DwC (by introducing two new classes, and shuffling some terms

from one existing class to the new classes)?  The second question is: what

are the real costs/consequences of the "destabilization".  In my mind, the

answer to the first question is increasingly obvious ("yes").  But I don't

have a good feel for the answer to the second question.

Aloha,

Rich

P.S. Greg: I live on the other side of the world from *everyone*, yet that

hasn't prevented me from getting my words in... :-)

  </pre>

  <blockquote type="cite">

    <pre wrap="">-----Original Message-----

From: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-content-bounces@lists.tdwg.org">tdwg-content-bounces@lists.tdwg.org</a> [<a class="moz-txt-link-freetext" href="mailto:tdwg-content">mailto:tdwg-content</a>-

<a class="moz-txt-link-abbreviated" href="mailto:bounces@lists.tdwg.org">bounces@lists.tdwg.org</a>] On Behalf Of "Markus D&ouml;ring (GBIF)"

Sent: Tuesday, September 13, 2011 6:59 AM

To: Steve Baskauf

Cc: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>; "&Eacute;amonn &Oacute; Tuama (GBIF)"

Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects:

a review

Hi Steve,

I agree this is a good thing to me more clear about what an occurrence

actualize is and I would't disagree with the proposed 3 classes. Still

    </pre>

  </blockquote>

  <pre wrap=""><!---->there is a

  </pre>

  <blockquote type="cite">

    <pre wrap="">drastic change in semantics of an existing term Occurrence and I would

    </pre>

  </blockquote>

  <pre wrap=""><!---->feel

  </pre>

  <blockquote type="cite">

    <pre wrap="">more comfortable if we can tell those different usages apart. If thats via

    </pre>

  </blockquote>

  <pre wrap=""><!---->a

  </pre>

  <blockquote type="cite">

    <pre wrap="">namespace based versioning of (all?) darwin core terms, through the use of

    </pre>

  </blockquote>

  <pre wrap=""><!---->a

  </pre>

  <blockquote type="cite">

    <pre wrap="">different term name or sth else I don't know.

Don't you think this an issue? Would you also change an owl ontology class

definition in the same way and would't that be harmful to existing

    </pre>

  </blockquote>

<pre wrap="">instances?

  </pre>

  <blockquote type="cite">

    <pre wrap="">Markus

    </pre>

    <blockquote type="cite">

      <pre wrap="">With regards to Markus' concern about whether people will be able to

      </pre>

    </blockquote>

    <pre wrap="">know whether somebody is talking about a "new-style" Occurrence or an

"old" Occurrence, I would assert that the "old" Occurrence didn't really

    </pre>

  </blockquote>

  <pre wrap=""><!---->have

  </pre>

  <blockquote type="cite">

    <pre wrap="">a clear meaning.  If you review the summary of the discussion on

    </pre>

  </blockquote>

  <pre wrap=""><!---->Occurrence,

  </pre>

  <blockquote type="cite">

    <pre wrap="">you can see that it was used to mean at least three different kinds of

    </pre>

  </blockquote>

  <pre wrap=""><!---->"things"

  </pre>

  <blockquote type="cite">

    <pre wrap="">by different people.  What John is actually doing with his proposal is to

    </pre>

  </blockquote>

  <pre wrap=""><!---->add

  </pre>

  <blockquote type="cite">

    <pre wrap="">clarity about what an Occurrence is where it didn't exist before.  I think

    </pre>

  </blockquote>

  <pre wrap=""><!---->that is

  </pre>

  <blockquote type="cite">

    <pre wrap="">a good thing.  If, by the "old" kind of Occurrence people are meaning that

Occurrence is a fancier name for PreservedSpecimen (which I believe is how

some people in the museum community are thinking of it), then I would say

that such a characterization is incorrect (based on the apparent

    </pre>

  </blockquote>

  <pre wrap=""><!---->consensus)

  </pre>

  <blockquote type="cite">

    <pre wrap="">and that clarifying the incorrectness of that view is a really good thing.

    </pre>

    <blockquote type="cite">

      <pre wrap="">Steve

&Eacute;amonn &Oacute; Tuama (GBIF) wrote:

      </pre>

      <blockquote type="cite">

        <pre wrap="">It would be good to hear from someone who is familiar with the work

going on in the Observations Task Group and could explain how a

generic model for observations/measurements (e.g. OBOE) might help

sort out these issues. It seems to me that we are trying to build in

an ad-hoc manner an increasingly complex model on top of DwC which is

really just a glossary of terms. That does not seem like a good

approach - but I'm no modeller :-) _&Eacute;amonn

-----Original Message-----

From: Dag Endresen (GBIF) [

<a class="moz-txt-link-freetext" href="mailto:dendresen@gbif.org">mailto:dendresen@gbif.org</a>

]

Sent: 13 September 2011 12:18

To: "Markus D&ouml;ring (GBIF)"

Cc:

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

; &Eacute;amonn &Oacute; Tuama

Subject: Re: [tdwg-content] Occurrences, Organisms, and

CollectionObjects: a review

 Hi Markus,

 I believe that the discussion here originates from the view that the

"CollectionObject"/"Sample" is a different thing from the "Organism"

-  and that there can be a relationship between

CollectionObjects/Samples  and Organisms that could be difficult to

describe if these things are  identified as the same think

(occurrenceID). Do you think that the  "Occurrence" would be seen as

a thing different from the proposed  CollectionObject/Sample and

Organism - or as a super-class that would  include

CollectionObjects/Samples and Organisms? Would the semantics of

        </pre>

      </blockquote>

    </blockquote>

    <pre wrap="">Occurrence change?

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <pre wrap=""> I fully share your view that the Darwin Core Archive (DwC-A) would

not  be suited to share the full complex relationship between

entities - even  if persistent identifiers would be used. However if

we start to describe  and include other things (core types) than only

the taxon and  occurrences then perhaps the DwC-A could be a useful

way to provide a  simple list of these entities? This could perhaps

provide easier  indexing and discovery of these new entities?

 Dag

 On Tue, 13 Sep 2011 10:03:00 +0200, Markus D&ouml;ring (GBIF) wrote:

        </pre>

        <blockquote type="cite">

          <pre wrap="">I have to say that the change in semantics to the Occurrence class

makes me a bit nervous.

Can someone try help fighting my fears?

DarwinCore has no versioning of namespaces, so there is no way for a

consumer to detect if its an old style Occurrence or a new one. I am

currently parsing various RSS feeds and even though its a mess

having to parse 10 different styles I am glad that at least the

designers made sure they all have their own namespace! Also removing

or renaming terms might cause serious problems. Would discrete

versions of dwc with their own namespace hurt?

Another observation relates to dwc archives and its star schema. As

an index to data that has been flattened there is no problem with

more classes and core row types, but if you want it as a way to

transfer complete normalized data it will not work. But that never

really was the intention and I simply wanted to stress that fact.

Markus

On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:

          </pre>

          <blockquote type="cite">

            <pre wrap="">Richard Pyle wrote:

            </pre>

            <blockquote type="cite">

              <pre wrap="">I'm also wondering if we necessarily need to "break" the

traditional view of the "Occurrence" class in order to implement

Organism and CollectionObject.

As long as we keep in mind that DwC is a vocabulary of terms

focused on representing an exchange standard (rather than a

full-blown Ontology), perhaps Occurrence records can continue to

be represented in the traditional way as "flat" content, but the

Organism and CollectionObject classes allow us to present data in

a somewhat more "normalized" way in those circumstances that call

for it (e.g. tracking individuals or groups over time [Organism],

or managing fossil rocks with multiple taxa [CollectionObject] --

to name just two).

              </pre>

            </blockquote>

            <pre wrap="">I've been thinking about this issue of "backward compatibility"

with respect to Occurrences if the

CollectionObject/Sample/Token/whatever

class is adopted.  I really don't think it is going to be as big of

a deal as people are making it out to be.

It seems to me that the main problems arise in two areas: when one

wants to be clear about typing and when one wants to express

relationships in a system where it is possible to do through

semantics (like RDF).

In

that kind of circumstance, it's bad (oh yeh, I forgot - the term is

"naughty") to say  something like

resourceA hasOccurrence resourceB

when resourceB isn't actually an Occurrence.   "Wrong" typing also

happens all the time because the classes don't exist (yet) to do

the typing correctly.  As a case in point, in the Morphbank system,

I have multiple images of the same tree.  In that system the tree

is typed as a "specimen".  That is totally wrong because the tree

isn't a specimen, but what else is it going to be typed as?  There

isn't (yet) an appropriate class to put it in.

Although these two problems (wrong typing and using a term with the

wrong kind of object which are actually different manifestations of

the same class-based problem) are naughty, realistically very few

people are actually using a system that is "semantic-aware" (e.g.

serving and consuming RDF) so right now making those mistakes

doesn't really "break"

anything.  Most data providers are using traditional databases or

even Excel spreadsheets where the DwC terms are just column

headings with no real "meaning" other than what the data managers

intend for them to mean.  So if a manager has a table where each

line contains a record for a specimen and has a column heading for

a column entitled "dwc:catalogNumber", there isn't really anything

other than an idea in the manager's head that the catalogNumber is

a property of a specimen or Occurence or CollectionObject.  If each

line in the database table is "flat" such that one specimen=one

CollectionObject=one Occurrence, all that is required to make

catalogNumber be a property of a CollectionObject instead of an

Occurrence is a different way of thinking in the managers mind

because there are really no semantics embedded in the table.  We

are already doing this kind of mental gymnastics with existing

classes like dwc:Identification .  If our hypothetical database

manager has a column heading that says "dwc:identifiedBy" in the

specimen table, that is really a property of dwc:Identification,

not dwc:Occurrence but again that is a distinction that is only

going to be made in the manager's mind.  Making the distinction

really only becomes an issue when the database stops being "flat"

for a particular relationship, e.g. if the database wants to allow

multiple Identifications per specimen record.  Then the database

structure must be changed accordingly to accommodate that

"normalization".

What we have here at the present moment is a situation where data

providers don't have any way to have anything but "flat" records

where 1

specimen=1 Occurrence=1 Organism.  By adding the Organism and

CollectionObject classes, we allow people who need or want to have

less "flat" (=more "normalized") databases to have something to

call the entities that are represented by the new tables they

create to handle 1:many relationships instead of 1:1 relationships.

Anybody who only cares about 1:1 relationships really doesn't need

to worry about the fact that the new class exists, just as people

currently don't have to worry about the Identification class if

they only allow one Identification per specimen in their database.

So I guess what I'm saying is that if a database manager has a

table labeled Occurrence, they really don't have to freak out if we

now tell them that their table actually should be labeled

CollectionObject as long as there is only one CollectionObject per

Occurrence.  They didn't freak out before when we told them that

they should call their table "Occurrence" instead of "Observation"

or "Specimen" in 2009, did they?

I think what I'm saying here is what Rich was trying to say in the

paragraph I quoted, but I'm not sure.

Steve

--

Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University

Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

_______________________________________________

tdwg-content mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

            </pre>

          </blockquote>

        </blockquote>

        <pre wrap="">

_______________________________________________

tdwg-content mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

        </pre>

      </blockquote>

      <pre wrap="">--

Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.

of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

      </pre>

    </blockquote>

    <pre wrap="">_______________________________________________

tdwg-content mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->

.

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

</pre>

</body>

</html>