<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content=text/html;charset=ISO-8859-1 http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 8.00.6001.18939">

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff text=#000000>

<DIV><FONT size=2 face=Arial>Dear Moderator,</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>Could you please&nbsp;unsubscribe me from all TDWG 

mailing lists? Unfortunately, without my will I am flooded with TDWG emails 

since 8th October. </FONT><FONT size=2 face=Arial>Most of subjects are 

very&nbsp;interesting, but I have my job to do.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>Yours, Yuri </FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 

face=Arial>----------------------------------------------------------------------<BR>Dr. 

Yury Roskov<BR>Catalogue of Life Executive Editor<BR>School of Biological 

Sciences<BR>The Harborne Building<BR>The University of Reading<BR>Reading, RG6 

6AS, UK<BR>&nbsp;<BR>Tel. +44 (0) 118 378 6466<BR>Fax +44 (0) 118 378 

8160<BR>E-mail: <A 

href="mailto:y.roskov@reading.ac.uk">y.roskov@reading.ac.uk</A><BR>&nbsp;<BR><A 

href="http://www.sp2000.org">www.sp2000.org</A>, <A 

href="http://www.catalogueoflife.org">www.catalogueoflife.org</A><BR>EC 

projects: <A href="http://www.4d4life.eu">www.4d4life.eu</A>, <A 

href="http://www.i4life.eu">www.i4life.eu</A> 

<BR>----------------------------------------------------------------------<BR></FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<BLOCKQUOTE 

style="BORDER-LEFT: #000000 2px solid; PADDING-LEFT: 5px; PADDING-RIGHT: 0px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">

  <DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>

  <DIV 

  style="FONT: 10pt arial; BACKGROUND: #e4e4e4; font-color: black"><B>From:</B> 

  <A title=steve.baskauf@vanderbilt.edu 

  href="mailto:steve.baskauf@vanderbilt.edu">Steve Baskauf</A> </DIV>

  <DIV style="FONT: 10pt arial"><B>Cc:</B> <A title=tdwg-content@lists.tdwg.org 

  href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</A> 

  </DIV>

  <DIV style="FONT: 10pt arial"><B>Sent:</B> Wednesday, October 27, 2010 11:42 

  AM</DIV>

  <DIV style="FONT: 10pt arial"><B>Subject:</B> Re: [tdwg-content] Treatise on 

  Occurrence, tokens, and basisOfRecord</DIV>

  <DIV><BR></DIV>Please note that in various examples, I have incorrectly placed 

  rdf:type in the namespace rdfs: (<A class=moz-txt-link-freetext 

  href="http://www.w3.org/2000/01/rdf-schema#">http://www.w3.org/2000/01/rdf-schema#</A>) 

  rather than rdf: (<A class=moz-txt-link-freetext 

  href="http://www.w3.org/1999/02/22-rdf-syntax-ns#">http://www.w3.org/1999/02/22-rdf-syntax-ns#</A>).&nbsp; 

  Thanks to Bob for pointing out this serious error.&nbsp; <BR><BR>Also, the ACS 

  model information is very cool.&nbsp; I wish I'd seen it a long time 

  ago.&nbsp; I especially like the giant relationship chart.&nbsp; Thanks Stan 

  and Rich.<BR>Steve<BR><BR>Steve Baskauf wrote: 

  <BLOCKQUOTE cite=mid:4CC3B6B2.3020808@vanderbilt.edu 

    type="cite">Rich,<BR>Thanks for taking the time to read the whole 

    thing.&nbsp; Based on the first series of comments you made, it seems as 

    though we are in agreement on most points.&nbsp; I think that what I wrote 

    was (as I had anticipated) somewhat less clear due to my use (or failure to 

    use) some appropriate terms to describe what I was talking about.&nbsp; For 

    example, when I said "atomized" I probably should have said something like 

    "fine-grained" and correct use of the term "normalized" would have 

    helped.&nbsp; Some other comments inline:<BR><BR>Richard Pyle wrote:<BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop 

      type="cite"><BLOCKQUOTE type="cite"><PRE wrap="">I believe that historically the assumed token model has been

the one which most people have had in mind.

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

Actually, I've always envisioned it as you have in your token-explicit

version (and have said as much at various meetings to discuss DwC, going

back to 1.0).  In fact, I remember discussing this exact issue with Stan

Blum long before DwC existed (he was the first to suggest to me the term

"evidence" in this context -- which I think is functionally equivalent to

your "token"). However, I've conceeded that this level of normalization

would probably be too much for the intended purpose of the DwC terms.  But

I'll keep an open mind on that.

  </PRE>

      <BLOCKQUOTE type="cite"><PRE wrap="">Before the new DwC standard, we had specimens and we had

observations.  In order to avoid redundancies in terms for

those two types of "things", a combined "thing" called

"Occurrence" was created.  An Occurrence that was an

observation didn't have a token and an Occurrence that

was a specimen had a physical or living specimen as its

token.

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

My rationalization of it in the early days (pre-DwC) was that *everything*

was effectively an observation, and beyond that, the only question was a

matter of evidence.  In my earliest models, I categorized "evidence" into

"Specimen", "Image", "Literature Report", and "Unvouchered Observation" (I

was using the word "voucher" in the general sense, as in the verb "to vouch"

-- not in the more specific sense for our community, which implies "Specimen

preserved in Museum").  My read on the history of DwC is that it was

initially established as a means to aggregate and/or share Specimen data

amongst Museums (hence its Specimen-centric nature).  Later, the

Specimen/Observation dichotomy was introduced to allow DwC content to allow

more sophisticated and complete representations of the occurrence of

organisms in place and time, because there was muchmore information than

what existed as specimens in Museums.  In my mind, the "Observation" side

was effectively a collapsing of my "Image", "Literature Report" and

"Unvouchered Observation" -- which I was OK with in the context of the time.

Because at the time, the vast majority of content available in computer

databases came from museum specimen databases, and from observational

databases (largely in the bird realm).

  </PRE></BLOCKQUOTE>Well, I'm not surprised that the ideas that I'm trying 

    to put down in words and diagrams predate my entry into this arena a year 

    and a half ago.&nbsp; What is a bit frustrating to me is that ideas like 

    these aren't laid out in an easy-to-understand fashion and placed in 

    easy-to-find places.&nbsp; I have spent much of that last year and a half 

    trying to understand how the whole TDWG/DwC universe is supposed to fit 

    together.&nbsp; I think that the idea of having the Google Code site where 

    there are explanations and examples for the various DwC terms is the kind of 

    thing we need.&nbsp; Unfortunately, most of the terms do not yet have 

    entries there.&nbsp; Perhaps I'm just impatient.&nbsp; If it turns out that 

    any of the summaries that I've written here accurately reflect any kind of 

    consensus, then maybe someone could "clean them up" (i.e. use correct 

    technical terms after giving definitions of what they mean) and paste them 

    somewhere where people can find them.&nbsp; That would prevent another 

    person 10 years from now re-articulating the same ideas a third time.&nbsp; 

    I'm particularly thinking of the summary diagram <A 

    class=moz-txt-link-freetext 

    href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif" 

    moz-do-not-send="true">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</A> 

    along with an explanation of how people use the more normalized and more 

    flattened versions of it.&nbsp; We already do have quite lucid examples in 

    the Simple Darwin Core (flattened) and Darwin Core XML guide (normalized), 

    but some sort of overview of the big picture might be helpful.&nbsp; If an 

    RDF guide ever gets off the ground, that would be another example of how the 

    relationships assumed in DwC are expressed in a very explicit way.&nbsp; 

<BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop type="cite"><PRE wrap="">So...I see the current iteration of DwC as another step in the evolution of

moving from "sharing and aggregating specimen data among museums" to

"documenting biodiversity in nature".  It's not all the way into the fully

normalized representation of biodiversity data, but it's far enough that it

is a nice compromise between practical and effective for the majority of the

user constituency. In my mind, the next logical step in this evolutionary

trajectory would be to recognize "Individual" as a class (which DwC is

apready primed for, via individualID).

  </PRE></BLOCKQUOTE>I think I understand the message that you are trying to 

    convey above and in your later comments about creating new versions of DwC 

    (or new evolutionary states of DwC) that don't break the previous 

    ones.&nbsp; I think that is one reason why the process of examining and 

    clearly articulating the community consensus on what Darwin Core terms and 

    classes "mean" and how they are connected to each other is so important 

    before we embark on implementing GUIDs and RDF.&nbsp; Pete has suggested 

    that we may need a second version of DwC in order to make it work in the 

    Linked Open Data world and he's probably right.&nbsp; I'm not sure that the 

    existing vocabulary has all of the terms we need to do that.&nbsp; However, 

    if we are going to "evolve" Darwin Core so that it will work in the LOD 

    world, I hope that we do it in such a way that we maintain the same 

    "meaning" of things as Darwin Core 1.0 .&nbsp; I think that is the way to 

    maintain the kind of "stability" that you described below.&nbsp; <BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop 

      type="cite"><BLOCKQUOTE type="cite"><PRE wrap="">Unlike specimens where the token's metadata terms are placed in the

Occurrence class, I guess in the case of an image one is supposed

to use associatedMedia to link the so-called MachineObservation to

the image record.  If DNA were extracted, one would link the

sequence to the Occurrence using associatedSequences (although

it's not clear to me what the basisOfRecord for that would be -

"TookATissueSample"?).  But what does one do for other kinds of

tokens, like seeds or tissue samples - create terms like

associatedSeed and associatedTissueSample?

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

In my mind, things like seeds, tissue samples, and DNA sequences are simply

different kinds of specimens (just like dried skeletons vs. botanical

pressed sheets vs. whole organisms in jars of alcohol vs. prepared skins,

etc.)  They may have certain properties specific to each subclass of

specimen, but fundamentally I think it's fair to treat them as specimens.

DNA sequences are a bit different, of course, because they are not the

"stuff" of an organism, but rather an indirect representation of the

"stuff".  In my mind, that difference justifies associatedSequences, where

we don't have associatedSeeds, associatedTeeth, associatedSkins,

associatedSkeletons, etc.

  </PRE></BLOCKQUOTE>Your point is well taken in that we don't need a 

    proliferation of types of associated tokens.&nbsp; We need as many different 

    token "types" as we have coherent sets of metadata terms.&nbsp; One of the 

    points of typing resources is to let potential users know what kinds of 

    metadata properties (terms) they can reasonably expect to receive about that 

    resource.&nbsp; If one will receive the same set of properties about two 

    kinds of resources (e.g. skins and skeletons), there is no reason to type 

    them differently.&nbsp; The point that I was trying to get at (eventually) 

    was that it was inconsistent to say that images need to be referenced as 

    associatedMedia and sequences needed to be referenced as 

    associatedSequences, and yet not say that specimens needed to be referenced 

    as "associatedSpecimens".&nbsp; I actually think that based on Roger's 

    explanation of "to subclass or not" (<A class=moz-txt-link-freetext 

    href="http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot" 

    moz-do-not-send="true">http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot</A>), 

    it makes more sense to talk about using a generic "hasToken" or "tokenID" 

    along with "tagging" the token using rdfs:type (as I suggested toward the 

    end of my "treatise") rather than a bunch of associatedXXXX terms.&nbsp; 

    <BR><BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop type="cite"><PRE wrap="">  </PRE>

      <BLOCKQUOTE type="cite"><PRE wrap="">If we accept the explicit token model, then as a biodiversity

informatics resource type "observation" will have to disappear

into a puff of nothingness

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

Not necessarily.  See my comment earlier about patterns on neurons in a

human brain that constitute a memory.  Just as a digital image rendered on a

hard disk requires certain machinery to convert into photons that strike our

retinas (i.e., a computer and monitor), so too does a memory require such

machinery (e.g., the brain itself, transmission of sound waves via vocal

chords, soud ways striking ear drums, etc.)  This may sound weird, but I'm

being serious: a human memory is, fundamentally, every bit as much of a

"token" as a specimen or a digital image.  It's just considerably less

accessible and well-resolved.

  </PRE></BLOCKQUOTE>I guess I'm thinking about this in terms of a token 

    being something to which we can assign an identifier and retrieve a 

    representation (a la representational state transfer).&nbsp; Although I 

    don't deny the existence of memory patterns in neurons that are associated 

    with a HumanObservation, there isn't any way that we can receive a 

    representation of that memory directly.&nbsp; If the person draws a sketch 

    of what he/she remembers, then we have a media item that we can convert into 

    a digital form and transmit through the Internet (a token).&nbsp; If the 

    person types up notes, then we have a text document&nbsp; (a token that can 

    also be delivered as a digital file or scan of typewritten page).&nbsp; On 

    the other hand, if the person simply records the values of recordedBy, 

    eventDate, and Location terms, then we have only Occurrence metadata (no 

    token).&nbsp; If someone claims "basisOfRecord=HumanObservation" and has no 

    token of any kind, then what is there that is deliverable other than the 

    basic Occurrence metadata?&nbsp; That's why I'm claiming that 

    basisOfRecord=HumanObservation simply corresponds to an Occurrence record 

    with no token.<BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop type="cite"><PRE wrap="">  </PRE>

      <BLOCKQUOTE type="cite"><PRE wrap="">Realistically, I can't see this kind of separation ever happening,

given the amount of trouble it's been just to get a few people

to admit that Individuals exist.

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

I don't think the issue was ever in convincing people that Individuals exist

-- that much, I think, was clear to everyone (as proof: see

dwc:individualID).  The issue was always more about where the current DwC

should lie on the scale of highly flattened (e.g., DwC 1.0) to highly

normalized (e.g., ABCD and CDM).  It's necessarily a compromise between

modelling the information "as it really is", vs. modelling the information

in a way that's both accessible to the majority to content providers, and

useful to the majority of contnent consumers.  I think we both understand

what the trade-offs are in either direction. The question is, what is the

"sweet spot" for the majority of our community at this time in history?

I would venture that at the time DwC 1.0 was developed, that hit the sweet

spot reasonably well.  As more content holders develop inclreasingly

sophisticated DBMS for their content, and as the user community delves into

increasingly sophisticated analyses of the data, the "sweet spot" will shift

from the flattened end of the scale to the normalized end of the scale. And,

I would hope, DwC wll evolve accordingly.

  </PRE>

      <BLOCKQUOTE type="cite"><PRE wrap="">It is just too hard to get motion to happen in the TDWG community.

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

People make the same complaint about another organization that I'm involved

with (ICZN).  But here's the thing: as in the case of nomenclature,

stability in itself can be a very important thing.  If DwC changed every six

months, then by the time people developed software apps to work with it,

those apps would already be obsolete.  If someone writes code that consumes

DwC content as expressed in the current version of DwC, then that code may

break if people start providing content with class:individual and

class:token content.  If our community is going to move forward

successfully, I think standards like DwC need to evolve in a punctuated way,

rather than a gradualist way (same goes for the Codes of nomenclature). That

is, a bit of inertia in the system is probably a good thing.

  </PRE>

      <BLOCKQUOTE type="cite"><PRE wrap="">OK, I've now gone on for eight pages of text explaining the

rationale behind the question.  So I'll return to the basic

question: is the consensus for modeling the relationship

between an Occurrence and associated token(s) the assumed

token model:

      <A class=moz-txt-link-freetext href="http://bioimages.vanderbilt.edu/pages/token-assumed.gif" moz-do-not-send="true">http://bioimages.vanderbilt.edu/pages/token-assumed.gif</A>

      or the explicit token model:

      <A class=moz-txt-link-freetext href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif" moz-do-not-send="true">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</A>

      ?

    </PRE></BLOCKQUOTE><PRE wrap=""><!---->

Here's how I would answer:  When modelling my own databases, tracking my own

content, I would *definitely* (and indeed already have, for a long time now)

go with the token-expicit.

But when deciding on a community data exchange standard (i.e., DwC),

compromise between flat and normalized is still a necesssity, and as such,

the answer in terms of modifying DwC needs to take into account the form of

the bulk of the existing content, the needs of the bulk of the existing

users/consumers, and the virtues of stability of Standards in a world where

software app development time stretches for months or years.

Maybe the answer to this is to treat different versions of DwC as

concurrent, rather than serial.  That is, as long as the next most

sophisticated version can easily be "collapsed" to all previous versions

(aka, backward compatibility), then maybe we just need a clear mechanism for

consuming applications to indicate desired DwC version. That way, apps

developed to work with v2.1 can indicate to a provider that is capable of

produding v3.6 content, that they want it in v2.1 format.  Assuming we

maintain backward compatibility (i.e., the more-normalized version can be

easily collapsed to the more flattened version), then is should be a very

simple matter for the content provider to stream the same content in v2.1

format.

  </PRE></BLOCKQUOTE>Yes, I agree about this concept.&nbsp; I think that 

    what I'm really advocating for is that we agree on what the most normalized 

    model is that will connect all of the existing Darwin Core classes and 

    terms.&nbsp; In that sense, when I'm asking for Individual to be accepted as 

    a class, I'm not arguing for a "new" thing, I'm arguing for a clarification 

    of what we mean when we use the existing term dwc:individualID.&nbsp; When 

    I'm asking for terms to facilitate a logically consistent way to connect 

    Occurrences with their tokens, I'm also not really asking for an expansion 

    of Darwin Core, I'm asking for a more consistent model than "subclassing" by 

    using associatedMedia and associatedSequences but not using 

    "associatedSpecimens".&nbsp; I think that this is important because if we 

    don't agree on these things, we are going to have a royal mess on our hands 

    if we try to start trying to develop an RDF guide for Darwin Core.&nbsp; As 

    an eternal optimist, I think that describing a fully normalized model that 

    can be translated into RDF can be achieved with only a few minor additions 

    to the existing terms as opposed to requiring a complete new version.&nbsp; 

    If we really need to completely rewrite Darwin Core for RDF I don't have any 

    delusions that it will be accomplished before I retire.&nbsp; <BR>

    <BLOCKQUOTE cite=mid:3D9122E1A02541B2B3B08FE1673E3755@RLPLaptop type="cite"><PRE wrap="">But now I'm dabbling in areas that are WAY outside my scope of expertise...

Anyway...I would reiterate that I, for one, appreciate that you took the

time to write all this down (took me over 3 hours to read &amp; respond -- so

obviously I care! -- of course, I'm waiting for a taxi to go to the airport,

so really not much else for me to do right now).  If I didn't reply to parts

of your message, it was either because I agreed with you and had nothing to

elaborate or expound upon, or I didn't really understand (e.g., all the rdf

stuff).

  </PRE></BLOCKQUOTE>Again, thanks for taking the time to read and 

    comment.<BR><BR>Steve<BR><PRE class=moz-signature cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<A class=moz-txt-link-freetext href="http://bioimages.vanderbilt.edu" moz-do-not-send="true">http://bioimages.vanderbilt.edu</A>

  </PRE></BLOCKQUOTE><BR><PRE class=moz-signature cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<A class=moz-txt-link-freetext href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</A>

</PRE>

  <P>

  <HR>

  <P></P>_______________________________________________<BR>tdwg-content mailing 

  list<BR>tdwg-content@lists.tdwg.org<BR>http://lists.tdwg.org/mailman/listinfo/tdwg-content<BR></BLOCKQUOTE></BODY></HTML>