Steve,<div><br></div><div>I think that it might be useful to point out that there are actually two different &quot;kinds&quot; of DarwinCore at least as I see it</div><div><br></div><div>1) The first is the current more XML-ish version that people use for their records,  A format largely intended to consumed by GBIF.</div>

<div>This is a representation that allows users to map the Excel or RDMB tables to a standard set of fields.</div><div>This appears to pretty close to workable. Two areas where I am not sure about are:</div><div><br></div>

<div>    Does this form need an &quot;Individual&quot; class as you propose?</div><div>    Should this form allow the use of the geo vocabulary?</div><div><br></div><div>In this version, something like TaxonConceptID is probably fine because the consuming application will know to look at the contents of that field, determine if it is an LSID or a URI and handle it appropriately. Generic semantic web tools will not be able to handle these but if this is</div>

<div>understood ahead of time, it is not a problem. These are not made for generic tools but for specific tools and use cases.</div><div><br></div><div><div>So what I propose is we address the issues of the current XMLish DarwinCore so that it handles some of the issues you mention. </div>

<div><br></div><div>I think Markus would probably have some valuable insight into how this can be done in a way that will work with GBIF.</div></div><div><br></div><div>This should result in a clear form 1 version of the Darwin core fairly quickly.</div>

<div><br></div><div><div>This allows the submission process to continue without confusion while the issues of the other version get worked out.</div></div><div><br></div><div>For most users, form 1 is all they will need to understand.</div>

<div><br></div><div>2) The second version is a more fully semantacized version which I think will require a lot more discussion. This version should be understandable</div><div>by the generic semantic web tools and should ideally also work well in the LOD cloud. But it will take some time for us to agree how this should</div>

<div>be done and even longer for the general community to get familiar with it. For most users, form 1 is all they will need to understand.</div><div><br></div><div>Once we have actually figured out a more semantic version that makes sense, GBIF can process and express the submitted data in this version.</div>

<div>(I am over simplifying here but I will be more specific below)</div><div><br></div><div>This process might take a while and I think a number of groups will probably want to stick to submitting their data in DWC form 1.</div>

<div><br></div><div>I think the issue that you and I have run into are where we are trying to use form 1 as if it was more like form 2 and running into problems.</div><div><br></div><div>Some of these issues are relatively easy to solve, others are very tricky and are only visible when you try to run specific kinds of SPARQL queries on the knowledge base.</div>

<div><br></div><div>Others are even more complicated. For instance, I see that there might be a need for several forms of &quot;species concepts&quot;.</div><div><br></div><div>Those that I have created have specific use cases in mind. A lot of the use cases that Steve&#39;s describes overlap with my use cases.</div>

<div><br></div><div>Most of my concepts are structured to provide basic informatin and a map to other potentially related information sources.</div><div><br></div><div>What I eventually intend them to be serve as a form of &quot;key element&quot; that that allows different people to repeatably map the same</div>

<div>specimens to the same concepts. </div><div><br></div><div>I think these will eventually useful as concepts for DWC form 1 because they basically represent a  &quot;key element&quot;</div><div><br></div><div>As currently proposed, do they cover some of the issues and relationships handled by Rich Pyle&#39;s TNC&#39;s?  </div>

<div><br></div><div>No they don&#39;t.</div><div><br></div><div>(We do think that we can make some interesting and useful interlinkages between the TNC&#39;s and the TXN&#39;s)</div><div><br></div><div>There are probably other groups with use cases for species concepts which may necessitate different underlying assumptions.</div>

<div><br></div><div>So, for just the aspects relating to species concepts, there still needs to be a lot more discussion.</div><div><br></div><div>As to the related things like occurrences, and individuals, it is clear to me there are a lot of similarities and overlap. </div>

<div><br></div><div>The differences seem to mainly involve the specifics of the RDF representations.</div><div><br></div><div>There are other aspects where the meaning or mental constructs are different for what - may at first - seem similar. For instance, I would see the</div>

<div>specimens derived from one individual plant but now existing as two separate plant in a different locations as a separate individuals. In the same way</div><div>as I see identical twins as different individuals. </div>

<div><br></div><div>So we have a slightly different conceptualization of what an &quot;individual&quot; is.</div><div><br></div><div>To solve this we will need to have some sort of meeting or videoconference between those groups that have a pretty good understanding of RDF.</div>

<div>This will allow us to hash these issues out. Here we could explain our different conceptualizations and use cases and see if we can come up with</div><div>some common standards that will allow people to do what they need to do while following a common standard.</div>

<div><br></div><div>This will need some sort of whiteboard enabled discussion, sample data sets, and example SPARQL queries.</div><div><br></div><div>We should also have use cases in the form of SPARQL queries, that allow the resulting knowledge bases to be successfully queried in the ways that people need, and return the kinds of results they expect.</div>

<div><br></div><div>It appears to me that the more semanticized version of the DarwinCore will require RDF that differentiated between:</div><div><br></div><div>1) entries that are literals, </div><div>2) entries that a LOD compatible URI&#39;s</div>

<div>3) entries that are LSID&#39;s dereferencable via a proxy.</div><div><br></div><div>There are a number basic issues involving RDF, which the members will need to have a common understanding about.</div><div><br></div>

<div>Does the entire TDWG community have to understand these nuances? </div><div><br></div><div>No, but the group working on a more semantic version will need to.</div><div><br></div><div>When we think we have something workable, I would suggest that we have others in the semantic web and LOD community look at it to see if we missed something. </div>

<div><br></div><div>If this all goes well, we will have an example data set, and documentation that explains how it works and why specific things are done the way they are.</div><div><br></div><div>This then goes to GBIF (if they want it) where they can work out exposing their current records in this new format.</div>

<div><br></div><div>My guess is that this might take a few passes to get right - simply because that is the nature of the beast. </div><div><br></div><div>GBIF will have to come up with some A.I type error and interpretation system to process data. (Which is a pretty cool problem in itself)</div>

<div><br></div><div>They seem to already have a lot of this in place.</div><div><br></div><div>The initial example set and a GBIF set can be included in some of the gigantic billion triple challenge and inferencing studies that are going on.</div>

<div><br></div><div>I suspect that these studies will expose some issues and insights that we would have never been able to see ourselves.</div><div><br></div><div>These studies will also feed back on the process standards process, where we can figure out how to deal with the strange edge-cases they expose.</div>

<div><br></div><div>In the end, we should have a well-vetted format, that is also well understood.</div><div><br></div><div>So what I propose is that do the following to work out a future more semantic version of the DarwinCore:</div>

<div><br></div><div>We break this into a smaller group to work on pulling together the similar semantic representations, come with a list of use cases and a test set.</div><div><br></div><div>The discussion of this new version should probably be moved to one of the separate lists (as, I think, Markus suggested).</div>

<div><br></div><div>This will avoid confusion between issues in the current DarwinCore and those issues relating to some more fully &quot;semantic&quot; future representation.</div><div><br></div><div>While this potential standard gets worked out groups can use form one of the DarwinCore to submit their records. </div>

<div><br></div><div>As I said previously, I think this might be the only version they really need to be familiar with.</div><div><br></div><div>Does this plan seem to make sense to everyone?</div><div><br></div><div>Respectfully,</div>

<div><br></div><div>- Pete</div><div><br></div><div><br></div><div><br><div class="gmail_quote">On Thu, Oct 14, 2010 at 10:43 AM, Steve Baskauf <span dir="ltr">&lt;<a href="mailto:steve.baskauf@vanderbilt.edu">steve.baskauf@vanderbilt.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div bgcolor="#ffffff" text="#000000">

Thanks for the various replies.  I&#39;m going to try to respond to several

of them in this one.  I realize that these lengthy replies may

overwhelm some readers.  However, I will beg your collective indulgence

because I&#39;ve got a proposal on the table for adding Individual as a

Darwin Core class.  It appears that the submission process is moving

forward and you can consider this as the &quot;pleading of my case&quot; for why

that addition is desirable (and in my opinion) necessary.<br>

<br>

One point which I think has permeated the Darwin Core discussions since

I&#39;ve started following them is that DwC is designed to facilitate many

uses.  Although somebody might use Occurrence records to make dots on a

distribution map, somebody else might be using the same records to

track the movement of the individual organism as it swims around the

sea.  Somebody else may just be using the location and time metadata to

demonstrate that the photo that they took places the organism in a

reasonable location for the species they assert they have

photographed.  Another person may be using the location and time

metadata to indicate that two species co-occurred at the same location

at the same time.  Darwin Core will be functioning well when it allows

occurrence records to do any of these things or possibly all of these

things at the same time.  The case that I&#39;ll try to make here is that

Darwin Core

mostly allows these things, but lack of an Individual class is making

it difficult to do some of them.  I will illustrate with a couple

examples.<br>

<br>

The first one is the problem of tracking an individual over time.  As

Rich correctly points out, the &quot;new&quot; Darwin Core standard has the term

dwc:individualID which is designed to facilitate exactly this kind of

thing.  In a previous thread when we discussed the appropriate use of

the xxxxID terms, I believe that there was a consensus that using them

as &quot;idrefs&quot; (I can&#39;t remember the technical database term for this, I

mean when an item in a record points to the identifier of another

record) was appropriate.  In a flat &quot;table-based&quot; database system, you

would just have a table of records (i.e. rows) for some kind of

&quot;thing&quot; with a column heading of  &quot;xxxxID&quot;.  You would place the

identifier for

the related other thing in that column.  In the case of

dwc:individualID, the rows would be occurrence records and the entry in

the individualID column would be the identifier for the individual.  In

RDF, you would make

statements asserting the relationship between the thing and the other

thing.  For example, if you wanted to say that a dwc:Identification

asserted that something was a particular dwc:Taxon, you could make the

statement in RDF that [identification] dwc:taxonID [taxon], where

[identification] and [taxon] are instances of those two classes that

have been assigned some kind of (hopefully gobally unique)

identifiers.  In the case of asserting that a number of occurrence

records track the same individual over time, in RDF, I would for each

occurrence make the statement [occurrence] dwc:individualID

[individual].  That&#39;s great and I can (and do) do that with Darwin Core

as it exists.  The problem that I face is that in RDF any time that one

makes a statement about a resource (I&#39;m switching to that term because

&quot;thing&quot; is to vague) using an identifier for it (in the form of a URI),

the identifier must dereference (resolve? sorry Bob!) to produce

metadata about the resource.  So when I assign a URI to an individual

organism a semantic client should be able to retrieve information about

the individual.  One of the fundamental pieces of information that a

client should (according to the TDWG GUID applicability statement) be

given about a resource is what type of thing the resource is.  This is

called the &quot;rdfs:type&quot; of the resource.  The TDWG Applicability

statement (recommendation 11) says that resources identified by a GUID

&quot;should be typed using the TDWG ontology or other well-known

vocabularies&quot;.  I hate to be cynical about this, but I don&#39;t have

confidence that the TDWG ontology will be ready to use in my lifetime. 

The only &quot;well-known&quot; vocabulary that I know of that will work for this

purpose at the moment is Darwin Core and the Darwin Core classes are

just right for

typing all of the kinds of resources I want to talk about (occurrences,

taxon, identifications, etc.) EXCEPT for Individuals.  I think that

dwc:individualID is the only one of the xxxxID terms that refers to a

type of thing that doesn&#39;t have a class defined for it, hence my

request to add Individual as a class.  At the TDWG meeting, somebody

(Roger maybe?) commented that there isn&#39;t anything that would stop me

from creating my own URI for an Individual class.  That is absolutely

true and I already did that

(<a href="http://bioimages.vanderbilt.edu/rdf/terms#Individual" target="_blank">http://bioimages.vanderbilt.edu/rdf/terms#Individual</a>),

but that

doesn&#39;t make my term &quot;well known&quot;.  I want Individual to be a class in

Darwin Core so that people other than me know what it means.  There is

no way that I can currently follow the &quot;rules&quot; for GUIDs and RDF on

this, and anybody in the future who uses dwc:individualID in RDF is

going to face this same problem (i.e. anyone who wants to track

individuals over time).<br>

<br>

In the case of putting &quot;dots on a map&quot; to show the distribution of a

species, the case is simple if the occurrences are specimens where the

whole dead organism is collected.  It is not so simple with other types

of occurrences.  Let me illustrate with an example.  There is currently

precisely one known individual of Crataegus harbisonii in nature.  I

have given this individual the URI

<a href="http://bioimages.vanderbilt.edu/ind-baskauf/70905" target="_blank">http://bioimages.vanderbilt.edu/ind-baskauf/70905</a>

.  I have

approximately 62 images of that individual at

<a href="http://bioimages.vanderbilt.edu/ind-baskauf/70905.htm" target="_blank">http://bioimages.vanderbilt.edu/ind-baskauf/70905.htm</a>

and

<a href="http://www.cas.vanderbilt.edu/bioimages/species/crha2.htm" target="_blank">http://www.cas.vanderbilt.edu/bioimages/species/crha2.htm</a>

.  Each one

of these images represents an occurrence in that I pressed the shutter

on my camera at different times for each one.  Ron Lance has collected

tissue from this tree for grafting purposes and now has an occurrence

with

basisOfRecord=&quot;LivingSpecimen&quot; in his arboretum in North Carolina. 

Andrea Bishop of the Tennessee Dept of Environment and Conservation has

seeds collected from the tree - I&#39;d call the collection of those seeds

an occurrence record.  I&#39;m pretty sure that there are one or more

specimens from this tree in herbaria (although I&#39;m not sure where).  So

my question to Marcus and others at GBIF is: how many dots will you put

on your map for this tree?  65 (one for each occurrence) or 1 (one for

each individual)?  I think the answer should be one, but it isn&#39;t clear

to me how a data aggregator is going to achieve the goal of having one

dot per individual if the basic unit &quot;dot creation&quot; is an occurrence

rather than an individual.  At the present moment, this question seems

like a moot point because most records in big databases like GBIF are

based on one specimen (or observation) per record of an individual, but

that won&#39;t necessarily be the

case in the future if people take multiple live organism images,

perhaps also at the same time they collect a physical specimen.  I

anticipate that one response to this

question will be to call each imaging bout one &quot;observation&quot; having a

number of dwc:associatedMedia references.   That collapses the number

of occurrence records considerably, but not down to one.  I took images

of that tree on at least three separate instances over the course of a

year and Ron collected his graft tissue years before that.  There is

simply no way to reduce the number of occurrences for this tree to one,

nor should we want to.  A possible use of multiple occurrence records

(i.e. my first point above) of this sort might be to establish how long

individuals of Crataegus harbisonii live and each occurrence record

(whether separated by years or by the seconds between shutter clicks)

is a part of the record that we should be able to (and want to)

preserve.  Another use would be to track a non-sessile organism (e.g. a

whale) in both time and space.  In that case, the record on a map for

an individual would be some kind of curve rather than a dot.  But in

any case, recognizing the existence of an entity that I&#39;m calling an

Individual facilitates these broader uses of occurrence data and it&#39;s

really hard for me to see how that is going to happen if we ONLY have

occurrences as separate entities.  Response Markus?  How does GBIF deal

with whale tracks or multiple banded bird observations for a single

bird?<br>

<br>

The third compelling reason for recognizing the existence of

Individuals as a resource type is that it is the best way to maintain

the linkage between multiple occurrences of the same individual and

identifications.  (In the oversimplified examples I gave earlier, I

applied a scientific name directly to an individual.  In actual

practice, I relate individuals to identifications and then relate the

identifications to taxa.)  Again, to illustrate with a real-life

example, when Bruce Kirchoff was developing his Woody Plants of the

Southeastern US learning software, he asked a taxonomist to go through

the images of mine that he was using for the project to verify that

they were identified correctly.  My old website just threw together all

images of a particular species onto one page without regard to the

individuals from which they originated (e.g.

<a href="http://www.cas.vanderbilt.edu/bioimages/species/sarar3.htm" target="_blank">http://www.cas.vanderbilt.edu/bioimages/species/sarar3.htm</a>

and

<a href="http://www.cas.vanderbilt.edu/bioimages/species/soam3.htm" target="_blank">http://www.cas.vanderbilt.edu/bioimages/species/soam3.htm</a>). 

It turns

out that I had carelessly misidentified a vegetative Sambucus racemosa

ssp. racemosa individual as Sorbus americana.  The taxonomist asked me

which of the various bark, twig, leaf, etc. images were from the same

plant and the only way I could find out was through the laborious

process of looking for images with similar time/date values and my hand

written field notes.  It was a nightmare finding all of the particular

image records that needed to have their identifications fixed and then

correcting them.  On my new website (e.g.

<a href="http://bioimages.vanderbilt.edu/metadata.htm" target="_blank">http://bioimages.vanderbilt.edu/metadata.htm</a>,

then click on Quercus

chrysolepis), the images are connected to the individual from which

they originated.  If I discover by looking at a particularly

informative image that I have misidentified the individual, I only need

to add an updated determination (i.e. identification) to that

individual&#39;s record and automatically all images from that individual

are displayed with the correct name and are placed on the correct

species page.  Now imagine a situation that is larger and even more

complicated than this (think a Bioblitz).  Herbarium curators and live

plant photographers are working together to document the flora of an

area.  Multiple images and multiple specimens may be collected from the

same individual.  The images may go one place and the specimens may go

to several herbaria (if &quot;duplicates&quot; are distributed).  It&#39;s possible

that people might come back to the same individual later to photograph

or collect fruit having initially seen flowers.  Suppose on down

the line a taxonomist looks at one of the specimen duplicates and

realizes that

the initial identification was wrong (or maybe just wants to assert an

alternative opinion about the identity).  If the record is based on

that individual, then all that is required is for the annotating

taxonomist to add a determination (i.e. dwc:Identification) to the

Individual&#39;s record and poof! all images and duplicate specimens have

that opinion associated with them.  In contrast, if all of these

separate occurrence records are not tied together via the Individual,

and if each individual occurrence record has its own determination,

nobody is possibly going to ever track down and correct every one. 

Granted, the scenario that I&#39;ve suggested is contingent on the

existence of a large scale database that can connect metadata across

institutions, but exactly that kind of thing is what projects like the

US Virtual Herbarium and our Live Plants Imaging group are trying to

create.  Let&#39;s enable this by making it possible within Darwin Core to

have a record structure that is Individual-based.<br>

<br>

I recognize that many &quot;specimen-based&quot; organizations aren&#39;t really

going to care one whit about this.  That&#39;s fine.  In their databases

and personal XML schemas they can ignore Individuals as it is their

prerogative.  But when we build RDF templates, I believe strongly that

for the benefit of those of us who care about the broader applications

of occurrences those templates should use individuals to connect (one

or more) occurrences and (one or more) identifications.  For those with

a technical bent, you can see how I have done this for an herbarium

specimen by looking at the page source RDF of the example

<a href="http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf" target="_blank">http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf</a>

.  For

those of a non-technical bent, just look at the webpage that shows up

when you click on the link.  It looks just like any other web page for

a specimen and you don&#39;t even have to know that the underlying RDF

supports using Individuals as a grouping mechanism.<br>

<br>

In summary, I think we need Individual as a DwC class to enable

understandable rdfs:typing of records of individuals and to create a

context in which instances of individuals can be placed (i.e. people

would assign and use identifiers for individuals when they document

occurrences).  These instances (and their assigned URI GUIDSs) would

allow for &quot;connecting&quot; identifications and occurrences in a more

meaningful way.  I am not suggesting that the occurrence be dethroned

as the center of biodiversity records.  Assuming that the xxxxID terms

end up being moved out of the various classes and into the record-level

terms area as was suggested recently, I think that there are really

only about two terms that should be put into a new Individual class:

the other new term I have proposed (individualRemarks) and

establishmentMeans (but that is the topic of another email).  It may

seem odd to suggest a adding a class that has very few terms in it, but

if you follow my reasoning above you will hopefully understand why I

have done so.  <br>

<br>

I hope that the discussion (and

criticism!) will continue.  Again, I&#39;m interested in hearing

alternatives.<br>

Steve<br>

<br>

Richard Pyle wrote:

<blockquote type="cite"><div><div></div><div class="h5">

  <blockquote type="cite">

    <pre>In many cases, a specimen is created by killing an organism and gluing it

    </pre>

  </blockquote>

  <pre>to a

  </pre>

  <blockquote type="cite">

    <pre>piece of paper (if it&#39;s a plant) or putting it in a jar (if it&#39;s an

    </pre>

  </blockquote>

  <pre>animal).

  </pre>

  <blockquote type="cite">

    <pre>It is natural to ask the question &quot;what kind of species is the specimen?&quot;.

    </pre>

  </blockquote>

  <pre>  </pre>

  <blockquote type="cite">

    <pre>We can look at the specimen and make a statement like [specimen]

dwc:scientificName &quot;Drosophila melanogaster&quot; and it pretty much makes

    </pre>

  </blockquote>

  <pre>sense.

  </pre>

  <blockquote type="cite">

    <pre>However, in the new Darwin Core standard, we have a broader category of

&quot;things&quot; (a.k.a. resources) that we call Occurrences which include

    </pre>

  </blockquote>

  <pre>specimens

  </pre>

  <blockquote type="cite">

    <pre>but which also includes observations and probably all kinds of things like

    </pre>

  </blockquote>

  <pre>  </pre>

  <blockquote type="cite">

    <pre>images, DNA samples, and a whole lot of other things.  If we try to apply

the same kind of statement to other kinds of Occurrences besides specimens

    </pre>

  </blockquote>

  <pre>  </pre>

  <blockquote type="cite">

    <pre>we immediately run into problems.  If we say that [digital image]

dwc:scientificName &quot;Drosophila melanogaster&quot; we are making a nonsensical

statement.  The digital image can have properties like its photographer,

its format, its pixel dimensions, etc. but the image itself does not have

    </pre>

  </blockquote>

  <pre>a

  </pre>

  <blockquote type="cite">

    <pre>scientific name.  The scientific name is a property of the thing that was

photographed.  It makes even less sense if we are talking about

    </pre>

  </blockquote>

  <pre>observations.

  </pre>

  <blockquote type="cite">

    <pre>An observation is a situation where somebody observes an organism.

The observation can have properties like the observer, the location, etc.

However, if we say [observation] dwc:scientificName &quot;Drosophila

    </pre>

  </blockquote>

  <pre>melanogaster&quot;

  </pre>

  <blockquote type="cite">

    <pre>we are saying that that act of observing has a scientific name.

That is an incorrect statement.  So the general statement [Occurrence]

dwc:scientificName &quot;Drosophila melanogaster&quot; does not make sense when

applied to all possible types of Occurrences.  Rather, the organism

that we are observing is the thing that has a scientific name.

    </pre>

  </blockquote>

  <pre>OK, I admit that I have not been following this list as closely as I should

have -- especially during the latter half of 2009.  But I have to

ask....seriously....is this the level of misunderstanding that still exists

in our community?

Perhaps I&#39;m the idiot here, but it has *always* been my understanding that

the &quot;thing&quot; (I hesitate to use the word &quot;basis&quot;) of an Occurrence instance

is *always* the organism (or set of organisms, or impression of an organism

in the case of fossils).  If the organisms were captured and preserved in a

Museum, then we call it a specimen.  If the organisms were only witnessed

and not captured, we call it an observation.  Everything else (including the

physical specimen) is just layers of evidence to support the existence and

taxonomic identification of the organism within the Occurrence.  When

photons reflected off the outer surface of an organism find their way

through a lense and onto some mechanism for recording said photos (either a

human retina and neurons in the brain, or sheet of celluloid, or digital

image sensor and memory stick), it&#39;s still the organism that the photons

reflected off of, which represents the &quot;thing&quot; of the Occurrence to which

metadata apply. Same goes for vocalizations transmitted through pressure

waves in the air onto some recording device (ear/brain, or microphone/tape).

So while it&#39;s certainly true that a media object such as a 35mm slide or

digital image file does not itself have a scientificName (then again, some

of my old Kodachromes have enough mold on them that they might....), said

media objects are *not* the Occurrence itself -- they merely represent

evidence of the occurrence.  Even a specimen in a jar is not the Occurrence

itself.  The Occurrence occurred when the specimen was captured (e.g., 400

feet deep on a coral reef).  A specimen in a jar on a shelf in a Museum is

no longer the &quot;Occurrence&quot;; it is the evidence of the Occurrence.

When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an

&quot;Observation&quot;), I&#39;m certainly not trying to identify the act of observation;

I&#39;m identifying the organism that was observed, at the time and place that

it was observed.

For what it&#39;s worth, if I only have a still or video image of an organism

(e.g., <a href="http://www.youtube.com/watch?v=GVTd11q3Ppc" target="_blank">http://www.youtube.com/watch?v=GVTd11q3Ppc</a>; taken by Rob Whitton, who

some of you met at TDWG this year), and didn&#39;t collect the specimen, I

create an Observation record, and link the image to it as associatedMedia.

I would never assign a taxon name to the video clip -- only to the &quot;content

item&quot; of the video that represents an organism, serving as the basis of an

Occurrence record.

  </pre>

  <blockquote type="cite">

    <pre>The specimen is an occurrence of the individual organism.

The image is an occurrence of the individual organism.

The observation is an occurrence of the individual organism.

    </pre>

  </blockquote>

  <pre>I would say in all three cases that the presence of an organism at a place

and time was the Occurrence.  Specimens, images, and reported observations

are merely the evidence that the occurrence existed (and to varying degrees,

can also allow for subsequent interpretations of taxonomic identification).

  </pre>

  <blockquote type="cite">

    <pre>These statements may seem odd because we are used to

thinking of an Occurrence being an occurrence of the

&quot;species&quot; but it&#39;s not really.

    </pre>

  </blockquote>

  <pre>I completely agree.  The occurrence was the organism at a place and time.

The &quot;species&quot; is merely the taxon concept that someone identified the

organism as belonging to.  The scientificName is merely the label that

someone applied to the taxon concept.  In other words, the scientificName is

really a property of the Taxon Concept, and the Taxon Concept is the subject

of an identification event, and the identification event was applied to the

organism, which itself represents the basis of an Occurrence.  But very few

people go to the trouble of creating that full chain of relationships, so as

a short-hand, the scientificName is often treated as a direct property of

the occurrence (collected or observed organism).  I think this short-hand is

perfectly fine in the context of DwC, but only as long as people understand

the implied chain of linked entities.  If we start to forget what&#39;s really

going on, then we run into trouble.

Which, I guess, was the whole point of Steve&#39;s post.

What concerns me, though, is that we&#39;re not (yet?) already beyond this.

  </pre>

  <blockquote type="cite">

    <pre>This point becomes more clear if we look at a situation where several

types of occurrence records are collected from the same individual.

Let&#39;s say that we capture a bird, photograph it, collect a feather from

    </pre>

  </blockquote>

  <pre>it,

  </pre>

  <blockquote type="cite">

    <pre>collect a DNA sample and band it and let it go.  Later somebody sees the

band and reports that as an observation.

How do we connect all of these things?

    </pre>

  </blockquote>

  <pre>Two Occurences:  The first one when it was captured, photographed, and

relieved of a feather. The second when it was observed at a later date.

  </pre>

  <blockquote type="cite">

    <pre>Do we create an identifier for the specimen (the feather)

and then say that the image and the DNA sample came from it?

    </pre>

  </blockquote>

  <pre>We create an identifier for the first Occurrence, capture the

specimen-relevant metadata of the preserved feather, and track the DNA

sample via associatedSequences.

  </pre>

  <blockquote type="cite">

    <pre>That would be wrong.  We could take an image of the feather,

but that would be a different thing from an image of the bird.

    </pre>

  </blockquote>

  <pre>It&#39;s certainly different from an image of the whole Bird, but that doesn&#39;t

preclude us from including both bird and feather images among

associatedMedia for the first Occurrence.

  </pre>

  <blockquote type="cite">

    <pre>We didn&#39;t get the DNA sample from the feather, we got it

via a blood sample from the bird.

    </pre>

  </blockquote>

  <pre>I don&#39;t see that as a problem, because the feather is only the evidence of

the bird at the place and time (i.e., the first Occurrence). Thus, the

sequence can still be included as part of the associatedSequences for the

first Occurrence.

  </pre>

  <blockquote type="cite">

    <pre>The band observation is not an observation of the feather,

or the image or the DNA sample.  It&#39;s an observation of

the bird which was never any kind of specimen living or dead.

The bird is an individual organism and that&#39;s what we need to call it.

    </pre>

  </blockquote>

  <pre>Agreed -- it forms the basis for the second Occurrence record (later date).

The two Occurrence records can be cross referenced, either via a shared

individualID, or via associatedOccurrences.

  </pre>

  <blockquote type="cite">

    <pre>Right now we don&#39;t have anything in Darwin Core that can

be used to rdfs:type the bird, which is why I proposed Individual

as a Darwin Core class.

    </pre>

  </blockquote>

  <pre>As someone else alluded to earlier in this thread, there are near-infinite

ways that we can slice &amp; cluster biodiversity data. I think there are some

cases where &quot;individual&quot; makes a lot of sense as a class (banded birds,

managed organisms in zoos and curated gardens, whale and shark observation

datasets, plant monitoring projects, etc.). But I think the notion of

&quot;Occurrence&quot; makes more sense at this point in biodiversity informatics

history, because the vast majority of datasets can be organized in this way

realtively painlessly, and because the majority of questions being asked of

these data revolve around presence of organisms identified to taxon concepts

occurring at place and time.

  </pre>

  <blockquote type="cite">

    <pre>I could say these things more clearly in RDF, but since

because many members of the audience of this message

aren&#39;t familiar with RDF/XML they would probably zone

out and the point would be lost.

    </pre>

  </blockquote>

  <pre>Myself among them.  Thank you for presenting it in the less-efficient

English Prose form.

  </pre>

  <blockquote type="cite">

    <pre>The point is that we need to have identifiable classes of &quot;resources&quot;

(the technical name for &quot;things&quot; like physical artifacts, concepts,

and electronic representations) for all of the things that that we

need to describe and inter-relate in the Darwin Core world.

Right now, we are missing one of the important pieces that we need,

which is a class for the Individual.  If we are satisfied with creating

an RDF model that only works for specimens and one-time observations,

then we probably don&#39;t need Individual as a Darwin Core class.  On the

other hand, if TDWG and GBIF are really serious about creating a

system (Darwin Core and RDF based on it) that can handle other types

of Occurrences like multiple images of live organisms, observations

of the same organism over time, and multiple types of Occurrences

collected from the same organism, then this capability should be built

into the system from the start.  When I got back from the TDWG meeting,

I was all excited about trying to use Darwin Core Archives with my

live plant image collection.  However, it quickly became evident

that it could not work because Occurrences were at the center of the

diagram rather than Individuals.  So unless something changes, we

are already embarking on the process of locking out these other

Occurrence types.

    </pre>

  </blockquote>

  <pre>Well...I certainly agree with you that we need *clear* documentation on what

these classes are intended to represent.  I had *thought* it was clear that

an Occurrence was as I have outlined above.  But like I said, I&#39;m perfectly

willing to accept that I&#39;m the idiot in this case, and am completely out of

phase with the rest of the community.

As to whether or not we need to define a class for Individual, I&#39;m not so

sure that&#39;s entirely necessary.  I guess DwC is already primed for it

(<a href="http://rs.tdwg.org/dwc/terms/index.htm#individualID" target="_blank">http://rs.tdwg.org/dwc/terms/index.htm#individualID</a>) -- but I&#39;m not sure

what properties would apply to such a class that are not already covered in

DwC.  Pronbably the next intieration of DwC would move some of the

properties of the Occurrence class (catalogNumber, individualCount,

preparations, disposition, associatedSequences, previousIdentifications)

over to the Individual Class, at which point the Occurrence becomes the

intersection of an Individual and an Event.

But let me ask: how would you scope &quot;Individual&quot;? (see my previous rants on

this list in recent days)  Would it be restricted to a particular individual

organism? Or, would it be extended to include specified groups of organisms

(as dwc:individualID already does)? What about populations?  Taxon Concepts?

  </pre>

  <blockquote type="cite">

    <pre>I hate to sound like a broken record (do we have those any more?),

but read my paper on this subject.

    </pre>

  </blockquote>

  </div></div><pre><div><div></div><div class="h5">

I&#39;ve had gotten through the first few pages, and intend to finish soon.  But

it&#39;s much more fun to write emails about this stuff..... :-)

Aloha,

Rich

Richard L. Pyle, PhD

Database Coordinator for Natural Sciences

Associate Zoologist in Ichthyology

Dive Safety Officer

Department of Natural Sciences, Bishop Museum

1525 Bernice St., Honolulu, HI 96817

Ph: (808)848-4115, Fax: (808)847-8252

email: <a href="mailto:deepreef@bishopmuseum.org" target="_blank">deepreef@bishopmuseum.org</a>

<a href="http://hbs.bishopmuseum.org/staff/pylerichard.html" target="_blank">http://hbs.bishopmuseum.org/staff/pylerichard.html</a></div></div>

.

  </pre>

</blockquote><div class="im">

<br>

<pre cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a href="http://bioimages.vanderbilt.edu" target="_blank">http://bioimages.vanderbilt.edu</a>

</pre>

</div></div>

</blockquote></div><br><br clear="all"><br>-- <br>----------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>

1630 Linden Drive<br>Madison, WI 53706<br><a href="http://www.taxonconcept.org/" target="_blank">TaxonConcept Knowledge Base</a> / <a href="http://lod.geospecies.org/" target="_blank">GeoSpecies Knowledge Base</a><br><a href="http://about.geospecies.org/" target="_blank">About the GeoSpecies Knowledge Base</a><br>

------------------------------------------------------------<br>

</div>