<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Dean Pentcheff wrote:

<blockquote

 cite="mid:AANLkTikAtH4LMUO3Af5ZHAU+OxxjnVMXO8hW_1Ox4TSo@mail.gmail.com"

 type="cite">

  <pre wrap="">OK. I think this makes sense to me.

Seeing how my "jar of semi-identified scunge" example plays now:

Setting: A jar full of stuff collected from a coral reef. In it I can

see larval fish, sphaeromatid isopods, one "Diadema antillarum"

urchin, and a green alga.

"Individual"ization:

1. The record for the jarful of scunge is _not_ eligible to be an "Individual".

  </pre>

</blockquote>

I think under the definition as it stands and the way the discussion

has been going, it could be an Individual with an Identification of

urkingdom=Eukarya (or however taxonomists prefer to call it) because

since there is are animals and algae in it, that would be the lowest

taxon common to all organisms known to be in the jar.  Whether this

would be wise or whether you would actually want to do it or not is for

you to decide.  But I think it's allowed in the definition.  <br>

<blockquote

 cite="mid:AANLkTikAtH4LMUO3Af5ZHAU+OxxjnVMXO8hW_1Ox4TSo@mail.gmail.com"

 type="cite">

  <pre wrap="">2. I can create four additional records, each of which is designated

as "partOf" the jar record, each of which can be an "Individual" with

a higher-level or species-level determination attached to it.

  </pre>

</blockquote>

Yes, as long as each of those four Individuals isn't Identified to a

level below the lowest taxon common to all organisms known to be in

that lot.<br>

<blockquote

 cite="mid:AANLkTikAtH4LMUO3Af5ZHAU+OxxjnVMXO8hW_1Ox4TSo@mail.gmail.com"

 type="cite">

  <pre wrap="">

Two years from now, I sort and ID the sphaeromatid isopods and

determine that they are indeed all in the Family Sphaeromatidae, and

split them out to three separate jars, each of which is a different

genus. Each of those jars can be referred to as an "Individual" (one

or more objects from a single taxon collected at a single time/place).

Has the older family-level record now lost its "Individual"ness, now

that it's clear it's a composition of three lower-level taxa? Or does

it still record the correct fact that a group of independently

locomoting bugs from one collecting instance all belonged to the same

family, and hence is still an "Individual" record, indicating that

higher-level taxon group of bugs?

  </pre>

</blockquote>

The older jar Identified to family=Sphaeromatidae is still an

Individual.  It has three "child" Individuals, each of which isPartOf

the Sphaeromatidae Individual.  Each of the "child" Individuals can

have an Identification to genus.  You could continue this process until

you have every individual isopod in a separate jar identified to

species or whatever the lowest taxon is that you believe in.  What I'm

saying in my post is that it is NOT allowed by the definition (or

useful based on the reason why the Individual class was proposed) to

cut the isopods into pieces and then call the pieces Individuals (for

the reasons I gave and won't repeat here).  You can call the pieces

PreservedSpecimens, choppedOrganismParts, derivedLifeBits, or whatever

the community decides this kind of thing should be.  But my assertion

is that you can't call them Individuals.  <br>

<br>

Steve<br>

<blockquote

 cite="mid:AANLkTikAtH4LMUO3Af5ZHAU+OxxjnVMXO8hW_1Ox4TSo@mail.gmail.com"

 type="cite">

  <pre wrap="">[As a side issue, note that I have IDed them to genus, not species, to

sidestep the debate on the specialness of the species taxon.]

-Dean

--

Dean Pentcheff

<a class="moz-txt-link-abbreviated" href="mailto:pentcheff@gmail.com">pentcheff@gmail.com</a>

<a class="moz-txt-link-abbreviated" href="mailto:dpentche@nhm.org">dpentche@nhm.org</a>

On Fri, Nov 5, 2010 at 12:29 PM, Steve Baskauf

<a class="moz-txt-link-rfc2396E" href="mailto:steve.baskauf@vanderbilt.edu">&lt;steve.baskauf@vanderbilt.edu&gt;</a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">For those of you who triage emails and don't read long emails, the bottom

line is that although I agree with some of Rich's points, I think that the

suggestion that parts of Individuals should be classified as Individuals

does not fit the definition that is on the table for the proposed class

dwc:Individual.  I argue that allowing pieces of organisms to be called

Individuals defeats the purpose of having the Individual class.  I suggest

an alternative approach that I think is the most straightforward method of

separating tokens from the occurrences they document.  The acceptance or

rejection of Individual as a new class does not hinge on my suggested

approach.  Development of a system to handle more complicated resource

relationships can take place independently of the proposal for the

Individual class.

Responses inline below:

Richard Pyle wrote:

...

previousIdentifications

Hmm.  I suppose yes, but better to just have

another instance of Identification.  Why not?

When the data are structured that way at the source, yes.  But a number of

DwC terms exist because many content sources have not parsed/normalized all

their data    to the full extent of the DwC classes.  Therefore, I think

previousIdentifications should be kept, and if so, it should be part of the

Individual class.

Got it.  It should go with Individual.

associatedSequences

I suppose you won't agree on this, but I don't see sequences

as any different than other tokens/evidence types that I

think we should allow to document Occurrences.  I would like

this term to eventually go away, at least for people using

RDF who will explicitly create resources for tokens and then

type them.

OK, well I guess "Sequences" per se are functionally equivalent to images,

in that they are not the organism themselves, but rather a representation of

some aspect of the organisms (in this case, a representation of the

molecular structure of the DNA molecules contained within the cells of the

organism, rather than a representation of light waves reflected off the

exterior of an organism in the case of an image, or of x-ray waves

transmitted through an organism in the case of a radiograph image).  I was

thinking more in terms of tissue samples -- which I will much more

stubbornly defend as being in the Individual class -- but I guess more in

terms of "individualScope".

OK, as usual you are warping my brain into thinking about things in a

different way.  I'm going to separate the issue of "dead" from the issue of

"pieces" (for the moment I'm going to accept that it doesn't matter if a

whole organism is dead or not).  The advantage of letting pieces of the

organism be considered as a type of Individual is that it allows us to avoid

creating another class of things called "PreservedSpecimen" (although in a

sense we already have it because of dwctype:PreservedSpecimen, which when

used as a rdf:type would imply membership in some rdfs:Class called

"PreservedSpecimen").  The pieces could share properties that one might want

to also apply to the whole organism.  One could differentiate among the two

by the value of "individualScope".

But after another long commute to think about this, I'm realizing that

pieces of organisms really must not be Individuals.  First of all, the

definition that is under consideration is "The category of information

pertaining to an individual organism or a group of individual organisms that

can reliably be known to represent a single taxon." [the Google Code entry,

with substitution of "taxon" for "species (or lower taxonomic rank if it

exists)" as was discussed].  That definition as it stands applies to an

organism or group of organisms, but does not include parts of organisms.

Obviously the definition could be changed, but if you consider the comment,

which describes the primary function of Individual: "Instances of this class

can serve the purpose of connecting one or more instances of the Darwin Core

class Occurrence to one or more instances of the Darwin Core class

Identification" it becomes clear that making parts of organisms Individuals

defeats this primary purpose for the term.

The major selling point for having Individuals at all is to get out of the

business of applying determinations to all of the pieces of evidence such as

specimens, images, sounds, etc. that get collected from the same biological

individual through multiple Occurrences.  This has the benefit that if one

applies an Identification to the Individual, all physical and information

resources that are derived from the individual automatically get associated

with the Identification and hence the taxonomic informations referenced by

the Identification.  If we call preserved specimens that are pieces of

organism Individuals having a value of individualScope="part", then do we do

the same thing to them as we do with Individuals at higher levels, namely

apply Identifications to them?  If so, then we are back in the business of

assigning Identifications to all of our derivative resources rather than the

biological individuals from which they came.  If we just say that we'll skip

assigning separate Identifications to the derivative resources, then we have

something that doesn't fit the functional role for which Individual was

designed.  In that case an "Individual" which is an organism part is such a

different thing that one might as well call it as something else (i.e. a

PreservedSpecimen).

The case of a whole organism (live as a LivingSpecimen or dead as a

PreservedSpecimen) is different because in that case we would have a single

resource serving as the evidence (the whole organism itself).  By

definition, there can't be many of those (there would just be one) and it

would already have an Identification assigned to it, because it is the same

Individual that it is providing evidence for.  So there is no superfluous

assignment of Identifications in that case.

Here's one thing I'm not so certain about, though.  An in-situ image of an

organism is clearly a token of an Occurrence, because it is evidence of the

organism at the place/time.  An image of the preserved specimen in a Museum,

or an x-ray, etc., is not really a token of an Occurrence, because it's not

evidence of the organism at the place/time of its capture.  Same goes for

Sequences -- they are a token of the Individual organism, not of the

occurrence of the organism at a place and time.  This is why I have a hard

time thinking of such things as tokens of an Occurrence, when they are

really more tokens of the Individual.

I think the solution to this is to not call it a "token of the Occurrence".

Let's say that the token is derived from the Individual and that it MAY

serve as evidence for an Occurrence.

I think that the solution is something like you suggest: link the chain of

derivation of tokens to the Individual and not to the Occurrence.  Then have

a reference in the Occurrence record to the particular token that was

created or collected during the event of the Occurrence.  See

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/tree-branch.gif">http://bioimages.vanderbilt.edu/pages/tree-branch.gif</a> .  I have had the

tendency of thinking that the tokens supported the Occurrence, but there

does not need to be just one purpose for the token.  They also support the

existence of the Individual.  This should probably make you happy, because

the pieces of the Individual (preserved specimens, tissue samples) would be

derived from the Individual. The "provenance" if you want to call it that,

traces the connection of the tokens to the Individual.  The chain of

derivation can be traced using the property that I've called "derivedFrom".

The branch specimen is "derivedFrom" the Individual and the specimen image

is "derivedFrom" the specimen.  Your desire to differentiate between things

that are physically derived from the Individual vs. things that aren't can

be handled by the "isPartOf" property.  The branch specimen "isPartOf" the

Individual tree, but the image is not a part of the branch.   A token could

have both the isPartOf property and the derivedFrom property (if it's a

piece of the Individual), or only the derivedFrom property (if it's not).

In this diagram, the term "hasEvidence" is a property of the Occurrence.  It

has the branch specimen as its object, but not the image of the specimen

because as you note, the event marking the creation of the image is not the

same as the event documenting the Occurrence of the Individual (i.e. the

collection of the branch specimen). Either of the "tokens" (the specimen or

the image) could be used as evidence for the Identification (we could have a

property of the Identification called "basedOn" that could have the

specimen, the image, or both as its object - I did something similar to this

in the Biodiversity Informatics paper).

Please note that for each of the properties I've listed on the diagram,

there could and probably should be inverse properties (not shown):

hasDerivative for derivedFrom, hasPart for isPartOf, isEvidenceFor for

hasEvidence, and usedIn for basedOn.  All of the "tokens" and the Occurrence

could have the property individualID which would relate the resource

directly to the Individual and its Identifications.

I have created a number of similar charts showing how these relationships

could apply to various types of tokens:

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/tree-branch.gif">http://bioimages.vanderbilt.edu/pages/tree-branch.gif</a>  (tree branch

PreservedSpecimen)

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/tree-image.gif">http://bioimages.vanderbilt.edu/pages/tree-image.gif</a>  (image of a live tree)

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/whale-dna.gif">http://bioimages.vanderbilt.edu/pages/whale-dna.gif</a>  (tissue sample and DNA

sequence from a whale)

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/bird-observation.gif">http://bioimages.vanderbilt.edu/pages/bird-observation.gif</a> (bird

observation)

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/wildebeest.gif">http://bioimages.vanderbilt.edu/pages/wildebeest.gif</a>  (wildebeest calf

captured and put in zoo)

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/botanical-garden.gif">http://bioimages.vanderbilt.edu/pages/botanical-garden.gif</a>  (twig removed

and turned into a living specimen in a botanical garden)

Note that in every case, the "token" is typed based on the kind of thing

that it is.  We don't try to make it an Occurrence (my previous mistake) or

an Individual (what I'm saying is Rich's mistake).  Physical things that are

a part of the Individual have the special status of "isPartOf", electronic

representations never do.  Only the token that was created during the event

associated with the Occurrence record is connected to the Occurrence

record.  The token serving as evidence for the Occurrence can be anything -

there is no special class called "token".  In fact, the "token" can be the

organism itself if the organism is curated (John's favorite wildebeest calf

in the zoo or a whole dead fish in a jar).  The token can be another

individual such as a living specimen that originates as a clone (maybe also

seed) from the Individual being documented in the Occurrence.  We (DwC) only

get into the business of creating types and properties of tokens if they

don't already exist in other vocabularies.  DwC needs to do that for

specimens, but not for images that are already covered by MRTG.  An

observation may or may not have a token depending on whether there is some

kind of evidence that can be referred to (see bird example).

In these diagrams a single Occurrence and a single "line" of derived tokens

is shown.  But there can be many tokens per Occurrence and many tokens per

Individual.  There can also be many Occurrences per Individual.  I didn't

try to show this on the diagram because it would be too complicated.

Obviously many users will want to make this "flatter" and less complicated.

But I think this model allows for just about any kind of relationship among

occurrence-documenting resources that people want to handle.  It was the

kind of thing I was trying to do in the Biodiversity Informatics paper (e.g.

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif">http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif</a>) but

better because I'm letting the tokens be what they are rather than trying to

force them all to be Occurrences.

This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be

re-assigned to Record-level terms. Was there some reason this isn't

appropriate?

I think it is appropriate because they should be usable with at

least two classes: Individual (for living specimens) and

Occurrences (e.g. preserved specimens, images)

Hmmm...on that basis, should individualCount and the various tokens also be

Record-level terms -- on the basis that they can apply either to an

Occurrence, or to an Individual? Actually, in the case of DNA Barcodes and

such, isn't it possible to also represent a DNA Sequence as an attribute of

a Taxon as well?  If the purpose of Record-level terms is to aggregate terms

that apply to more than one class, then perhaps that is the solution for a

number of these things (including disposition, and maybe even preparation --

depending on how broadly those things are defined)?

individualCount would be metadata that results from an Occurrence, so I

think that's the only place it belongs.  Tokens aren't properties of

anything, they are resources in their own right that are connected by some

property term (e.g. hasDerivative/derivedFrom) to an Occurrence in which

they were collected/recorded and to the Individual from which they derived

(e.g. derivedFrom and hasDerivative).  A DNA sequence is another resource

that isn't an attribute of anything.  It could be the object of numerous

properties that could have a variety of subjects.

I haven't said this before, but are we allowing Individuals

to be dead?

Errr....fossils? Preserved specimens? Are they not Individuals?  I know you

think of them in terms of tracking a living organism over time.  But that's

only one of the reasons why I support an Individual class (not even the main

reason).  To me, the main reason is that an "Individual" represents the

actual organism(s), separate from an Occurrence, which represents the

presence of an organism at a particular place and time.

I think I am prepared to accept this as long as they are the whole thing and

not pieces.  There could be some issues with a fossil since in many cases

the tissues of the organism are replaced by minerals.  But there is still a

one-to-one relationship, so the problem I described in the long paragraph

above doesn't apply.

If we put it in a jar of alcohol

and cut it into many separately-cataloged pieces, are

all of the pieces still some of the Individual?

This is why we need two things for an Individual:

individualScope (which can range anywhere from the aggregates of multiple

individuals, all the way down to the smallest parts of individuals)

See above.

And, a mechanism to track series of "derived from" Individuals.  The ASC

model covered this, I think (right, Stan?)

I didn't see it in the flow chart, but it could be there somewhere.  I had

something like this in sernec:derivativeOccurrence and sernec:derivedFrom

(<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/rdf/terms.htm">http://bioimages.vanderbilt.edu/rdf/terms.htm</a>) when I was making every

token an Occurrence.  But it's better to do as you suggest here which is

what I did in the examples above.

I think Pete might have been suggesting modeling

things that way with "partOf".  What if we cut a

branch from a tree, glue part of it to a page

and turn part of it into a DNA sample that

get sequenced.  Are those all a part of an

Individual?

It seems to me that each unit could represent a separate instance of

Individual, but the "parts" need to be clearly aggregated around the

single-organism parent Individual, which itself may be a part of another

Individual instance that is an aggregate lot of specimens, which itself

could be a subsampling of another Individual instance that represents a

population in nature.

Let the supporting evidence (tokens) be whatever type of thing they are

rather than call them Individuals.  Link them together through hierarchical

relationships to the single-organism parent Individual.

In my mind, we parse all of these things as separate instances of

Individuals, but join them via a hierarchical (parent/child) relationship.

If I'm not mistaken, this is how the ASC model managed instances of

BiologicalObject (again....right, Stan?)

I don't really want them to be, but maybe I must?

Somehow we need to be able to handle road-kill,

which will be dead when we make the

observation/collection.  If we cut a branch from

a tree (an Individual), root it, and grow it in

a botanical garden, do we call the resulting tree

in the garden the same Individual?  I would assign

it a new identifier and call it a new Individual.

I guess my point is that I would only apply the

term Individual to dead stuff, pieces of dead stuff,

and living pieces of things with extreme caution.

Why extreme caution?  What are the risks that we are cautioning ourselves

against?

The risk that we make the definition of Individual so broad that it can't

perform any of the functions it was defined to serve.  We've already lost

one of them (the ability to infer duplicates) when I agreed to the broader

definition, but that's the subject of another post.

These are some principles that I always try to keep in mind when discussing

these things:

- DwC is a data exchange standard, not so much a physical data model.

- There is a necessary balance between structuring DwC around how data

actually exist in content-provider databases, and how data *should* be

represented in a normalised world

- When in doubt, DwC should be accomodating, rather than restrictive --

especially when more restrictive needs can be met via associated data

filtering

There are other principles as well, but these are the ones I keep having to

remind myself of.

I think that what I I have suggested above is very unrestrictive.  We let

evidence be the type of things that they are (PreservedSpecimens,

Individuals, StillImages, SoundRecordings, DNA sequences, etc.).  We don't

determine their type by what we want to use them for.  That was the mistake

that I made in the Biodiversity Informatics paper.  If we follow this

approach, then a StillImage can fill any role that we want: evidence that an

Occurrence happened, information to support an Identification, a character

for a visual key, a logo, etc. We let it fulfill those roles by giving it an

identifier and connecting it to other resources using appropriate terms

(hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.

I think maybe so.  Maybe the appropriate course

of action here as well is to let people try

different approaches out and if they turn out

to work and be needed, then we talk about

applying them to Darwin Core.

Ultimately, I think people will use it in accordance to what terms are

nested within it -- which is why I think it's important to have this

conversation we're having now.

As I indicated at an earlier time, I think that there are very few terms

that should be properties of Individual since it is primarily a node that

connects Occurrences to Identifications (and I guess now to derived

tokens).

Aloha,

Rich

Looking forward to responses!  But I don't think development of these ideas

should hold up the proposal for the class Individual, which can stand on its

own with its current (revised) definition.

Steve

.

--

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

_______________________________________________

tdwg-content mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->_______________________________________________

tdwg-content mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

</pre>

</body>

</html>