[tdwg-content] tdwg-content Digest, Vol 20, Issue 17

Steve Baskauf steve.baskauf at vanderbilt.edu
Fri Nov 5 20:29:57 CET 2010


For those of you who triage emails and don't read long emails, the 
bottom line is that although I agree with some of Rich's points, I think 
that the suggestion that parts of Individuals should be classified as 
Individuals does not fit the definition that is on the table for the 
proposed class dwc:Individual.  I argue that allowing pieces of 
organisms to be called Individuals defeats the purpose of having the 
Individual class.  I suggest an alternative approach that I think is the 
most straightforward method of separating tokens from the occurrences 
they document.  The acceptance or rejection of Individual as a new class 
does not hinge on my suggested approach.  Development of a system to 
handle more complicated resource relationships can take place 
independently of the proposal for the Individual class.

Responses inline below:

Richard Pyle wrote:
> ...
>>> previousIdentifications
>>>       
>> Hmm.  I suppose yes, but better to just have
>> another instance of Identification.  Why not?
>>     
>
> When the data are structured that way at the source, yes.  But a number of
> DwC terms exist because many content sources have not parsed/normalized all
> their data	to the full extent of the DwC classes.  Therefore, I think
> previousIdentifications should be kept, and if so, it should be part of the
> Individual class.
>   
Got it.  It should go with Individual.
>   
>>> associatedSequences
>>>       
>> 	  
>> I suppose you won't agree on this, but I don't see sequences 
>> as any different than other tokens/evidence types that I 
>> think we should allow to document Occurrences.  I would like 
>> this term to eventually go away, at least for people using 
>> RDF who will explicitly create resources for tokens and then 
>> type them.
>>     
>
> OK, well I guess "Sequences" per se are functionally equivalent to images,
> in that they are not the organism themselves, but rather a representation of
> some aspect of the organisms (in this case, a representation of the
> molecular structure of the DNA molecules contained within the cells of the
> organism, rather than a representation of light waves reflected off the
> exterior of an organism in the case of an image, or of x-ray waves
> transmitted through an organism in the case of a radiograph image).  I was
> thinking more in terms of tissue samples -- which I will much more
> stubbornly defend as being in the Individual class -- but I guess more in
> terms of "individualScope".
>   
OK, as usual you are warping my brain into thinking about things in a 
different way.  I'm going to separate the issue of "dead" from the issue 
of "pieces" (for the moment I'm going to accept that it doesn't matter 
if a whole organism is dead or not).  The advantage of letting pieces of 
the organism be considered as a type of Individual is that it allows us 
to avoid creating another class of things called "PreservedSpecimen" 
(although in a sense we already have it because of 
dwctype:PreservedSpecimen, which when used as a rdf:type would imply 
membership in some rdfs:Class called "PreservedSpecimen").  The pieces 
could share properties that one might want to also apply to the whole 
organism.  One could differentiate among the two by the value of 
"individualScope". 

But after another long commute to think about this, I'm realizing that 
pieces of organisms really must not be Individuals.  First of all, the 
definition that is under consideration is "The category of information 
pertaining to an individual organism or a group of individual organisms 
that can reliably be known to represent a single taxon." [the Google 
Code entry, with substitution of "taxon" for "species (or lower 
taxonomic rank if it exists)" as was discussed].  That definition as it 
stands applies to an organism or group of organisms, but does not 
include parts of organisms.  Obviously the definition could be changed, 
but if you consider the comment, which describes the primary function of 
Individual: "Instances of this class can serve the purpose of connecting 
one or more instances of the Darwin Core class Occurrence to one or more 
instances of the Darwin Core class Identification" it becomes clear that 
making parts of organisms Individuals defeats this primary purpose for 
the term.  

The major selling point for having Individuals at all is to get out of 
the business of applying determinations to all of the pieces of evidence 
such as specimens, images, sounds, etc. that get collected from the same 
biological individual through multiple Occurrences.  This has the 
benefit that if one applies an Identification to the Individual, all 
physical and information resources that are derived from the individual 
automatically get associated with the Identification and hence the 
taxonomic informations referenced by the Identification.  If we call 
preserved specimens that are pieces of organism Individuals having a 
value of individualScope="part", then do we do the same thing to them as 
we do with Individuals at higher levels, namely apply Identifications to 
them?  If so, then we are back in the business of assigning 
Identifications to all of our derivative resources rather than the 
biological individuals from which they came.  If we just say that we'll 
skip assigning separate Identifications to the derivative resources, 
then we have something that doesn't fit the functional role for which 
Individual was designed.  In that case an "Individual" which is an 
organism part is such a different thing that one might as well call it 
as something else (i.e. a PreservedSpecimen). 

The case of a whole organism (live as a LivingSpecimen or dead as a 
PreservedSpecimen) is different because in that case we would have a 
single resource serving as the evidence (the whole organism itself).  By 
definition, there can't be many of those (there would just be one) and 
it would already have an Identification assigned to it, because it is 
the same Individual that it is providing evidence for.  So there is no 
superfluous assignment of Identifications in that case. 

> Here's one thing I'm not so certain about, though.  An in-situ image of an
> organism is clearly a token of an Occurrence, because it is evidence of the
> organism at the place/time.  An image of the preserved specimen in a Museum,
> or an x-ray, etc., is not really a token of an Occurrence, because it's not
> evidence of the organism at the place/time of its capture.  Same goes for
> Sequences -- they are a token of the Individual organism, not of the
> occurrence of the organism at a place and time.  This is why I have a hard
> time thinking of such things as tokens of an Occurrence, when they are
> really more tokens of the Individual.
>   
I think the solution to this is to not call it a "token of the 
Occurrence".  Let's say that the token is derived from the Individual 
and that it MAY serve as evidence for an Occurrence. 

I think that the solution is something like you suggest: link the chain 
of derivation of tokens to the Individual and not to the Occurrence.  
Then have a reference in the Occurrence record to the particular token 
that was created or collected during the event of the Occurrence.  See 
http://bioimages.vanderbilt.edu/pages/tree-branch.gif .  I have had the 
tendency of thinking that the tokens supported the Occurrence, but there 
does not need to be just one purpose for the token.  They also support 
the existence of the Individual.  This should probably make you happy, 
because the pieces of the Individual (preserved specimens, tissue 
samples) would be derived from the Individual. The "provenance" if you 
want to call it that, traces the connection of the tokens to the 
Individual.  The chain of derivation can be traced using the property 
that I've called "derivedFrom".  The branch specimen is "derivedFrom" 
the Individual and the specimen image is "derivedFrom" the specimen.  
Your desire to differentiate between things that are physically derived 
from the Individual vs. things that aren't can be handled by the 
"isPartOf" property.  The branch specimen "isPartOf" the Individual 
tree, but the image is not a part of the branch.   A token could have 
both the isPartOf property and the derivedFrom property (if it's a piece 
of the Individual), or only the derivedFrom property (if it's not). 

In this diagram, the term "hasEvidence" is a property of the 
Occurrence.  It has the branch specimen as its object, but not the image 
of the specimen because as you note, the event marking the creation of 
the image is not the same as the event documenting the Occurrence of the 
Individual (i.e. the collection of the branch specimen). Either of the 
"tokens" (the specimen or the image) could be used as evidence for the 
Identification (we could have a property of the Identification called 
"basedOn" that could have the specimen, the image, or both as its object 
- I did something similar to this in the Biodiversity Informatics paper).

Please note that for each of the properties I've listed on the diagram, 
there could and probably should be inverse properties (not shown): 
hasDerivative for derivedFrom, hasPart for isPartOf, isEvidenceFor for 
hasEvidence, and usedIn for basedOn.  All of the "tokens" and the 
Occurrence could have the property individualID which would relate the 
resource directly to the Individual and its Identifications. 

I have created a number of similar charts showing how these 
relationships could apply to various types of tokens:
http://bioimages.vanderbilt.edu/pages/tree-branch.gif  (tree branch 
PreservedSpecimen)
http://bioimages.vanderbilt.edu/pages/tree-image.gif  (image of a live tree)
http://bioimages.vanderbilt.edu/pages/whale-dna.gif  (tissue sample and 
DNA sequence from a whale)
http://bioimages.vanderbilt.edu/pages/bird-observation.gif (bird 
observation)
http://bioimages.vanderbilt.edu/pages/wildebeest.gif  (wildebeest calf 
captured and put in zoo)
http://bioimages.vanderbilt.edu/pages/botanical-garden.gif  (twig 
removed and turned into a living specimen in a botanical garden)

Note that in every case, the "token" is typed based on the kind of thing 
that it is.  We don't try to make it an Occurrence (my previous mistake) 
or an Individual (what I'm saying is Rich's mistake).  Physical things 
that are a part of the Individual have the special status of "isPartOf", 
electronic representations never do.  Only the token that was created 
during the event associated with the Occurrence record is connected to 
the Occurrence record.  The token serving as evidence for the Occurrence 
can be anything - there is no special class called "token".  In fact, 
the "token" can be the organism itself if the organism is curated 
(John's favorite wildebeest calf in the zoo or a whole dead fish in a 
jar).  The token can be another individual such as a living specimen 
that originates as a clone (maybe also seed) from the Individual being 
documented in the Occurrence.  We (DwC) only get into the business of 
creating types and properties of tokens if they don't already exist in 
other vocabularies.  DwC needs to do that for specimens, but not for 
images that are already covered by MRTG.  An observation may or may not 
have a token depending on whether there is some kind of evidence that 
can be referred to (see bird example). 

In these diagrams a single Occurrence and a single "line" of derived 
tokens is shown.  But there can be many tokens per Occurrence and many 
tokens per Individual.  There can also be many Occurrences per 
Individual.  I didn't try to show this on the diagram because it would 
be too complicated.  Obviously many users will want to make this 
"flatter" and less complicated.  But I think this model allows for just 
about any kind of relationship among occurrence-documenting resources 
that people want to handle.  It was the kind of thing I was trying to do 
in the Biodiversity Informatics paper (e.g. 
http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif) but 
better because I'm letting the tokens be what they are rather than 
trying to force them all to be Occurrences.
>   
>> This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be
>> re-assigned to Record-level terms. Was there some reason this isn't
>> appropriate?
>> 		  
>> I think it is appropriate because they should be usable with at 
>> least two classes: Individual (for living specimens) and 
>> Occurrences (e.g. preserved specimens, images)
>>     
>
> Hmmm...on that basis, should individualCount and the various tokens also be
> Record-level terms -- on the basis that they can apply either to an
> Occurrence, or to an Individual? Actually, in the case of DNA Barcodes and
> such, isn't it possible to also represent a DNA Sequence as an attribute of
> a Taxon as well?  If the purpose of Record-level terms is to aggregate terms
> that apply to more than one class, then perhaps that is the solution for a
> number of these things (including disposition, and maybe even preparation --
> depending on how broadly those things are defined)?
> 	
>   
individualCount would be metadata that results from an Occurrence, so I 
think that's the only place it belongs.  Tokens aren't properties of 
anything, they are resources in their own right that are connected by 
some property term (e.g. hasDerivative/derivedFrom) to an Occurrence in 
which they were collected/recorded and to the Individual from which they 
derived (e.g. derivedFrom and hasDerivative).  A DNA sequence is another 
resource that isn't an attribute of anything.  It could be the object of 
numerous properties that could have a variety of subjects.
>   
>> I haven't said this before, but are we allowing Individuals 
>> to be dead?  
>>     
>
> Errr....fossils? Preserved specimens? Are they not Individuals?  I know you
> think of them in terms of tracking a living organism over time.  But that's
> only one of the reasons why I support an Individual class (not even the main
> reason).  To me, the main reason is that an "Individual" represents the
> actual organism(s), separate from an Occurrence, which represents the
> presence of an organism at a particular place and time.
>   
I think I am prepared to accept this as long as they are the whole thing 
and not pieces.  There could be some issues with a fossil since in many 
cases the tissues of the organism are replaced by minerals.  But there 
is still a one-to-one relationship, so the problem I described in the 
long paragraph above doesn't apply.
>   
>> If we put it in a jar of alcohol 
>> and cut it into many separately-cataloged pieces, are 
>> all of the pieces still some of the Individual? 
>>     
>
> This is why we need two things for an Individual: 
>
> individualScope (which can range anywhere from the aggregates of multiple
> individuals, all the way down to the smallest parts of individuals)
>   
See above.
> And, a mechanism to track series of "derived from" Individuals.  The ASC
> model covered this, I think (right, Stan?)
>  
>   
I didn't see it in the flow chart, but it could be there somewhere.  I 
had something like this in sernec:derivativeOccurrence and 
sernec:derivedFrom (http://bioimages.vanderbilt.edu/rdf/terms.htm) when 
I was making every token an Occurrence.  But it's better to do as you 
suggest here which is what I did in the examples above. 
>> I think Pete might have been suggesting modeling 
>> things that way with "partOf".  What if we cut a 
>> branch from a tree, glue part of it to a page 
>> and turn part of it into a DNA sample that 
>> get sequenced.  Are those all a part of an 
>> Individual?  
>>     
>
> It seems to me that each unit could represent a separate instance of
> Individual, but the "parts" need to be clearly aggregated around the
> single-organism parent Individual, which itself may be a part of another
> Individual instance that is an aggregate lot of specimens, which itself
> could be a subsampling of another Individual instance that represents a
> population in nature.
>   
Let the supporting evidence (tokens) be whatever type of thing they are 
rather than call them Individuals.  Link them together through 
hierarchical relationships to the single-organism parent Individual.
> In my mind, we parse all of these things as separate instances of
> Individuals, but join them via a hierarchical (parent/child) relationship.
> If I'm not mistaken, this is how the ASC model managed instances of
> BiologicalObject (again....right, Stan?)
>
>   
>> I don't really want them to be, but maybe I must?  
>> Somehow we need to be able to handle road-kill, 
>> which will be dead when we make the 
>> observation/collection.  If we cut a branch from 
>> a tree (an Individual), root it, and grow it in 
>> a botanical garden, do we call the resulting tree 
>> in the garden the same Individual?  I would assign 
>> it a new identifier and call it a new Individual.  
>> I guess my point is that I would only apply the 
>> term Individual to dead stuff, pieces of dead stuff, 
>> and living pieces of things with extreme caution.
>>     
>
> Why extreme caution?  What are the risks that we are cautioning ourselves
> against?
>   
The risk that we make the definition of Individual so broad that it 
can't perform any of the functions it was defined to serve.  We've 
already lost one of them (the ability to infer duplicates) when I agreed 
to the broader definition, but that's the subject of another post.
> These are some principles that I always try to keep in mind when discussing
> these things:
>
> - DwC is a data exchange standard, not so much a physical data model.
> - There is a necessary balance between structuring DwC around how data
> actually exist in content-provider databases, and how data *should* be
> represented in a normalised world
> - When in doubt, DwC should be accomodating, rather than restrictive --
> especially when more restrictive needs can be met via associated data
> filtering
>   
> There are other principles as well, but these are the ones I keep having to
> remind myself of.
>   
I think that what I I have suggested above is very unrestrictive.  We 
let evidence be the type of things that they are (PreservedSpecimens, 
Individuals, StillImages, SoundRecordings, DNA sequences, etc.).  We 
don't determine their type by what we want to use them for.  That was 
the mistake that I made in the Biodiversity Informatics paper.  If we 
follow this approach, then a StillImage can fill any role that we want: 
evidence that an Occurrence happened, information to support an 
Identification, a character for a visual key, a logo, etc. We let it 
fulfill those roles by giving it an identifier and connecting it to 
other resources using appropriate terms (hasEvidence, derivedFrom, 
mrtg:attributionLogoURL, etc.
> 	
>   
>> I think maybe so.  Maybe the appropriate course 
>> of action here as well is to let people try 
>> different approaches out and if they turn out 
>> to work and be needed, then we talk about 
>> applying them to Darwin Core.
>>     
>
> Ultimately, I think people will use it in accordance to what terms are
> nested within it -- which is why I think it's important to have this
> conversation we're having now.
>   
As I indicated at an earlier time, I think that there are very few terms 
that should be properties of Individual since it is primarily a node 
that connects Occurrences to Identifications (and I guess now to derived 
tokens). 
> Aloha,
> Rich
>
>
>   
Looking forward to responses!  But I don't think development of these 
ideas should hold up the proposal for the class Individual, which can 
stand on its own with its current (revised) definition.
Steve
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101105/0d04f990/attachment.html 


More information about the tdwg-content mailing list