[tdwg-content] New Darwin Core terms proposed relating to material samples

Steve Baskauf steve.baskauf at vanderbilt.edu
Sun May 26 15:08:32 CEST 2013

I suppose that Rich and Rob W. have already looked at 
http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity .  I 
think it pretty much encapsulates what they are talking about.  I should 
note that the way DSW defines dsw:IndividualOrganism does not require it 
to be a single organism.  It can be a collection of organisms (herd, 
colony, school) or part of an organism (tissue).  The basic requirement 
is that it is a "taxonomically homogeneous entity".  In a variant form 
of DSW (dsw_alt.owl) we included "taxonomically heterogeneous entity" 
(THeE) which would basically include what Rich and Rob W. are talking 
about (lots of organisms which are seperatable and aren't necessarily 
from the same lowest taxonomic level).  It should be no surprise that 
THeE does what Rich wants because we included it in DSW because during 
the preceding discussion Rich said he wanted something like it.  In 
dsw_alt.owl, properties like "hasPart" and "isPartOf" are used to 
connect physical entities whose properties can be inferred by 
inheritance.  What this diagram includes that Rich did not mention are 
"tokens" (evidence).  We defined a class for evidence, but we also 
considered not having evidence being an explicit class.  Not defining an 
explicit Token class would have simplified the diagram at the bottom of 
the page - one could just say that there should be evidence and it 
should be linked to the resource it documents.  Token and 
THeE/IndividualOrganism are not disjoint classes - the physical entity 
can be the evidence if somebody "owns" it and makes it available for 
people to examine.  However, in DSW, Token and THeE are not synonymous 
because we allow evidence to include things that are not physically 
derived from the entity (e.g. images, sounds, string data records) in 
addition to physical specimens.

I think that we have to be careful when we say "we don't need X", "there 
is pressing value for X but not for Y", "X is too vaguely defined", 
etc.  MaterialSample does exactly what the metagenomics people need 
because they invented it to serve the purposes they want it to serve 
(handle material samples in which one may or may not ever know what all 
organisms are included or even if there are organisms in it).  
Individual (sensu Pyle/Whitton)/THeH is just vague enough to do what 
Rich and Rob W. want it to do with their lots and specimens, but is too 
vague for Rob G.  IndividualOrganism (sensu DSW) and Token does exactly 
what Cam Webb and I want it to do with our images, specimens, DNA 
samples, and data records, and the requirement that IndividualOrganism 
be taxonomically homogeneous allows us to infer that a determination 
applied to one resource also applies to other resources which are 
derived from the same IndividualOrganism (a requirement not stated by 
the others) but it's too restrictive for both Rob G and Rich.  If we 
start in on the game of saying "WE need the features that I think are 
important but not the features that YOU think are important" then we are 
in for another month of massive email traffic on this list and will end 
up no better off than we were when we started.

I think that it is clear from this and preceding discussions that there 
is a need for some system of tracking things that are like 
individuals/organisms/samples/lots.  It is my believe that what needs to 
happen is:
1. define clearly what the various stakeholders want to accomplish by 
their version of individuals/organisms/samples/lots (i.e. use 
cases/competency questions)
2. use set theory or some other kind of logical system to describe 
clearly how the various versions of individuals/organisms/samples/lots 
are related to each other
3. examine alternative mechanisms for defining the relationships among 
the variously defined individuals/organisms/samples/lots terms and 
determine how each approach can or cannot satisfy the use 
cases/competency questions.
4. use one or more mechanisms which pass test #3 to define the terms 
that are deemed necessary and include them in some TDWG standard which 
may or may not be Darwin Core.

In September 2011, John Wieczorek had packaged several of the proposed 
class additions to Darwin Core into a concrete proposal: 
.  This proposal was deferred by the Executive Committee (see the last 
comment at http://code.google.com/p/darwincore/issues/detail?id=117 ) 
"... until we can further examine broader changes including the new 
classes and any insights that might come out of the RDF Interest 
Group."  So the RDF Task Group has specifically been charged with the 
task of examining the addition of additional classes to Darwin Core and 
their implications.  The RDF TG has assembled competency questions 
http://code.google.com/p/tdwg-rdf/wiki/CompetencyQuestions and use cases 
http://code.google.com/p/tdwg-rdf/wiki/UseCases but has not moved beyond 
that.  So that's a start on Item #1 in the list above.  However, the 
process has not moved beyond that.  I recently made an appeal to the TG 
for someone to take up work on delivering some concrete progress on 
deliverables, but got no responses.  I cannot be the person to move this 
forward for two reasons.  One is that I already have my hands full with 
the DwC RDF guide (which doesn't address these issues) and the other is 
that I have reached the limits of my technical skills and am not able to 
take leadership on items #2-#4.  Who will champion this?

At the risk of making this email too long, I will add one more comment.  
There seems to be a developing consensus that an OWL ontology structured 
according to the OBO Foundry (http://www.obofoundry.org/) principles is 
the answer to #2 and #3 above.  However, I have yet to see the evidence 
that the complexity introduced by a formal OWL ontology is necessary or 
any actual concrete examples of how an OBO-style ontology would be used 
to satisfy the use cases.  We have shown with DSW that some use cases 
can be met using only simple RDF and SPARQL (i.e. no actual reasoner 
involved).  I presume that Rich and Rob W. have in hand a technical 
solution to their use cases that doesn't involve RDF at all.  So I think 
that there need to be some iterations of defining and testing before we 
adopt a technology by acclimation.  We've been down that road before 
with the TDWG Ontology and look how that turned out.


Robert Guralnick wrote:
> I agree with John and Gregor.  The term "individual" doesn't quite 
> seem to capture the concept or usage.  However, I think there is more 
> general agreement that there is a pressing need - and immediate value 
> - for a term to represent "material sample" and derivaties.  It seems 
> that the proposal on the table serves that need with the right 
> definition, that is explicit, and that provides necessary linkages to 
> other related domains.  
> Best, Rob
> On Sun, May 26, 2013 at 1:47 AM, Gregor Hagedorn 
> <g.m.hagedorn at gmail.com <mailto:g.m.hagedorn at gmail.com>> wrote:
>     > Basically, we’ve been running with the idea of an “Individual”
>     class – as
>     > originally proposed by Steve and discussed at some length on
>     this list a
>     > while ago.  This has been documented for DSW:
>     >
>     > https://code.google.com/p/darwin-sw/wiki/ClassIndividual
>     > We define an “Individual” as the physical “something” that
>     underpins an
>     > Occurrence.  In the case of organisms, this can be a group
>     (herd, school,
>     > flock, etc.), specimen (either a single specimen, or a lot of
>     multiple
>     > specimens), or any sort of derivative of a specimen (part,
>     tissue sample,
>     > dna extraction, etc.).  It corresponds to the intended meaning of
>     I disagree with using "Individual" for sets of objects. It is
>     surprising, and lacking any clear definition when to stop, that means
>     a taxon is an individual, a collection is an individual, etc.
>     _______________________________________________
>     tdwg-content mailing list
>     tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     http://lists.tdwg.org/mailman/listinfo/tdwg-content

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20130526/93f868ee/attachment.html 

More information about the tdwg-content mailing list