[tdwg-content] dwc:associatedOccurrences

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Aug 24 07:16:43 CEST 2010

Several points:
1. The definition of dwc:individualID is an identifier for "an 
individual or named group of individual organisms...".  Thus by this 
definition a small population of organisms of the same species found 
together can be considered an individual even if they are not biological 
individuals.  The definition of individual is a functional one that 
facilitates the common circumstance you describe where a collector 
collects "duplicates" that aren't really duplicate individuals but are 
considered to reliably be from the same species because the members of 
the population are in close proximity and (hopefully) the collector took 
care to collect only biological individuals of the same species. 
2. If it is discovered that a sheet has individuals of two species, a 
curator would undoubtedly create two records for the two species on the 
sheet.  I can't imagine the curator would try to somehow cram the 
metadata for the specimens of both species into the same record unless 
the curator had some kind of special way of dealing with "specimens" 
that were actually themselves collections of species.  The fact that the 
occurrences described in the two records were on the same piece of paper 
would not really be relevant other than that they would have the same 
collection date, collector, etc. in the same way as would two specimens 
of different species collected on the same day by the same collector 
placed on two different papers.
3. I do not see why cataloging the two different parts of the same 
organism as separate specimens would be any problem at all.  I've 
collected from the same tree in different seasons to get leafy stems and 
then winter twigs.  They are two dwc:Occurrences, each with its own set 
of metadata properties, and its own identifier that happen to be from 
the same individual.  People who do mark/recapture or repeated 
observations of live organisms have multiple Occurrences from the same 
individual all the time. 
4. The issue that you bring up about the problem of different 
determinations based on different specimens is not a problem if the 
determinations are associated with the individuals rather than with the 
Occurrences (i.e. specimens) as I suggest in the paper (see Fig. 8 on 
p.30).  If the assertion is made that the individual from which the 
first specimen is derived is owl:sameAs the individual from which the 
second specimen is derived, then a client capable of drawing inferences 
from RDF (e.g. a 'bot gleaning biodiversity metadata from many 
institutions to build a large database) will apply determinations linked 
to either of the individuals to both of them (since they are known to be 
the same entity).  All of the determinations applied to the (now 
singular) individual will be associated with both specimens through the 
relationship of  each specimen to its source individual (i.e. its 
dwc:individualID property).  Users of the resulting database would be 
able to examine for either specimen the entire set of determinations 
associated with the (formerly separate) individuals.  They could assess 
the temporal order of the values of the determination's 
dwc:dateIdentified properties (i.e. which is most "current") and assess 
the weight to place on a particular determination based on the identity 
of the determiner (dwc:identifiedBy property).  You say that the holders 
of the original duplicates would not be interested in the determination 
based on the acorn.  Really?  If the specimens and the acorn came from 
the same tree, the curators certainly ought to be interested because the 
tree couldn't simultaneously BE two different species (even if it had 
two determinations)!

This suggested solution isn't intended to address all kinds of 
associations, just the case of duplicates.  Some one else can take that 
one on... :-)

Bob Morris wrote:
> Good idea, but it suffers from the same fate as might
> associatedOccurrences  (not previously mentioned because I was after
> some clarification in principle, with the herbarium duplicate sheets
> only one current case of interest): I need to follow whatever the
> community practice is of regarding a sheet as part of a duplicate set
> distributed by the original collector.  I'm told by the people at the
> Harvard University Herbaria that "duplicate" usually, but not always,
> means from the same organism and same collection event---occasionally
> people used to put several organisms on the same sheet, raising the
> possibility that they are not even the same taxon. Worse,  the
> different parts of the same organism might be catalogued as separate
> specimens. In this case, an assertion that they are from the same
> individual might be true and understandable, but the utility of that
> assertion depends on your purpose. Consider a use case in which one
> set of traditional duplicates all have a determination that is out of
> date, but another specimen---say your acorn collected later from the
> same tree---has a current determination.  For purposes of notifying
> duplicate holders that a new determination has been made to the
> original, the later acorn may not be interesting. This means that for
> this use, a distributed query of the form "find all records with the
> same dwc:individualID" is not as useful as "find all records with the
> same dwc:eventID".
> Also, as Mark writes, it doesn't address any other associatedOccurrences.
> More generally, we are working on annotations of data records.
> Probably what the real issue here is that associatedOccurrences is an
> assertion about organisms, and we are making assertions about
> occurrence data.
> On Mon, Aug 23, 2010 at 3:07 PM, Steve Baskauf
> <steve.baskauf at vanderbilt.edu> wrote:
>> Bob,
>> It seems to me that the most semantically clear way to indicate in a
>> machine-readable way that two herbarium sheets are duplicates would be to
>> assert that they have the same dwc:individualID.  individualID is defined as
>> "An identifier for an individual or named group of individual organisms
>> represented in the Occurrence" so asserting that two occurrences represent
>> the same individual or named group of individual organisms pretty much
>> exactly describes what duplicate specimens are.  I use this same approach to
>> indicate that
>> http://bioimages.vanderbilt.edu/baskauf/67307
>> is an image of an acorn from the same tree:
>> http://bioimages.vanderbilt.edu/ind-baskauf/67304
>> as the bark image
>> http://bioimages.vanderbilt.edu/baskauf/67312
>> I won't say more here as I have written more extensively on this approach in
>> Biodiversity Informatics 7:17-44
>> (https://journals.ku.edu/index.php/jbi/article/view/3664).  You can also
>> look at the RDF associated with those GUIDs to see what I mean.  Solving
>> this problem is also one of the reasons I have proposed adding the class
>> Individual to DwC (i.e. so that the individuals that are the object of
>> dwc:individualID can be rdfs:type'd using a well-known vocabulary and
>> therefore be "understood" by linked data clients).
>> Steve
>> Bob Morris wrote:
>> http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences   carries
>> this description:
>> associatedOccurrences
>> Identifier:	http://rs.tdwg.org/dwc/terms/associatedOccurrences
>> Class:	http://rs.tdwg.org/dwc/terms/Occurrence
>> Definition:	A list (concatenated and separated) of identifiers of
>> other Occurrence records and their associations to this Occurrence.
>> Comment:	Example: "sibling of FMNH:Mammal:1234; sibling of
>> FMNH:Mammal:1235". For discussion see
>> http://code.google.com/p/darwincore/wiki/Occurrence
>> Details:	associatedOccurrences
>> My questions:
>> a.  Are the names of the associations, and/or the syntax of the value
>> meant to be community defined?
>> b. If no to a. , where are those definitions? If yes, Have any
>> communities defined any names and syntax? I am especially interested
>> in "duplicate of" in the case of herbarium sheets."
>> c. (May share an answer with b.) Is there any use being made by anyone
>> in which associatedOccurrences is designed to have machine-readable
>> values.  If yes, where?
>> Thanks
>> Bob
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20100824/afe96854/attachment.html 

More information about the tdwg-content mailing list