I have had several off-list email exchanges about this topic and have been encouraged to bring the topic to this list for comment.  Along with addressing this topic specifically, I'd like to include clarification of how I view Occurrences, based partly on emails written during the October/November tdwg-content list discussion regarding the conflict between Darwin Core and Dublin Core usage of dcterms:type, partly on some off-list email discussions about various DwC terms and their appropriate application, and partly on my own view of the relationship between Occurrences and the proposed Darwin Core class Individual.  If I have misrepresented anything, please reply to the list with your perspective on the situation.   (The following namespaces are used for brevity: dcterms="http://purl.org/dc/terms/",  dcmitype="http://purl.org/dc/dcmitype/", dwc="http://rs.tdwg.org/dwc/terms/", and dwctype="http://rs.tdwg.org/dwc/dwctype/".)
 

A. CHARACTERISTICS OF MEMBERS OF THE CLASS dwc:Occurrence

1. Instances of the DwC class Occurrence represent evidence that a particular organism existed at some point in time.  Thus a photo of a certain bird would qualify as an Occurrence, while a painting that was made by looking at a number of stuffed and live birds (but no particular one bird) would not be an Occurrence. 

2. Categorization of a resource as an Occurrence does not imply fitness of use for any particular purpose.  An Occurrence may or may not be useful for any of the following purposes:  serve to document the presence of an organism in nature at a particular place and time (i.e. to help model the natural distribution of a species), serve as a teaching tool, facilitate an identification key, illustrate a character state, provide an image for a trademark of the museum, etc. 

3. An Occurrence may be typed using dcterms:type (having values such as dcmitype:PhysicalObject, dcmitype:Sound, or dcmitype:StillImage) and dwc:basisOfRecord (having values such as dwctype:PreservedSpecimen, dwctype:LivingSpecimen, and dwctype:HumanObservation).  However, the typing of an Occurrence again does not imply fitness of use for any particular purpose.  An Occurrence of a particular type may be used for any or all purposes for which it is useful. 

 

B. HOW DO WE KNOW WHICH OCCURRENCES ARE USEFUL FOR DOCUMENTING DISTRIBUTIONS?

Having made the point that an Occurrence does not have an implied fitness for any particular use, it is true that one of the most common uses of DwC in the past has been to describe the metadata required to document the presence of organisms for the purpose of describing species' distributions.   I don't know if there is an official term for this use, but I'm going to refer to it as the "distribution documentation" use.  In the current Darwin Core standard, this use is facilitated by the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans), which is defined as "the process by which the biological individual(s) represented in the Occurrence became established at the location."  dwc:establishmentMeans helps a metadata user to assess the fitness of a record for distribution documentation by differentiating between individual organisms that occur naturally and those that do not.  Because dwc:establishmentMeans allows for values that span the range of native to adventive/naturalized to cultivated, a metadata user can select a level of stringency when searching.  dwc:establishmentMeans assumes that any type of Occurrence can be used to document the individual.  However, not all Occurrences are suitable for distribution documentation.  So an important question is: under what circumstances do Occurrences provide useful information for documenting a species distribution?  Another important question which should be addressed is: which Occurrence records should have dwc:establishmentMeans as a property? (or should dwc:establishmentMeans should be a property of something other than Occurrences?) 

 

C. EXAMPLES

In discussing these questions, I will use some examples which range from a relatively simple circumstance (preserved specimens in a museum or herbarium collection) to more complex networks of various kinds of resources.

Example 1: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-herbarium.gif

Example 2: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-insect.gif

Example 3: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-botanical.gif

 

The starting point in all of these example diagrams is an Individual organism found in the environment.  By Individual I mean the object of the term dwc:individualID, which is the entity that a dwcterm:Occurrence documents and is described by the class proposed for DwC at:  http://code.google.com/p/darwincore/issues/detail?id=69.  (As a practical matter an "Individual" may also represent a small population of organisms of the same species.)  This Individual is the entity that dwc:establishmentMeans describes. 

 

The Occurrence resources that are created are represented by the rectangles in the diagram.  In these examples they are primarily specimens and images but they could also be other types of resources.  Each Occurrence is associated with a particular dcmitype:Event in which the Occurrence resource was created (an "Occurrence resource creation event" described by the date/time and dcterms:Location of the event and represented by arrows in the diagram).  Because of the one-to-one relationship between an Occurrence resource and the Event in which it is created, I have chosen to think of the metadata for both things as a single unit.  One could also divide them conceptually into two separate entities, i.e. different resources having different identifiers.  Whether this approach is desirable or not should probably be the subject of a separate thread.

 

D. WHICH OCCURRENCES CONTAIN LOCATION INFORMATION ABOUT INDIVIDUALS?

Since each Occurrence resource has location metadata associated with its resource creation event, each one has the potential for providing information to document a distribution.  However it is clear from these diagrams that the location metadata for many of these Occurrences has no use for distribution documentation because the Occurrences were created in locations that have no relationship to the Individual (e.g. the location in a lab where a DNA sample was extracted or in a museum where an image was created from a specimen).  Examination of all three examples shows that the resource creation events for Occurrences that are derived directly from the Individual (highlighted in gray) provide useful information about where the Individual was at a particular time (i.e. distribution documentation), while the resource creation events for all of the other Occurrences do not.  (Note that whether or not an Occurrence documents the distribution of the Individual's species is not related to the type [dcterms:type or dwc:basisOfRecord] of the resource.)  It seems to me that there is a need for a Darwin Core term that indicates whether the location metadata for a particular Occurrence resource creation event is useful for distribution documentation - I can't see any terms currently in the standard that can serve this function.  Perhaps term could be created such as "documentsDistribution" having values of "true" for those derived directly from the Individual and "false" for those that are not. 

 

E. TO WHAT DOES dwc:establishmentMeans APPLY?

Currently dwc:establishmentMeans is assigned to the class Occurrence.  So theoretically, a value for the dwc:establishmentMeans of the Individual could be assigned to any Occurrence resource/resource creation event derived from the Individual regardless of how far the Occurrence is removed from the Individual through multiple resource creation events.  I believe that this would probably be a bad idea, since the person creating the metadata for more distantly derived resources (i.e. a technician doing a DNA extraction or photographing a leg on a pinned specimen in example 2) may not have a clue about the status of the original Individual.  At best, that person would just copy the value of dwc:establishmentMeans from the metadata for some other Occurrence resource derived more directly from the Individual, which would be rather pointless. 

 

Alternatively, dwc:establishmentMeans could be assigned only to Occurrence resources/resource creation events for which "documentsDistribution"="true".  One could argue that each collector/observer (i.e. the object of dwc:recordedBy) of an Occurrence derived directly from the Individual could theoretically make an independent assessment of the correct value of dwc:establishmentMeans for the Individual.  It could also be argued that dwc:establishmentMeans should be associated with particular Occurrences because if an Individual's dwc:establishmentMeans status changed over time, then multiple Occurrences recorded from that Individual over time could have different values of dwc:establishmentMeans.  However, it is difficult for me to imagine circumstances under which this would happen.  An Individual that was "cultivated" is not likely to become "native" or vice-versa.  The one situation I can imagine is if a "native" (or "adventive/naturalized") individual were collected and moved to a zoo or botanical garden.  However, in that circumstance, upon collection the organism would cease to be in its natural environment and would conceptually become an Occurrence of type dwctype:LivingSpecimen rather than an Individual.  In such a circumstance, the value of dwc:establishmentMeans for the LivingSpecimen metadata should be "native" (or "adventive/naturalized") because establishmentMeans refers to "the biological individual(s) represented in the Occurrence", not to the Occurrence itself, and at the time of its collection, the Individual's establishmentMeans was "native" (or "adventive/naturalized"). 

 

To me, the most logical thing would be to re-assign dwc:establishmentMeans from the class dwc:Occurrence to the proposed Darwin Core class Individual.  That makes sense to me because after all, dwc:establishmentMeans is supposed to tell us something about an Individual.  However, there could be a couple problems with this.  One is that it is likely that many collectors of preserved specimens (e.g. who have simple resource creation circumstances such as in example 1) are not going to see any reason to bother with the concept of Individuals.  If they do not create records for Individuals, then in what context will they assign a value of dwc:establishmentMeans?  The other problem that I foresee is if two different collector/observers created recorded Occurrences from the same Individual but assigned the Individual different values of dwc:establishmentMeans.  When those two Individual records are merged into one, there would either have to be some kind of rule for determining which value of dwc:establishmentMeans to use, or two values of dwc:establishmentMeans would have to be allowed for a single Individual.  However, it seems to me that this circumstance would probably occur only very rarely.

 

F. SUMMARY

It seems to me that two things are needed here to meet the needs of users who create complex networks of Occurrence resources such as in Examples 2 and 3. 

1. A term such as "documentsDistribution" is needed to unambiguously indicate whether the dcterms:Location and dcmitype:Event (i.e. resource creation event) metadata for a particular Occurrence provide useful information about the Individual that the Occurrence represents.  

2. Clarification of whether the term dwc:establishmentMeans should be a part of the metadata for individual Occurrence resources (and if so, which ones) or whether it should be moved to the proposed class Individual.

 Comments???

Steve Baskauf

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu