Indicating which Occurrence resources are useful for documenting distributions
I have had several off-list email exchanges about this topic and have been encouraged to bring the topic to this list for comment. Along with addressing this topic specifically, I'd like to include clarification of how I view Occurrences, based partly on emails written during the October/November tdwg-content list discussion regarding the conflict between Darwin Core and Dublin Core usage of dcterms:type, partly on some off-list email discussions about various DwC terms and their appropriate application, and partly on my own view of the relationship between Occurrences and the proposed Darwin Core class Individual. If I have misrepresented anything, please reply to the list with your perspective on the situation. (The following namespaces are used for brevity: dcterms="http://purl.org/dc/terms/", dcmitype="http://purl.org/dc/dcmitype/", dwc="http://rs.tdwg.org/dwc/terms/", and dwctype="http://rs.tdwg.org/dwc/dwctype/".)
A. CHARACTERISTICS OF MEMBERS OF THE CLASS dwc:Occurrence
1. Instances of the DwC class Occurrence represent evidence that a particular organism existed at some point in time. Thus a photo of a certain bird would qualify as an Occurrence, while a painting that was made by looking at a number of stuffed and live birds (but no particular one bird) would not be an Occurrence.
2. Categorization of a resource as an Occurrence does not imply fitness of use for any particular purpose. An Occurrence may or may not be useful for any of the following purposes: serve to document the presence of an organism in nature at a particular place and time (i.e. to help model the natural distribution of a species), serve as a teaching tool, facilitate an identification key, illustrate a character state, provide an image for a trademark of the museum, etc.
3. An Occurrence may be typed using dcterms:type (having values such as dcmitype:PhysicalObject, dcmitype:Sound, or dcmitype:StillImage) and dwc:basisOfRecord (having values such as dwctype:PreservedSpecimen, dwctype:LivingSpecimen, and dwctype:HumanObservation). However, the typing of an Occurrence again does not imply fitness of use for any particular purpose. An Occurrence of a particular type may be used for any or all purposes for which it is useful.
B. HOW DO WE KNOW WHICH OCCURRENCES ARE USEFUL FOR DOCUMENTING DISTRIBUTIONS?
Having made the point that an Occurrence does not have an implied fitness for any particular use, it is true that one of the most common uses of DwC in the past has been to describe the metadata required to document the presence of organisms for the purpose of describing species' distributions. I don't know if there is an official term for this use, but I'm going to refer to it as the "distribution documentation" use. In the current Darwin Core standard, this use is facilitated by the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans), which is defined as "the process by which the biological individual(s) represented in the Occurrence became established at the location." dwc:establishmentMeans helps a metadata user to assess the fitness of a record for distribution documentation by differentiating between individual organisms that occur naturally and those that do not. Because dwc:establishmentMeans allows for values that span the range of native to adventive/naturalized to cultivated, a metadata user can select a level of stringency when searching. dwc:establishmentMeans assumes that any type of Occurrence can be used to document the individual. However, not all Occurrences are suitable for distribution documentation. So an important question is: under what circumstances do Occurrences provide useful information for documenting a species distribution? Another important question which should be addressed is: which Occurrence records should have dwc:establishmentMeans as a property? (or should dwc:establishmentMeans should be a property of something other than Occurrences?)
C. EXAMPLES
In discussing these questions, I will use some examples which range from a relatively simple circumstance (preserved specimens in a museum or herbarium collection) to more complex networks of various kinds of resources.
Example 1: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-herbarium.gif
Example 2: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-insect.gif
Example 3: http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-botanical.gif
The starting point in all of these example diagrams is an Individual organism found in the environment. By Individual I mean the object of the term dwc:individualID, which is the entity that a dwcterm:Occurrence documents and is described by the class proposed for DwC at: http://code.google.com/p/darwincore/issues/detail?id=69. (As a practical matter an "Individual" may also represent a small population of organisms of the same species.) This Individual is the entity that dwc:establishmentMeans describes.
The Occurrence resources that are created are represented by the rectangles in the diagram. In these examples they are primarily specimens and images but they could also be other types of resources. Each Occurrence is associated with a particular dcmitype:Event in which the Occurrence resource was created (an "Occurrence resource creation event" described by the date/time and dcterms:Location of the event and represented by arrows in the diagram). Because of the one-to-one relationship between an Occurrence resource and the Event in which it is created, I have chosen to think of the metadata for both things as a single unit. One could also divide them conceptually into two separate entities, i.e. different resources having different identifiers. Whether this approach is desirable or not should probably be the subject of a separate thread.
D. WHICH OCCURRENCES CONTAIN LOCATION INFORMATION ABOUT INDIVIDUALS?
Since each Occurrence resource has location metadata associated with its resource creation event, each one has the potential for providing information to document a distribution. However it is clear from these diagrams that the location metadata for many of these Occurrences has no use for distribution documentation because the Occurrences were created in locations that have no relationship to the Individual (e.g. the location in a lab where a DNA sample was extracted or in a museum where an image was created from a specimen). Examination of all three examples shows that the resource creation events for Occurrences that are derived directly from the Individual (highlighted in gray) provide useful information about where the Individual was at a particular time (i.e. distribution documentation), while the resource creation events for all of the other Occurrences do not. (Note that whether or not an Occurrence documents the distribution of the Individual's species is not related to the type [dcterms:type or dwc:basisOfRecord] of the resource.) It seems to me that there is a need for a Darwin Core term that indicates whether the location metadata for a particular Occurrence resource creation event is useful for distribution documentation - I can't see any terms currently in the standard that can serve this function. Perhaps term could be created such as "documentsDistribution" having values of "true" for those derived directly from the Individual and "false" for those that are not.
E. TO WHAT DOES dwc:establishmentMeans APPLY?
Currently dwc:establishmentMeans is assigned to the class Occurrence. So theoretically, a value for the dwc:establishmentMeans of the Individual could be assigned to any Occurrence resource/resource creation event derived from the Individual regardless of how far the Occurrence is removed from the Individual through multiple resource creation events. I believe that this would probably be a bad idea, since the person creating the metadata for more distantly derived resources (i.e. a technician doing a DNA extraction or photographing a leg on a pinned specimen in example 2) may not have a clue about the status of the original Individual. At best, that person would just copy the value of dwc:establishmentMeans from the metadata for some other Occurrence resource derived more directly from the Individual, which would be rather pointless.
Alternatively, dwc:establishmentMeans could be assigned only to Occurrence resources/resource creation events for which "documentsDistribution"="true". One could argue that each collector/observer (i.e. the object of dwc:recordedBy) of an Occurrence derived directly from the Individual could theoretically make an independent assessment of the correct value of dwc:establishmentMeans for the Individual. It could also be argued that dwc:establishmentMeans should be associated with particular Occurrences because if an Individual's dwc:establishmentMeans status changed over time, then multiple Occurrences recorded from that Individual over time could have different values of dwc:establishmentMeans. However, it is difficult for me to imagine circumstances under which this would happen. An Individual that was "cultivated" is not likely to become "native" or vice-versa. The one situation I can imagine is if a "native" (or "adventive/naturalized") individual were collected and moved to a zoo or botanical garden. However, in that circumstance, upon collection the organism would cease to be in its natural environment and would conceptually become an Occurrence of type dwctype:LivingSpecimen rather than an Individual. In such a circumstance, the value of dwc:establishmentMeans for the LivingSpecimen metadata should be "native" (or "adventive/naturalized") because establishmentMeans refers to "the biological individual(s) represented in the Occurrence", not to the Occurrence itself, and at the time of its collection, the Individual's establishmentMeans was "native" (or "adventive/naturalized").
To me, the most logical thing would be to re-assign dwc:establishmentMeans from the class dwc:Occurrence to the proposed Darwin Core class Individual. That makes sense to me because after all, dwc:establishmentMeans is supposed to tell us something about an Individual. However, there could be a couple problems with this. One is that it is likely that many collectors of preserved specimens (e.g. who have simple resource creation circumstances such as in example 1) are not going to see any reason to bother with the concept of Individuals. If they do not create records for Individuals, then in what context will they assign a value of dwc:establishmentMeans? The other problem that I foresee is if two different collector/observers created recorded Occurrences from the same Individual but assigned the Individual different values of dwc:establishmentMeans. When those two Individual records are merged into one, there would either have to be some kind of rule for determining which value of dwc:establishmentMeans to use, or two values of dwc:establishmentMeans would have to be allowed for a single Individual. However, it seems to me that this circumstance would probably occur only very rarely.
F. SUMMARY
It seems to me that two things are needed here to meet the needs of users who create complex networks of Occurrence resources such as in Examples 2 and 3.
1. A term such as "documentsDistribution" is needed to unambiguously indicate whether the dcterms:Location and dcmitype:Event (i.e. resource creation event) metadata for a particular Occurrence provide useful information about the Individual that the Occurrence represents.
2. Clarification of whether the term dwc:establishmentMeans should be a part of the metadata for individual Occurrence resources (and if so, which ones) or whether it should be moved to the proposed class Individual.
Comments???
Steve Baskauf
This is a follow-up to the email that I just sent to the list. It is an issue that I see as related to the issues I raised in the other email.
This message is related to a DwC clarification given by John Wieczorek in http://lists.tdwg.org/pipermail/tdwg-content/2010-January/000219.html
in response to my post
http://lists.tdwg.org/pipermail/tdwg-content/2009-December/000201.html
With regards to John's email, it seems clear to me that the subject of dwc:recordedBy should be Occurrences derived directly from Individuals, i.e. those which in the examples of my previous email which have gray arrows and would have been given "documentsDistribution" values of "True".
That is a more specific way of defining the applicability of recordedBy than the word "original" in the term definition.
I agree with what John said in his post that the dcterms:creator may not be the same as dwc:collector in many circumstances, but I think in his example of a specimen preparator, I would provide the name of the institution as dcterms:creator rather than the person who was the specimen preparator. I suppose that could be a matter of preference - it's probably more important that the dwc:collector be properly recorded.
I'm not sure what I think about John's statement that the dwc:recordedBy value for Occurrence resources with "documentsDistribution" values of true should be applied to all of the resources derived from them. The problem is that if the chain of derivation becomes too complicated, it becomes unclear what the "original" Occurrence is. What about the situation in example 3:
http://people.vanderbilt.edu/~steve.baskauf/conceptual-scheme-botanical.gif
which is a relatively realistic circumstance in a botanical garden? Would I give a dwc:recordedBy value to the DNA sample, and if I did would it be the name of the person who collected the seed, the person who planted the seed in the garden, or the person who collected the DNA sample from the living specimen? It seems to me that it would be better to recommend that a value for dwc:recordedBy be given for all Occurrences that would have a "documentsDistribution" value of true, but to make its assignment optional for any resource where "documentsDistribution"=false. If users want to know who collected the "original" Occurrence resource, let them trace the origin of the Occurrence resource back to the first resource that was derived from the Individual (how to do that is another can of worms that I will not open here, although I do have an answer to that question).
Steve Baskauf
participants (1)
-
Steve Baskauf