Consensus on what constitutes an Occurrence? (was Re: New Darwin Core terms proposed relating to material samples)
I have quoted below part of an email which has been sitting in my inbox for a month. It been stuck there because there was a statement in it that (in my mind) needed clarification. In John Deck's email, he says "...since an Occurrence represents an organism at a place and time...". What I am wondering is whether there is actually a consensus that an Organism represents an organism at a place and time.
Caveat: I use "individual organism" here in a general way that probably includes more than individual organisms. But that is a different issue, so let's not rehash that in this thread.
The history of the discussion of the meaning of Occurrence is extensive. You can find my attempt to summarize it at: http://code.google.com/p/darwin-sw/wiki/ClassOccurrence so I won't repeat that here. In a nutshell, it seems to me that people have used dwc:Occurrence in three general ways:
- to indicate that we know from aggregate records that a taxon occurs or ever occurred, in a particular geographic area (the "checklist" meaning of Occurrence) - as a broad term that includes both preserved specimens and observations (the "superclass" meaning of Occurrence) - as a join between Events and individual organisms [database description]/as a node connecting Event instances to instances of individual organisms [RDF description]/as a tuple of (individual organism,Event) with properties to connect it to the individual organism and Event [computer science description] (the "node" meaning of Occurrence).
It has been noted that the "checklist" meaning of Occurrence is related to Occurrence as a primary unit of data gathering ("superclass" and "node" meanings; see history reference for details) but the "checklist" meaning is probably the least likely to be considered a consensus view, so I'm going to ignore it for the moment. The "node" meaning of occurrence corresponds to what is described by John Deck (quoting Markus Döring) in his email below. It is also the view taken by Darwin-SW and is reflected in Rich Pyle's emails (related since Darwin-SW was influenced by Rich Pyle's emails!). However, although it isn't explicitly stated as such, the Darwin Core standard as it currently stands really reflects the "superclass" meaning. I was involved in a conversation with John Wieczorek a few months ago which was on the topic of "fixing" dwc:Occurrence (i.e. getting rid of the ambiguity surrounding it). In that conversation, I confirmed with John W. that as things stand currently, Darwin Core effectively considers dwc:Occurrence to be a superclass of PreservedSpecimen and Observation. So to me it does not seem that there actually is a consensus about what dwc:Occurrence means. Is an Occurrence the *thing* that documents the presence of an organism at a place and time ("superclass" meaning), or is the Occurrence an *abstract resource* connecting organisms to place/time with the thing itself as documentation for the abstract resource ("node" meaning)?
In order to "fix" Occurrence by clarifying its meaning, it seems to me that there are two courses of action:
1. Declare clearly that Occurrence is a superclass of PreservedSpecimen and Observation and create a new term for the more abstract "organism at a place and time". 2. Declare clearly that Occurrence is an organism at a place and time and that it is NOT a superclass of PreservedSpecimen and Observation.
The second course of action would be the easiest from the standpoint of making a change to the standard. However, it might be the worst from an implementation standpoint because of the thousands (millions?) of specimen records that are typed as Occurrence.
If we can clarify these two uses of Occurrence, then the terms currently listed in DwC under the dwc:Occurrence class could be separated among the two "kinds" of Occurrence. Terms related to the recording of the presence of an organism at a time and place (dwc:recordedBy, dwc:behavior, etc.) would be separated from terms related to the specimens themselves (dwc:preparations, dwc:disposition, etc.). This may not seem like a big deal for flat specimen records, but it would be very helpful from the standpoint of advancing the use of DwC in RDF to clarify the types of resources that these terms can serve as properties of.
I would be interested in hearing some discussion about concrete steps that could be taken to "fix" Occurrence. The "best" solution would probably be to create a robust consensus ontology that includes Occurrence. However, that is not likely to happen on the timescale of a year or less. Given that this issue has dragged on for at least two years already, in the interest of moving forward it would be good to take some kind of decisive action in the near term.
Steve
-------- Original Message -------- Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples Date: Wed, 29 May 2013 16:00:35 +0200 From: John Deck jdeck@berkeley.edu To: Richard Pyle deepreef@bishopmuseum.org CC: Markus Döring m.doering@mac.com, Steve Baskauf steve.baskauf@vanderbilt.edu, TDWG Content Mailing List tdwg-content@lists.tdwg.org, Robert Whitton whittonr@gmail.com, "Ramona Walls" rlwalls2008@gmail.com References:
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
...
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not
...
Steve,
I agree with you that there is no consensus on what is a Darwin Core occurrence. In fact, the situation is worse than you describe, since you don't mention the "http://rs.tdwg.org/dwc/terms/type-vocabulary/" ("dwctype:") namespace below. (I assume that by "dwc:", you mean "http://rs.tdwg.org/dwc/terms/".) As you know, part of the confusion about occurrences is that the term exists in two distinct Darwin Core namespaces.
In fact, I've been meaning to ask you something ... The DwC RDF guide (correctly, IMO) specifies that occurrences should be rdf:typed using the dwctype:Occurrence class [1]. But Darwin-SW uses the dwc:Occurrence class as the rdf:type of occurrence records. I also used dwc:Occurrence when representing bioblitz occurrence. So my question is: Were we both wrong, and should we remove our (incorrectly typed?) occurrence records from the web? Or are you saying that it's hard to say what's right and what's wrong, since the documentation and existing usage is inconsistent?
Best, Joel.
1. https://docs.google.com/document/d/1OLyVFuveGX1a0Yt6Niok9FfGxkrghii-xlGYhwO8... Section 2.3.1.5
On Mon, 24 Jun 2013, Steve Baskauf wrote:
I have quoted below part of an email which has been sitting in my inbox for a month. It been stuck there because there was a statement in it that (in my mind) needed clarification. In John Deck's email, he says "...since an Occurrence represents an organism at a place and time...". What I am wondering is whether there is actually a consensus that an Organism represents an organism at a place and time.
Caveat: I use "individual organism" here in a general way that probably includes more than individual organisms. But that is a different issue, so let's not rehash that in this thread.
The history of the discussion of the meaning of Occurrence is extensive. You can find my attempt to summarize it at: http://code.google.com/p/darwin-sw/wiki/ClassOccurrence so I won't repeat that here. In a nutshell, it seems to me that people have used dwc:Occurrence in three general ways:
- to indicate that we know from aggregate records that a taxon occurs or
ever occurred, in a particular geographic area (the "checklist" meaning of Occurrence)
- as a broad term that includes both preserved specimens and observations
(the "superclass" meaning of Occurrence)
- as a join between Events and individual organisms [database
description]/as a node connecting Event instances to instances of individual organisms [RDF description]/as a tuple of (individual organism,Event) with properties to connect it to the individual organism and Event [computer science description] (the "node" meaning of Occurrence).
It has been noted that the "checklist" meaning of Occurrence is related to Occurrence as a primary unit of data gathering ("superclass" and "node" meanings; see history reference for details) but the "checklist" meaning is probably the least likely to be considered a consensus view, so I'm going to ignore it for the moment. The "node" meaning of occurrence corresponds to what is described by John Deck (quoting Markus Döring) in his email below. It is also the view taken by Darwin-SW and is reflected in Rich Pyle's emails (related since Darwin-SW was influenced by Rich Pyle's emails!). However, although it isn't explicitly stated as such, the Darwin Core standard as it currently stands really reflects the "superclass" meaning. I was involved in a conversation with John Wieczorek a few months ago which was on the topic of "fixing" dwc:Occurrence (i.e. getting rid of the ambiguity surrounding it). In that conversation, I confirmed with John W. that as things stand currently, Darwin Core effectively considers dwc:Occurrence to be a superclass of PreservedSpecimen and Observation. So to me it does not seem that there actually is a consensus about what dwc:Occurrence means. Is an Occurrence the *thing* that documents the presence of an organism at a place and time ("superclass" meaning), or is the Occurrence an *abstract resource* connecting organisms to place/time with the thing itself as documentation for the abstract resource ("node" meaning)?
In order to "fix" Occurrence by clarifying its meaning, it seems to me that there are two courses of action:
- Declare clearly that Occurrence is a superclass of PreservedSpecimen and
Observation and create a new term for the more abstract "organism at a place and time". 2. Declare clearly that Occurrence is an organism at a place and time and that it is NOT a superclass of PreservedSpecimen and Observation.
The second course of action would be the easiest from the standpoint of making a change to the standard. However, it might be the worst from an implementation standpoint because of the thousands (millions?) of specimen records that are typed as Occurrence.
If we can clarify these two uses of Occurrence, then the terms currently listed in DwC under the dwc:Occurrence class could be separated among the two "kinds" of Occurrence. Terms related to the recording of the presence of an organism at a time and place (dwc:recordedBy, dwc:behavior, etc.) would be separated from terms related to the specimens themselves (dwc:preparations, dwc:disposition, etc.). This may not seem like a big deal for flat specimen records, but it would be very helpful from the standpoint of advancing the use of DwC in RDF to clarify the types of resources that these terms can serve as properties of.
I would be interested in hearing some discussion about concrete steps that could be taken to "fix" Occurrence. The "best" solution would probably be to create a robust consensus ontology that includes Occurrence. However, that is not likely to happen on the timescale of a year or less. Given that this issue has dragged on for at least two years already, in the interest of moving forward it would be good to take some kind of decisive action in the near term.
Steve
-------- Original Message -------- Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples Date: Wed, 29 May 2013 16:00:35 +0200 From: John Deck jdeck@berkeley.edu To: Richard Pyle deepreef@bishopmuseum.org CC: Markus Döring m.doering@mac.com, Steve Baskauf steve.baskauf@vanderbilt.edu, TDWG Content Mailing List tdwg-content@lists.tdwg.org, Robert Whitton whittonr@gmail.com, "Ramona Walls" rlwalls2008@gmail.com References:
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
...
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not
...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
Joel, Yes, I purposefully avoided appending a namespace before "Occurrence" to avoid complicating the email with the issue you raise. When we discussed the dual class definitions (dwc: general namespace classes such as dwc:Occurrence and dwctype: type vocabulary namespace classes such as dwctype:Occurrence) John Wieczorek felt that it made sense to use the type vocabulary for rdf:type declarations and I agreed with this position. The type vocabulary already contains a number of classes that aren't declared in the general dwc: namespace, so it would require fewer changes to put all of the classes recommended for use with rdf:type into the type vocabulary. This would include the new proposed dwctype:MaterialSample . The existing dwc: classes would remain as organizational suggestions, but would not have a role in RDF typing.
Yes, the DwC RDF Guide specifies the use of dwctype:Occurrence with rdf:type (assuming that people can figure out what an Occurrence is). Darwin-SW made the coin flip in the other direction. However, Cam and I intend to advance Darwin-SW to a next version which will be in line with the DwC RDF recommendations. We haven't done that yet because we wanted to wait to see if the DwC RDF Guide was going to fly or not first. I don't know how many records there are on the web that use Darwin-SW at present. I've got about 15000 distinct RDF files based on it, but I intend to make them conform with the DwC RDF Guide recommendations when the recommendations are finished. Again, I'm not going to do that until I know if the Guide is going to fly. There is some possibility that we might "break" something by changing Darwin-SW, but since Darwin-SW assumes using dwc: namespace Darwin Core properties for most of the datatype property triples and since some people are likely to be using the dwc: namespace terms in ways that conflict with the DwC RDF Guide anyway, there will probably be many things that will have to be cleaned up in RDF that uses Darwin Core. But better to do that now when most of the triples that are out there are test implementations than later when millions of triples have been exposed.
Steve
joel sachs wrote:
Steve,
I agree with you that there is no consensus on what is a Darwin Core occurrence. In fact, the situation is worse than you describe, since you don't mention the "http://rs.tdwg.org/dwc/terms/type-vocabulary/" ("dwctype:") namespace below. (I assume that by "dwc:", you mean "http://rs.tdwg.org/dwc/terms/".) As you know, part of the confusion about occurrences is that the term exists in two distinct Darwin Core namespaces.
In fact, I've been meaning to ask you something ... The DwC RDF guide (correctly, IMO) specifies that occurrences should be rdf:typed using the dwctype:Occurrence class [1]. But Darwin-SW uses the dwc:Occurrence class as the rdf:type of occurrence records. I also used dwc:Occurrence when representing bioblitz occurrence. So my question is: Were we both wrong, and should we remove our (incorrectly typed?) occurrence records from the web? Or are you saying that it's hard to say what's right and what's wrong, since the documentation and existing usage is inconsistent?
Best, Joel.
https://docs.google.com/document/d/1OLyVFuveGX1a0Yt6Niok9FfGxkrghii-xlGYhwO8...
Section 2.3.1.5
On Mon, 24 Jun 2013, Steve Baskauf wrote:
I have quoted below part of an email which has been sitting in my inbox for a month. It been stuck there because there was a statement in it that (in my mind) needed clarification. In John Deck's email, he says "...since an Occurrence represents an organism at a place and time...". What I am wondering is whether there is actually a consensus that an Organism represents an organism at a place and time.
Caveat: I use "individual organism" here in a general way that probably includes more than individual organisms. But that is a different issue, so let's not rehash that in this thread.
The history of the discussion of the meaning of Occurrence is extensive. You can find my attempt to summarize it at: http://code.google.com/p/darwin-sw/wiki/ClassOccurrence so I won't repeat that here. In a nutshell, it seems to me that people have used dwc:Occurrence in three general ways:
- to indicate that we know from aggregate records that a taxon occurs or
ever occurred, in a particular geographic area (the "checklist" meaning of Occurrence)
- as a broad term that includes both preserved specimens and
observations (the "superclass" meaning of Occurrence)
- as a join between Events and individual organisms [database
description]/as a node connecting Event instances to instances of individual organisms [RDF description]/as a tuple of (individual organism,Event) with properties to connect it to the individual organism and Event [computer science description] (the "node" meaning of Occurrence).
It has been noted that the "checklist" meaning of Occurrence is related to Occurrence as a primary unit of data gathering ("superclass" and "node" meanings; see history reference for details) but the "checklist" meaning is probably the least likely to be considered a consensus view, so I'm going to ignore it for the moment. The "node" meaning of occurrence corresponds to what is described by John Deck (quoting Markus Döring) in his email below. It is also the view taken by Darwin-SW and is reflected in Rich Pyle's emails (related since Darwin-SW was influenced by Rich Pyle's emails!). However, although it isn't explicitly stated as such, the Darwin Core standard as it currently stands really reflects the "superclass" meaning. I was involved in a conversation with John Wieczorek a few months ago which was on the topic of "fixing" dwc:Occurrence (i.e. getting rid of the ambiguity surrounding it). In that conversation, I confirmed with John W. that as things stand currently, Darwin Core effectively considers dwc:Occurrence to be a superclass of PreservedSpecimen and Observation. So to me it does not seem that there actually is a consensus about what dwc:Occurrence means. Is an Occurrence the *thing* that documents the presence of an organism at a place and time ("superclass" meaning), or is the Occurrence an *abstract resource* connecting organisms to place/time with the thing itself as documentation for the abstract resource ("node" meaning)?
In order to "fix" Occurrence by clarifying its meaning, it seems to me that there are two courses of action:
- Declare clearly that Occurrence is a superclass of
PreservedSpecimen and Observation and create a new term for the more abstract "organism at a place and time". 2. Declare clearly that Occurrence is an organism at a place and time and that it is NOT a superclass of PreservedSpecimen and Observation.
The second course of action would be the easiest from the standpoint of making a change to the standard. However, it might be the worst from an implementation standpoint because of the thousands (millions?) of specimen records that are typed as Occurrence.
If we can clarify these two uses of Occurrence, then the terms currently listed in DwC under the dwc:Occurrence class could be separated among the two "kinds" of Occurrence. Terms related to the recording of the presence of an organism at a time and place (dwc:recordedBy, dwc:behavior, etc.) would be separated from terms related to the specimens themselves (dwc:preparations, dwc:disposition, etc.). This may not seem like a big deal for flat specimen records, but it would be very helpful from the standpoint of advancing the use of DwC in RDF to clarify the types of resources that these terms can serve as properties of.
I would be interested in hearing some discussion about concrete steps that could be taken to "fix" Occurrence. The "best" solution would probably be to create a robust consensus ontology that includes Occurrence. However, that is not likely to happen on the timescale of a year or less. Given that this issue has dragged on for at least two years already, in the interest of moving forward it would be good to take some kind of decisive action in the near term.
Steve
-------- Original Message -------- Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples Date: Wed, 29 May 2013 16:00:35 +0200 From: John Deck jdeck@berkeley.edu To: Richard Pyle deepreef@bishopmuseum.org CC: Markus Döring m.doering@mac.com, Steve Baskauf steve.baskauf@vanderbilt.edu, TDWG Content Mailing List tdwg-content@lists.tdwg.org, Robert Whitton whittonr@gmail.com, "Ramona Walls" rlwalls2008@gmail.com References:
Since the original proposal was from a group of folks, we decided to put our heads together to construct a general response to the various issues and ideas expressed on this thread.
John Deck for Rob Guralnick, Ramona Walls, and John Wieczorek
...
How is MaterialSample different from Individual? The intent of individualID is fairly clear: since an Occurrence represents an organism at a place and time (per Markus’ email), the individualID term allows us to assign an instance identifier for a particular organism that can be present in multiple events. MaterialSampleID, on the other hand, is intended to allow users to say that the basis of an occurence is a material entity (i.e. matter) that has been sampled according to some particular method. Whether or not
...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
Oh, I get confused about which messages are coming from tdwg-content and which are coming from the RDF task group list. Since this message went out on the tdwg-content list, I should clarify that the document Joel cited below is a draft of the Darwin Core RDF Guide that is under review by the RDF Task Group until the end of June. If it can be edited into a form that the Task Group can recommend, it will advance to an official public review in accordance with the Darwin Core namespace policy. At that point, the modified draft will be announced to the tdwg-content list (this list) for an official comment period. It's not a problem if people want to look at the document as it stands now, but they should realize that it is a draft under review and not (yet) a submission for public comment.
Steve
joel sachs wrote:
In fact, I've been meaning to ask you something ... The DwC RDF guide (correctly, IMO) specifies that occurrences should be rdf:typed using the dwctype:Occurrence class [1]. But Darwin-SW uses the dwc:Occurrence class as the rdf:type of occurrence records. I also
...
https://docs.google.com/document/d/1OLyVFuveGX1a0Yt6Niok9FfGxkrghii-xlGYhwO8...
Section 2.3.1.5
participants (2)
-
joel sachs
-
Steve Baskauf