http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined as "An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences represent the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach to indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach in /Biodiversity Informatics/ 7:17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664). You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
This discussion about associated occurrences and identity reminds me that I really could use an advisor on my open source project for a common mark-recapture database:
http://www.ecoceanusa.org/shepherd/doku.php
As much as possible, I am trying to map database fields and data management functions to DarwinCore terms and standards.
Is there anyone who would be willing to act as an "advisor" on the project?
Thanks in advance, Jason Holmberg ECOCEAN USA
On Mon, Aug 23, 2010 at 12:07 PM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined as "An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences represent the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach to indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach in *Biodiversity Informatics* 7:17-44 ( https://journals.ku.edu/index.php/jbi/article/view/3664). You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion seehttp://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Good idea, but it suffers from the same fate as might associatedOccurrences (not previously mentioned because I was after some clarification in principle, with the herbarium duplicate sheets only one current case of interest): I need to follow whatever the community practice is of regarding a sheet as part of a duplicate set distributed by the original collector. I'm told by the people at the Harvard University Herbaria that "duplicate" usually, but not always, means from the same organism and same collection event---occasionally people used to put several organisms on the same sheet, raising the possibility that they are not even the same taxon. Worse, the different parts of the same organism might be catalogued as separate specimens. In this case, an assertion that they are from the same individual might be true and understandable, but the utility of that assertion depends on your purpose. Consider a use case in which one set of traditional duplicates all have a determination that is out of date, but another specimen---say your acorn collected later from the same tree---has a current determination. For purposes of notifying duplicate holders that a new determination has been made to the original, the later acorn may not be interesting. This means that for this use, a distributed query of the form "find all records with the same dwc:individualID" is not as useful as "find all records with the same dwc:eventID".
Also, as Mark writes, it doesn't address any other associatedOccurrences.
More generally, we are working on annotations of data records. Probably what the real issue here is that associatedOccurrences is an assertion about organisms, and we are making assertions about occurrence data.
On Mon, Aug 23, 2010 at 3:07 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined as "An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences represent the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach to indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach in Biodiversity Informatics 7:17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664).%C2%A0 You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Several points: 1. The definition of dwc:individualID is an identifier for "an individual or named group of individual organisms...". Thus by this definition a small population of organisms of the same species found together can be considered an individual even if they are not biological individuals. The definition of individual is a functional one that facilitates the common circumstance you describe where a collector collects "duplicates" that aren't really duplicate individuals but are considered to reliably be from the same species because the members of the population are in close proximity and (hopefully) the collector took care to collect only biological individuals of the same species. 2. If it is discovered that a sheet has individuals of two species, a curator would undoubtedly create two records for the two species on the sheet. I can't imagine the curator would try to somehow cram the metadata for the specimens of both species into the same record unless the curator had some kind of special way of dealing with "specimens" that were actually themselves collections of species. The fact that the occurrences described in the two records were on the same piece of paper would not really be relevant other than that they would have the same collection date, collector, etc. in the same way as would two specimens of different species collected on the same day by the same collector placed on two different papers. 3. I do not see why cataloging the two different parts of the same organism as separate specimens would be any problem at all. I've collected from the same tree in different seasons to get leafy stems and then winter twigs. They are two dwc:Occurrences, each with its own set of metadata properties, and its own identifier that happen to be from the same individual. People who do mark/recapture or repeated observations of live organisms have multiple Occurrences from the same individual all the time. 4. The issue that you bring up about the problem of different determinations based on different specimens is not a problem if the determinations are associated with the individuals rather than with the Occurrences (i.e. specimens) as I suggest in the paper (see Fig. 8 on p.30). If the assertion is made that the individual from which the first specimen is derived is owl:sameAs the individual from which the second specimen is derived, then a client capable of drawing inferences from RDF (e.g. a 'bot gleaning biodiversity metadata from many institutions to build a large database) will apply determinations linked to either of the individuals to both of them (since they are known to be the same entity). All of the determinations applied to the (now singular) individual will be associated with both specimens through the relationship of each specimen to its source individual (i.e. its dwc:individualID property). Users of the resulting database would be able to examine for either specimen the entire set of determinations associated with the (formerly separate) individuals. They could assess the temporal order of the values of the determination's dwc:dateIdentified properties (i.e. which is most "current") and assess the weight to place on a particular determination based on the identity of the determiner (dwc:identifiedBy property). You say that the holders of the original duplicates would not be interested in the determination based on the acorn. Really? If the specimens and the acorn came from the same tree, the curators certainly ought to be interested because the tree couldn't simultaneously BE two different species (even if it had two determinations)!
This suggested solution isn't intended to address all kinds of associations, just the case of duplicates. Some one else can take that one on... :-) Steve
Bob Morris wrote:
Good idea, but it suffers from the same fate as might associatedOccurrences (not previously mentioned because I was after some clarification in principle, with the herbarium duplicate sheets only one current case of interest): I need to follow whatever the community practice is of regarding a sheet as part of a duplicate set distributed by the original collector. I'm told by the people at the Harvard University Herbaria that "duplicate" usually, but not always, means from the same organism and same collection event---occasionally people used to put several organisms on the same sheet, raising the possibility that they are not even the same taxon. Worse, the different parts of the same organism might be catalogued as separate specimens. In this case, an assertion that they are from the same individual might be true and understandable, but the utility of that assertion depends on your purpose. Consider a use case in which one set of traditional duplicates all have a determination that is out of date, but another specimen---say your acorn collected later from the same tree---has a current determination. For purposes of notifying duplicate holders that a new determination has been made to the original, the later acorn may not be interesting. This means that for this use, a distributed query of the form "find all records with the same dwc:individualID" is not as useful as "find all records with the same dwc:eventID".
Also, as Mark writes, it doesn't address any other associatedOccurrences.
More generally, we are working on annotations of data records. Probably what the real issue here is that associatedOccurrences is an assertion about organisms, and we are making assertions about occurrence data.
On Mon, Aug 23, 2010 at 3:07 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined as "An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences represent the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach to indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach in Biodiversity Informatics 7:17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664). You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Note that associatedOccurrences is one of the several terms that are meant to allow lists of relationships between resources to be captured in a single field. Others include associatedMedia, associatedReferences, associatedSequnces, and associatedTaxa. The main purposes of these fields is to provide a mechanism to share relationship information in a flat application profile such as the Simple Darwin Core ( http://rs.tdwg.org/dwc/terms/simple/index.htm). If an application profile isn't constrained by being flat, then there is a much more robust way to capture relationships, using the ResourceRelationship class and it's constituent terms (http://rs.tdwg.org/dwc/terms/index.htm#relindex).
On Mon, Aug 23, 2010 at 6:39 PM, Bob Morris morris.bob@gmail.com wrote:
Good idea, but it suffers from the same fate as might associatedOccurrences (not previously mentioned because I was after some clarification in principle, with the herbarium duplicate sheets only one current case of interest): I need to follow whatever the community practice is of regarding a sheet as part of a duplicate set distributed by the original collector. I'm told by the people at the Harvard University Herbaria that "duplicate" usually, but not always, means from the same organism and same collection event---occasionally people used to put several organisms on the same sheet, raising the possibility that they are not even the same taxon. Worse, the different parts of the same organism might be catalogued as separate specimens. In this case, an assertion that they are from the same individual might be true and understandable, but the utility of that assertion depends on your purpose. Consider a use case in which one set of traditional duplicates all have a determination that is out of date, but another specimen---say your acorn collected later from the same tree---has a current determination. For purposes of notifying duplicate holders that a new determination has been made to the original, the later acorn may not be interesting. This means that for this use, a distributed query of the form "find all records with the same dwc:individualID" is not as useful as "find all records with the same dwc:eventID".
Also, as Mark writes, it doesn't address any other associatedOccurrences.
More generally, we are working on annotations of data records. Probably what the real issue here is that associatedOccurrences is an assertion about organisms, and we are making assertions about occurrence data.
On Mon, Aug 23, 2010 at 3:07 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined
as
"An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences
represent
the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach
to
indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach
in
Biodiversity Informatics 7:17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664). You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
As I was stuck in traffic this morning I was thinking about my response to Bob's comments. In retrospect, I should have simply said that indicating that specimens are duplicates by assigning their dwc:individualID property to the same URI is really not just one option, but rather that it is the semantically correct thing to do.
Assume that we are assembling a database of RDF triples about taxonomic names and their authors. We discover URI#1 whose metadata asserts that a foaf:Person has the rdfs:label "L.". If we know of URI#2 whose metadata asserts that a foaf:Person has the rdfs:label "Carl Linnaeus", it would be correct to assert that URI#2 is owl:sameAs URI#1. Anyone who was aware of this assertion through knowledge of our database and who trusted the veracity of the assertion would then know that both URIs referred to the same person because the labels "L." and "Carl Linnaeus" actually refer to the same person.
In contrast, assume we are assembling a database about specimens at various institutions. We discover URI#3 for a dwc:Occurrence of dwc:basisOfRecord="PreservedSpecimen". We realize by some means that this specimen is a "duplicate" of a dwc:Occurrence of dwc:basisOfRecord="PreservedSpecimen" having URI#4 and located in another institution. Despite the colloquial use of the word "duplicate", it would not be correct to assert that URI#4 is owl:sameAs URI#3 because the resources represented by those two URIs are NOT the same thing. They are different pieces of dead tissue in different jars or pasted to different pieces of paper. If we think of what curators mean by "duplicate" it has exactly the meaning that both duplicates were collected from either the same individual organism (in the case of a large organism like a tree) or from the same small population of organisms (all members of the same species) such as a clump of grass, ants in the same colony, etc. Given the definition of dwc:individualID as referring to a "an individual or named group of individual organisms", assigning the two Occurrences the same value for their dwc:individualID property semantically describes "duplicates" exactly. Curators might not like this way of describing "duplicates" because it's not the way they are used to talking about them, but in the Linked Data world it is our job to correctly describe relationships using existing predicates. We don't make up a new term if there is already one available that will do the job.
This relationship may seem more apparent if one considers an example that involves dwc:Occurrences of a different type. I enjoyed looking at http://www.whaleshark.org/ yesterday. In this library, users report dwc:Occurrences of the proposed type dwc:basisOfRecord="DigitalStillImage". The database assigns identifiers (not URIs but they could be someday) to individual whale sharks and associates the dwc:Occurrences with the Individuals (i.e. the equivalent of providing a value for dwc:individualID for the dwc:Occurrence). By using pattern recognition software, the project matches spot patterns and with luck can reach the conclusion that an Individual represented in a particular dwc:Occurrence is the same as the Individual documented by another dwc:Occurrence. It is abundantly clear in this circumstance that the correct thing to do is to assert that the Individual represented in the first dwc:Occurrence is owl:sameAs the the Individual represented in the second dwc:Occurrence if the Individuals had previously been assigned different identifiers, or just to assign the second dwc:Occurrence the same value for dwc:individualID as the first dwc:Occurrence if a second identifier hadn't already been assigned to the Individual.
The point here is that from a semantic point of view, there is no difference in what is being done in the case of linking duplicate specimens in different herbaria and in linking images that were taken of the same whale shark. In both cases, two dwc:Occurrences are related in a certain way because they have the same value for dwc:individualID. Multiple observations/mark recapture might be used to establish where individuals move or how they behave, recognition of duplicate specimens might be used to update identifications, track relationships among herbaria, or anything we want. We do not have to imply some particular fitness of use when we assert that relationship and we should not invent terms for a particular fitness of use when we have generic terms that already describe the relationship.
Thus I say that it is wrong to invent some other term to represent a relationship that can be clearly and unambiguously expressed using existing terms. One of the beauties of the Darwin Core standard is that it simplifies the vocabulary needed to express equivalent relationships by having a generic class (dwc:Occurrence) that can represent kinds of things that are distinguished by typing them with values for dwc:basisOfRecord such as PreservedSpecimen, HumanObservation, or DigitalStillImage. So lets not move backwards by proposing to invent some new terms that will only apply to herbarium specimens.
Steve
Bob Morris wrote:
Good idea, but it suffers from the same fate as might associatedOccurrences (not previously mentioned because I was after some clarification in principle, with the herbarium duplicate sheets only one current case of interest): I need to follow whatever the community practice is of regarding a sheet as part of a duplicate set distributed by the original collector. I'm told by the people at the Harvard University Herbaria that "duplicate" usually, but not always, means from the same organism and same collection event---occasionally people used to put several organisms on the same sheet, raising the possibility that they are not even the same taxon. Worse, the different parts of the same organism might be catalogued as separate specimens. In this case, an assertion that they are from the same individual might be true and understandable, but the utility of that assertion depends on your purpose. Consider a use case in which one set of traditional duplicates all have a determination that is out of date, but another specimen---say your acorn collected later from the same tree---has a current determination. For purposes of notifying duplicate holders that a new determination has been made to the original, the later acorn may not be interesting. This means that for this use, a distributed query of the form "find all records with the same dwc:individualID" is not as useful as "find all records with the same dwc:eventID".
Also, as Mark writes, it doesn't address any other associatedOccurrences.
More generally, we are working on annotations of data records. Probably what the real issue here is that associatedOccurrences is an assertion about organisms, and we are making assertions about occurrence data.
On Mon, Aug 23, 2010 at 3:07 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Bob, It seems to me that the most semantically clear way to indicate in a machine-readable way that two herbarium sheets are duplicates would be to assert that they have the same dwc:individualID. individualID is defined as "An identifier for an individual or named group of individual organisms represented in the Occurrence" so asserting that two occurrences represent the same individual or named group of individual organisms pretty much exactly describes what duplicate specimens are. I use this same approach to indicate that http://bioimages.vanderbilt.edu/baskauf/67307 is an image of an acorn from the same tree: http://bioimages.vanderbilt.edu/ind-baskauf/67304 as the bark image http://bioimages.vanderbilt.edu/baskauf/67312 I won't say more here as I have written more extensively on this approach in Biodiversity Informatics 7:17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664). You can also look at the RDF associated with those GUIDs to see what I mean. Solving this problem is also one of the reasons I have proposed adding the class Individual to DwC (i.e. so that the individuals that are the object of dwc:individualID can be rdfs:type'd using a well-known vocabulary and therefore be "understood" by linked data clients).
Steve
Bob Morris wrote:
http://rs.tdwg.org/dwc/terms/index.htm#associatedOccurrences carries this description:
associatedOccurrences Identifier: http://rs.tdwg.org/dwc/terms/associatedOccurrences Class: http://rs.tdwg.org/dwc/terms/Occurrence Definition: A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence. Comment: Example: "sibling of FMNH:Mammal:1234; sibling of FMNH:Mammal:1235". For discussion see http://code.google.com/p/darwincore/wiki/Occurrence Details: associatedOccurrences
My questions: a. Are the names of the associations, and/or the syntax of the value meant to be community defined? b. If no to a. , where are those definitions? If yes, Have any communities defined any names and syntax? I am especially interested in "duplicate of" in the case of herbarium sheets." c. (May share an answer with b.) Is there any use being made by anyone in which associatedOccurrences is designed to have machine-readable values. If yes, where?
Thanks Bob
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
participants (4)
-
Bob Morris
-
Jason Holmberg
-
John Wieczorek
-
Steve Baskauf