Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode + DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceName + DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John
As one of the primary brawlers on this topic, I've already said enough about it, so I will restrain myself and just say that I fully support the proposal.
Well, mostly restrain myself... I will make one comment about what John said below. Although it is true that a CollectionObject (or "evidence") would probably need to have been derived from an organism to be relevant in the Darwin Core context, there is no reason why a CollectionObject cannot simultaneously serve as evidence that the Organism existed, that an Occurrence occurred, and as support for an Identification. Particularly in the case of specimens, it is likely that the CollectionObject will usually serve all three purposes at once. A CollectionObject could actually serve as "evidence" for anything you want. To some extent, that's one of the reasons for decoupling PreservedSpecimen from Occurrence.
For more pontification on this subject, I will refer to http://code.google.com/p/darwin-sw/wiki/ClassToken (where Token is equivalent to what John is calling CollectionObject). The first figure on the page illustrates diagramatically what I said in the paragraph above.
Steve
John Wieczorek wrote:
and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea
I am fine with the idea that the CollectionObject can be evidence for anything, I may have been projecting or misinterpreting what others said. Can you recommend a better definition than the one I provided?
On Wed, Sep 7, 2011 at 8:54 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
As one of the primary brawlers on this topic, I've already said enough about it, so I will restrain myself and just say that I fully support the proposal.
Well, mostly restrain myself... I will make one comment about what John said below. Although it is true that a CollectionObject (or "evidence") would probably need to have been derived from an organism to be relevant in the Darwin Core context, there is no reason why a CollectionObject cannot simultaneously serve as evidence that the Organism existed, that an Occurrence occurred, and as support for an Identification. Particularly in the case of specimens, it is likely that the CollectionObject will usually serve all three purposes at once. A CollectionObject could actually serve as "evidence" for anything you want. To some extent, that's one of the reasons for decoupling PreservedSpecimen from Occurrence.
For more pontification on this subject, I will refer to http://code.google.com/p/darwin-sw/wiki/ClassToken (where Token is equivalent to what John is calling CollectionObject). The first figure on the page illustrates diagramatically what I said in the paragraph above.
Steve
John Wieczorek wrote:
and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
comments inline
John Wieczorek wrote:
I am fine with the idea that the CollectionObject can be evidence for anything, I may have been projecting or misinterpreting what others said. Can you recommend a better definition than the one I provided?
I don't think that the definition of CollectionObject which you gave needs to be changed. Being derived from an Organism is the minimal requirement for a collection object. If it can do other things (be evidence for an Occurrence, support an Identification, or anything else) that's great but not required. If it has a collection date, location of collection, or both, which CollectionObjects usually do, then it would be evidence for an Occurrence as well as evidence that the organism existed.
Richard Pyle wrote:
One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens). Does that mean that data providers will need to generate two separate Ids (one organismID and one collectionObjectID) to represent all of these specimens?
Shockingly, I think I agree with everything Rich wrote in response to Gregor's email. As far as this question is concerned, I would say that whether one generates a single ID or two IDs is up to the user. Assuming that one has GUIDs of some form (LSID, HTTP URI, or whatever), then if one wishes to consider the dead fish in the jar both the Organism and the CollectionObject, then use a single GUID for both organismID and collectionObjectID. If one prefers to think of the fish as a Platonic ideal of Organism in all of its incarnations (living and dead) but the fish in a jar as the CollectionObject, then give them separate GUIDs. That really is a data management decision by the GUID creator. I was hung up on this issue for a very long time, but after mulling it over I realized that I was only hung up about it because I assumed that all classes in DwC had to be disjoint (sensu OWL). They almost always are, but in this case I don't see any reason why they would have to be. DwC simply uses classes to categorize things and to suggest the types of terms that one might use to describe instances of those classes, but otherwise stays out of our personal data management lives.
Richard Pyle wrote:
I think I might agree with this, but I want to ask a simple question:
To what objects would an Identification instance apply? In other words, an Identification instance represents a link between an instance of Taxon to an instance of [XXXXXXX].
In my mind, this should always be "Organism". To me, neither an Occurrence
Totally. Is this the first time I've agreed with everything Rich has said? :-) Steve
DwC simply uses classes to categorize things and to suggest the types of terms that one might use to describe instances of those classes, but otherwise stays out of our
personal
data management lives.
Yes, I agree with you on that -- but it may get confusing trying to share data when some providers brand their "things" as instances of CollectionObjects, and some brand them as instances of Organism. Maybe some "best practices" guidelines could help. Or, maybe I'm worried about something that may not end up representing a real problem. I guess time will tell.
In my mind, this should always be "Organism".
Totally. Is this the first time I've agreed with everything Rich has
said? :-)
If so, then the credit is all yours. Your very thoughtfully-worded posts during this long debate (all of which I have read, and forced myself to understand) have re-shaped my own view of how this information ought to be represented. A major breakthrough for me was the realization that "Organism" and "CollectionObject" were not necessarily the same things (again, thanks to you for allowing me to get my head around that). Once I got there, the other stuff (e.g., organisms as homogeneous taxa) all sort of fell into place.
This has been a long, but (I think) very productive discussion. My only hope is that it will actually work, and be practical, in the real world of biodiversity information exchange.
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
Aloha, Rich
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [mailto:dendresen@gbif.org] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well, I would respectfully disagree that we are building an ad hoc model here. We are actually at the end of a rather lengthy process of trying to develop a consensus of what exactly an Occurrence is.
Since at least October of 2009, there has been discussion on the tdwg-content list about the meaning of Occurrence. I realize that because of the large number of posts, not everyone had the time to keep up with that conversation and parallel discussions on the topic of Organisms/Individuals, and CollectionObjects/Samples/Tokens. The suggestion was made that someone take the time to summarize these discussions for those who didn't have the time to keep up with them as they were happening. I have expended a rather large amount of time to do exactly that: you can find a somewhat chronological summary at http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary and topical summaries at http://code.google.com/p/darwin-sw/wiki/ClassOccurrence , http://code.google.com/p/darwin-sw/wiki/ClassIndividual which have links to the many of the individual posts that were made on those topics.
What emerged from these discussions was what appeared (to me at least) to be a consensus about what an Occurrence was. I will refer you to the http://code.google.com/p/darwin-sw/wiki/ClassOccurrence page for some of the definitions that were suggested. This consensus made the distinction between the record that an organism was present at a particular time and place, and the evidence (if any) that was used to document the Occurrence. The proposal that John made is a reflection of the apparent consensus that came out of that discussion.
The Darwin Core standard has a process in place for making additions and changes to the terms in its vocabulary. That process involves discussion, consensus-building, and the defining and adoption of new terms (or modification of the definitions of existing terms) when they are needed by a significant portion of the TDWG community. That process has taken place in this instance and I believe that John is right to "call the question" on these new term proposals. TDWG has a reputation as an organization where people talk endlessly and nothing ever really gets accomplished. We have an opportunity here prove that reputation wrong. If we simply start asking the same questions (which have already been discussed ad nauseum) over again without making an actual decision on the proposal, then what we are doing really IS a waste of time. I, for one, have no interest in spending any more time on this issue. So I would recommend that people who want to comment about the proposed definitions of Occurrence, Organism, and CollectionObject review the discussion summaries that I've noted above before restarting conversations that have already pretty much been run into the ground.
I would also respectfully disagree that through these proposals we are building a complex model by adding terms for CollectionObject and Organism. The proposals are ONLY for adding terms. Nothing in those proposals models how the new classes are related to any other existing classes in Darwin Core in a formal way (e.g. through OWL or RDFS). There have been repeated calls for further discussion that would build a consensus about a more complex model, possibly built on top of a foundation based on Darwin Core classes and property terms. As a consequence, we are attempting to charter a group to discuss more complex RDF models that can be used by those who need them (see http://code.google.com/p/tdwg-rdf/wiki/CharterOfIG). As you suggest, the core members of the proposed group includes representation from the Observations community as well as other constituencies within TDWG. But that discussion is really just starting.
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [mailto:dendresen@gbif.org] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
Markus, Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".
There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain. So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.
There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...
Steve
Markus Döring (GBIF) wrote:
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
.
Comments in line.
On Tue, Sep 13, 2011 at 10:39 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Markus, Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".
There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain.
That is correct. dwcattributes:organizedInClass is something like a label to suggest the class under which to list the term. It is a convenience, and does not assert an ontological relationship between the property and the class.
So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.
It wouldn't break existing schemas, but they would be incomplete if the changes were not incorporated. So, it will require adjustments to the reference schemas (http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd, http://rs.tdwg.org/dwc/xsd/tdwg_dwc_class_terms.xsd, and http://rs.tdwg.org/dwc/xsd/tdwg_dwcterms.xsd) to allow the use of the new classes. It will affect Simple Darwin Core as text only insofar as terms are added, removed, or their names changed, unless a new type of core record is deemed necessary. We would probably also want to add or add to the examples throughout the documentation.
There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...
That document dwctermshistory.rdf contains the entire history of Darwin Core terms as far back as I could go. It contains all versions of terms, identifying them (for example, rdf:Description rdf:about="acceptedNameUsage-2009-09-21"), and relating them to each other (for example, rdfs:replaces rdf:resource="http://rs.tdwg.org/dwc/terms/acceptedScientificName-2009-07-06"/).
The document dwcterms.rdf contains only the terms for which the status is recommended, without the history of relationship. So, if someone wanted to reason across versions of terms (to gain a backward compatibility), he or she would use the full history, not the conveniently simplified dwcterms.rdf. In that sense, dwctermshistory.rdf is the normative document for the terms.
Steve
Markus Döring (GBIF) wrote:
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks everyone, Im not so much concerned about producing dwc in whatever form. Adapting the schemas or IPT "extension" definitions is straight forward and as John says we could even add support for all 3 classes as core records in the IPT easily.
My concern is rather on the consumers side of things. When I get a record in dwca with a rowType of dwc:Occurrence, I currently treat it as if its an observation, a specimen or anything that we used to accept as an occurrence. With the change I should be able to say this is *not* a collection object or organism, but that is sth I can't say for sure as I don't know which version of dwc this records adhere to. Is this a no brainer and doesn't matter in practice? Or does the problem lie rather in the implementation technology and we should do versioning of our "schemas" and transmit them with records, but not the dwc namespace? At first glance that actually sounds like a good way to go. The dwc xml schemas would have to have a new attribute with a default value in the root element in that case though - sth agreeable?
Markus
On Sep 13, 2011, at 7:39 PM, Steve Baskauf wrote:
Markus, Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".
There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain. So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.
There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...
Steve
Markus Döring (GBIF) wrote:
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [
mailto:dendresen@gbif.org
] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc:
tdwg-content@lists.tdwg.org
; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
> I'm also wondering if we necessarily need to "break" the > traditional view of > the "Occurrence" class in order to implement Organism and > CollectionObject. > As long as we keep in mind that DwC is a vocabulary of terms > focused on > representing an exchange standard (rather than a full-blown > Ontology), > perhaps Occurrence records can continue to be represented in the > traditional > way as "flat" content, but the Organism and CollectionObject > classes allow > us to present data in a somewhat more "normalized" way in those > circumstances that call for it (e.g. tracking individuals or groups > over > time [Organism], or managing fossil rocks with multiple taxa > [CollectionObject] -- to name just two). > > > > > I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
On Wed, Sep 14, 2011 at 12:34 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Thanks everyone, Im not so much concerned about producing dwc in whatever form. Adapting the schemas or IPT "extension" definitions is straight forward and as John says we could even add support for all 3 classes as core records in the IPT easily.
My concern is rather on the consumers side of things. When I get a record in dwca with a rowType of dwc:Occurrence, I currently treat it as if its an observation, a specimen or anything that we used to accept as an occurrence. With the change I should be able to say this is *not* a collection object or organism, but that is sth I can't say for sure as I don't know which version of dwc this records adhere to. Is this a no brainer and doesn't matter in practice?
I think it matters, and is not a no-brainer, and the solution depends on the implementation. For Simple Darwin Core the distinction is indeed simple, and wouldn't change anything, because the information contained in a record is defined by basisOfRecord. An old-style Occurrence that was a PreservedSpecimen will be a new-style CollectionObject that is a PreservedSpecimen. There will be no difference except to those who care that PreservedSpecimen will not longer refine Occurrence, rather, it will refine CollectionObject. Existing Simple Darwin Core observation records would have to change a only two fields (individualID to organismID and occurrenceRemarks to organismRemarks under some circumstances), if they were already in use. Existing Simple Darwin Core records of things that fall into the new CollectionObject category would have to change up to four fields; 1) individualID to organismID, 2) occurrenceRemarks to organismRemarks under some circumstances, 3) associatedOccurrences to associatedOrganisms in some cases, and 4) associatedOccurrences to associatedCollectionObjects in some cases) if they were already in use.
Or does the problem lie rather in the implementation technology and we should do versioning of our "schemas" and transmit them with records, but not the dwc namespace? At first glance that actually sounds like a good way to go. The dwc xml schemas would have to have a new attribute with a default value in the root element in that case though - sth agreeable?
I think XML schemas will have to be versioned unless we can make them completely backward compatible. I don't like the prospects of maintenance for the latter option. I understand what you mean by adding an attribute for version in the schemas, but what did you mean by "but not the dwc namespace?"
I'm curious. Are any of the Darwin Core XML schemas in use beyond the Apiary and the GermPlasm extensions?
Markus
On Sep 13, 2011, at 7:39 PM, Steve Baskauf wrote:
Markus, Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".
There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain. So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.
There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...
Steve
Markus Döring (GBIF) wrote:
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [
mailto:dendresen@gbif.org
] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc:
tdwg-content@lists.tdwg.org
; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
> Richard Pyle wrote: > > > > >> I'm also wondering if we necessarily need to "break" the >> traditional view of >> the "Occurrence" class in order to implement Organism and >> CollectionObject. >> As long as we keep in mind that DwC is a vocabulary of terms >> focused on >> representing an exchange standard (rather than a full-blown >> Ontology), >> perhaps Occurrence records can continue to be represented in the >> traditional >> way as "flat" content, but the Organism and CollectionObject >> classes allow >> us to present data in a somewhat more "normalized" way in those >> circumstances that call for it (e.g. tracking individuals or groups >> over >> time [Organism], or managing fossil rocks with multiple taxa >> [CollectionObject] -- to name just two). >> >> >> >> >> > I've been thinking about this issue of "backward compatibility" with > respect to Occurrences if the CollectionObject/Sample/Token/whatever > class is adopted. I really don't think it is going to be as big of > a > deal as people are making it out to be. > > It seems to me that the main problems arise in two areas: when one > wants > to be clear about typing and when one wants to express relationships > in > a system where it is possible to do through semantics (like RDF). > In > that kind of circumstance, it's bad (oh yeh, I forgot - the term is > "naughty") to say something like > resourceA hasOccurrence resourceB > when resourceB isn't actually an Occurrence. "Wrong" typing also > happens all the time because the classes don't exist (yet) to do the > typing correctly. As a case in point, in the Morphbank system, I > have > multiple images of the same tree. In that system the tree is typed > as a > "specimen". That is totally wrong because the tree isn't a > specimen, > but what else is it going to be typed as? There isn't (yet) an > appropriate class to put it in. > > Although these two problems (wrong typing and using a term with the > wrong kind of object which are actually different manifestations of > the > same class-based problem) are naughty, realistically very few people > are > actually using a system that is "semantic-aware" (e.g. serving and > consuming RDF) so right now making those mistakes doesn't really > "break" > anything. Most data providers are using traditional databases or > even > Excel spreadsheets where the DwC terms are just column headings with > no > real "meaning" other than what the data managers intend for them to > mean. So if a manager has a table where each line contains a record > for > a specimen and has a column heading for a column entitled > "dwc:catalogNumber", there isn't really anything other than an idea > in > the manager's head that the catalogNumber is a property of a > specimen or > Occurence or CollectionObject. If each line in the database table > is > "flat" such that one specimen=one CollectionObject=one Occurrence, > all > that is required to make catalogNumber be a property of a > CollectionObject instead of an Occurrence is a different way of > thinking > in the managers mind because there are really no semantics embedded > in > the table. We are already doing this kind of mental gymnastics with > existing classes like dwc:Identification . If our hypothetical > database > manager has a column heading that says "dwc:identifiedBy" in the > specimen table, that is really a property of dwc:Identification, not > dwc:Occurrence but again that is a distinction that is only going to > be > made in the manager's mind. Making the distinction really only > becomes > an issue when the database stops being "flat" for a particular > relationship, e.g. if the database wants to allow multiple > Identifications per specimen record. Then the database structure > must > be changed accordingly to accommodate that "normalization". > > What we have here at the present moment is a situation where data > providers don't have any way to have anything but "flat" records > where 1 > specimen=1 Occurrence=1 Organism. By adding the Organism and > CollectionObject classes, we allow people who need or want to have > less > "flat" (=more "normalized") databases to have something to call the > entities that are represented by the new tables they create to > handle > 1:many relationships instead of 1:1 relationships. Anybody who only > cares about 1:1 relationships really doesn't need to worry about the > fact that the new class exists, just as people currently don't have > to > worry about the Identification class if they only allow one > Identification per specimen in their database. > > So I guess what I'm saying is that if a database manager has a table > labeled Occurrence, they really don't have to freak out if we now > tell > them that their table actually should be labeled CollectionObject as > long as there is only one CollectionObject per Occurrence. They > didn't > freak out before when we told them that they should call their table > "Occurrence" instead of "Observation" or "Specimen" in 2009, did > they? > > I think what I'm saying here is what Rich was trying to say in the > paragraph I quoted, but I'm not sure. > > Steve > > -- > Steven J. Baskauf, Ph.D., Senior Lecturer > Vanderbilt University Dept. of Biological Sciences > > postal mail address: > VU Station B 351634 > Nashville, TN 37235-1634, U.S.A. > > delivery address: > 2125 Stevenson Center > 1161 21st Ave., S. > Nashville, TN 37235 > > office: 2128 Stevenson Center > phone: (615) 343-4582, fax: (615) 343-6707 > > > http://bioimages.vanderbilt.edu > > > > > _______________________________________________ > tdwg-content mailing list > > > tdwg-content@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-content > > > > > >
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks everyone, Im not so much concerned about producing dwc in whatever form. Adapting the schemas or IPT "extension" definitions is straight forward and as John says we could even add support for all 3 classes as core records in the IPT easily.
My concern is rather on the consumers side of things. When I get a record in dwca with a rowType of dwc:Occurrence, I currently treat it as if its an observation, a specimen or anything that we used to accept as an occurrence. With the change I should be able to say this is *not* a collection object or organism, but that is sth I can't say for sure as I don't know which version of dwc this records adhere to. Is this a no brainer and doesn't matter in practice?
I think it matters, and is not a no-brainer, and the solution depends on the implementation. For Simple Darwin Core the distinction is indeed simple, and wouldn't change anything, because the information contained in a record is defined by basisOfRecord. An old-style Occurrence that was a PreservedSpecimen will be a new-style CollectionObject that is a PreservedSpecimen. There will be no difference except to those who care that PreservedSpecimen will not longer refine Occurrence, rather, it will refine CollectionObject. Existing Simple Darwin Core observation records would have to change a only two fields (individualID to organismID and occurrenceRemarks to organismRemarks under some circumstances), if they were already in use. Existing Simple Darwin Core records of things that fall into the new CollectionObject category would have to change up to four fields;
- individualID to organismID, 2) occurrenceRemarks to organismRemarks
under some circumstances, 3) associatedOccurrences to associatedOrganisms in some cases, and 4) associatedOccurrences to associatedCollectionObjects in some cases) if they were already in use.
These are still issues from the publishers point of view that Im not so much concerned about. But what about consumers receiving mixed versions with old and new records? There are certain term name changes that you mentioned, so a consumer must know about the historical terms too. But as basis of record for simple dwc did and still does define the true record "class" there doesn't seem to be a big change after all.
For dwc archives the row type - which is a dwc "class" term - is more crucial and consumers logic depends on it. But for GBIF at least I would think there is not much of a change as we - at least currently - treat all occurrence records the same way.
Or does the problem lie rather in the implementation technology and we should do versioning of our "schemas" and transmit them with records, but not the dwc namespace? At first glance that actually sounds like a good way to go. The dwc xml schemas would have to have a new attribute with a default value in the root element in that case though - sth agreeable?
I think XML schemas will have to be versioned unless we can make them completely backward compatible. I don't like the prospects of maintenance for the latter option. I understand what you mean by adding an attribute for version in the schemas, but what did you mean by "but not the dwc namespace?"
Versioning the namespace is sth people (also in tdwg) usually do for versioning standards. Dwc used to do this before, but there were reasons I can't remember why this was abandoned. Verisoning the xml schemas themselves using the schema attribute doesn't really help as we don't exchange the schema files, but the instances they define. So if the namespace always remains the same the another option would be to define an additional attribute/element that becomes part of every record instance. For dwc archives this element could be a new attribute of the meta.xml file, for simple xml for example a new version attribute in <SimpleDarwinRecordSet version="1.1">
Markus
I'm curious. Are any of the Darwin Core XML schemas in use beyond the Apiary and the GermPlasm extensions?
Markus
On Sep 13, 2011, at 7:39 PM, Steve Baskauf wrote:
Markus, Well, I don't know that I'd go so far as to say that it's a drastic change in semantics, at least in the formal semantics of the normative document (which I _think_ can be viewed at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf). That document says (in human terms) that Occurrence is an rdfs:Class, that its status is Recommended, and some bookkeeping stuff about versioning. The main change is in the rdfs:comment which presents the description "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." but that's really a human thing and as I've said, there has been quite a bit of misunderstanding about what the "human" definition means. As has been noted on this list before, DwC doesn't get into domain and range issues for its terms at all and usually doesn't get into subclassing, so there is very little in the normative document to be "broken" in terms of semantics. That's a rather different situation than changing an owl ontology class definition where relationships among classes and their properties are likely to be more complicated (e.g. disjoint classes, subclassing, range, domain) and therefore more easily "broken".
There is the issue that a number of property terms would have their dwcattributes:organizedInClass property changed from dwc:Occurrence to something else. But my understanding was that the organization of the property terms under the DwC classes was more of a suggestion as opposed to a declaration of domain. So it doesn't seem likely that the understanding of machines will be "broken" by this change since I don't think that much of any machine reasoning based on DwC is going on at the present. But this is getting beyond my area of expertise, so maybe others can clarify things on this point.
There is an RDF version of DwC viewable at http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... which actually has dated versions of the terms (e.g. Occurrence-2009-04-29). But I must confess, I don't understand how this document is related to the dwcterms.rdf document I mentioned above. Perhaps John can enlighten us...
Steve
Markus Döring (GBIF) wrote:
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still there is a drastic change in semantics of an existing term Occurrence and I would feel more comfortable if we can tell those different usages apart. If thats via a namespace based versioning of (all?) darwin core terms, through the use of a different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing instances?
Markus
With regards to Markus' concern about whether people will be able to know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really have a clear meaning. If you review the summary of the discussion on Occurrence, you can see that it was used to mean at least three different kinds of "things" by different people. What John is actually doing with his proposal is to add clarity about what an Occurrence is where it didn't exist before. I think that is a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent consensus) and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [
mailto:dendresen@gbif.org
] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc:
tdwg-content@lists.tdwg.org
; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism" - and that there can be a relationship between CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
> I have to say that the change in semantics to the Occurrence class > makes me a bit nervous. > Can someone try help fighting my fears? > > DarwinCore has no versioning of namespaces, so there is no way for a > consumer to detect if its an old style Occurrence or a new one. I am > currently parsing various RSS feeds and even though its a mess having > to parse 10 different styles I am glad that at least the designers > made sure they all have their own namespace! Also removing or > renaming > terms might cause serious problems. Would discrete versions of dwc > with their own namespace hurt? > > Another observation relates to dwc archives and its star schema. As > an index to data that has been flattened there is no problem with > more > classes and core row types, but if you want it as a way to transfer > complete normalized data it will not work. But that never really was > the intention and I simply wanted to stress that fact. > > Markus > > > > On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote: > > > > > >> Richard Pyle wrote: >> >> >> >> >>> I'm also wondering if we necessarily need to "break" the >>> traditional view of >>> the "Occurrence" class in order to implement Organism and >>> CollectionObject. >>> As long as we keep in mind that DwC is a vocabulary of terms >>> focused on >>> representing an exchange standard (rather than a full-blown >>> Ontology), >>> perhaps Occurrence records can continue to be represented in the >>> traditional >>> way as "flat" content, but the Organism and CollectionObject >>> classes allow >>> us to present data in a somewhat more "normalized" way in those >>> circumstances that call for it (e.g. tracking individuals or groups >>> over >>> time [Organism], or managing fossil rocks with multiple taxa >>> [CollectionObject] -- to name just two). >>> >>> >>> >>> >>> >> I've been thinking about this issue of "backward compatibility" with >> respect to Occurrences if the CollectionObject/Sample/Token/whatever >> class is adopted. I really don't think it is going to be as big of >> a >> deal as people are making it out to be. >> >> It seems to me that the main problems arise in two areas: when one >> wants >> to be clear about typing and when one wants to express relationships >> in >> a system where it is possible to do through semantics (like RDF). >> In >> that kind of circumstance, it's bad (oh yeh, I forgot - the term is >> "naughty") to say something like >> resourceA hasOccurrence resourceB >> when resourceB isn't actually an Occurrence. "Wrong" typing also >> happens all the time because the classes don't exist (yet) to do the >> typing correctly. As a case in point, in the Morphbank system, I >> have >> multiple images of the same tree. In that system the tree is typed >> as a >> "specimen". That is totally wrong because the tree isn't a >> specimen, >> but what else is it going to be typed as? There isn't (yet) an >> appropriate class to put it in. >> >> Although these two problems (wrong typing and using a term with the >> wrong kind of object which are actually different manifestations of >> the >> same class-based problem) are naughty, realistically very few people >> are >> actually using a system that is "semantic-aware" (e.g. serving and >> consuming RDF) so right now making those mistakes doesn't really >> "break" >> anything. Most data providers are using traditional databases or >> even >> Excel spreadsheets where the DwC terms are just column headings with >> no >> real "meaning" other than what the data managers intend for them to >> mean. So if a manager has a table where each line contains a record >> for >> a specimen and has a column heading for a column entitled >> "dwc:catalogNumber", there isn't really anything other than an idea >> in >> the manager's head that the catalogNumber is a property of a >> specimen or >> Occurence or CollectionObject. If each line in the database table >> is >> "flat" such that one specimen=one CollectionObject=one Occurrence, >> all >> that is required to make catalogNumber be a property of a >> CollectionObject instead of an Occurrence is a different way of >> thinking >> in the managers mind because there are really no semantics embedded >> in >> the table. We are already doing this kind of mental gymnastics with >> existing classes like dwc:Identification . If our hypothetical >> database >> manager has a column heading that says "dwc:identifiedBy" in the >> specimen table, that is really a property of dwc:Identification, not >> dwc:Occurrence but again that is a distinction that is only going to >> be >> made in the manager's mind. Making the distinction really only >> becomes >> an issue when the database stops being "flat" for a particular >> relationship, e.g. if the database wants to allow multiple >> Identifications per specimen record. Then the database structure >> must >> be changed accordingly to accommodate that "normalization". >> >> What we have here at the present moment is a situation where data >> providers don't have any way to have anything but "flat" records >> where 1 >> specimen=1 Occurrence=1 Organism. By adding the Organism and >> CollectionObject classes, we allow people who need or want to have >> less >> "flat" (=more "normalized") databases to have something to call the >> entities that are represented by the new tables they create to >> handle >> 1:many relationships instead of 1:1 relationships. Anybody who only >> cares about 1:1 relationships really doesn't need to worry about the >> fact that the new class exists, just as people currently don't have >> to >> worry about the Identification class if they only allow one >> Identification per specimen in their database. >> >> So I guess what I'm saying is that if a database manager has a table >> labeled Occurrence, they really don't have to freak out if we now >> tell >> them that their table actually should be labeled CollectionObject as >> long as there is only one CollectionObject per Occurrence. They >> didn't >> freak out before when we told them that they should call their table >> "Occurrence" instead of "Observation" or "Specimen" in 2009, did >> they? >> >> I think what I'm saying here is what Rich was trying to say in the >> paragraph I quoted, but I'm not sure. >> >> Steve >> >> -- >> Steven J. Baskauf, Ph.D., Senior Lecturer >> Vanderbilt University Dept. of Biological Sciences >> >> postal mail address: >> VU Station B 351634 >> Nashville, TN 37235-1634, U.S.A. >> >> delivery address: >> 2125 Stevenson Center >> 1161 21st Ave., S. >> Nashville, TN 37235 >> >> office: 2128 Stevenson Center >> phone: (615) 343-4582, fax: (615) 343-6707 >> >> >> http://bioimages.vanderbilt.edu >> >> >> >> >> _______________________________________________ >> tdwg-content mailing list >> >> >> tdwg-content@lists.tdwg.org >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> >> >> >> >>
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Ever since DwC transitioned from a "Federated Schema" to a "Vocabulary", I've never been entirely clear on what sorts of alterations would break backward-compatibility, and which are easily handled. I've heard various statements from people with much more understanding than I on the implications of a "Vocabulary" that the classes are really intended as rough clusters of terms, and it's the definition of terms that matter. Have I misunderstood this? The point being: The only way we are threatening to "break" DwC is by moving terms from the Occurrence class to two other new classes. Does that mean we are no longer allowed to represent those terms as properties of a record with an OccurrenceID? The tiny part of my brain that "gets" ontology wants to believe that backward compatibility of what would be the new DwC:Occurrence would be maintained with what is the existing DwC:Occurrence *only* if the new classes ("Organism" and "CollectionObject") are regarded as subclasses of Occurrence. But the slightly less tiny (but still tiny) part of my brain that "gets" information modeling doesn't think that's the right way to represent the new classes. Which tiny part of my brain is right? (I'm guessing neither...) Does it even matter?
Obviously, we want a stable DwC. But we also want a DwC that meets our needs. Clearly, there are needs that are not being met by the existing DwC. The first question is, are those needs important enough to consider destabilizing DwC (by introducing two new classes, and shuffling some terms from one existing class to the new classes)? The second question is: what are the real costs/consequences of the "destabilization". In my mind, the answer to the first question is increasingly obvious ("yes"). But I don't have a good feel for the answer to the second question.
Aloha, Rich
P.S. Greg: I live on the other side of the world from *everyone*, yet that hasn't prevented me from getting my words in... :-)
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: Tuesday, September 13, 2011 6:59 AM To: Steve Baskauf Cc: tdwg-content@lists.tdwg.org; "Éamonn Ó Tuama (GBIF)" Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still
there is a
drastic change in semantics of an existing term Occurrence and I would
feel
more comfortable if we can tell those different usages apart. If thats via
a
namespace based versioning of (all?) darwin core terms, through the use of
a
different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing
instances?
Markus
With regards to Markus' concern about whether people will be able to
know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really
have
a clear meaning. If you review the summary of the discussion on
Occurrence,
you can see that it was used to mean at least three different kinds of
"things"
by different people. What John is actually doing with his proposal is to
add
clarity about what an Occurrence is where it didn't exist before. I think
that is
a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent
consensus)
and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism"
- and that there can be a relationship between
CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of
Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
I'm also wondering if we necessarily need to "break" the traditional view of the "Occurrence" class in order to implement Organism and CollectionObject. As long as we keep in mind that DwC is a vocabulary of terms focused on representing an exchange standard (rather than a full-blown Ontology), perhaps Occurrence records can continue to be represented in the traditional way as "flat" content, but the Organism and CollectionObject classes allow us to present data in a somewhat more "normalized" way in those circumstances that call for it (e.g. tracking individuals or groups over time [Organism], or managing fossil rocks with multiple taxa [CollectionObject] -- to name just two).
I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
OK, let's look at a concrete example. Take the specimen that is illustrated at http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf The identifier "http://www.cyberfloralouisiana.com/specimens/lsu000/0138" could be associated with this "thing". (It actually isn't, I just made it up for an example. If you wish, you could substitute a UUID.) Now let's say that someone previously had asserted: occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138 with the understanding that the Occurrence represented not only the fact that a plant identified as /Egeria densa/ occurred at the location N 29.79 deg., W 90.632 deg. on 7 Sep 1977 but that it also represented the actual dried plant specimen itself (i.e. the evidence that the plant occurred there). This is the meaning of Occurrence that was implied (but not stated very explicitly) in the 2009 Darwin Core standard.
Under the new definition of Occurrence that is under consideration, the Occurrence represents the fact that a plant identified as /Egeria densa/ occurred at the location N 29.79 deg., W 90.632 deg. on 7 Sep 1977. These metadata fall under the occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138. Technically, the actual dried plant specimen itself is now not part of the Occurrence but rather is a CollectionObject. Does that break something? Does it force the institution to create a new identifier for the CollectionObject that has just been defined into existence? I think not. If the particular institution has ONLY occurrence records for which single pieces of evidence are associated with each Occurrence, then they have a flat database that does not distinguish between the Occurrence and the CollectionObject associated with that Occurrence. The change to the term definition is essentially irrelevant to that institution. On the other hand, if the institution decides that they have a new policy which requires that all collected specimens must now be photographed prior to collection and a DNA sample collected and submitted to Genbank, the new definitions provide a way for them to associate three (or more) CollectionObjects having separate collectionObjectIDs with the single Occurrence. If they "de-flatten" their database to accommodate this more "normalized" structure, they could easily implement a rule like 'put "#sp" after the identifier for the Occurrence to construct a default identifier for the single CollectionObject associated with that Occurrence (e.g. collectionObjectID==http://www.cyberfloralouisiana.com/specimens/lsu000/0138#sp for the CollectionObject associated with the Occurrence having occurrenceID=http://www.cyberfloralouisiana.com/specimens/lsu000/0138) or make up new identifiers for the CollectionObjects if they want. But no TDWG "Big Brother" is making them change their database structure or add new identifiers unless they want to.
To put this in perspective, look at http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf Here we have a specimen that has two dwc:Identifications, one that asserts the taxon /Juncus diffusissimus/ sensu L. Urbatsch and one that asserts /Juncus debilis/ sensu G. Montz. Accommodating a single specimen which has two dwc:Identifications in a completely flat database presents exactly the same problems as accommodating two CollectionObjects for a single Occurrence. All of the issues that have been raised for separating CollectionObjects from Occurrences apply equally well to creating the Identification class (like "do I have to assign new separate identifiers for the Identification instances" and "do I "break" things if I allow multiple Identifications in a world where people have databases that permit only a single determination per specimen/occurrence/organism"). I don't hear anybody gnashing their teeth or frothing at the mouth about the fact that we let the term dwc:Identification sneak into the 2009 Darwin Core and mess up our nice perfectly flat database world. Somebody explain to me how the issues raised with CollectionObject is different from this?
Steve
Richard Pyle wrote:
Ever since DwC transitioned from a "Federated Schema" to a "Vocabulary", I've never been entirely clear on what sorts of alterations would break backward-compatibility, and which are easily handled. I've heard various statements from people with much more understanding than I on the implications of a "Vocabulary" that the classes are really intended as rough clusters of terms, and it's the definition of terms that matter. Have I misunderstood this? The point being: The only way we are threatening to "break" DwC is by moving terms from the Occurrence class to two other new classes. Does that mean we are no longer allowed to represent those terms as properties of a record with an OccurrenceID? The tiny part of my brain that "gets" ontology wants to believe that backward compatibility of what would be the new DwC:Occurrence would be maintained with what is the existing DwC:Occurrence *only* if the new classes ("Organism" and "CollectionObject") are regarded as subclasses of Occurrence. But the slightly less tiny (but still tiny) part of my brain that "gets" information modeling doesn't think that's the right way to represent the new classes. Which tiny part of my brain is right? (I'm guessing neither...) Does it even matter?
Obviously, we want a stable DwC. But we also want a DwC that meets our needs. Clearly, there are needs that are not being met by the existing DwC. The first question is, are those needs important enough to consider destabilizing DwC (by introducing two new classes, and shuffling some terms from one existing class to the new classes)? The second question is: what are the real costs/consequences of the "destabilization". In my mind, the answer to the first question is increasingly obvious ("yes"). But I don't have a good feel for the answer to the second question.
Aloha, Rich
P.S. Greg: I live on the other side of the world from *everyone*, yet that hasn't prevented me from getting my words in... :-)
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: Tuesday, September 13, 2011 6:59 AM To: Steve Baskauf Cc: tdwg-content@lists.tdwg.org; "Éamonn Ó Tuama (GBIF)" Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still
there is a
drastic change in semantics of an existing term Occurrence and I would
feel
more comfortable if we can tell those different usages apart. If thats via
a
namespace based versioning of (all?) darwin core terms, through the use of
a
different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing
instances?
Markus
With regards to Markus' concern about whether people will be able to
know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really
have
a clear meaning. If you review the summary of the discussion on
Occurrence,
you can see that it was used to mean at least three different kinds of
"things"
by different people. What John is actually doing with his proposal is to
add
clarity about what an Occurrence is where it didn't exist before. I think
that is
a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent
consensus)
and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism"
- and that there can be a relationship between
CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of
Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
> I'm also wondering if we necessarily need to "break" the > traditional view of the "Occurrence" class in order to implement > Organism and CollectionObject. > As long as we keep in mind that DwC is a vocabulary of terms > focused on representing an exchange standard (rather than a > full-blown Ontology), perhaps Occurrence records can continue to > be represented in the traditional way as "flat" content, but the > Organism and CollectionObject classes allow us to present data in > a somewhat more "normalized" way in those circumstances that call > for it (e.g. tracking individuals or groups over time [Organism], > or managing fossil rocks with multiple taxa [CollectionObject] -- > to name just two). > > > > I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
On Tue, Sep 13, 2011 at 11:15 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Ever since DwC transitioned from a "Federated Schema" to a "Vocabulary", I've never been entirely clear on what sorts of alterations would break backward-compatibility, and which are easily handled. I've heard various statements from people with much more understanding than I on the implications of a "Vocabulary" that the classes are really intended as rough clusters of terms, and it's the definition of terms that matter. Have I misunderstood this?
Well, that depends partially on what technology you have invested in. If you only need to share records of Simple Darwin Core, whether in text files or in XML documents that are valid to the Simple Darwin Core XML schema, then the classes really don't have much of a function at all, except maybe as a convenience. For example, people might find it convenient to talk about the Location terms in Simple Darwin Core as a subset of terms that are grouped together and pertain to Locations. But in the record, the Location class does not appear. IPT uses the label for the Class to create a list of terms in its mapping user interface to help user quickly find relevant terms among the many, many terms that Darwin Core supports.
Once you get into more relational structures, Classes may take on a more active role. For example, Using the IPT's capacity to represent a star schema in text files, one might have all of the Identification information in a single file and relate that to a core record of the Occurrence type and thereby support many Identifications for a single Occurrence. The classes aren't explicit in this example, but they are "understood" by the humans who attach the identifiers in the Identifications file that relate to the core record. The same sort of structural understanding might be made more explicit in a database structure that tries to mimic the recommended dcattributes:organizedInClass, or in an XML Schema that explicitly uses the Classes as containers for properties.
In the semantic world, the Classes might be used to make world views (plural on purpose) that reflect an understanding of how the higher level concepts ought to relate to each other when applied to a particular problem.
All of these are valid uses of Darwin Core.
The point being: The only way we are threatening to "break" DwC is by moving terms from the Occurrence class to two other new classes. Does that mean we are no longer allowed to represent those terms as properties of a record with an OccurrenceID?
You weren't ever really allowed to do so in Darwin Core in the strict sense of assigning a domain to a property. This is because, for example, just because something has a dwc:scientificName doesn't mean it's a Taxon.
The tiny part of my brain that "gets" ontology wants to believe that backward compatibility of what would be the new DwC:Occurrence would be maintained with what is the existing DwC:Occurrence *only* if the new classes ("Organism" and "CollectionObject") are regarded as subclasses of Occurrence.
They really are distinct, and subclassing them from Occurrence would not give them any properties. Backward compatibility is much more affected by what you do did with Occurrence before and what you will do with Occurrence, Organism, and CollectionObject instead. To give an example, the IPT could be modified to allow an Organism and CollectionObject to be a core types instead of or in addition to Occurrence. That would require re-engineering. IPT could just as easily ignore these distinctions and still pump out perfectly good Darwin Core Archives about Taxa and Occurrences. It really about what you want to do with it.
But the slightly less tiny (but still tiny) part of my brain that "gets" information modeling doesn't think that's the right way to represent the new classes. Which tiny part of my brain is right? (I'm guessing neither...) Does it even matter?
I think it matters to those who want to make Darwin more capable of being used in ways that require distinctions that it cannot currently represent. It might be useful to think of a counter example to help people feel more comfortable with the proposed changes. I you have a monitoring network of camera traps, does it really bother you that Darwin Core has this GeologicalContext class? You don't use it. Should it matter that there are classes for Organisms and the persistent evidence of them? It should matter if you can do something interesting with that, otherwise you can ignore the distinction between them and think of the properties as Occurrence-related.
Obviously, we want a stable DwC. But we also want a DwC that meets our needs. Clearly, there are needs that are not being met by the existing DwC. The first question is, are those needs important enough to consider destabilizing DwC (by introducing two new classes, and shuffling some terms from one existing class to the new classes)? The second question is: what are the real costs/consequences of the "destabilization". In my mind, the answer to the first question is increasingly obvious ("yes"). But I don't have a good feel for the answer to the second question.
The answer to the second one is going to depend on how Darwin Core is being used. It might be good to get some anecdotes about how the proposed changes are going to "break" anything currently in existence.
Aloha, Rich
P.S. Greg: I live on the other side of the world from *everyone*, yet that hasn't prevented me from getting my words in... :-)
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: Tuesday, September 13, 2011 6:59 AM To: Steve Baskauf Cc: tdwg-content@lists.tdwg.org; "Éamonn Ó Tuama (GBIF)" Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Steve, I agree this is a good thing to me more clear about what an occurrence actualize is and I would't disagree with the proposed 3 classes. Still
there is a
drastic change in semantics of an existing term Occurrence and I would
feel
more comfortable if we can tell those different usages apart. If thats via
a
namespace based versioning of (all?) darwin core terms, through the use of
a
different term name or sth else I don't know.
Don't you think this an issue? Would you also change an owl ontology class definition in the same way and would't that be harmful to existing
instances?
Markus
With regards to Markus' concern about whether people will be able to
know whether somebody is talking about a "new-style" Occurrence or an "old" Occurrence, I would assert that the "old" Occurrence didn't really
have
a clear meaning. If you review the summary of the discussion on
Occurrence,
you can see that it was used to mean at least three different kinds of
"things"
by different people. What John is actually doing with his proposal is to
add
clarity about what an Occurrence is where it didn't exist before. I think
that is
a good thing. If, by the "old" kind of Occurrence people are meaning that Occurrence is a fancier name for PreservedSpecimen (which I believe is how some people in the museum community are thinking of it), then I would say that such a characterization is incorrect (based on the apparent
consensus)
and that clarifying the incorrectness of that view is a really good thing.
Steve
Éamonn Ó Tuama (GBIF) wrote:
It would be good to hear from someone who is familiar with the work going on in the Observations Task Group and could explain how a generic model for observations/measurements (e.g. OBOE) might help sort out these issues. It seems to me that we are trying to build in an ad-hoc manner an increasingly complex model on top of DwC which is really just a glossary of terms. That does not seem like a good approach - but I'm no modeller :-) _Éamonn
-----Original Message----- From: Dag Endresen (GBIF) [ mailto:dendresen@gbif.org ] Sent: 13 September 2011 12:18 To: "Markus Döring (GBIF)" Cc: tdwg-content@lists.tdwg.org ; Éamonn Ó Tuama Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Hi Markus,
I believe that the discussion here originates from the view that the "CollectionObject"/"Sample" is a different thing from the "Organism"
- and that there can be a relationship between
CollectionObjects/Samples and Organisms that could be difficult to describe if these things are identified as the same think (occurrenceID). Do you think that the "Occurrence" would be seen as a thing different from the proposed CollectionObject/Sample and Organism - or as a super-class that would include CollectionObjects/Samples and Organisms? Would the semantics of
Occurrence change?
I fully share your view that the Darwin Core Archive (DwC-A) would not be suited to share the full complex relationship between entities - even if persistent identifiers would be used. However if we start to describe and include other things (core types) than only the taxon and occurrences then perhaps the DwC-A could be a useful way to provide a simple list of these entities? This could perhaps provide easier indexing and discovery of these new entities?
Dag
On Tue, 13 Sep 2011 10:03:00 +0200, Markus Döring (GBIF) wrote:
I have to say that the change in semantics to the Occurrence class makes me a bit nervous. Can someone try help fighting my fears?
DarwinCore has no versioning of namespaces, so there is no way for a consumer to detect if its an old style Occurrence or a new one. I am currently parsing various RSS feeds and even though its a mess having to parse 10 different styles I am glad that at least the designers made sure they all have their own namespace! Also removing or renaming terms might cause serious problems. Would discrete versions of dwc with their own namespace hurt?
Another observation relates to dwc archives and its star schema. As an index to data that has been flattened there is no problem with more classes and core row types, but if you want it as a way to transfer complete normalized data it will not work. But that never really was the intention and I simply wanted to stress that fact.
Markus
On Sep 9, 2011, at 4:52 PM, Steve Baskauf wrote:
Richard Pyle wrote:
> I'm also wondering if we necessarily need to "break" the > traditional view of the "Occurrence" class in order to implement > Organism and CollectionObject. > As long as we keep in mind that DwC is a vocabulary of terms > focused on representing an exchange standard (rather than a > full-blown Ontology), perhaps Occurrence records can continue to > be represented in the traditional way as "flat" content, but the > Organism and CollectionObject classes allow us to present data in > a somewhat more "normalized" way in those circumstances that call > for it (e.g. tracking individuals or groups over time [Organism], > or managing fossil rocks with multiple taxa [CollectionObject] -- > to name just two). > > > I've been thinking about this issue of "backward compatibility" with respect to Occurrences if the CollectionObject/Sample/Token/whatever class is adopted. I really don't think it is going to be as big of a deal as people are making it out to be.
It seems to me that the main problems arise in two areas: when one wants to be clear about typing and when one wants to express relationships in a system where it is possible to do through semantics (like RDF). In that kind of circumstance, it's bad (oh yeh, I forgot - the term is "naughty") to say something like resourceA hasOccurrence resourceB when resourceB isn't actually an Occurrence. "Wrong" typing also happens all the time because the classes don't exist (yet) to do the typing correctly. As a case in point, in the Morphbank system, I have multiple images of the same tree. In that system the tree is typed as a "specimen". That is totally wrong because the tree isn't a specimen, but what else is it going to be typed as? There isn't (yet) an appropriate class to put it in.
Although these two problems (wrong typing and using a term with the wrong kind of object which are actually different manifestations of the same class-based problem) are naughty, realistically very few people are actually using a system that is "semantic-aware" (e.g. serving and consuming RDF) so right now making those mistakes doesn't really "break" anything. Most data providers are using traditional databases or even Excel spreadsheets where the DwC terms are just column headings with no real "meaning" other than what the data managers intend for them to mean. So if a manager has a table where each line contains a record for a specimen and has a column heading for a column entitled "dwc:catalogNumber", there isn't really anything other than an idea in the manager's head that the catalogNumber is a property of a specimen or Occurence or CollectionObject. If each line in the database table is "flat" such that one specimen=one CollectionObject=one Occurrence, all that is required to make catalogNumber be a property of a CollectionObject instead of an Occurrence is a different way of thinking in the managers mind because there are really no semantics embedded in the table. We are already doing this kind of mental gymnastics with existing classes like dwc:Identification . If our hypothetical database manager has a column heading that says "dwc:identifiedBy" in the specimen table, that is really a property of dwc:Identification, not dwc:Occurrence but again that is a distinction that is only going to be made in the manager's mind. Making the distinction really only becomes an issue when the database stops being "flat" for a particular relationship, e.g. if the database wants to allow multiple Identifications per specimen record. Then the database structure must be changed accordingly to accommodate that "normalization".
What we have here at the present moment is a situation where data providers don't have any way to have anything but "flat" records where 1 specimen=1 Occurrence=1 Organism. By adding the Organism and CollectionObject classes, we allow people who need or want to have less "flat" (=more "normalized") databases to have something to call the entities that are represented by the new tables they create to handle 1:many relationships instead of 1:1 relationships. Anybody who only cares about 1:1 relationships really doesn't need to worry about the fact that the new class exists, just as people currently don't have to worry about the Identification class if they only allow one Identification per specimen in their database.
So I guess what I'm saying is that if a database manager has a table labeled Occurrence, they really don't have to freak out if we now tell them that their table actually should be labeled CollectionObject as long as there is only one CollectionObject per Occurrence. They didn't freak out before when we told them that they should call their table "Occurrence" instead of "Observation" or "Specimen" in 2009, did they?
I think what I'm saying here is what Rich was trying to say in the paragraph I quoted, but I'm not sure.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
The definition of collection object has been bothering me, and I'll use Steve's comment below as a jumping off point.
On Thu, 8 Sep 2011 06:02:49 -0500 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Being derived from an Organism is the minimal requirement for a collection object.
This is dramatically different from the long history of use of the term collection object (since at least the ASC model), as being a physical thing in a natural science collection, where any of the following can be collection objects cataloged in a collection:
A single herbarium sheet with two pressed plants of different species collected in two different collecting events at two different localities.
A cast of a fossil bone.
A model of a hypothetical soft part reconstruction of an extinct organism.
A herbarium sheet bearing a drawing of a plant.
A bulk sample of sediment and fossils with collecting event and locality data.
A slab of rock containing many fossils of different taxa.
A vial of insects in ethanol collected in a trap.
A herbarium packet containing a rock, two lichens, and a moss.
A mouse, comprised of skull, skeleton, skin, and tissue sample preparations.
A mouse, comprised of a set of related collection objects: skull, skeleton, skin, and tissue sample, each with its own catalog number (perhaps with the tissue sample in a different collection using a number in a different catalog number series).
In many instances in many collections there is not a one to one relationship between a collection object and an organism. In lot based collections, collection objects are often sets of individual organisms. In some disciplines, collection objects are often aggregates of many different individuals belonging to multiple different taxa. Collection objects are often heirarchies of derived objects derived through various preparation techniques (the bulk sample that has been partly picked with macrofossils sorted into lots by higher taxon, with some of these lots sorted and identified down to the species level, with some parts of some specimens mounted on SEM stubs; or the mouse prepared into multiple preparation types).
A short definition of a collection object might be: "a thing that can be sent on loan from a collection".
-Paul
Paul J. Morris wrote:
The definition of collection object has been bothering me, and I'll use Steve's comment below as a jumping off point.
On Thu, 8 Sep 2011 06:02:49 -0500 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Being derived from an Organism is the minimal requirement for a collection object.
Well, my point here wasn't that the collection object must be derived from a SINGLE organism. I don't see any reason why a collection object could not be an aggregate derived from several organisms. In fact, there may be more cases where a collection object contains evidence for more than one organism than there are cases when it is derived from a single organism. I don't think there is a problem with that. Take the collection object which is an image having the GUID http://bioimages.vanderbilt.edu/baskauf/15833 . That image includes visible parts of at least 16 organisms. At the present moment, I'm interested in documenting the big tree in the middle of the picture, but that doesn't stop me from creating identifiers for any of Organisms that are the five people who are also shown in the image if I had a reason to do so. The existence of any of those people could be documented by the CollectionObject which is the image just as easily as the tree. The actual point of my statement was that a CollectionObject that had no obvious connection to an Organism (such as an abstract painting in a museum) would be outside of the scope of DwC.
...
In many instances in many collections there is not a one to one relationship between a collection object and an organism. In lot based collections, collection objects are often sets of individual organisms. In some disciplines, collection objects are often aggregates of many different individuals belonging to multiple different taxa. Collection objects are often heirarchies of derived objects derived through various preparation techniques (the bulk sample that has been partly picked with macrofossils sorted into lots by higher taxon, with some of these lots sorted and identified down to the species level, with some parts of some specimens mounted on SEM stubs; or the mouse prepared into multiple preparation types).
As I said, what is the problem with that? We are not demanding a one-to-one relationships between a collection object and an organism. I would say the connection is at least potentially many-to-many.
A short definition of a collection object might be: "a thing that can be sent on loan from a collection".
This definition is problematic. The large oak tree having the GUID http://bioimages.vanderbilt.edu/vanderbilt/7-314 is a part of a physical collection (the Vanderbilt arboretum). However, it cannot be sent on loan. The definition as it stands does not limit CollectionObjects to physical objects and I don't think is should be because the goal (at least I hope it's a goal!) is to allow things other than museum specimens to document organisms. I would say a better definition is that a CollectionObject is a resource that has been cataloged and is being maintained as part of a collection. That would include any PreservedSpecimens, but would also include LivingSpecimens, Images, MachineObservations, etc. If the history of the name "CollectionObject" is an impediment to people's understanding of what it means, then use a different name. But I think then broadening the meaning to include all kinds of things that are maintained as persistent evidence is fine as long as the meaning of the term is documented clearly (which I think it is in John's definition) .
Steve
On Thu, 8 Sep 2011 11:52:56 -0500 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Being derived from an Organism is the minimal requirement for a collection object.
Well, my point here wasn't that the collection object must be derived from a SINGLE organism. I don't see any reason why a collection object could not be an aggregate derived from several organisms.
I wholly agree with your point. My issue is with the phrasing of the definition (and not, I think, with your points at all):
Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms.
The word "an", in "evidence that an organism existed" implies to me that the intent is a one to one relationship between organism and collection object.
just as easily as the tree. The actual point of my statement was that a CollectionObject that had no obvious connection to an Organism (such as an abstract painting in a museum) would be outside of the scope of DwC.
There do exist a small class of collection objects in natural science collections that are similar to the abstract painting, we've got two entire collections of them here at Harvard in the glass flowers and the glass invertebrates, other examples include the Wards and Charles Knight dinosaur models.
A short definition of a collection object might be: "a thing that can be sent on loan from a collection".
This definition is problematic. The large oak tree having the GUID http://bioimages.vanderbilt.edu/vanderbilt/7-314 is a part of a physical collection (the Vanderbilt arboretum). However, it cannot be sent on loan.
Granted. I'm always forgetting living collections.
The definition as it stands does not limit CollectionObjects to physical objects and I don't think is should be because the goal (at least I hope it's a goal!) is to allow things other than museum specimens to document organisms. I would say a better definition is that a CollectionObject is a resource that has been cataloged and is being maintained as part of a collection.
Resource maintained as part of a collection sounds like a description of a collection object, "has been cataloged" does not. The vast majority of the 5 million or so collection objects in the Harvard University Herbaria have not been cataloged. The vast majority of all specimens in the stratigraphic portions of paleontological collections have not been cataloged. A heirarchical collection object consisting of a dry snail shell with one catalog number in one collection and soft parts in ethanol with a separate catalog number in another collection isn't is also problematic for associating a collection object with cataloged items. Botanical duplicates likewise.
That would include any PreservedSpecimens, but would also include LivingSpecimens, Images, MachineObservations, etc. If the history of the name "CollectionObject" is an impediment to people's understanding of what it means, then use a different name.
Field notes, publications, and digital images fall outside the scope of the meaning of collection object. (Though digital images might be derived objects derived from a collection object through a digital imaging preparation process). Field notes tend to be seen as metadata about the collecting event, though they might be the only source of information about other observations.
Voucher is perhaps a better term for the broader concept.
But I think then broadening the meaning to include all kinds of things that are maintained as persistent evidence is fine as long as the meaning of the term is documented clearly (which I think it is in John's definition).
I don't agree. Taking a term that is widely used for a central entity in relational models of natural science collections data, and expanding its scope to include concepts that are commonly held in distantly related entities is likely to cause confusion.
Perhaps: Voucher: The category of information pertaining to persistent evidence for the existance of organisms that is held in a collection, including digital forms.
-Paul
Paul J. Morris wrote:
Resource maintained as part of a collection sounds like a description of a collection object, "has been cataloged" does not. The vast majority of the 5 million or so collection objects in the Harvard University Herbaria have not been cataloged. The vast majority of all specimens in the stratigraphic portions of paleontological collections have not been cataloged. A heirarchical collection object consisting of a dry snail shell with one catalog number in one collection and soft parts in ethanol with a separate catalog number in another collection isn't is also problematic for associating a collection object with cataloged items. Botanical duplicates likewise.
Well, we may be talking past each other here if we have different understandings about what it means for something to be cataloged. But I would assert that any of the 5 million uncatalogued items in the Harvard University Herbaria are not relevant to Darwin Core in their present state because Darwin Core is a scheme for organizing metadata. How are you going to organize metadata for an item that has no database record/has not been cataloged? If you look at the terms that John is suggesting fall under the category of CollectionItem (i.e. properties of collection items), they are things like catalogNumber, disposition, otherCatalogNumbers, collectionObjectID, etc. These are properties of things which have been cataloged and and have a record in a database. You can't assign those properties to herbarium sheets that have no records in a database. In other words, the purpose of creating the class CollectionObject is not to describe the idea of what a collection object is, but rather to organize metadata for things that HAVE recorded metadata. No metadata, no reason for DwC to deal with it.
I'm not understanding your issue with separate parts or duplicates. If each one has it's own record, then each one is a separate instance of CollectionObject. They could be related to each other or the organism from which they came by properties like dcterms:hasPart, dcterms:isPartOf, or dsw:derivedFrom
Field notes, publications, and digital images fall outside the scope of the meaning of collection object.
Maybe we need a different name for the class if this is an impediment to too many people.
(Though digital images might be derived objects derived from a collection object through a digital imaging preparation process). Field notes tend to be seen as metadata about the collecting event, though they might be the only source of information about other observations.
Voucher is perhaps a better term for the broader concept.
Again, I'm repeating something I said on the list earlier, but I've been chided by some botanists for collecting images of live plants without collecting "vouchers". In that context, the person who scolded me intended for the term "voucher" to only include physical parts of a plant, and not images - a narrower concept than what John has in his definition. So we would run into the same problem of people having a preconceived idea of what "voucher" means that would basically be the same as what you are asserting "CollectionObject" means.
Steve
Regarding CollectionObject and Paul's comment "Resource maintained as part of a collection sounds like a description of a collection object, "has been cataloged" does not. The vast majority of the 5 million or so collection objects in the Harvard University Herbaria have not been cataloged."
CollectionObject in this case is the string/label applied to a particular class of data. It exists only for data. The fact that there are "collection objects" that are in the real world helps confuse these discussions that center on data entities, like a class called CollectionObject. So, although there are millions of non-cataloged collection objects in the HUH, none of them could ever be included in a CollectionObject data class unless they were first digitized, databased, or cataloged. So, I think defining a CollectionObject class to contain cataloged objects is valid.
The overlap between the real, or worse mental, world and the data world does cause some confusion. Not everything in the real world is actually captured in the data world and some may never be.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Paul J. Morris Sent: Thursday, September 08, 2011 1:17 PM To: Steve Baskauf Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
On Thu, 8 Sep 2011 11:52:56 -0500 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Being derived from an Organism is the minimal requirement for a collection object.
Well, my point here wasn't that the collection object must be derived from a SINGLE organism. I don't see any reason why a collection object could not be an aggregate derived from several organisms.
I wholly agree with your point. My issue is with the phrasing of the definition (and not, I think, with your points at all):
Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms.
The word "an", in "evidence that an organism existed" implies to me that the intent is a one to one relationship between organism and collection object.
just as easily as the tree. The actual point of my statement was that a CollectionObject that had no obvious connection to an Organism (such as an abstract painting in a museum) would be outside of the scope of DwC.
There do exist a small class of collection objects in natural science collections that are similar to the abstract painting, we've got two entire collections of them here at Harvard in the glass flowers and the glass invertebrates, other examples include the Wards and Charles Knight dinosaur models.
A short definition of a collection object might be: "a thing that can be sent on loan from a collection".
This definition is problematic. The large oak tree having the GUID http://bioimages.vanderbilt.edu/vanderbilt/7-314 is a part of a physical collection (the Vanderbilt arboretum). However, it cannot be sent on loan.
Granted. I'm always forgetting living collections.
The definition as it stands does not limit CollectionObjects to physical objects and I don't think is should be because the goal (at least I hope it's a goal!) is to allow things other than museum specimens to document organisms. I would say a better definition is that a CollectionObject is a resource that has been cataloged and is being maintained as part of a collection.
Resource maintained as part of a collection sounds like a description of a collection object, "has been cataloged" does not. The vast majority of the 5 million or so collection objects in the Harvard University Herbaria have not been cataloged. The vast majority of all specimens in the stratigraphic portions of paleontological collections have not been cataloged. A heirarchical collection object consisting of a dry snail shell with one catalog number in one collection and soft parts in ethanol with a separate catalog number in another collection isn't is also problematic for associating a collection object with cataloged items. Botanical duplicates likewise.
That would include any PreservedSpecimens, but would also include LivingSpecimens, Images, MachineObservations, etc. If the history of the name "CollectionObject" is an impediment to people's understanding of what it means, then use a different name.
Field notes, publications, and digital images fall outside the scope of the meaning of collection object. (Though digital images might be derived objects derived from a collection object through a digital imaging preparation process). Field notes tend to be seen as metadata about the collecting event, though they might be the only source of information about other observations.
Voucher is perhaps a better term for the broader concept.
But I think then broadening the meaning to include all kinds of things that are maintained as persistent evidence is fine as long as the meaning of the term is documented clearly (which I think it is in John's definition).
I don't agree. Taking a term that is widely used for a central entity in relational models of natural science collections data, and expanding its scope to include concepts that are commonly held in distantly related entities is likely to cause confusion.
Perhaps: Voucher: The category of information pertaining to persistent evidence for the existance of organisms that is held in a collection, including digital forms.
-Paul -- Paul J. Morris Biodiversity Informatics Manager Harvard University Herbaria/Museum of Comparative Zoölogy mole@morris.net AA3SD PGP public key available _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I agree with Paul that the new proposed DwC class expands the scope of "CollectionObject" from what it has traditionally been. I'm just not sure any of the alternative terms are any better. Basically, we're talking about elevating "collectionObject" to a superclass of what it used to be. Everything ypou list below falls within scope of my understanding of the new (sensu lato) definition of "collectionObject". Now, one could argue that we keep "collectionObject" for the traditional (sensu stricto) meaning and come up with another term for the superclass. But I can't think of another term that would adequately represent the superclass, and "preservedSpecimen" works well for the subclass (=traditional "collectionObject"). I originally liked "voucher" because I use the term "virtual voucher" for photographic/videographic evidence of organisms. But I agree with Steve's comment that "voucher" probably has an even more explicit implication of "preservedSpecimen" than "collectionObject" does.
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Paul J. Morris Sent: Thursday, September 08, 2011 5:22 AM To: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
The definition of collection object has been bothering me, and I'll use Steve's comment below as a jumping off point.
On Thu, 8 Sep 2011 06:02:49 -0500 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Being derived from an Organism is the minimal requirement for a collection object.
This is dramatically different from the long history of use of the term collection object (since at least the ASC model), as being a physical thing in a natural science collection, where any of the following can be collection objects cataloged in a collection:
A single herbarium sheet with two pressed plants of different species collected in two different collecting events at two different localities.
A cast of a fossil bone.
A model of a hypothetical soft part reconstruction of an extinct organism.
A herbarium sheet bearing a drawing of a plant.
A bulk sample of sediment and fossils with collecting event and locality data.
A slab of rock containing many fossils of different taxa.
A vial of insects in ethanol collected in a trap.
A herbarium packet containing a rock, two lichens, and a moss.
A mouse, comprised of skull, skeleton, skin, and tissue sample preparations.
A mouse, comprised of a set of related collection objects: skull, skeleton, skin, and tissue sample, each with its own catalog number (perhaps with the tissue sample in a different collection using a number in a different catalog number series).
In many instances in many collections there is not a one to one relationship between a collection object and an organism. In lot based collections, collection objects are often sets of individual organisms. In some disciplines, collection objects are often aggregates of many different individuals belonging to multiple different taxa. Collection objects are often heirarchies of derived objects derived through various preparation techniques (the bulk sample that has been partly picked with macrofossils sorted into lots by higher taxon, with some of these lots sorted and identified down to the species level, with some parts of some specimens mounted on SEM stubs; or the mouse prepared into multiple preparation types).
A short definition of a collection object might be: "a thing that can be sent on loan from a collection".
-Paul
Paul J. Morris Biodiversity Informatics Manager Harvard University Herbaria/Museum of Comparative Zoölogy mole@morris.net AA3SD PGP public key available _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
As one of the primary brawlers on this topic, I've already said enough
about
it, so I will restrain myself and just say that I fully support the
proposal.
As another of the primary brawlers, I fully concur with Steve's comments below. Nice job, John!
Well, mostly restrain myself... I will make one comment about what John said below. Although it is true that a CollectionObject (or "evidence")
would
probably need to have been derived from an organism to be relevant in the Darwin Core context, there is no reason why a CollectionObject cannot simultaneously serve as evidence that the Organism existed, that an Occurrence occurred, and as support for an Identification. Particularly in the case of specimens, it is likely that the
CollectionObject will
usually serve all three purposes at once. A CollectionObject could
actually
serve as "evidence" for anything you want. To some extent, that's one of the reasons for decoupling PreservedSpecimen from Occurrence.
I think I might agree with this, but I want to ask a simple question:
To what objects would an Identification instance apply? In other words, an Identification instance represents a link between an instance of Taxon to an instance of [XXXXXXX].
In my mind, this should always be "Organism". To me, neither an Occurrence instance or a CollectionObject instance has a taxonomic identity. Thinking about it in database terms, an Occurrence represents a join-table between Events (Place+time) and Organisms, and CollectionObjects represent the providence for the Organism.
I agree with Steve that a CollectionObject can certainly provide evidence that assists with Identifications and for documenting Occurrences. However, that doesn't mean that I think that there should be direct relationships between instances of Occurrence or CollectionObject with instances of Identification, or directly between CollectionObject and Occurrence (these should happen through an Organism).
However, I also fully understand that assertions of this sort are outside the scope of DwC (more an issue of onology), so this may not be the right time & place to raise this issue.
Aloha, Rich
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous.
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you mention symbiont, but the symbiont is the part of a symbiontic relation, e.g. both the algae taxon and fungus taxon each are a symbiont in a lichen.
Contradict if my German biology is at odds with English.
The problem is, that individual and set are mixed, so that the "homogeneous" appears to apply also to the individual. Proposal:
Definition: The information class pertaining to a specific instance or set of instances of a life form or organism (virus, bacteria, symbiontic life forms, individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
Gregor
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you
mention
symbiont, but the symbiont is the part of a symbiontic relation, e.g. both
the
algae taxon and fungus taxon each are a symbiont in a lichen.
I don't understand the problem. Isn't this simply two instances of "Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
Definition: The information class pertaining to a specific instance or
set of
instances of a life form or organism (virus, bacteria, symbiontic life
forms,
individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is). When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.
All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).
I originally thought that both "Organism" and "collectionObject" would be redundant, and that only one was really needed. But I have now been convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism". Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".
One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens). Does that mean that data providers will need to generate two separate Ids (one organismID and one collectionObjectID) to represent all of these specimens?
Aloha, Rich
"in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject..."
Rich has hit on an important point. The discussion continues to focus on perfecting logic that is all encompassing. But, is it the best thing to do for the community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).
Maybe we should consider keeping the solutions simple (ie current DwC ) for the many, many, many cases and introduce complex extensions only for the very, very, very few.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Thursday, September 08, 2011 3:52 AM To: 'Gregor Hagedorn'; tuco@berkeley.edu Cc: 'TDWG Content Mailing List' Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you
mention
symbiont, but the symbiont is the part of a symbiontic relation, e.g. both
the
algae taxon and fungus taxon each are a symbiont in a lichen.
I don't understand the problem. Isn't this simply two instances of "Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
Definition: The information class pertaining to a specific instance or
set of
instances of a life form or organism (virus, bacteria, symbiontic life
forms,
individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is). When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.
All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).
I originally thought that both "Organism" and "collectionObject" would be redundant, and that only one was really needed. But I have now been convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism". Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".
One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens). Does that mean that data providers will need to generate two separate Ids (one organismID and one collectionObjectID) to represent all of these specimens?
Aloha, Rich
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well, this point brings to mind the previous discussions about "collapsing" or "denormalizing" complex models into simpler models. In those discussions, it was noted that many if not most people "denormalize" out of existence parts of a more complex model when they don't need them in their particular situation. People who have whole dead organisms in a jar do not care about the distinction between an Organism and a CollectionObject. However, people who cut five branches from the same tree and send them to five different herbaria do care, as do people who both photograph and collect samples from an organism, people who make observations of the same organism over time, and people who make measurements of a temporarily captured animal at the same time they collect a DNA sample. I don't consider the latter four examples to be very, very, very few cases. If it seems like there are very few cases, that's just because TDWG was started by (and currently run mostly) by people who run museums. If TDWG wants to broaden the biodiversity informatics tent to include people outside of the museum community, then it is important to create the classes and terms necessary to accommodate those other people.
Simple DwC does not require that users include classes that they don't need in their databases. If a museum only has specimens that are whole dead organisms, then there is no need for them to have a table in their database for Organism; they only need a table for CollectionObjects and the Organism represented by that CollectionObject can be inferred. This is no different than flattening the relationships among Location, Event, and Occurrence by having a 1:1:1 relationship among those three classes, which might be done in a Darwin Core Archive. The fact that some people prefer to "flatten" those three classes into a single database table doesn't mean that other people will not have the need to have several events at a single Location, or several Occurrences at one Event.
So I guess I would say that I don't agree that there are very, very, very few cases where there are multiple CollectionObjects per Organism. One of the major points of the BiSciCol initiative is that there needs to be some way for connecting diverse collection objects that originate from the same organism but end up being scattered around in different institutions. The lack of an Organism class is a big hole in modeling these kinds of situations and I haven't heard of an alternative way to fix that hole. There is a strong desire among a diverse group of TDWG participants to move forward on more precisely modeling relationships among classes using RDF. If we do not make the changes John has suggested, it is difficult to see how this will be accomplished without people "making up" classes on the fly to accommodate the more precise modeling that is sought. That's what Cam and I had to do when we created darwin-sw.
Steve
Chuck Miller wrote:
"in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject..."
Rich has hit on an important point. The discussion continues to focus on perfecting logic that is all encompassing. But, is it the best thing to do for the community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).
Maybe we should consider keeping the solutions simple (ie current DwC ) for the many, many, many cases and introduce complex extensions only for the very, very, very few.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Thursday, September 08, 2011 3:52 AM To: 'Gregor Hagedorn'; tuco@berkeley.edu Cc: 'TDWG Content Mailing List' Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you
mention
symbiont, but the symbiont is the part of a symbiontic relation, e.g. both
the
algae taxon and fungus taxon each are a symbiont in a lichen.
I don't understand the problem. Isn't this simply two instances of "Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
Definition: The information class pertaining to a specific instance or
set of
instances of a life form or organism (virus, bacteria, symbiontic life
forms,
individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is). When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.
All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).
I originally thought that both "Organism" and "collectionObject" would be redundant, and that only one was really needed. But I have now been convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism". Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".
One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens). Does that mean that data providers will need to generate two separate Ids (one organismID and one collectionObjectID) to represent all of these specimens?
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
I just want to say that Steve *PERFECTLY* captured my own (current) perspective on this in his message below.
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Thursday, September 08, 2011 5:22 AM To: Chuck Miller Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Well, this point brings to mind the previous discussions about "collapsing" or "denormalizing" complex models into simpler models. In those discussions, it was noted that many if not most people "denormalize" out of existence parts of a more complex model when they don't need them in their particular situation. People who have whole dead organisms in a jar do not care about the distinction between an Organism and a CollectionObject. However, people who cut five branches from the same tree and send them to five different herbaria do care, as do people who both photograph and collect samples from an organism, people who make observations of the same organism over time, and people who make measurements of a temporarily captured animal at the same time they collect a DNA sample. I don't consider the latter four examples to be very, very, very few cases. If it seems like there are very few cases, that's just because TDWG was started by (and currently run mostly) by people who run museums. If TDWG wants to broaden the biodiversity informatics tent to include people outside of the museum community, then it is important to create the classes and terms necessary to accommodate those other people.
Simple DwC does not require that users include classes that they don't need in their databases. If a museum only has specimens that are whole dead organisms, then there is no need for them to have a table in their database for Organism; they only need a table for CollectionObjects and the Organism represented by that CollectionObject can be inferred. This is no different than flattening the relationships among Location, Event, and Occurrence by having a 1:1:1 relationship among those three classes, which might be done in a Darwin Core Archive. The fact that some people prefer to "flatten" those three classes into a single database table doesn't mean that other people will not have the need to have several events at a single Location, or several Occurrences at one Event.
So I guess I would say that I don't agree that there are very, very, very few cases where there are multiple CollectionObjects per Organism. One of the major points of the BiSciCol initiative is that there needs to be some way for connecting diverse collection objects that originate from the same organism but end up being scattered around in different institutions. The lack of an Organism class is a big hole in modeling these kinds of situations and I haven't heard of an alternative way to fix that hole. There is a strong desire among a diverse group of TDWG participants to move forward on more precisely modeling relationships among classes using RDF. If we do not make the changes John has suggested, it is difficult to see how this will be accomplished without people "making up" classes on the fly to accommodate the more precise modeling that is sought. That's what Cam and I had to do when we created darwin-sw.
Steve
Chuck Miller wrote:
"in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject..."
Rich has hit on an important point. The discussion continues to focus on perfecting logic that is all encompassing. But, is it the best thing to do for the community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).
Maybe we should consider keeping the solutions simple (ie current DwC ) for the many, many, many cases and introduce complex extensions only for the very, very, very few.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Thursday, September 08, 2011 3:52 AM To: 'Gregor Hagedorn'; tuco@berkeley.edu Cc: 'TDWG Content Mailing List' Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you
mention
symbiont, but the symbiont is the part of a symbiontic relation, e.g. both
the
algae taxon and fungus taxon each are a symbiont in a lichen.
I don't understand the problem. Isn't this simply two instances of "Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
Definition: The information class pertaining to a specific instance or
set of
instances of a life form or organism (virus, bacteria, symbiontic life
forms,
individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
I guess it could be defined that way, but I've come around to Steve's view that "taxonomically homogeneous" implies that in cases where more than one individual is involved (colonies, small groups, populations), all such individuals belong to a single species (independently of whether or not we can identify what that species is). When more than one species is discovered amongst a multi-individual instance of "Organism", then one would create additional instances of Organism to accommodate the heterogeneous taxa.
All of my original examples of things that I would want to be taxonomically heterogeneous (e.g., a single fossil rock with multiple phyla/kingdoms, or a single rock with multiple phyla of invertebrates attached) can be easily aggregated via a single instance of collectionObject, associated with multiple instances of Organism (one for each species[ish] level taxon).
I originally thought that both "Organism" and "collectionObject" would be redundant, and that only one was really needed. But I have now been convinced by Steve (and others) that this would be an unnecessary over-loading of "Organism". Now that we are contemplating two distinct classes, I have no problem with the more "refined" definition of "Organism".
One concern I do have, however, is in the many, many, many cases where there will be a 1:1 relationship between an Organism and a CollectionObject (i.e., the vast majority of all Museum specimens). Does that mean that data providers will need to generate two separate Ids (one organismID and one collectionObjectID) to represent all of these specimens?
Aloha, Rich
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
I agree with Chuck on the many many cases. Is it possible to define
Single_Organism_Collection
which inherits both CollectionObject and Organism/LifeForm properties?
Gregor
That's what I would call "Occurrence Sensu Lato" (=traditional use of Occurrence class). What I'm a little fuzzy on is whether we can have meaningful co-functionality of Occurrence "sensu stricto" (new definition) and Occurrence "sensu lato" (traditional definition) concurrently without breaking stuff, violating natural laws of semantic integrity, or enduring some other manner of unexpected hell.
Rich
-----Original Message----- From: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Sent: Thursday, September 08, 2011 9:12 AM To: Chuck Miller Cc: Richard Pyle; tuco@berkeley.edu; TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms,and CollectionObjects: a review
I agree with Chuck on the many many cases. Is it possible to define
Single_Organism_Collection
which inherits both CollectionObject and Organism/LifeForm properties?
Gregor
Rich has hit on an important point.
Yes, but I later somewhat negated it.
The discussion continues to focus on perfecting logic that is all encompassing. But, is it the best thing to
do for the
community as a whole to implement solutions that are more complex in order to accommodate the very, very, very few cases (using the counter description to Rich's).
Well, I'm not sure that's right. I don't think there are "very, very, very few cases" that would benefit from these proposed new classes. I think there are vry many of them, but they happen to be shoe-horned into the existing Occurrence class. I think that was fine to get DwC through its adolescence (are we there yet?), but now I think it's starting to represent a potential barrier to a potentially non-trivial amount biodiversity information exchange. Indeed, when you consider datasets involving the tracking of migratory organisms & such, the potential scope of content that uses the Organism class may come to dwarf that of the traditional specimen-in-museum set.
Maybe we should consider keeping the solutions simple (ie current DwC )
for
the many, many, many cases and introduce complex extensions only for the very, very, very few.
This was the basis for my musings about maintaining "backward compatibility" with the traditional sense of Occurrence. That is, let's try to allow traditionally cast Occurrence records to continue to function, but also allow parsing into Organism and CollectionObject instances when needed. This sounds dangerously close to implying that "Organism" and "CollectionObject" are subclasses of "Occurrence" -- but I've explored the logical implications of that, and there be dragons. I don't think they're subclasses of Occurrence; but I do not think that the traditional framing of Occurrence instances is necessarily logically incompatible with the implementation of these two new classes.
Rich
Rich wrote:
I don't understand the problem. Isn't this simply two instances of "Organism" (one symbiont and one host)?
Together, they may comprise a single collectionObject (e.g., specimen); but I see no trouble treating obligatory mutualistic symbionts and their host(s) as distinct instances of "Organism".
a) I think it is not that simple. The combination of fungus and algae is given its own taxonomic name, the lichen name. A lichen taxon always identifies at least two organisms.
b) I think we very often use the main, dominant organism when recording mutualistic symbiosis. Most trees and many other plants die without their mycorrhiza. No group of oaks is taxonomically homogeneous - it is always a mixture of plant and fungus. We normally just know, but don't record this. I would like to be able to keep it that way. You can say definitions don't matter, I prefer to explicitly state the flexibility inside the def.
Even humans depend on a wide spectrum of, e.g., gut bacteria. Without them, we would normally die (I believe, not being a medical person).
That should not mean, that we should not record a symbiosis as two records where appropriate, just that there are good reasons not to force people to do it, because it would be at the expense of what they really want to achieve.
I would prefer a bit more flexible definition, but am not starting a bar brawl :-)
Gregor
a) I think it is not that simple. The combination of fungus and algae is
given its
own taxonomic name, the lichen name. A lichen taxon always identifies at least two organisms.
It may be two "organisms", but its taxonomically homogeneous because it has one (lichen) taxon name. I didn't advocate that Organisms be "phylogenetically homogenous", only "taxonomically homogeneous". If Lichen names are intended to represent a taxon, then I still don't see the problem.
b) I think we very often use the main, dominant organism when recording mutualistic symbiosis. Most trees and many other plants die without their mycorrhiza. No group of oaks is taxonomically homogeneous - it is always a mixture of plant and fungus. We normally just know, but don't record this.
I
would like to be able to keep it that way. You can say definitions don't matter, I prefer to explicitly state the flexibility inside the def.
Perhaps, but maybe this is a perfect example of the difference between "Organism" and "CollectionObject"; the former should be taxonomically homogeneous, but the latter need not be.
That should not mean, that we should not record a symbiosis as two records where appropriate, just that there are good reasons not to force people to do it, because it would be at the expense of what they really want to achieve.
I don't think anyone is forcing anyone to do anything. If you want to talk about a fish as it swims through the water (or "sleeps" in a jar of alcohol), you create a single Organism instance for it, even though there are many symbionts physically attached to it. If/when you need to recognize these symbionts as individual things with their own properties (e.g., taxonomic identifications), you then create the necessary Organism instances to which those properties are applied.
Rich
What exactly is accomplished by requiring "taxonomically homogenous?" Perhaps the problem is that Organism is a subclass of something slightly more general, some more general "biologically organized" object that has a context dependent organizing principle. For example, biologists seem willing to talk about ecosystem instances in this way. Also, for some purposes, people seem willing to have discourse about an organism in which they include microbes that must survive not only on or in the organism, but even a tiny bit away from it. So, if one had a slightly more general class, and Organism is required to have some enumerated set of specific kinds of organizing principles, e.g. those presently on the table, several things happen: (a)those who need to have a different organizing principle than the current consensus of what organizes an Organism have a place to hang their organizing principle, (b) scientific advances about the organizing principles of life don't require massive ontological disruption(*)...you just move a principle into the appropriate subclass.
Bob Morris aka Recovering Algebraist
(*)well, I suppose the important ones do for the biologists, but I suspect they needn't for the formal ontologies, if the upper level organizing principle is "organizing principle".
On Thu, Sep 8, 2011 at 3:56 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous.
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you mention symbiont, but the symbiont is the part of a symbiontic relation, e.g. both the algae taxon and fungus taxon each are a symbiont in a lichen.
Contradict if my German biology is at odds with English.
The problem is, that individual and set are mixed, so that the "homogeneous" appears to apply also to the individual. Proposal:
Definition: The information class pertaining to a specific instance or set of instances of a life form or organism (virus, bacteria, symbiontic life forms, individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well, I think we've plowed this ground before (actually several times before). In the first attempt to come up with a consensus definition for "Individual" (previous name for what we are now calling "Organism"), we had allowed that an Individual be identified to a single Taxon, but with no restriction on the level of the taxon. In other words, the Individual could be taxonomically heterogenous at a lower taxonomic level as long as its components were part of the same higher-level taxon (e.g. the infamous marine trawl sample and various jars of samples taken from it; each jar an "Individual" identified to some higher taxonomic level that was common to all organisms in the jar). However, there was a point more recently when someone (I think it was actually you) requested competency questions for the proposed class. I provided three, one of which was the ability to track "duplicates" and to infer that any Identification which applies to one duplicate also applies to all others. I will say no more here, but simply refer to the email where I discussed this: http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002690.html Rich agreed that the ability to draw this kind of inference was valuable and agreed that requiring that Individuals (now called Organisms) to be taxonomically heterogeneous was a benefit that outweighted the benefits that would accrue from allowing them to be taxonomically heterogeneous. Rich can correct this if I've misrepresented anything he said.
Your suggestion that an Organism be a subclass of something more general is what Cam and I suggested in an alternate version of darwin-sw. I will not comment further on this because this approach has already been outlined in text and diagrams at http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity I don't have any objection to having a superclass of Organism that allows taxonomic heterogeneity, but one of the principles of Darwin Core is that in order for a term to become a part of the vocabulary, at least several people have to indicate that they want the term and there should be some reasonable explanation of how people would use the term. That has happened for Organism. It has NOT happened for TaxonomicallyHeterogeneousEntity or whatever you want to call it. As I discuss on the page reference above, allowing taxonomic heterogeneity introduces some significant complexities in modeling and I for one have no clue how to deal with them.
Steve
Bob Morris wrote:
What exactly is accomplished by requiring "taxonomically homogenous?" Perhaps the problem is that Organism is a subclass of something slightly more general, some more general "biologically organized" object that has a context dependent organizing principle. For example, biologists seem willing to talk about ecosystem instances in this way. Also, for some purposes, people seem willing to have discourse about an organism in which they include microbes that must survive not only on or in the organism, but even a tiny bit away from it. So, if one had a slightly more general class, and Organism is required to have some enumerated set of specific kinds of organizing principles, e.g. those presently on the table, several things happen: (a)those who need to have a different organizing principle than the current consensus of what organizes an Organism have a place to hang their organizing principle, (b) scientific advances about the organizing principles of life don't require massive ontological disruption(*)...you just move a principle into the appropriate subclass.
Bob Morris aka Recovering Algebraist
(*)well, I suppose the important ones do for the biologists, but I suspect they needn't for the formal ontologies, if the upper level organizing principle is "organizing principle".
On Thu, Sep 8, 2011 at 3:56 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous.
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you mention symbiont, but the symbiont is the part of a symbiontic relation, e.g. both the algae taxon and fungus taxon each are a symbiont in a lichen.
Contradict if my German biology is at odds with English.
The problem is, that individual and set are mixed, so that the "homogeneous" appears to apply also to the individual. Proposal:
Definition: The information class pertaining to a specific instance or set of instances of a life form or organism (virus, bacteria, symbiontic life forms, individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I was perhaps unclear. I don't mean to suggest a superclass that has some other notion of taxonomic organization. I meant to suggest one that simply has \some/ notion of organization. That wouldn't change the offered definition of Organism, but rather give people who feel they need some notion of an organized set of biological stuff a way to define other subclasses with different organizations. It would, for example, let people use DwC to describe some aspects of ecosystems able to do so without having to pretend that an ecosystem is always a special kind of Organism, or vice-versa.
It doesn't look to me like you envision that http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity would be suitable for describing ecosystems, possibly even for those ecologists that think a hierarchy of ecosystem types is as fundamental to what the study as are classical taxonomic hierarchies to classical taxonomists.
Bob
On Thu, Sep 8, 2011 at 10:40 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Well, I think we've plowed this ground before (actually several times before). In the first attempt to come up with a consensus definition for "Individual" (previous name for what we are now calling "Organism"), we had allowed that an Individual be identified to a single Taxon, but with no restriction on the level of the taxon. In other words, the Individual could be taxonomically heterogenous at a lower taxonomic level as long as its components were part of the same higher-level taxon (e.g. the infamous marine trawl sample and various jars of samples taken from it; each jar an "Individual" identified to some higher taxonomic level that was common to all organisms in the jar). However, there was a point more recently when someone (I think it was actually you) requested competency questions for the proposed class. I provided three, one of which was the ability to track "duplicates" and to infer that any Identification which applies to one duplicate also applies to all others. I will say no more here, but simply refer to the email where I discussed this: http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002690.html Rich agreed that the ability to draw this kind of inference was valuable and agreed that requiring that Individuals (now called Organisms) to be taxonomically heterogeneous was a benefit that outweighted the benefits that would accrue from allowing them to be taxonomically heterogeneous. Rich can correct this if I've misrepresented anything he said.
Your suggestion that an Organism be a subclass of something more general is what Cam and I suggested in an alternate version of darwin-sw. I will not comment further on this because this approach has already been outlined in text and diagrams at http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity I don't have any objection to having a superclass of Organism that allows taxonomic heterogeneity, but one of the principles of Darwin Core is that in order for a term to become a part of the vocabulary, at least several people have to indicate that they want the term and there should be some reasonable explanation of how people would use the term. That has happened for Organism. It has NOT happened for TaxonomicallyHeterogeneousEntity or whatever you want to call it. As I discuss on the page reference above, allowing taxonomic heterogeneity introduces some significant complexities in modeling and I for one have no clue how to deal with them.
Steve
Bob Morris wrote:
What exactly is accomplished by requiring "taxonomically homogenous?" Perhaps the problem is that Organism is a subclass of something slightly more general, some more general "biologically organized" object that has a context dependent organizing principle. For example, biologists seem willing to talk about ecosystem instances in this way. Also, for some purposes, people seem willing to have discourse about an organism in which they include microbes that must survive not only on or in the organism, but even a tiny bit away from it. So, if one had a slightly more general class, and Organism is required to have some enumerated set of specific kinds of organizing principles, e.g. those presently on the table, several things happen: (a)those who need to have a different organizing principle than the current consensus of what organizes an Organism have a place to hang their organizing principle, (b) scientific advances about the organizing principles of life don't require massive ontological disruption(*)...you just move a principle into the appropriate subclass.
Bob Morris aka Recovering Algebraist
(*)well, I suppose the important ones do for the biologists, but I suspect they needn't for the formal ontologies, if the upper level organizing principle is "organizing principle".
On Thu, Sep 8, 2011 at 3:56 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous.
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you mention symbiont, but the symbiont is the part of a symbiontic relation, e.g. both the algae taxon and fungus taxon each are a symbiont in a lichen.
Contradict if my German biology is at odds with English.
The problem is, that individual and set are mixed, so that the "homogeneous" appears to apply also to the individual. Proposal:
Definition: The information class pertaining to a specific instance or set of instances of a life form or organism (virus, bacteria, symbiontic life forms, individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
I see your point here. There has been some previous discussion on this list about classes for describing things like ecosystems and other larger scale phenomena involving collections of living things. But I don't think that discussion has been anything close to as extensive as the discussion surrounding individuals/organisms and tokens/evidence/collectionObjects. So I think the questions: "who needs the superclass?" and "what do they want to do with it?" (i.e. competency questions) need to be asked. Then somebody needs to create a definition and a proposal. I don't think any of those things have happened so far for larger-scale aggregates of living things.
I think it is possible (actually likely) that other groups may already have terms for some aggregates that we might adopt. Vegetation classification schemes come to mind as well as several ecoregions classification schemes.
Steve
Bob Morris wrote:
I was perhaps unclear. I don't mean to suggest a superclass that has some other notion of taxonomic organization. I meant to suggest one that simply has \some/ notion of organization. That wouldn't change the offered definition of Organism, but rather give people who feel they need some notion of an organized set of biological stuff a way to define other subclasses with different organizations. It would, for example, let people use DwC to describe some aspects of ecosystems able to do so without having to pretend that an ecosystem is always a special kind of Organism, or vice-versa.
It doesn't look to me like you envision that http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity would be suitable for describing ecosystems, possibly even for those ecologists that think a hierarchy of ecosystem types is as fundamental to what the study as are classical taxonomic hierarchies to classical taxonomists.
Bob
On Thu, Sep 8, 2011 at 10:40 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Well, I think we've plowed this ground before (actually several times before). In the first attempt to come up with a consensus definition for "Individual" (previous name for what we are now calling "Organism"), we had allowed that an Individual be identified to a single Taxon, but with no restriction on the level of the taxon. In other words, the Individual could be taxonomically heterogenous at a lower taxonomic level as long as its components were part of the same higher-level taxon (e.g. the infamous marine trawl sample and various jars of samples taken from it; each jar an "Individual" identified to some higher taxonomic level that was common to all organisms in the jar). However, there was a point more recently when someone (I think it was actually you) requested competency questions for the proposed class. I provided three, one of which was the ability to track "duplicates" and to infer that any Identification which applies to one duplicate also applies to all others. I will say no more here, but simply refer to the email where I discussed this: http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002690.html Rich agreed that the ability to draw this kind of inference was valuable and agreed that requiring that Individuals (now called Organisms) to be taxonomically heterogeneous was a benefit that outweighted the benefits that would accrue from allowing them to be taxonomically heterogeneous. Rich can correct this if I've misrepresented anything he said.
Your suggestion that an Organism be a subclass of something more general is what Cam and I suggested in an alternate version of darwin-sw. I will not comment further on this because this approach has already been outlined in text and diagrams at http://code.google.com/p/darwin-sw/wiki/TaxonomicHeterogeneity I don't have any objection to having a superclass of Organism that allows taxonomic heterogeneity, but one of the principles of Darwin Core is that in order for a term to become a part of the vocabulary, at least several people have to indicate that they want the term and there should be some reasonable explanation of how people would use the term. That has happened for Organism. It has NOT happened for TaxonomicallyHeterogeneousEntity or whatever you want to call it. As I discuss on the page reference above, allowing taxonomic heterogeneity introduces some significant complexities in modeling and I for one have no clue how to deal with them.
Steve
Bob Morris wrote:
What exactly is accomplished by requiring "taxonomically homogenous?" Perhaps the problem is that Organism is a subclass of something slightly more general, some more general "biologically organized" object that has a context dependent organizing principle. For example, biologists seem willing to talk about ecosystem instances in this way. Also, for some purposes, people seem willing to have discourse about an organism in which they include microbes that must survive not only on or in the organism, but even a tiny bit away from it. So, if one had a slightly more general class, and Organism is required to have some enumerated set of specific kinds of organizing principles, e.g. those presently on the table, several things happen: (a)those who need to have a different organizing principle than the current consensus of what organizes an Organism have a place to hang their organizing principle, (b) scientific advances about the organizing principles of life don't require massive ontological disruption(*)...you just move a principle into the appropriate subclass.
Bob Morris aka Recovering Algebraist
(*)well, I suppose the important ones do for the biologists, but I suspect they needn't for the formal ontologies, if the upper level organizing principle is "organizing principle".
On Thu, Sep 8, 2011 at 3:56 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous.
I see a problem with the "taxonomically homogeneous" since many taxa are not. All obligatory mutualistically symbiontic organisms are excluded (you mention symbiont, but the symbiont is the part of a symbiontic relation, e.g. both the algae taxon and fungus taxon each are a symbiont in a lichen.
Contradict if my German biology is at odds with English.
The problem is, that individual and set are mixed, so that the "homogeneous" appears to apply also to the individual. Proposal:
Definition: The information class pertaining to a specific instance or set of instances of a life form or organism (virus, bacteria, symbiontic life forms, individual, colony, group, population). Sets must reliably be known to taxonomically homogeneous (including obligatory symbiontic associations).
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Great job John.
1 comment - I feel a little uncomfortable about the unstructured nature of the associatedOrganisms term. If I was writing some software to read and import some DwC data, I would be interested in capturing the associations that have been defined (in some structured way).
There is possibly a few ways to improve this: - define a structured way of filling this field, eg "[relationship type (from controlled vocabulary), eg host]:[related Organism ID]" - add another field for just associatedOrganismIDs, that just has the IDs and not free form text - not sure how this will help though - add another class for AssociatedOrganism, that has fromOrganismID, toOrganismID and relationshipType
I also notice with a few of the DwC terms that it is recommended to use a controlled vocabulary for setting the value of the term, but the controlled vocabulary itself is not provided. Is it not the place for the DwC vocabulary to provide these? Or has the work just not been done to define them?
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Thursday, 8 September 2011 1:05 p.m. To: TDWG Content Mailing List Subject: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode + DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni + tSourceName + DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni + tID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
This is *exactly* what I'm talking about. What is dwc:ResourceRelationship for, if not exactly what Kevin is discussing below?
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Thursday, September 08, 2011 11:20 AM To: tuco@berkeley.edu; TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Great job John.
1 comment - I feel a little uncomfortable about the unstructured nature of the associatedOrganisms term. If I was writing some software to read and import some DwC data, I would be interested in capturing the associations that have been defined (in some structured way).
There is possibly a few ways to improve this:
- define a structured way of filling this field, eg "[relationship type
(from
controlled vocabulary), eg host]:[related Organism ID]"
- add another field for just associatedOrganismIDs, that just has the IDs
and
not free form text - not sure how this will help though
- add another class for AssociatedOrganism, that has fromOrganismID,
toOrganismID and relationshipType
I also notice with a few of the DwC terms that it is recommended to use a controlled vocabulary for setting the value of the term, but the
controlled
vocabulary itself is not provided. Is it not the place for the DwC
vocabulary to
provide these? Or has the work just not been done to define them?
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Thursday, 8 September 2011 1:05 p.m. To: TDWG Content Mailing List Subject: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions,
prototypes,
scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to
remove
ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll
see if
you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to
synthesize
the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and
give
them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as
"Individual"
has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer
to the
class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even
populations,
but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a
definitive
description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific
to the
data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before
capture".
For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitS ourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni
- tSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni
- tID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an
organism
occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital
media,
written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class.
Nor
does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name).
ABCD
isn't shy about vague term names - it uses "Unit" for roughly this
concept.
The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which
is
strictly required. No one objected to this name for the term, however, so
I
will continue to use it here to illustrate the proposed changes and
additions
to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated
definitions
for consistency. Note that with the addition of the "CollectionObject"
class,
the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject
class.
Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the
collectionObjectID
globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a
different
set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in
a
dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and
"CollectionObjects".
If you made it this far, I congratulate you on your dedication to the
cause.
Please let's clear up the remaining issues as a community and put these
new
terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use,
disclose,
copy or retain it; (ii) please contact the sender immediately by reply
and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Let's see if I can catch up.
On Thu, Sep 8, 2011 at 2:20 PM, Kevin Richards RichardsK@landcareresearch.co.nz wrote:
Great job John.
1 comment - I feel a little uncomfortable about the unstructured nature of the associatedOrganisms term. If I was writing some software to read and import some DwC data, I would be interested in capturing the associations that have been defined (in some structured way).
I understand your discomfort. That proposed term is to take the place of the corresponding associatedOccurrences. As with the other "associatedX" terms, it is meant to allow useful information to be retained as part of flat records, not necessarily to be rigorously interpretable for the contents of the list, which would require data content standards as well.
There is possibly a few ways to improve this:
- define a structured way of filling this field, eg "[relationship type (from controlled vocabulary), eg host]:[related Organism ID]"
- add another field for just associatedOrganismIDs, that just has the IDs and not free form text - not sure how this will help though
- add another class for AssociatedOrganism, that has fromOrganismID, toOrganismID and relationshipType
This is really the realm of more relational representations, in which you could use the ResourceRelationship class to good effect.
I also notice with a few of the DwC terms that it is recommended to use a controlled vocabulary for setting the value of the term, but the controlled vocabulary itself is not provided. Is it not the place for the DwC vocabulary to provide these? Or has the work just not been done to define them?
No Darwin Core term enforces controlled vocabularies. Only recommendations are made, and that's on purpose. One of the biggest reasons is that the vocabularies can be quite dynamic, variable, and/or contentious, and we did not want that distraction in the management of the Darwin Core terms.
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Thursday, 8 September 2011 1:05 p.m. To: TDWG Content Mailing List Subject: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni
- tSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni
- tID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Thanks John, the ResourceRelationship does satisfy my concerns! 1 person satisfied, 10 to go. :-)
Kevin
-----Original Message----- From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Wednesday, 14 September 2011 8:22 a.m. To: Kevin Richards Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Let's see if I can catch up.
On Thu, Sep 8, 2011 at 2:20 PM, Kevin Richards RichardsK@landcareresearch.co.nz wrote:
Great job John.
1 comment - I feel a little uncomfortable about the unstructured nature of the associatedOrganisms term. If I was writing some software to read and import some DwC data, I would be interested in capturing the associations that have been defined (in some structured way).
I understand your discomfort. That proposed term is to take the place of the corresponding associatedOccurrences. As with the other "associatedX" terms, it is meant to allow useful information to be retained as part of flat records, not necessarily to be rigorously interpretable for the contents of the list, which would require data content standards as well.
There is possibly a few ways to improve this:
- define a structured way of filling this field, eg "[relationship type (from controlled vocabulary), eg host]:[related Organism ID]"
- add another field for just associatedOrganismIDs, that just has the
IDs and not free form text - not sure how this will help though
- add another class for AssociatedOrganism, that has fromOrganismID,
toOrganismID and relationshipType
This is really the realm of more relational representations, in which you could use the ResourceRelationship class to good effect.
I also notice with a few of the DwC terms that it is recommended to use a controlled vocabulary for setting the value of the term, but the controlled vocabulary itself is not provided. Is it not the place for the DwC vocabulary to provide these? Or has the work just not been done to define them?
No Darwin Core term enforces controlled vocabularies. Only recommendations are made, and that's on purpose. One of the biggest reasons is that the vocabularies can be quite dynamic, variable, and/or contentious, and we did not want that distraction in the management of the Darwin Core terms.
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Thursday, 8 September 2011 1:05 p.m. To: TDWG Content Mailing List Subject: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni tSourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedU
- ni
- tSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedU
- ni
- tID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Count me satisfied as well! (Thanks, John , for the answer to my query on ResourceRelationship, which I both understand, and fully agree with).
2/10
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Tuesday, September 13, 2011 10:40 AM To: tuco@berkeley.edu Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Thanks John, the ResourceRelationship does satisfy my concerns! 1 person satisfied, 10 to go. :-)
Kevin
-----Original Message----- From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Wednesday, 14 September 2011 8:22 a.m. To: Kevin Richards Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Let's see if I can catch up.
On Thu, Sep 8, 2011 at 2:20 PM, Kevin Richards RichardsK@landcareresearch.co.nz wrote:
Great job John.
1 comment - I feel a little uncomfortable about the unstructured nature
of
the associatedOrganisms term. If I was writing some software to read and import some DwC data, I would be interested in capturing the associations that have been defined (in some structured way).
I understand your discomfort. That proposed term is to take the place of
the
corresponding associatedOccurrences. As with the other "associatedX" terms, it is meant to allow useful information to be retained as part of
flat
records, not necessarily to be rigorously interpretable for the contents
of the
list, which would require data content standards as well.
There is possibly a few ways to improve this:
- define a structured way of filling this field, eg "[relationship type
(from
controlled vocabulary), eg host]:[related Organism ID]"
- add another field for just associatedOrganismIDs, that just has the
IDs and not free form text - not sure how this will help though
- add another class for AssociatedOrganism, that has fromOrganismID,
toOrganismID and relationshipType
This is really the realm of more relational representations, in which you
could
use the ResourceRelationship class to good effect.
I also notice with a few of the DwC terms that it is recommended to use
a
controlled vocabulary for setting the value of the term, but the
controlled
vocabulary itself is not provided. Is it not the place for the DwC
vocabulary to
provide these? Or has the work just not been done to define them?
No Darwin Core term enforces controlled vocabularies. Only recommendations are made, and that's on purpose. One of the biggest reasons is that the vocabularies can be quite dynamic, variable, and/or contentious, and we did not want that distraction in the management of the Darwin Core terms.
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Thursday, 8 September 2011 1:05 p.m. To: TDWG Content Mailing List Subject: [tdwg-content] Occurrences, Organisms, and CollectionObjects: a review
Dear all,
Prepare yourself mentally. After more than a year of discussions,
prototypes, scholarly papers, bar room brawls, etc., we are very near
having
a path forward for two new, related classes for Darwin Core that attempt
to
remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is
important
to be as thorough as possible to make sure we get it right. I'll try here
to
synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence
and give
them their own classes.
Maybe not surprisingly, one of the hardest things to agree upon has been
the names for these classes. The class that was proposed first as
"Individual"
has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer
to the
class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses,
symbionts, individuals, colonies, groups of individuals, and even
populations,
but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a
definitive
description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official
definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism"
class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of
individuals, population) reliably be known to taxonomically homogeneous.
Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier
specific
to the data set.
Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before
capture".
For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUni tSourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedU
- ni
- tSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedU
- ni
- tID
The class proposed as "CollectionObject" has seen fewer alternate name
proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an
organism
occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital
media,
written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real
indication that the "Evidence" should apply to an "Organism"
rather than to an Occurrence, Taxon, Identification, or any other class.
Nor
does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name).
ABCD
isn't shy about vague term names - it uses "Unit" for roughly this
concept.
The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which
is
strictly required. No one objected to this name for the term, however, so
I
will continue to use it here to illustrate the proposed changes and
additions
to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID,
collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated
definitions
for consistency. Note that with the addition of the "CollectionObject"
class,
the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an
CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound,
drawing, field notes, publication), including digital forms.
Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the
CollectionObject
class.
Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a
combination
of identifiers in the record that will most closely make the
collectionObjectID
globally unique.
Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the
definition of the Occurrence class will have to change and quite a
different
set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information
pertaining to evidence of an occurrence in nature, in a collection, or in
a
dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin
Core
term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and
"CollectionObjects".
If you made it this far, I congratulate you on your dedication to the
cause.
Please let's clear up the remaining issues as a community and put these
new
terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use,
disclose,
copy or retain it; (ii) please contact the sender immediately by reply
and then delete the emails.
The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use,
disclose,
copy or retain it; (ii) please contact the sender immediately by reply
and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I really like the synopsis and the clarity of this proposal. However, I'm going to mud things up a bit by sending your message to the Biodiversity Genomics GSC Group list and alert them to this discussion so they can provide input.
There has been an ongoing effort to bridge DwC and GSC standards (for more information on GSC standards see http://www.nature.com/nbt/journal/v29/n5/full/nbt.1823.html). My sense is that CollectionObject is a nice analogue to what MIMARKS refers to as a Sample. Some possibilities for integration here (assuming i'm on target with sample and collectionObject being analogous):
1. Adopt the term "Sample" instead of "CollectionObject" 2. Develop a general understanding of how "Sample" and "CollectionObject" are related (or differ) and write up some explanatory text in the reference (with recognition of both terms). 3. Clarify the differences between CollectionObject and Samples and develop a ResourceRelationship between the two.
John Deck
On Wed, Sep 7, 2011 at 6:04 PM, John Wieczorek tuco@berkeley.edu wrote:
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode
DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I like sample. It is much more general and appropriate to work outside of museums. Preserving everthing is a luxury not all can afford. It should be done for many cases, but we usually don't have the resources to do it always.
Gregor
Funny thing for all of you Darwin Core trivia buffs. In one iteration of the class that became Occurrence, it was called "Sample" (http://rs.tdwg.org/dwc/terms/history/index.htm#Sample-2008-11-19). It was rejected as being too biased toward collections and away from observations. With CollectionObjects, we no longer need to worry about that sensitive issue. So, to me it seems Sample is no worse than CollectionObject, but suffers the same shortcomings when it comes to types of evidence that people wouldn't think of as samples (drawings, digital media, written notes and literature).
But I applaud the proposal to reconcile with GSC's Sample. Are GSC terms defined as vocabularies in a way that is compatible with Dublin Core and Darwin Core? Can someone point to the normative document containing the authoritative definition of the term?
On Thu, Sep 8, 2011 at 5:03 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
I like sample. It is much more general and appropriate to work outside of museums. Preserving everthing is a luxury not all can afford. It should be done for many cases, but we usually don't have the resources to do it always.
Gregor
I found an Excel spreadsheet containing "the MIGS/MIMS checklist" at http://gensc.org/gc_wiki/index.php/MIGS/MIMS#The_MIGS.2FMIMS_checklist_as_a_... It mentions "sample" in the description of various terms, but I couldn't find "sample" as a term itself. But there may be something I'm missing or maybe I haven't actually found the "real" term definitions (i.e. the normative document).
I didn't see any reference to term definitions as URIs or with RDF descriptions (as in DCMI and DwC). Just cells in a spreadsheet. Steve
John Wieczorek wrote:
Funny thing for all of you Darwin Core trivia buffs. In one iteration of the class that became Occurrence, it was called "Sample" (http://rs.tdwg.org/dwc/terms/history/index.htm#Sample-2008-11-19). It was rejected as being too biased toward collections and away from observations. With CollectionObjects, we no longer need to worry about that sensitive issue. So, to me it seems Sample is no worse than CollectionObject, but suffers the same shortcomings when it comes to types of evidence that people wouldn't think of as samples (drawings, digital media, written notes and literature).
But I applaud the proposal to reconcile with GSC's Sample. Are GSC terms defined as vocabularies in a way that is compatible with Dublin Core and Darwin Core? Can someone point to the normative document containing the authoritative definition of the term?
On Thu, Sep 8, 2011 at 5:03 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
I like sample. It is much more general and appropriate to work outside of museums. Preserving everthing is a luxury not all can afford. It should be done for many cases, but we usually don't have the resources to do it always.
Gregor
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
One of the problems with living on the other side of the world is the difficulty with getting a word into the appropriate place in a conversation that has been raging all night. Usually I tend to sigh and just let it pass - taking note of your various positions and simply trying to live with the consequences. But I do need to comment here.
The current debate reminds me a little of one of Bob's favourite theorems - if we had an organism we could make an occurrence - if we had a taxon. Darwin Core began its life as an flattened set of standardized biodiversity data access points employed at the aggregation and interrogation of content from built for purpose [collection] metadata systems. The use case here was to provide standardized content for value added projects with a focus on occurrence that could benefit from the large datasets that it made possible. This "Occurrence" was an abstraction derived from point in time sampling from our collection metadata repositories using "current" determinations. We do not build occurrence systems.
At the back end, in the real world, our use cases evolve from requirements that we manage collections of biodiversity content (..., individuals, parts, impressions, cultures, molecules, observations, events, images, names, taxa, citations and annotation histories) as a resource for scientific inquiry and research and for the development of practical tools for biodiversity management and data interchange. Without these efforts toward establishment of taxonomic hypotheses the very concept of "occurrence" is meaningless.
So, to me, it now seems a little absurd that with a bit of tweaking and classing the TDWG Domain Model might be derivable from this "occurrence" set. Darwin Core is extremely useful as a vocabulary. It is beautifully documented and entirely suited to its aggregation use case. It has been taken up in many quarters. The last thing we should be thinking about now is how to set about breaking it.
A workable domain model deserves to be a high priority with Interoperability across our standard offerings a primary goal. Roger's existing work at rs.tdwg.org may not have that standards ratification but it has been widely used and tested and forms the basis (along with TCS) of ongoing semantic systems research and linked data developments within the biodiversity space. The difficulty in maintaining LSID traction, the general lack of interest (especially in the vested interests) in RDF, the pressure of real work and the focus on application level schema and aggregation have all contributed to the current state of progress there. Never-the-less it should still be our point of reference as we try to boot-strap this modelling effort.
There is a proposal to form an DwC RDF task group that comes from the primary advocates of the Darwin Core changes under discussion here. It would seem to make sense to leave the Darwin Core alone until this group reports. There is a parallel movement in the TAG to resurrect the domain model effort leveraging experiences with the candidate TDWG ontologies. Hopefully we will have time to air these options during TDWG 2011.
greg
A few observations:
The real database is in the collection and the labels on the objects there contained, the real metadata. Our electronic versions are in effect meta-metadata.
Falsification of an occurrence may very well be the primary function of a scientific voucher.
There is no avenue of connection from "occurrence" to "taxon" other than through typification, direct citation of vouchered material or annotation by the taxon authority. Mostly they simply share taxon name string
On 8 September 2011 11:04, John Wieczorek tuco@berkeley.edu wrote:
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
+1 on keeping things simple (i.e., Darwin Core is a success, why muck with it?)
+1 on Roger Hyam's work on http://rs.tdwg.org/, which is the only useful thing I've ever found on the TDWG web site
Regards
Rod
On 13 Sep 2011, at 15:32, greg whitbread wrote:
One of the problems with living on the other side of the world is the difficulty with getting a word into the appropriate place in a conversation that has been raging all night. Usually I tend to sigh and just let it pass - taking note of your various positions and simply trying to live with the consequences. But I do need to comment here.
The current debate reminds me a little of one of Bob's favourite theorems - if we had an organism we could make an occurrence - if we had a taxon. Darwin Core began its life as an flattened set of standardized biodiversity data access points employed at the aggregation and interrogation of content from built for purpose [collection] metadata systems. The use case here was to provide standardized content for value added projects with a focus on occurrence that could benefit from the large datasets that it made possible. This "Occurrence" was an abstraction derived from point in time sampling from our collection metadata repositories using "current" determinations. We do not build occurrence systems.
At the back end, in the real world, our use cases evolve from requirements that we manage collections of biodiversity content (..., individuals, parts, impressions, cultures, molecules, observations, events, images, names, taxa, citations and annotation histories) as a resource for scientific inquiry and research and for the development of practical tools for biodiversity management and data interchange. Without these efforts toward establishment of taxonomic hypotheses the very concept of "occurrence" is meaningless.
So, to me, it now seems a little absurd that with a bit of tweaking and classing the TDWG Domain Model might be derivable from this "occurrence" set. Darwin Core is extremely useful as a vocabulary. It is beautifully documented and entirely suited to its aggregation use case. It has been taken up in many quarters. The last thing we should be thinking about now is how to set about breaking it.
A workable domain model deserves to be a high priority with Interoperability across our standard offerings a primary goal. Roger's existing work at rs.tdwg.org may not have that standards ratification but it has been widely used and tested and forms the basis (along with TCS) of ongoing semantic systems research and linked data developments within the biodiversity space. The difficulty in maintaining LSID traction, the general lack of interest (especially in the vested interests) in RDF, the pressure of real work and the focus on application level schema and aggregation have all contributed to the current state of progress there. Never-the-less it should still be our point of reference as we try to boot-strap this modelling effort.
There is a proposal to form an DwC RDF task group that comes from the primary advocates of the Darwin Core changes under discussion here. It would seem to make sense to leave the Darwin Core alone until this group reports. There is a parallel movement in the TAG to resurrect the domain model effort leveraging experiences with the candidate TDWG ontologies. Hopefully we will have time to air these options during TDWG 2011.
greg
A few observations:
The real database is in the collection and the labels on the objects there contained, the real metadata. Our electronic versions are in effect meta-metadata.
Falsification of an occurrence may very well be the primary function of a scientific voucher.
There is no avenue of connection from "occurrence" to "taxon" other than through typification, direct citation of vouchered material or annotation by the taxon authority. Mostly they simply share taxon name string
On 8 September 2011 11:04, John Wieczorek tuco@berkeley.edu wrote:
Dear all,
Prepare yourself mentally. After more than a year of discussions, prototypes, scholarly papers, bar room brawls, etc., we are very near having a path forward for two new, related classes for Darwin Core that attempt to remove ambiguity inherent in the Occurrence class as it currently stands. Adding classes is quite a bit more complicated than adding properties (as you'll see if you manage to get through this message), and so it is important to be as thorough as possible to make sure we get it right. I'll try here to synthesize the rough consensus and the remaining issues.
Basically, the idea is to pull two distinct concepts out of Occurrence and give them their own classes. Maybe not surprisingly, one of the hardest things to agree upon has been the names for these classes. The class that was proposed first as "Individual" has seen no less than 12 alternate names, none of them satisfying to everyone. The closest thing to an acceptable name was "Organism", with caveats that the definition should make it abundantly clear what is to be included in the class and what is not. I'll use "Organism" here to refer to the class in the hopes of offending the fewest people.
The rough consensus on "Organism" is that is should include viruses, symbionts, individuals, colonies, groups of individuals, and even populations, but that there should be taxonomic homogeneity to an instance of an "Organism". There has been some concern about how and where to draw the line on homogeneity. No attempt has yet been made to write a definitive description of the class, though many examples of representatives of the class have been given.
What we need to move forward on the "Organism" class are an official definition and an official comment, the combination of which should be sufficient for someone previously unfamiliar with the term and the arguments leading to its existence to understand. Some existing terms (individualCount, sex, lifeStage, reproductiveCondition, behavior, previousIdentifications, associatedTaxa) will have to be reorganized to be under this new class. These terms may require updated definitions for consistency. New terms (organismID, associatedOrganisms, organismRemarks) and an Organism Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "Organism" class:
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/terms/Organism Namespace: http://rs.tdwg.org/dwc/terms/ Label: Organism Definition: The category of information pertaining to a specific instance of an organism (virus, symbiont, individual, colony, group of individuals, population) reliably be known to taxonomically homogeneous. Comment: For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: Organism Identifier: http://rs.tdwg.org/dwc/dwctype/Organism Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: Organism Definition: A resource describing an instance of the Organism class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-00 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: Organism-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: organismID Identifier: http://rs.tdwg.org/dwc/terms/organismID Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismID Definition: An identifier for the set of information associated with an Organism. May be a global unique identifier or an identifier specific to the data set. Comment: For discussion see http://code.google.com/p/darwincore/wiki/BiologicalEntity Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: organismID-2011-09-09 Replaces: individualID-2009-09-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/UnitID
Term Name: organismRemarks Identifier: http://rs.tdwg.org/dwc/terms/organismRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: organismRemarks Definition: Comments or notes about the Organism. Comment: Example: "seen several times in Tilden Park before capture". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: organismRemarks-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Term Name: associatedOrganisms Identifier: http://rs.tdwg.org/dwc/terms/associatedOrganisms Namespace: http://rs.tdwg.org/dwc/terms/ Label: associatedOrganisms Definition: A list (concatenated and separated) of identifiers of other Organism records and their associations to this Organism. Comment: Example: "sibling of MXA-231; sibling of MXA-232". For discussion see http://code.google.com/p/darwincore/wiki/Organism Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: associatedOrganisms-2011-09-09 Replaces: associatedOccurrences-2009-04-24 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/Organism ABCD 2.06: DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceInstitutionCode
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitSourceName
- DataSets/DataSet/Units/Unit/Associations/UnitAssociation/AssociatedUnitID
The class proposed as "CollectionObject" has seen fewer alternate name proposals than "Organism", but the same call for clarity on inclusion and exclusion has been voiced. The basic idea is to use this class to cover information that could be considered "persistent evidence" that an organism occurred, and that the concept is distinct from both "Organism" and Occurrence. Evidence might include collection-based materials, digital media, written materials, and literature.
"Evidence" may be a bit vague as a name for the class, providing no real indication that the "Evidence" should apply to an "Organism" rather than to an Occurrence, Taxon, Identification, or any other class. Nor does it convey the idea that the evidence should be persistent. "PersistentEvidenceThatAnOrganismExisted" gets the idea across pretty well, but it is a bit lengthy (and no one actually proposed this name). ABCD isn't shy about vague term names - it uses "Unit" for roughly this concept. The long-standing term "CollectionObject" is less vague than the proposed alternatives, but it might lead people to assume that the object must be physical, and that it must be housed within a collection, neither of which is strictly required. No one objected to this name for the term, however, so I will continue to use it here to illustrate the proposed changes and additions to accommodate this concept.
Some existing terms (institutionID, institutionCode, collectionID, collectionCode, ownerInstitutionCode, catalogNumber, preparations, disposition, otherCatalogNumbers, associatedSequences) will have to be organized under this new class. These terms may require updated definitions for consistency. Note that with the addition of the "CollectionObject" class, the institutionCode, collectionCode, catalogNumber triplet would no longer apply to an Occurrence.
New terms (collectionObjectID and collectionObjectRemarks) and an CollectionObject Darwin Core Type vocabulary term will have to be added. Following is an updated proposal for changes related to the adoption of a new "CollectionObject" class:
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/terms/CollectionObject Namespace: http://rs.tdwg.org/dwc/terms/ Label: CollectionObject Definition: The category of information pertaining to persistent evidence that an organism existed (specimen, sample, image, sound, drawing, field notes, publication), including digital forms. Comment: For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: {DataSets/DataSet/Units/Unit/CultureCollectionUnit or DataSets/DataSet/Units/Unit/MycologicalUnit or DataSets/DataSet/Units/Unit/HerbariumUnit or DataSets/DataSet/Units/Unit/BotanicalGardenUnit or DataSets/DataSet/Units/Unit/PlantGeneticResourceUnit or DataSets/DataSet/Units/Unit/ZoologicalUnit or DataSets/DataSet/Units/Unit/PalaeontologicalUnit or DataSets/DataSet/Units/Unit/MultimediaObjects/MultimediaObject}
Term Name: CollectionObject Identifier: http://rs.tdwg.org/dwc/dwctype/CollectionObject Namespace: http://rs.tdwg.org/dwc/dwctype/ Label: CollectionObject Definition: A resource describing an instance of the CollectionObject class. Comment: For discussion see http://code.google.com/p/darwincore/wiki/DwCTypeVocabulary Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Member Of: http://rs.tdwg.org/dwc/terms/DwCType Has Domain: Has Range: Version: CollectionObject-2011-09-09 Replaces: Is Replaced By: Class: ABCD 2.06: not in ABCD
Term Name: collectionObjectID Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectID Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectID Definition: An identifier for the CollectionObject. In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the collectionObjectID globally unique. Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: http://purl.org/dc/terms/identifier Status: recommended Date Issued: 2011-09-09 Date Modified: 2011-09-09 Has Domain: Has Range: Version: collectionObjectID-2011-09-09 Replaces: Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/UnitGUID
Term Name: collectionObjectRemarks Identifier: http://rs.tdwg.org/dwc/terms/collectionObjectRemarks Namespace: http://rs.tdwg.org/dwc/terms/ Label: collectionObjectRemarks Definition: Comments or notes about the CollectionObject. Comment: Example: "custody transferred in 1995 from National Park Service". For discussion see http://code.google.com/p/darwincore/wiki/CollectionObject Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: recommended Date Issued: 2011-09-09 Date Modified: 2009-09-09 Has Domain: Has Range: Version: collectionObjectRemarks-2011-09-09 Replaces: SampleRemarks-2009-01-18 Is Replaced By: Class: http://rs.tdwg.org/dwc/terms/CollectionObject ABCD 2.06: DataSets/DataSet/Units/Unit/Notes
Because of these changes for "Organism" and "CollectionObject", the definition of the Occurrence class will have to change and quite a different set of terms organized under it, namely:
occurrenceID, occurrenceRemarks, recordNumber, recordedBy, establishmentMeans, and occurrenceStatus
The Occurrence definition will change from "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)." to something more akin to "The category of information pertaining to evidence of an occurrence of an Organism in nature."
The term occurrenceDetails will be deprecated in favor of the Dublin Core term dcterms:references at the record level. Also, associatedMedia, which was organized under Occurrence, would become a record level term, as it could apply as easily to Occurrences, "Organisms", and "CollectionObjects".
If you made it this far, I congratulate you on your dedication to the cause. Please let's clear up the remaining issues as a community and put these new terms to good use.
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
-- Greg Whitbread Australian National Botanic Gardens Australian National Herbarium +61 2 62509482 ghw@anbg.gov.au
"And therfore, at the kynges court, my brother, Ech man for hymself, ther is noon oother." The Knight's Tale, l. 1181-1182 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Roderic Page wrote:
+1 on keeping things simple (i.e., Darwin Core is a success, why muck with it?)
Well, I guess the accuracy of the statement "Darwin Core is a success, why muck with it?" depends on your point of view. If the only thing you do is collect specimens, then you would probably agree with that statement. If you document organisms by photographing them, collecting DNA from them, or a combination of the above, then you would probably NOT agree with it.
GBIF makes the following statement on it's website: "Since its inception, GBIF has focused its data digitisation and mobilisation activities on natural history collections data. However, targets such as the discovery of 5 billion and mobilisation of up to 2 billion primary biodiversity data records require that GBIF data discovery and mobilisation activities place equal emphasis on other data types." (http://www.gbif.org/informatics/primary-data/types-of-primary-biodiversity-d...). I would be interested in hearing from some of the GBIF representatives how they would envision accomplishing the goal of integrating other data types with specimen data if Darwin Core remains frozen in a form that really only works well for collections metadata?
Steve
participants (14)
-
"Markus Döring (GBIF)"
-
Bob Morris
-
Chuck Miller
-
Dag Endresen (GBIF)
-
greg whitbread
-
Gregor Hagedorn
-
John Deck
-
John Wieczorek
-
Kevin Richards
-
Paul J. Morris
-
Richard Pyle
-
Roderic Page
-
Steve Baskauf
-
Éamonn Ó Tuama (GBIF)