Conflict between DarwinCore and DublinCore usage of dcterms:type / basisOfRecord
Dear John
we (Gregor Hagedorn, Bob Morris, Steve Baskauf) realize that the public review period for DarwinCore (DwC) is over, but we believe we need to bring a potentially highly problematic issue to your attention. This issue has been found originally by Steve Baskauf. Essentially, it is an issue that is not very appearant when reading DarwinCore for review, but detected when trying to implement it in combination with other technologies.
Steve describes the problem on: http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC
DarwinCore seems to use dcterms:type in a way that is inconsistent with the DublinCore (DC) recommendations for publication artifacts, which is the way most users of DC are likely to use dcterms:type. Steve pointed out that MRTG's use, which does follow the DC recommendation, is inconsistent with DwC. We believe that this is not a problem of MRTG; the problem equally occurs, e. g., where natural history collections collaborate with the culture and library initative Europeana.eu, which equally uses DublinCore type in the original sense.
DublinCore dcterms:type has an explicit type vocabulary: http://dublincore.org/documents/dcmi-terms/#terms-type whose annotations says: "Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]." This vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/ defines values like StillImage, Sound, MovingImage, Text.
In contrast, the DwC type vocabulary acts on an abstract level of recording occurrences that are independent of physical records. These occurrences can then be vouchered by physical resources like specimens, photos, movies, etc. The actual resources treated in DublinCore are therefore only potential vouchers for a DarwinCore resource. The terms recommended for DublinCore "type" are therefore expected in the DarwinCore "basisOfRecord" property.
We do not mean to imply that there is anything wrong with the DarwinCore perspective. Unfortunately, we believe that DarwinCore cannot coexist with DublinCore data, as long as DarwinCode does not define its own dwc:type/dwc:abstractType property.
-------------
Test case: An image showing a taxon observation shall be documented both in DarwinCore and DublinCore.
DarwinCore prescribes or recommends dcterms:type=Occurrence, plus: basisOfRecord:StillImage.
DublinCore recommends dcterms:type=StillImage.
-------------
We have internally begun to discuss possible solutions. In DublinCore, dcterms:type does not express a particular type of metadata record, but is metadata about the underlying resource. We therefore consider replacing the DwC use of dcterms:type with something in the dwc namespace, and replacing dwc:basisOfRecord with dcterms:type as an option that minimizes the necessary design changes in DwC. We can see some other issues arise that depend on how one tries to bring DwC into closer coherence with the DublinCore recommendations, but perhaps these are best put forth on a wiki.
Here we would like only to point out that we believe that the values for basisOfRecord fit into the dcterm:type vocabulary. Observations (dwc:HumanObservation and dwc:MachineObservation) may be placed as subtypes of http://purl.org/dc/dcmitype/Event. and specimens (dwc:PreservedSpecimen, dwc:FossilSpecimen, dwc:LivingSpecimen) as subtypes of http://purl.org/dc/dcmitype/PhysicalObject. For different communities, the dwc specimen types may have to be further subtyped as "Seed", "TissueSample", "DNA_Sample".
However, we believe it is not possible to create a hierarchy like "StillImage - isSubtypeOf - Image - isSubtypeOf - Occurrence" because this is a use-case dependent view: A character image may be a subtype of a taxon representation, and it may or may not be a subtype of an occurrence representation.
-------------
Best wishes
Gregor, Bob, Steve
Gregor and others,
Thanks for this very clear explanation of the problem. There were good reasons for defining dcterms:type an basisOfRecord the way they are published, but the reasons did not account for this serious conflict with the use of dcterms:type. This must fixed, and right away. It is a pity it wasn't caught earlier, but a good thing it was caught early.
I think the most expedient solution is to switch the usage of dcterms:type and basisOfRecord, swapping there definitions and the type vocabulary control recommendations. That solution only gives me one real hesitation, which is that I believe there will be an explosion of new subtypes in our community, and the maintenance of them in the standard will be come onerous. This was the primary reason for having basisOfRecord not formally controlled by a type vocabulary - because it would be changing all of the time.
So, a second solution is to remove the formal type vocabulary from Darwin Core altogether in combination with the switch in term definitions suggested above. In doing so, the expansion of type usage would be a matter of vocabulary control outside of the normative standard and much easier to maintain.
Thanks again for exposing this issue. Let's find a reasonable solution to this as quickly as possible. I favor the first solution, for now, as it will have the least impact on the standard. If that is agreeable, I'll go through the prescribed process for changes to the standard to get it implemented. Namely:
"3.3. Semantic changes in Darwin Core terms
"Changes of definitions within Darwin Core recommendations and/or Darwin Core term declarations will be reflected in the affected Darwin Core recommendation and/or Darwin Core term declaration. Semantic changes of this type will undergo a request for comments, and will result in a decision [DECISIONS]. If, in the judgment of the TDWG Architecture Group, such changes of meaning are likely to have substantial impact on either machine processing of Darwin Core terms or the functional semantics of the terms, then these changes will be reflected in a change of URI for the Darwin Core term or terms in question. The URIs for any new Darwin Core namespaces resulting from such changes will conform to the Darwin Core namespace URI pattern defined above.
"Requests for semantic changes to a term should be made to the Technical Architecture Group [DWC-USAGE], and should consist of a complete list of attributes to be changed along with a statement of justification for the changes."
Let's consider this to be the beginning of the request for comments.
John
On Fri, Oct 23, 2009 at 2:38 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Dear John
we (Gregor Hagedorn, Bob Morris, Steve Baskauf) realize that the public review period for DarwinCore (DwC) is over, but we believe we need to bring a potentially highly problematic issue to your attention. This issue has been found originally by Steve Baskauf. Essentially, it is an issue that is not very appearant when reading DarwinCore for review, but detected when trying to implement it in combination with other technologies.
Steve describes the problem on: http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC
DarwinCore seems to use dcterms:type in a way that is inconsistent with the DublinCore (DC) recommendations for publication artifacts, which is the way most users of DC are likely to use dcterms:type. Steve pointed out that MRTG's use, which does follow the DC recommendation, is inconsistent with DwC. We believe that this is not a problem of MRTG; the problem equally occurs, e. g., where natural history collections collaborate with the culture and library initative Europeana.eu, which equally uses DublinCore type in the original sense.
DublinCore dcterms:type has an explicit type vocabulary: http://dublincore.org/documents/dcmi-terms/#terms-type whose annotations says: "Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]." This vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/ defines values like StillImage, Sound, MovingImage, Text.
In contrast, the DwC type vocabulary acts on an abstract level of recording occurrences that are independent of physical records. These occurrences can then be vouchered by physical resources like specimens, photos, movies, etc. The actual resources treated in DublinCore are therefore only potential vouchers for a DarwinCore resource. The terms recommended for DublinCore "type" are therefore expected in the DarwinCore "basisOfRecord" property.
We do not mean to imply that there is anything wrong with the DarwinCore perspective. Unfortunately, we believe that DarwinCore cannot coexist with DublinCore data, as long as DarwinCode does not define its own dwc:type/dwc:abstractType property.
Test case: An image showing a taxon observation shall be documented both in DarwinCore and DublinCore.
DarwinCore prescribes or recommends dcterms:type=Occurrence, plus: basisOfRecord:StillImage.
DublinCore recommends dcterms:type=StillImage.
We have internally begun to discuss possible solutions. In DublinCore, dcterms:type does not express a particular type of metadata record, but is metadata about the underlying resource. We therefore consider replacing the DwC use of dcterms:type with something in the dwc namespace, and replacing dwc:basisOfRecord with dcterms:type as an option that minimizes the necessary design changes in DwC. We can see some other issues arise that depend on how one tries to bring DwC into closer coherence with the DublinCore recommendations, but perhaps these are best put forth on a wiki.
Here we would like only to point out that we believe that the values for basisOfRecord fit into the dcterm:type vocabulary. Observations (dwc:HumanObservation and dwc:MachineObservation) may be placed as subtypes of http://purl.org/dc/dcmitype/Event. and specimens (dwc:PreservedSpecimen, dwc:FossilSpecimen, dwc:LivingSpecimen) as subtypes of http://purl.org/dc/dcmitype/PhysicalObject. For different communities, the dwc specimen types may have to be further subtyped as "Seed", "TissueSample", "DNA_Sample".
However, we believe it is not possible to create a hierarchy like "StillImage - isSubtypeOf - Image - isSubtypeOf - Occurrence" because this is a use-case dependent view: A character image may be a subtype of a taxon representation, and it may or may not be a subtype of an occurrence representation.
Best wishes
Gregor, Bob, Steve
--
Dr. Gregor Hagedorn Heinrich-Seidel-Str. 2 12167 Berlin skype: g.hagedorn
This message is sent on a personal basis and does not constitute an activity of the German Federal Government or its research institutions. Together with any attachments, this message is intended only for the person to whom it is addressed and may not be redistributed or published without permission.
I agree this should be changed as soon as possible. Do I understand your proposal John correctly to have:
dcterms:type uncontrolled by darwin core and expected to be StillImage, HumanObservation, etc. The regular DC vocabulary or the classic dwc basis of record?
and still keep dwc:basisOfRecord with dwc:Occurrence, dwc:Taxon, etc?
If so, basisOfRecord does not seem intuitive for those kind of "classes" to me and I would prefer to either drop it altogether or define a new term instead - fully aware that this is opening up a can of worms again. Personally I think I would prefer to simply drop basisOfRecord. What are the downsides of this?
Markus
On Oct 23, 2009, at 18:48, John R. WIECZOREK wrote:
Gregor and others,
Thanks for this very clear explanation of the problem. There were good reasons for defining dcterms:type an basisOfRecord the way they are published, but the reasons did not account for this serious conflict with the use of dcterms:type. This must fixed, and right away. It is a pity it wasn't caught earlier, but a good thing it was caught early.
I think the most expedient solution is to switch the usage of dcterms:type and basisOfRecord, swapping there definitions and the type vocabulary control recommendations. That solution only gives me one real hesitation, which is that I believe there will be an explosion of new subtypes in our community, and the maintenance of them in the standard will be come onerous. This was the primary reason for having basisOfRecord not formally controlled by a type vocabulary
- because it would be changing all of the time.
So, a second solution is to remove the formal type vocabulary from Darwin Core altogether in combination with the switch in term definitions suggested above. In doing so, the expansion of type usage would be a matter of vocabulary control outside of the normative standard and much easier to maintain.
Thanks again for exposing this issue. Let's find a reasonable solution to this as quickly as possible. I favor the first solution, for now, as it will have the least impact on the standard. If that is agreeable, I'll go through the prescribed process for changes to the standard to get it implemented. Namely:
"3.3. Semantic changes in Darwin Core terms
"Changes of definitions within Darwin Core recommendations and/or Darwin Core term declarations will be reflected in the affected Darwin Core recommendation and/or Darwin Core term declaration. Semantic changes of this type will undergo a request for comments, and will result in a decision [DECISIONS]. If, in the judgment of the TDWG Architecture Group, such changes of meaning are likely to have substantial impact on either machine processing of Darwin Core terms or the functional semantics of the terms, then these changes will be reflected in a change of URI for the Darwin Core term or terms in question. The URIs for any new Darwin Core namespaces resulting from such changes will conform to the Darwin Core namespace URI pattern defined above.
"Requests for semantic changes to a term should be made to the Technical Architecture Group [DWC-USAGE], and should consist of a complete list of attributes to be changed along with a statement of justification for the changes."
Let's consider this to be the beginning of the request for comments.
John
On Fri, Oct 23, 2009 at 2:38 AM, Gregor Hagedorn <g.m.hagedorn@gmail.com
wrote: Dear John
we (Gregor Hagedorn, Bob Morris, Steve Baskauf) realize that the public review period for DarwinCore (DwC) is over, but we believe we need to bring a potentially highly problematic issue to your attention. This issue has been found originally by Steve Baskauf. Essentially, it is an issue that is not very appearant when reading DarwinCore for review, but detected when trying to implement it in combination with other technologies.
Steve describes the problem on: http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC
DarwinCore seems to use dcterms:type in a way that is inconsistent with the DublinCore (DC) recommendations for publication artifacts, which is the way most users of DC are likely to use dcterms:type. Steve pointed out that MRTG's use, which does follow the DC recommendation, is inconsistent with DwC. We believe that this is not a problem of MRTG; the problem equally occurs, e. g., where natural history collections collaborate with the culture and library initative Europeana.eu, which equally uses DublinCore type in the original sense.
DublinCore dcterms:type has an explicit type vocabulary: http://dublincore.org/documents/dcmi-terms/#terms-type whose annotations says: "Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]." This vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/ defines values like StillImage, Sound, MovingImage, Text.
In contrast, the DwC type vocabulary acts on an abstract level of recording occurrences that are independent of physical records. These occurrences can then be vouchered by physical resources like specimens, photos, movies, etc. The actual resources treated in DublinCore are therefore only potential vouchers for a DarwinCore resource. The terms recommended for DublinCore "type" are therefore expected in the DarwinCore "basisOfRecord" property.
We do not mean to imply that there is anything wrong with the DarwinCore perspective. Unfortunately, we believe that DarwinCore cannot coexist with DublinCore data, as long as DarwinCode does not define its own dwc:type/dwc:abstractType property.
Test case: An image showing a taxon observation shall be documented both in DarwinCore and DublinCore.
DarwinCore prescribes or recommends dcterms:type=Occurrence, plus: basisOfRecord:StillImage.
DublinCore recommends dcterms:type=StillImage.
We have internally begun to discuss possible solutions. In DublinCore, dcterms:type does not express a particular type of metadata record, but is metadata about the underlying resource. We therefore consider replacing the DwC use of dcterms:type with something in the dwc namespace, and replacing dwc:basisOfRecord with dcterms:type as an option that minimizes the necessary design changes in DwC. We can see some other issues arise that depend on how one tries to bring DwC into closer coherence with the DublinCore recommendations, but perhaps these are best put forth on a wiki.
Here we would like only to point out that we believe that the values for basisOfRecord fit into the dcterm:type vocabulary. Observations (dwc:HumanObservation and dwc:MachineObservation) may be placed as subtypes of http://purl.org/dc/dcmitype/Event. and specimens (dwc:PreservedSpecimen, dwc:FossilSpecimen, dwc:LivingSpecimen) as subtypes of http://purl.org/dc/dcmitype/PhysicalObject. For different communities, the dwc specimen types may have to be further subtyped as "Seed", "TissueSample", "DNA_Sample".
However, we believe it is not possible to create a hierarchy like "StillImage - isSubtypeOf - Image - isSubtypeOf - Occurrence" because this is a use-case dependent view: A character image may be a subtype of a taxon representation, and it may or may not be a subtype of an occurrence representation.
Best wishes
Gregor, Bob, Steve
--
Dr. Gregor Hagedorn Heinrich-Seidel-Str. 2 12167 Berlin skype: g.hagedorn
This message is sent on a personal basis and does not constitute an activity of the German Federal Government or its research institutions. Together with any attachments, this message is intended only for the person to whom it is addressed and may not be redistributed or published without permission.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
The downside is that we would have no idea what "Class" of record (Taxon, Occurrence, Location) a record is. That was the reason for adding dcterms:type to begin with. basisOfRecord may not be the best name for the term we need for the record class, and it has a history of being associated with the StillImage, PreservedSpecimen, etc.
How about we retain basisOfRecord, but have it refine dcterms:type, drop dcterms:type and add a "recordClass" term in its place that is governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
On Fri, Oct 23, 2009 at 10:26 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
I agree this should be changed as soon as possible. Do I understand your proposal John correctly to have:
dcterms:type uncontrolled by darwin core and expected to be StillImage, HumanObservation, etc. The regular DC vocabulary or the classic dwc basis of record?
and still keep dwc:basisOfRecord with dwc:Occurrence, dwc:Taxon, etc?
If so, basisOfRecord does not seem intuitive for those kind of "classes" to me and I would prefer to either drop it altogether or define a new term instead - fully aware that this is opening up a can of worms again. Personally I think I would prefer to simply drop basisOfRecord. What are the downsides of this?
Markus
On Oct 23, 2009, at 18:48, John R. WIECZOREK wrote:
Gregor and others,
Thanks for this very clear explanation of the problem. There were good reasons for defining dcterms:type an basisOfRecord the way they are published, but the reasons did not account for this serious conflict with the use of dcterms:type. This must fixed, and right away. It is a pity it wasn't caught earlier, but a good thing it was caught early.
I think the most expedient solution is to switch the usage of dcterms:type and basisOfRecord, swapping there definitions and the type vocabulary control recommendations. That solution only gives me one real hesitation, which is that I believe there will be an explosion of new subtypes in our community, and the maintenance of them in the standard will be come onerous. This was the primary reason for having basisOfRecord not formally controlled by a type vocabulary
- because it would be changing all of the time.
So, a second solution is to remove the formal type vocabulary from Darwin Core altogether in combination with the switch in term definitions suggested above. In doing so, the expansion of type usage would be a matter of vocabulary control outside of the normative standard and much easier to maintain.
Thanks again for exposing this issue. Let's find a reasonable solution to this as quickly as possible. I favor the first solution, for now, as it will have the least impact on the standard. If that is agreeable, I'll go through the prescribed process for changes to the standard to get it implemented. Namely:
"3.3. Semantic changes in Darwin Core terms
"Changes of definitions within Darwin Core recommendations and/or Darwin Core term declarations will be reflected in the affected Darwin Core recommendation and/or Darwin Core term declaration. Semantic changes of this type will undergo a request for comments, and will result in a decision [DECISIONS]. If, in the judgment of the TDWG Architecture Group, such changes of meaning are likely to have substantial impact on either machine processing of Darwin Core terms or the functional semantics of the terms, then these changes will be reflected in a change of URI for the Darwin Core term or terms in question. The URIs for any new Darwin Core namespaces resulting from such changes will conform to the Darwin Core namespace URI pattern defined above.
"Requests for semantic changes to a term should be made to the Technical Architecture Group [DWC-USAGE], and should consist of a complete list of attributes to be changed along with a statement of justification for the changes."
Let's consider this to be the beginning of the request for comments.
John
On Fri, Oct 23, 2009 at 2:38 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Dear John
we (Gregor Hagedorn, Bob Morris, Steve Baskauf) realize that the public review period for DarwinCore (DwC) is over, but we believe we need to bring a potentially highly problematic issue to your attention. This issue has been found originally by Steve Baskauf. Essentially, it is an issue that is not very appearant when reading DarwinCore for review, but detected when trying to implement it in combination with other technologies.
Steve describes the problem on: http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC
DarwinCore seems to use dcterms:type in a way that is inconsistent with the DublinCore (DC) recommendations for publication artifacts, which is the way most users of DC are likely to use dcterms:type. Steve pointed out that MRTG's use, which does follow the DC recommendation, is inconsistent with DwC. We believe that this is not a problem of MRTG; the problem equally occurs, e. g., where natural history collections collaborate with the culture and library initative Europeana.eu, which equally uses DublinCore type in the original sense.
DublinCore dcterms:type has an explicit type vocabulary: http://dublincore.org/documents/dcmi-terms/#terms-type whose annotations says: "Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]." This vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/ defines values like StillImage, Sound, MovingImage, Text.
In contrast, the DwC type vocabulary acts on an abstract level of recording occurrences that are independent of physical records. These occurrences can then be vouchered by physical resources like specimens, photos, movies, etc. The actual resources treated in DublinCore are therefore only potential vouchers for a DarwinCore resource. The terms recommended for DublinCore "type" are therefore expected in the DarwinCore "basisOfRecord" property.
We do not mean to imply that there is anything wrong with the DarwinCore perspective. Unfortunately, we believe that DarwinCore cannot coexist with DublinCore data, as long as DarwinCode does not define its own dwc:type/dwc:abstractType property.
Test case: An image showing a taxon observation shall be documented both in DarwinCore and DublinCore.
DarwinCore prescribes or recommends dcterms:type=Occurrence, plus: basisOfRecord:StillImage.
DublinCore recommends dcterms:type=StillImage.
We have internally begun to discuss possible solutions. In DublinCore, dcterms:type does not express a particular type of metadata record, but is metadata about the underlying resource. We therefore consider replacing the DwC use of dcterms:type with something in the dwc namespace, and replacing dwc:basisOfRecord with dcterms:type as an option that minimizes the necessary design changes in DwC. We can see some other issues arise that depend on how one tries to bring DwC into closer coherence with the DublinCore recommendations, but perhaps these are best put forth on a wiki.
Here we would like only to point out that we believe that the values for basisOfRecord fit into the dcterm:type vocabulary. Observations (dwc:HumanObservation and dwc:MachineObservation) may be placed as subtypes of http://purl.org/dc/dcmitype/Event. and specimens (dwc:PreservedSpecimen, dwc:FossilSpecimen, dwc:LivingSpecimen) as subtypes of http://purl.org/dc/dcmitype/PhysicalObject. For different communities, the dwc specimen types may have to be further subtyped as "Seed", "TissueSample", "DNA_Sample".
However, we believe it is not possible to create a hierarchy like "StillImage - isSubtypeOf - Image - isSubtypeOf - Occurrence" because this is a use-case dependent view: A character image may be a subtype of a taxon representation, and it may or may not be a subtype of an occurrence representation.
Best wishes
Gregor, Bob, Steve
--
Dr. Gregor Hagedorn Heinrich-Seidel-Str. 2 12167 Berlin skype: g.hagedorn
This message is sent on a personal basis and does not constitute an activity of the German Federal Government or its research institutions. Together with any attachments, this message is intended only for the person to whom it is addressed and may not be redistributed or published without permission.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
How about we retain basisOfRecord, but have it refine dcterms:type, drop dcterms:type and add a "recordClass" term in its place that is governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might desire to have only the strict resource type vocabulary in dcterms:type. Then this could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as your proposed dwc:subtype, so we should keep basisOfRecord.
Net solution:
1) keep dcterms:type 2) use DCType vocabulary to control dcterms:type (so, StillImage, PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen, HumanObservation, etc.) as the controlled vocabulary for basisOfRecord without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass term instead of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain, making it very easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine dcterms:type, drop dcterms:type and add a "recordClass" term in its place that is governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might desire to have only the strict resource type vocabulary in dcterms:type. Then this could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
Yes! For one thing, the "request for comments" hasn't even been open for four hours and we are already suggesting final solutions. I've been at work and have hardly had time to read the emails yet, let alone think them through. There may be people on the other side of the world who haven't even got out of bed yet.
I have some serious concerns, but I will need to have time to drive home, read carefully and formulate a reply. Slow down! Steve Baskauf
John R. WIECZOREK wrote:
Any objections?
John
My apologies to John Wieczorek for the panicked tone of my previous email to the list. It appeared to me (apparently incorrectly) that the issue was being closed without addressing the concern that I raised when I initially brought this up in my post to the MRTG wiki (http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC). I have the following general concerns about what is being proposed. I will follow with a particular use-case comment in a separate email.
*Hierarchy clarification needed. * If I am understanding the proposal as John has summarized it there would be three terms that could apply to a resource with metadata subject to DwC. A new DwC term /recordClass/ has with controlled values corresponding to the TDWG classes: "Occurrence", "Event", "Location", "Taxon". /dcterms:type/ is another term which could theoretically have DCMI Type values of: "Collection", "Dataset", "Event", "Image", "InteractiveResource", "MovingImage", "PhysicalObject", "Service", "Software", "Sound", "StillImage", or "Text" (although not all may be appropriate in the DwC context). It is not clear to me in John's proposal whether the assignment of /dcterms:type/ is intended to be independent of the value of /recordClass/, or if /dcterms:type/ is intended to be a subclass of /recordClass/ Occurrence (i.e. applicable only to resources that represent things that can document Occurrences such as Images and PhysicalObjects). In Gregor's email initiating this discussion he indicated that he felt that they should be independent. But in John's proposal, the third term: /basisOfRecord/ is clearly intended to be a subclass of /dcterms:type/ with possible values of "StillImage", "MovingImage", "Sound", "PreservedSpecimen", FossilSpecimen", LivingSpecimen", "HumanObservation", "MachineObservation". Since all of these /basisOfRecord/ objects are the bases for documenting Occurrences, that insinuates that their parent /dcterms:type/ terms should fall under /recordClass/ Occurrence.
*Problems with calling /basisOfRecord/ a subclass of /dcterms:type/. *PreservedSpecimen, FossilSpecimen, and LivingSpecimen can clearly fall under PhysicalObject, but what do we do with the rest? /basisOfRecord /terms StillImage and MovingImage could be subtypes of /dcterms:type/ Image, but what about a Sound? Is /basisOfRecord /Sound a subtype of /dcterms:type/ Sound? What about a 35mm slide picturing an organism? It is an Image, but is also a PhysicalObject. Gregor suggested that /basisOfRecord /HumanObservation and /basisOfRecord /MachineObservation could be subtypes of /dcterms:type /Event but I disagree. Just like observations, StillImages and PreservedSpecimens have Event information associated with them (the time and location of their creation) but we don't classify them as Events. An Event is a conceptually different thing from the resource that is created at that Event.
*How does this change facilitate machine processing? * In John's request for comments, he quoted "3.3. Semantic changes in Darwin Core terms" which mentioned "if ... such changes of meaning are likely to have substantial impact on either machine processing of Darwin Core terms... " It is not at all clear to me how the proposed reorganization of these three terms (/recordClass/, /dcterms:type,/ and /basisOfRecord/) will facilitate machine processing, in particular because of the problems in associating particular /basisOfRecord/ terms with /dcterms:type/ terms as I discussed in the previous paragraph. From a machine-processing standpoint, it makes a lot more sense to subclass /recordClass/ Occurrence as follows: *PhysicalObject *(including /BasisOfRecord/ "PreservedSpecimen", FossilSpecimen", LivingSpecimen", and any other relevant material objects such as Seeds, FilmImage, etc.) *DigitalObject *(including /BasisOfRecord/ "StillImage", "MovingImage", "Sound", and any other relevant file-representable objects such as DnaSequences) *NoObject *(including /BasisOfRecord/ "HumanObservation", "MachineObservation") The consuming application that is receiving the metadata can then know that if the record involves an Occurrence, if the record's subclass is: - PhysicalObject then there is an object somewhere that is not deliverable through the Internet but which could be visited in an herbarium or museum. - DigitalObject then there is a representational file that should be retrieved and presented to the user of the application in an appropriate way. - NoObject then there will only be metadata including measurements but nothing for the user to see, hear, etc. Alternatively, rather than creating the three subclasses, simply create a property for resources of the class Occurrence called "objectType" and allow it to have values of PhysicalObject, DigitalObject, or NoObject. This would serve the same purpose of informing the consuming application of the nature of the resource without having to create another hierarchical layer. From a machine-processing standpoint, I don't see a great benefit to presenting a biodiversity-related consuming application with both /dcterms:type,/ and /basisOfRecord./
*An alternative. *To me, trying to merge the /dcterms:type/ and its DCMI Type values together with the new DwC /recordClass/ and /BasisOfRecord/ is like trying to fit a square peg in a round hole. DCMI wasn't created with biodiversity records in mind and its vocabulary really doesn't mesh very well with DwC. I fully support the idea of creating the new DwC term /recordClass/ to contain what DwC formerly put in /dcterms:type/. However, I think it would just be better to leave the actual /dcterms:type/ and its DCMI Type values as an independent thing without trying to make DwC /basisOfRecord/ its subtype. Biodiversity media providers who need for their databases to mesh with non-biodiversity records should assign /dcterms:type/ values to their records, non-media providers can if they want. Biodiversity data providers (including those who provide media) should assign /recordClass/ and /BasisOfRecord/ values to their records.
Steve Baskauf
John R. WIECZOREK wrote:
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as your proposed dwc:subtype, so we should keep basisOfRecord.
Net solution:
- keep dcterms:type
- use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen, HumanObservation, etc.) as the controlled vocabulary for basisOfRecord without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass term instead of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain, making it very easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine dcterms:type, drop dcterms:type and add a "recordClass" term in its place that is governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might desire to have only the strict resource type vocabulary in dcterms:type. Then this could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
.
It seems to me that there is an underlying issue that makes some of the DwC typing mechanisms difficult to apply to multimedia---at least in the breadth the MRTG means to approach it--is that DwC is heavily slanted towards documenting organisms as opposed to documenting descriptions. Gregor's (and my) favorite examples are pictures meant to illustrate a character and its states. It's possible, but likely pointless, to document a picture such as http://bit.ly/pottedTomato as only, or even primarily, something about the particular organism photographed. The photograph was (speculatively) taken to illustrate the concept of compound leaf for use in the Morphster ontobrowser. Even if its original purpose \were/ to document, say, an Occurrence, MRTG attempts to provide assistance in determining, without fetching the media, a resource's fitness-for-use for some use perhaps unknown or of no interest to the originator of the image. To support this, a third-party might be motivated to create a MRTG or even a DwC record as an annotation of the original resource record. Such a new record must sometimes not be bound by any semantics that tie it to that particular potted tomato plant, or the time and place the picture was taken.
This particular example might be addressed by adding, e.g. CharacterIllustration, to the DwC-specific basis of record vocabulary. That might be a good idea, but it does not fully address my worry, which is how scalable is DwC to concerns wider than documenting occurrences. Even if such scalability is to be principally the addition of recordClass terms for specific uses, it would be good if one can examine whether such extensions have unintended consequences.
Unbridled class extension has a dark side: it's easy to introduce inconsistencies and circularities.
Bob Morris
On Fri, Oct 23, 2009 at 4:20 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as your proposed dwc:subtype, so we should keep basisOfRecord.
Net solution:
- keep dcterms:type
- use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen, HumanObservation, etc.) as the controlled vocabulary for basisOfRecord without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass term instead of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain, making it very easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine dcterms:type, drop dcterms:type and add a "recordClass" term in its place that is governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might desire to have only the strict resource type vocabulary in dcterms:type. Then this could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
I've wrestled with similar issues; namely collections of images that span from obvious occurrence records to illustrative images of the sort that Bob shows, to diagrams of particular specimens, to abstract diagrams of no specimen in particular.
My concern about Bob's CharacterIllustration BoR is that this is non-mutually exclusive to others. For example, a StillImage in-situ could represent both a geographic occurrence and a representation of a particular morphological character. How to represent such cases: two separate records?
My gut feeling is that we need to separate records that represent an occurrence, from records that represent the evidence documenting the occurrence. Very often we have undewater video of a fish in its habitat, then we collect the specimen, then we take a prepared specimen digital photograph. I assume the appropriate way to represent this through DwC is via three separate occurrence records, each appropriately types, and each cross-referenced to each other. But perhaps there should be only one occurrence record, with three cross-linked "Evidence" records of some sort.
Too early in the morning for me to think this through thoroughly -- just throwing it out there.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Saturday, October 24, 2009 5:50 AM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; Steve Baskauf; Vishwas Chavan (GBIF) Subject: Re: [tdwg-content] Conflict between DarwinCore and DublinCore usageof dcterms:type / basisOfRecord
It seems to me that there is an underlying issue that makes some of the DwC typing mechanisms difficult to apply to multimedia---at least in the breadth the MRTG means to approach it--is that DwC is heavily slanted towards documenting organisms as opposed to documenting descriptions. Gregor's (and my) favorite examples are pictures meant to illustrate a character and its states. It's possible, but likely pointless, to document a picture such as http://bit.ly/pottedTomato as only, or even primarily, something about the particular organism photographed. The photograph was (speculatively) taken to illustrate the concept of compound leaf for use in the Morphster ontobrowser. Even if its original purpose \were/ to document, say, an Occurrence, MRTG attempts to provide assistance in determining, without fetching the media, a resource's fitness-for-use for some use perhaps unknown or of no interest to the originator of the image. To support this, a third-party might be motivated to create a MRTG or even a DwC record as an annotation of the original resource record. Such a new record must sometimes not be bound by any semantics that tie it to that particular potted tomato plant, or the time and place the picture was taken.
This particular example might be addressed by adding, e.g. CharacterIllustration, to the DwC-specific basis of record vocabulary. That might be a good idea, but it does not fully address my worry, which is how scalable is DwC to concerns wider than documenting occurrences. Even if such scalability is to be principally the addition of recordClass terms for specific uses, it would be good if one can examine whether such extensions have unintended consequences.
Unbridled class extension has a dark side: it's easy to introduce inconsistencies and circularities.
Bob Morris
On Fri, Oct 23, 2009 at 4:20 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as
your proposed
dwc:subtype, so we should keep basisOfRecord.
Net solution:
- keep dcterms:type
- use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen,
FossilSpecimen,
HumanObservation, etc.) as the controlled vocabulary for
basisOfRecord
without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass
term instead
of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain,
making it very
easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine
dcterms:type,
drop dcterms:type and add a "recordClass" term in its
place that is
governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might
desire to have
only the strict resource type vocabulary in dcterms:type.
Then this
could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially,
MRTG intends
to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
-- Robert A. Morris Professor of Computer Science (nominally retired) UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herberia email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Your examples step on the dark side of class extension that I mentioned, and are a special case of my main concern in that posting, which is that one must think of unintended consequences of any proposal to fix the agreed-upon problem.
Off-topic, but perhaps an answer to your question about multiple records: one model is what we hope will emerge at the Annotations Working Session in Montpellier. With sufficient cyberinfrastructure for annotations, it shouldn't matter what is the original purpose (or even the original metadata or content schema) of a digital record. A set of annotations against an available record or set of records should form a kind of discoverable view of the original record(s). Those annotations might even be machine generated, as might be the case for, e.g., error corrections such as addressing name mis-spellings, or translations from one metadata schema to another. Nevertheless, the payload of annotations will be against \some/ metadata schema, and DwC is likely to be a common one, so the problem of this thread is not irrelevant to real implementations of annotation mechanisms.
Bob
On Sat, Oct 24, 2009 at 12:48 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
I've wrestled with similar issues; namely collections of images that span from obvious occurrence records to illustrative images of the sort that Bob shows, to diagrams of particular specimens, to abstract diagrams of no specimen in particular.
My concern about Bob's CharacterIllustration BoR is that this is non-mutually exclusive to others. For example, a StillImage in-situ could represent both a geographic occurrence and a representation of a particular morphological character. How to represent such cases: two separate records?
My gut feeling is that we need to separate records that represent an occurrence, from records that represent the evidence documenting the occurrence. Very often we have undewater video of a fish in its habitat, then we collect the specimen, then we take a prepared specimen digital photograph. I assume the appropriate way to represent this through DwC is via three separate occurrence records, each appropriately types, and each cross-referenced to each other. But perhaps there should be only one occurrence record, with three cross-linked "Evidence" records of some sort.
Too early in the morning for me to think this through thoroughly -- just throwing it out there.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Saturday, October 24, 2009 5:50 AM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; Steve Baskauf; Vishwas Chavan (GBIF) Subject: Re: [tdwg-content] Conflict between DarwinCore and DublinCore usageof dcterms:type / basisOfRecord
It seems to me that there is an underlying issue that makes some of the DwC typing mechanisms difficult to apply to multimedia---at least in the breadth the MRTG means to approach it--is that DwC is heavily slanted towards documenting organisms as opposed to documenting descriptions. Gregor's (and my) favorite examples are pictures meant to illustrate a character and its states. It's possible, but likely pointless, to document a picture such as http://bit.ly/pottedTomato as only, or even primarily, something about the particular organism photographed. The photograph was (speculatively) taken to illustrate the concept of compound leaf for use in the Morphster ontobrowser. Even if its original purpose \were/ to document, say, an Occurrence, MRTG attempts to provide assistance in determining, without fetching the media, a resource's fitness-for-use for some use perhaps unknown or of no interest to the originator of the image. To support this, a third-party might be motivated to create a MRTG or even a DwC record as an annotation of the original resource record. Such a new record must sometimes not be bound by any semantics that tie it to that particular potted tomato plant, or the time and place the picture was taken.
This particular example might be addressed by adding, e.g. CharacterIllustration, to the DwC-specific basis of record vocabulary. That might be a good idea, but it does not fully address my worry, which is how scalable is DwC to concerns wider than documenting occurrences. Even if such scalability is to be principally the addition of recordClass terms for specific uses, it would be good if one can examine whether such extensions have unintended consequences.
Unbridled class extension has a dark side: it's easy to introduce inconsistencies and circularities.
Bob Morris
On Fri, Oct 23, 2009 at 4:20 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as
your proposed
dwc:subtype, so we should keep basisOfRecord.
Net solution:
- keep dcterms:type
- use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen,
FossilSpecimen,
HumanObservation, etc.) as the controlled vocabulary for
basisOfRecord
without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass
term instead
of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain,
making it very
easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine
dcterms:type,
drop dcterms:type and add a "recordClass" term in its
place that is
governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might
desire to have
only the strict resource type vocabulary in dcterms:type.
Then this
could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially,
MRTG intends
to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
-- Robert A. Morris Professor of Computer Science (nominally retired) UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herberia email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
The proposed solution is simpler than people seem to be thinking. From my previous post...
Net solution: 1) keep dcterms:type 2) use DCType vocabulary to control dcterms:type (so, StillImage, PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen, HumanObservation, etc.) as the controlled vocabulary for basisOfRecord without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass term instead of the dcterms:type term.
Net solution more fully explained: 1) and 2). dcterms:type will be used in Darwin Core exactly as in the Dublin Core, with exactly the same controlled vocabulary as in Dublin Core ("Collection", "Dataset", "Event", "Image", "InteractiveResource", "MovingImage", "PhysicalObject", "Service", "Software", "Sound", "StillImage", or "Text").
3) and 4). basisOfRecord will be used in Darwin Core as it is now, without a formal type vocabulary. The recommended controlled vocabulary will continue to be managed outside of the standard as supplementary documentation, as was ratified already. The current recommendations are given at http://code.google.com/p/darwincore/wiki/RecordLevelTerms#basisOfRecord. The values on this list can be used or not, changed or not, or added to without affecting the Darwin Core standard. When I mentioned "some of the terms would go to dcterms:type" in my net solution, above, I was thinking that it would be redundant to keep "StillImage", "MovingImage", and "Sound" on the list of controlled vocabulary for basisOfRecord, as they are already in dcterms:type. Communities would be free to add to the vocabulary to the level of specificity they require. For example, MRTG could dispense with the mrtg:subtype term and use dwc:basisOfRecord instead - adding "Photograph", for example, to the controlled vocabulary list. This is exactly the sort of thing basisOfRecord was always meant for.
5) and 6). Add dwc:recordClass and use the formal DwCType vocabulary (Taxon, Occurrence, Location, Event) to control this term rather than control dcterms:type.
One-liner summaries of actual changes to make: 1) Let dcterms:type comply 100% with Dublin Core 2) Create dwc:recordClass to do what was attempted incorrectly with dcterms:type.
Use cases from Bob: UC-Bob1 * http://bit.ly/AudubonOspreyDescription an original Audubon manuscript describing the Osprey in the Audubon Osprey drawing
UC-Bob2 * http://bit.ly/AudubonOspreyPrint an original of the Audubon Osprey print, for sale at a gallery, or as in a stable Collection
UC-Bob3 * http://bit.ly/AudubonOspreyDigitalImage N.Y. Public Library Digital Image of Audubon Osprey print.
UC-Bob4 * http://www.flickr.com/photos/mikebaird/324182767/ a cc licensed picture on Flickr of an Osprey, georeferenced to named location.
UC-Bob1, UC-Bob2, and UC-Bob3 are not a Darwin Core resources - they can't be made into records of any of the DwCTypes (Taxon, Occurrence, Location, Event). This doesn't mean that DwC terms couldn't be used to describe these resource - you just can't make Darwin Core records out of them.
UC-Bob4 can be made into a Darwin Core Occurrence record having: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
Use cases from Steve: UC-Steve1 * In the example of a live plant image from http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm I would assign image record DwC:recordClass = Occurrence DwC:basisOfRecord = StillImage dcterms:type = StillImage mrtg:subtype = Photograph [Note that DwC:basisOfRecord is not synonymous with mrtg:subtype as it currently stands. Would it have to be under John's proposal?]
UC-Steve2 * In the example of an image of an herbarium sheet shown at http://www.morphbank.net/Show/?pop=Yes&id=142009 I would assign the record for the herbarium sheet itself: DwC:recordClass = Occurrence DwC:basisOfRecord = PreservedSpecimen dcterms:type = PhysicalObject
UC-Steve3 * and for the record of the specimen image: DwC:recordClass = Occurrence DwC:basisOfRecord = StillImage dcterms:type = StillImage mrtg:subtype = Photograph
For UC-Steve1, looking at the URL, I see nothing that suggests that the resource at http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm refers to an Occurrence record, but I suppose it is no different from having a specimen with no location information. Nevertheless, if the resource was being described with DwC terms, I would assign: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
For UC-Steve2 the Alaska museum of the North should have an Occurrence record for that specimen with: dcterms:type = "PhysicalObject" dwc:basisOfRecord = "PreservedSpecimen" dwc:recordClass = "Occurrence"
The image would be a different resource that could be referred to by the Occurrence record via dwc:associatedMedia or through an instance of the dwc:ResourceRelationship class. The image resource could be described by the terms: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
I hope that helps.
John
On Sat, Oct 24, 2009 at 9:48 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
I've wrestled with similar issues; namely collections of images that span from obvious occurrence records to illustrative images of the sort that Bob shows, to diagrams of particular specimens, to abstract diagrams of no specimen in particular.
My concern about Bob's CharacterIllustration BoR is that this is non-mutually exclusive to others. For example, a StillImage in-situ could represent both a geographic occurrence and a representation of a particular morphological character. How to represent such cases: two separate records?
My gut feeling is that we need to separate records that represent an occurrence, from records that represent the evidence documenting the occurrence. Very often we have undewater video of a fish in its habitat, then we collect the specimen, then we take a prepared specimen digital photograph. I assume the appropriate way to represent this through DwC is via three separate occurrence records, each appropriately types, and each cross-referenced to each other. But perhaps there should be only one occurrence record, with three cross-linked "Evidence" records of some sort.
Too early in the morning for me to think this through thoroughly -- just throwing it out there.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Saturday, October 24, 2009 5:50 AM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; Steve Baskauf; Vishwas Chavan (GBIF) Subject: Re: [tdwg-content] Conflict between DarwinCore and DublinCore usageof dcterms:type / basisOfRecord
It seems to me that there is an underlying issue that makes some of the DwC typing mechanisms difficult to apply to multimedia---at least in the breadth the MRTG means to approach it--is that DwC is heavily slanted towards documenting organisms as opposed to documenting descriptions. Gregor's (and my) favorite examples are pictures meant to illustrate a character and its states. It's possible, but likely pointless, to document a picture such as http://bit.ly/pottedTomato as only, or even primarily, something about the particular organism photographed. The photograph was (speculatively) taken to illustrate the concept of compound leaf for use in the Morphster ontobrowser. Even if its original purpose \were/ to document, say, an Occurrence, MRTG attempts to provide assistance in determining, without fetching the media, a resource's fitness-for-use for some use perhaps unknown or of no interest to the originator of the image. To support this, a third-party might be motivated to create a MRTG or even a DwC record as an annotation of the original resource record. Such a new record must sometimes not be bound by any semantics that tie it to that particular potted tomato plant, or the time and place the picture was taken.
This particular example might be addressed by adding, e.g. CharacterIllustration, to the DwC-specific basis of record vocabulary. That might be a good idea, but it does not fully address my worry, which is how scalable is DwC to concerns wider than documenting occurrences. Even if such scalability is to be principally the addition of recordClass terms for specific uses, it would be good if one can examine whether such extensions have unintended consequences.
Unbridled class extension has a dark side: it's easy to introduce inconsistencies and circularities.
Bob Morris
On Fri, Oct 23, 2009 at 4:20 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
Gregor,
That sounds like a good solution to all of the problems. I would propose that the basisOfRecord IS the the same thing as
your proposed
dwc:subtype, so we should keep basisOfRecord.
Net solution:
- keep dcterms:type
- use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.) 3) keep basisOfRecord 4) use our DwC-specific subtypes (PreservedSpecimen,
FossilSpecimen,
HumanObservation, etc.) as the controlled vocabulary for
basisOfRecord
without a formal type vocabulary (very close to how it is now, just some of the terms would go to dcterms:type). 5) add a recordClass term 6) use the DwCType vocabulary to control the recordClass
term instead
of the dcterms:type term.
This solutions fixes the Dublin Core - Darwin Core controlled vocabulary problem, retains all existing terms, isolates the controlled vocabulary that is specific to our domain,
making it very
easy to expand without changes to the standard.
Any objections?
John
On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
How about we retain basisOfRecord, but have it refine
dcterms:type,
drop dcterms:type and add a "recordClass" term in its
place that is
governed exactly as dcterms:type is currently being used in the recently ratified version of the Core?
recordClass for Taxon/Occurrence/Event sounds good.
I am less sure about keeping the "perspective-dependent" basisOfRecord-term in a place where dcterms:type. The dcterms:type vocabulary is, in principle, extensible, and meant to be extended. Except, of course, some specific xml-implementation of dublin core prevent this... To avoid problems with this one might
desire to have
only the strict resource type vocabulary in dcterms:type.
Then this
could by PhysicalObject/Event and a dwc:subtype added to express PreservedSpecimen/MachineObservation etc. Essentially,
MRTG intends
to use such a mrtg:subtype as well to differentiate different StillImage or Text subtypes: http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
This would then mean, DarwinCore might support: dwc:recordClass dcterms:type dwc:subtype
Gregor
-- Robert A. Morris Professor of Computer Science (nominally retired) UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herberia email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Comments inline.
John R. WIECZOREK wrote:
One-liner summaries of actual changes to make:
- Let dcterms:type comply 100% with Dublin Core
- Create dwc:recordClass to do what was attempted incorrectly with
dcterms:type.
OK, I think this is fine if basisOfRecord is not supposed to be a subclass of dcterms:type. Then is basisOrRecord considered to be a subclass of the new dwc:recordClass that replaces the former DwC use of dcterms:type as would be inferred from the "definition" line of http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord ?
Use cases from Steve: UC-Steve1
- In the example of a live plant image from
http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm I would assign image record DwC:recordClass = Occurrence DwC:basisOfRecord = StillImage dcterms:type = StillImage mrtg:subtype = Photograph [Note that DwC:basisOfRecord is not synonymous with mrtg:subtype as it currently stands. Would it have to be under John's proposal?]
...
For UC-Steve1, looking at the URL, I see nothing that suggests that the resource at http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm refers to an Occurrence record, but I suppose it is no different from having a specimen with no location information. Nevertheless, if the resource was being described with DwC terms, I would assign: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
These images are geolocated - I just haven't finished writing the software to display the location information yet. My intention eventually is to make available all metadata that would be available for a specimen AND for an image resource type, hence my interest in both DwC and MRTG.
The image would be a different resource that could be referred to by the Occurrence record via dwc:associatedMedia or through an instance of the dwc:ResourceRelationship class. The image resource could be described by the terms: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
I hope that helps.
Yes, that helps a lot. I agree that they should separate records. I have a comment/question related to the issue of relationships between resources of these types, but since it's also related to another comment as well, I'll attach it to that message.
Steve
More comments inline.
On Sat, Oct 24, 2009 at 12:02 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Comments inline.
John R. WIECZOREK wrote:
One-liner summaries of actual changes to make:
- Let dcterms:type comply 100% with Dublin Core
- Create dwc:recordClass to do what was attempted incorrectly with
dcterms:type.
OK, I think this is fine if basisOfRecord is not supposed to be a subclass of dcterms:type. Then is basisOrRecord considered to be a subclass of the new dwc:recordClass that replaces the former DwC use of dcterms:type as would be inferred from the "definition" line of http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord ?
A few points of information. basisOfRecord is a property, not a class, and so it cannot be a subclass. dcterms:type is also a property, not a class. basisOfRecord has no value for the narrowerThan attribute, which it would need to have if it was a sub-property of anything. It isn't. Nor would I propose that it should be. The definition of basisOfRecord was
"The specific nature of the data record - a subtype of the dcterms:type. Recommended best practice is to use a controlled vocabulary."
The word subtype here has no ontological meaning, it is simply used for human understanding of the relationship between the two terms (properties). Under the changes I've proposed the new definition of basisOfRecord should be
"The specific nature of the data record - a subtype of the recordClass. Recommended best practice is to use a controlled vocabulary."
The following may be confusing and isn't necessary to the resolution of the issue mentioned, but I thought I'd bring it up for the sake of completeness. Continue at your own risk. The DublinCore type vocabulary (Image, StillImage, MovingImage, etc.) is composed of classes, not properties, and some of the classes are subtypes of others (for example, StillImage and MovingImage have the subClassOf attribute rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Image"/). One could create formal vocabulary terms under the Darwin Core type vocabulary for dwctype:PreservedSpecimen, dwctype:FossilSpecimen, dwctype:LivingSpecimen as subclasses of dcmitype:PhysicalObject (rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/), for dwctype:HumanObservation and dwctype:MachineObservation as subclasses of dcmitype:Event (rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Event"/), and for the remaining terms (TaxonDistribution,TaxonName, NomenclaturalAct, and TaxonNameUsage) as entirely new classes. This was actually done in a previous iteration of the Darwin Core before public review. These new classes (the new vocabulary terms) could be used as values of dcterms:type (in XML, for example, by declaring dcterms:type xsi:type=”dwc:DwCType”) and a basisOfRecord term wouldn't be needed. This is what I mean by a formal vocabulary. In the Darwin Core as published this option was abandoned as being too much of a maintenance burden even within our community, as there is a tendency to split and lump and split ad infinitum. Precedent for not taking on more than can be reasonably handled by volunteer labor can be found in Thomas Baker's "Maintaining a vocabulary: Practices, policies, and models around Dublin Core" (http://dcpapers.dublincore.org/ojs/pubs/article/viewFile/765/761), in which the tendency toward simplicity is defended. In face of this challenge it was deemed prudent to use an informal recommended controlled vocabulary for basisOfRecord that could be managed without affecting the standard, and therefore without incurring the process overhead every time someone wants to add new capabilities using Darwin Core. Time will tell us if this was the right course, but for now, it is what we all ratified by not saying anything to the contrary.
Don't say I didn't warn you about reading this far. ;-)
Use cases from Steve: UC-Steve1
- In the example of a live plant image from
http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm I would assign image record DwC:recordClass = Occurrence DwC:basisOfRecord = StillImage dcterms:type = StillImage mrtg:subtype = Photograph [Note that DwC:basisOfRecord is not synonymous with mrtg:subtype as it currently stands. Would it have to be under John's proposal?]
...
For UC-Steve1, looking at the URL, I see nothing that suggests that the resource at http://www.cas.vanderbilt.edu/bioimages/species/frame/oslo.htm refers to an Occurrence record, but I suppose it is no different from having a specimen with no location information. Nevertheless, if the resource was being described with DwC terms, I would assign: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
These images are geolocated - I just haven't finished writing the software to display the location information yet. My intention eventually is to make available all metadata that would be available for a specimen AND for an image resource type, hence my interest in both DwC and MRTG.
The image would be a different resource that could be referred to by the Occurrence record via dwc:associatedMedia or through an instance of the dwc:ResourceRelationship class. The image resource could be described by the terms: dcterms:type = "StillImage" dwc:basisOfRecord = "Photograph" or "DigitalStillImage" or "DigitalPhotograph" or whatever vocabulary you decide. dwc:recordClass = "Occurrence"
I hope that helps.
Yes, that helps a lot. I agree that they should separate records. I have a comment/question related to the issue of relationships between resources of these types, but since it's also related to another comment as well, I'll attach it to that message. Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
With respect to the discussion of subclasses: the new recordType is on a different level than the resource types. We should not mix the information that something can be usefully interpreted as a Occurrence or Taxon concept with the type of resource that vouchers for this information.
Thus, while I think recordType is a DarwinCore categorization of intent, not resource, and is fine, I still feel that the basisOfRecord vocabulary is a subtyping of resource types.
I therefore believe that it would make life simpler for many consumers of DwC if DwC would adopt DublinCore type for its own purposes. Instead of having basisOfRecord = PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation StillImage MovingImage Sound NomenclaturalChecklist
DarwinCore would first use the DublinCore vocabulary: dcterms:type= StillImage MovingImage Sound Event Text
and then use dwc:subtype= PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist
for those subtypes of dcterms:type that DarwinCore cares about to specify further. This would allow consumers to directly map DwC records into their DublinCore metadata, rather than analysing the implied hierarchy and mapping in the flattened basisOfRecord.
Gregor
Can you explain the difference between your new term dwc:subtype and the term dwc:basisOfRecord most recently proposed in this thread?
JOHN R. WIECZOREK wrote (24 Oct 2009 11:29AM):
"basisOfRecord will be used in Darwin Core as it is now, without a formal type vocabulary. The recommended controlled vocabulary will continue to be managed outside of the standard as supplementary documentation, as was ratified already. The current recommendations are given at http://code.google.com/p/darwincore/wiki/RecordLevelTerms#basisOfRecord. The values on this list can be used or not, changed or not, or added to without affecting the Darwin Core standard. When I mentioned "some of the terms would go to dcterms:type" in my net solution, above, I was thinking that it would be redundant to keep "StillImage", "MovingImage", and "Sound" on the list of controlled vocabulary for basisOfRecord, as they are already in dcterms:type. Communities would be free to add to the vocabulary to the level of specificity they require. For example, MRTG could dispense with the mrtg:subtype term and use dwc:basisOfRecord instead - adding "Photograph", for example, to the controlled vocabulary list. This is exactly the sort of thing basisOfRecord was always meant for."
I see no difference bewteen your dwc:subtype and the proposed dwc:basisOfRecord except the name. The term basisOfRecord has been used for this purpose in Darwin Core since 13 Jun 2003. I think precedence should prevail.
On Sun, Oct 25, 2009 at 2:57 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
With respect to the discussion of subclasses: the new recordType is on a different level than the resource types. We should not mix the information that something can be usefully interpreted as a Occurrence or Taxon concept with the type of resource that vouchers for this information.
Thus, while I think recordType is a DarwinCore categorization of intent, not resource, and is fine, I still feel that the basisOfRecord vocabulary is a subtyping of resource types.
I therefore believe that it would make life simpler for many consumers of DwC if DwC would adopt DublinCore type for its own purposes. Instead of having basisOfRecord = PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation StillImage MovingImage Sound NomenclaturalChecklist
DarwinCore would first use the DublinCore vocabulary: dcterms:type= StillImage MovingImage Sound Event Text
and then use dwc:subtype= PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist
for those subtypes of dcterms:type that DarwinCore cares about to specify further. This would allow consumers to directly map DwC records into their DublinCore metadata, rather than analysing the implied hierarchy and mapping in the flattened basisOfRecord.
Gregor
2009/10/25 John R. WIECZOREK tuco@berkeley.edu:
Can you explain the difference between your new term dwc:subtype and the term dwc:basisOfRecord most recently proposed in this thread?
I see no difference bewteen your dwc:subtype and the proposed dwc:basisOfRecord except the name. The term basisOfRecord has been used for this purpose in Darwin Core since 13 Jun 2003. I think precedence should prevail.
Please see my slight preference for the word "subtype" over "basisOfRecord" as a secondary question.
The essence is that I propose to use DublinCore (precendence since 1995 and extremely widely adapted) where it applies.
basisOfRecord is a mixture of DublinCore type terms, and subtypes of DublinCore terms. In the latter case DwC omits the applicable DublinCore resource type vocabulary.
Thus any DC-aware consumer of the data has to do both a mapping of dwc:StillImage to http://purl.org/dc/dcmitype/StillImage and imply that the resource quoted throuh PreservedSpecimen, FossilSpecimen, LivingSpecimen is a http://purl.org/dc/dcmitype/PhysicalObject, that a HumanObservation or MachineObservation must be http://purl.org/dc/dcmitype/Event, and NomenclaturalChecklist a http://purl.org/dc/dcmitype/Text.
Gregor
Gregor wrote:
Thus, while I think recordType is a DarwinCore categorization of intent, not resource, and is fine, I still feel that the basisOfRecord vocabulary is a subtyping of resource types.
I therefore believe that it would make life simpler for many consumers of DwC if DwC would adopt DublinCore type for its own purposes. Instead of having basisOfRecord = PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation StillImage MovingImage Sound NomenclaturalChecklist
DarwinCore would first use the DublinCore vocabulary: dcterms:type= StillImage MovingImage Sound Event PhysicalObject /ADDED, forgotten in previous mail Text
and then use dwc:subtype= PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist
for those subtypes of dcterms:type that DarwinCore cares about to specify further. This would allow consumers to directly map DwC records into their DublinCore metadata, rather than analysing the implied hierarchy and mapping in the flattened basisOfRecord.
Comments inline.
On Sun, Oct 25, 2009 at 11:50 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
2009/10/25 John R. WIECZOREK tuco@berkeley.edu:
Can you explain the difference between your new term dwc:subtype and the term dwc:basisOfRecord most recently proposed in this thread?
I see no difference bewteen your dwc:subtype and the proposed dwc:basisOfRecord except the name. The term basisOfRecord has been used for this purpose in Darwin Core since 13 Jun 2003. I think precedence should prevail.
Please see my slight preference for the word "subtype" over "basisOfRecord" as a secondary question.
The essence is that I propose to use DublinCore (precendence since 1995 and extremely widely adapted) where it applies.
Agreed.
basisOfRecord is a mixture of DublinCore type terms, and subtypes of DublinCore terms. In the latter case DwC omits the applicable DublinCore resource type vocabulary.
I think you missed some messages in the thread. What you say above is true of the Darwin Core as published (basisOfRecord has StillImage, MovingImage, and Sound among the recommendations), but I proposed a solution (24 Oct 11:29AM) in which there is no mixing. What is "the latter case" to which you refer - subtypes of Dublin Core terms defined by Darwin Core? If so, you're correct, Darwin Core recommends a vocabulary of literals, not refinements of Dublin Core Type vocabulary (see my commentary in this thread 24 Oct 11:57PM, beginning with "The following may be confusing...").
Thus any DC-aware consumer of the data has to do both a mapping of dwc:StillImage to http://purl.org/dc/dcmitype/StillImage and imply that the resource quoted throuh PreservedSpecimen, FossilSpecimen, LivingSpecimen is a http://purl.org/dc/dcmitype/PhysicalObject, that a HumanObservation or MachineObservation must be http://purl.org/dc/dcmitype/Event, and NomenclaturalChecklist a http://purl.org/dc/dcmitype/Text.
Again, there would be no dwc:StillImage. Instead dcmitype:StillImage would be a possible value for dcterms:type. There would be no formal declaration of the string literal vocabulary of basisOfRecord, and no refinements of DCMI Type vocabulary for them.
I really think the answer (well, my answer) to your concerns is in that paragraph about "The following may be confusing...", which I'll repeat here for convenience. If I haven't understood your concern, please let me know. If you think I did understand it but you don't like my response (encapsulated below), please propose an alternative.
Thanks,
John
JOHN R WIECZOREK wrote (24 Oct 2009 11:57PM):
"The following may be confusing and isn't necessary to the resolution of the issue mentioned, but I thought I'd bring it up for the sake of completeness. Continue at your own risk. The DublinCore type vocabulary (Image, StillImage, MovingImage, etc.) is composed of classes, not properties, and some of the classes are subtypes of others (for example, StillImage and MovingImage have the subClassOf attribute rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Image"/). One could create formal vocabulary terms under the Darwin Core type vocabulary for dwctype:PreservedSpecimen, dwctype:FossilSpecimen, dwctype:LivingSpecimen as subclasses of dcmitype:PhysicalObject (rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/), for dwctype:HumanObservation and dwctype:MachineObservation as subclasses of dcmitype:Event (rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Event"/), and for the remaining terms (TaxonDistribution,TaxonName, NomenclaturalAct, and TaxonNameUsage) as entirely new classes. This was actually done in a previous iteration of the Darwin Core before public review. These new classes (the new vocabulary terms) could be used as values of dcterms:type (in XML, for example, by declaring dcterms:type xsi:type=”dwc:DwCType”) and a basisOfRecord term wouldn't be needed. This is what I mean by a formal vocabulary. In the Darwin Core as published this option was abandoned as being too much of a maintenance burden even within our community, as there is a tendency to split and lump and split ad infinitum. Precedent for not taking on more than can be reasonably handled by volunteer labor can be found in Thomas Baker's "Maintaining a vocabulary: Practices, policies, and models around Dublin Core" (http://dcpapers.dublincore.org/ojs/pubs/article/viewFile/765/761), in which the tendency toward simplicity is defended. In face of this challenge it was deemed prudent to use an informal recommended controlled vocabulary for basisOfRecord that could be managed without affecting the standard, and therefore without incurring the process overhead every time someone wants to add new capabilities using Darwin Core. Time will tell us if this was the right course, but for now, it is what we all ratified by not saying anything to the contrary."
Gregor
Gregor wrote:
Thus, while I think recordType is a DarwinCore categorization of intent, not resource, and is fine, I still feel that the basisOfRecord vocabulary is a subtyping of resource types.
I therefore believe that it would make life simpler for many consumers of DwC if DwC would adopt DublinCore type for its own purposes. Instead of having basisOfRecord = PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation StillImage MovingImage Sound NomenclaturalChecklist
DarwinCore would first use the DublinCore vocabulary: dcterms:type= StillImage MovingImage Sound Event PhysicalObject /ADDED, forgotten in previous mail Text
and then use dwc:subtype= PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist
for those subtypes of dcterms:type that DarwinCore cares about to specify further. This would allow consumers to directly map DwC records into their DublinCore metadata, rather than analysing the implied hierarchy and mapping in the flattened basisOfRecord.
John, yes, I was confused about the intended solution. I did not understand the text introduced with "The following may be confusing" - it is confusing to me, I don't understand which conclusion you draw from this. But you warned us, so I ignored this, and it probably does not matter here.
With respect to the Oct. 24th mail I see that I was reading too fast, and I based my understanding on the wrong of the two solutions for one of the examples (reading too fast, my fault!). This one is clear:
John writes:
For UC-Steve2 the Alaska museum of the North should have an Occurrence record for that specimen with: dcterms:type = "PhysicalObject" dwc:basisOfRecord = "PreservedSpecimen" dwc:recordClass = "Occurrence"
I cannot find the relation of HumanObservation / Machine Observation to dcterms:type spelled out. Do you propose
dcterms:type = "Event" dwc:basisOfRecord = "HumanObservation" dwc:recordClass = "Occurrence"
for unvouchered (no image) observations, and potentially
dcterms:type = "Dataset" dwc:basisOfRecord = "MachineObservation" dwc:recordClass = "Occurrence"
Sorry for email noise by not reading carefully enough!
Gregor
Hi, late in the discussion, but I would like to shed some light into using darwin core for taxonomic data exchange, not "occurrences". Would the following example make sense?
dc:type=Dataset dwc:basisOfRecord= TaxonName dwc:recordClass=Taxon dwc:scientificName=Abies alba Mill.
dwc:basisOfRecord and dwc:recordClass seems redundant. What is the need to have both? PreservedSpecimen implies its an occurrence and TaxonName implies a Taxon class. Isnt this yet another class hierarchy that is flattened to share different levels of granularity for different clients?
Markus
For UC-Steve2 the Alaska museum of the North should have an Occurrence record for that specimen with: dcterms:type = "PhysicalObject" dwc:basisOfRecord = "PreservedSpecimen" dwc:recordClass = "Occurrence"
Comments inline.
On Mon, Oct 26, 2009 at 7:53 AM, Markus Döring m.doering@mac.com wrote:
Hi, late in the discussion, but I would like to shed some light into using darwin core for taxonomic data exchange, not "occurrences". Would the following example make sense?
dc:type=Dataset dwc:basisOfRecord= TaxonName dwc:recordClass=Taxon dwc:scientificName=Abies alba Mill.
Yes, that would make sense.
dwc:basisOfRecord and dwc:recordClass seems redundant. What is the need to have both? PreservedSpecimen implies its an occurrence and TaxonName implies a Taxon class. Isnt this yet another class hierarchy that is flattened to share different levels of granularity for different clients?
Yes, it is. Is there harm in that? We do it all over the place. Keeping this one means we don't have to imply anything, we can get it explicitly. The alternative is a formal type vocabulary for what we now have as string literals in the recommended vocabulary of basisOfRecord. In such a type vocabulary (already developed once and discarded as a standards maintenance issue, or as stepping too far into TDWG Ontology work) a term (not a string literal) dwctype:PreservedSpecimen would be a refinement of dcmitype:PhysicalObject, and a term dcmitype:TaxonName would be a refinement of dcmitype:Dataset ().
Here are some relevant rdf snippets.
From StillImage, which is a refinement of Image:
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/%22/%3E <dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType%22/%3E <rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Image%22/%3E
For the would-be PreservedSpecimen: <rdfs:isDefinedBy rdf:resource="http://rs.tdwg.org/dwc/dwctype/%22/%3E <dcam:memberOf rdf:resource="http://rs.tdwg.org/dwc/dwctype/%22/%3E <rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject%22/%3E
For the would-be TaxonName: <rdfs:isDefinedBy rdf:resource="http://rs.tdwg.org/dwc/dwctype/%22/%3E <dcam:memberOf rdf:resource="http://rs.tdwg.org/dwc/dwctype/%22/%3E <rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Dataset%22/%3E
Markus
For UC-Steve2 the Alaska museum of the North should have an Occurrence record for that specimen with: dcterms:type = "PhysicalObject" dwc:basisOfRecord = "PreservedSpecimen" dwc:recordClass = "Occurrence"
John commented on Markus' question:
Would the following example make sense?
dc:type=Dataset dwc:basisOfRecord= TaxonName dwc:recordClass=Taxon dwc:scientificName=Abies alba Mill.
Yes, that would make sense.
I can't remember where, maybe in one of Rich's examples, I thought I saw the basisOfRecord for a taxonName designated as: "NomenclaturalAct". I thought that was both correct and precise. Similarly, I think the basis of a taxon record should be a "TaxonomicAct", i.e., a published description or reclassification.
Agreement or objections?
-Stan
I can't remember where, maybe in one of Rich's examples, I thought I saw the basisOfRecord for a taxonName designated as: "NomenclaturalAct". I thought that was both correct and precise. Similarly, I think the basis of a taxon record should be a "TaxonomicAct", i.e., a published description or reclassification.
Well...."NomenclaturalAct" is certainly in my vocabulary; but the overarching term I use (and likely would have used in my posted examples) is "TaxonNameUsage". Anytime a human uses a taxon name (and, really: who else besides humans use taxon names?), it's a TaxonNameUsage instance. Taken to the extreme, this includes casual/ephemeral conversations and other such utterances. However, in the realm of biodiversity informatics, I generally confine it to "documented" TaxonNameUsage instances. Although "documented" can be very broadly defined (including, e.g., correspondence, field notes, specimen labels and other single-copy documents), in our world it is largely dominated by publications and publication-like documentation sources.
TaxonNameUsage instances can "carry" things like Nomenclatural Acts and Taxon Concept circumscriptions, but I'm not sure if, ontologically speaking, it's appropriate to think of these as subtypes. They are certainly *not* mutually exclusive -- indeed, essentially every TaxonNameUsage that contains a nomenclatural act also represents a taxon concept/circumscription. However, there are many, many, many TNUs that represent Taxon Concept circumscriptions, but do not carry Nomenclatural Acts. And then there are TNUs that are neither: things like published type catalogs which certainly "use" taxon names (hence: TNU), but neither carry a Nomenclatural Act nor imply, assert or represent a Taxon Concept.
Anyway....I'm not entirely sure I would populate basisOfRecord with any of these terms for records representing recordClass=Taxon. But maybe I would -- I don't know. In part, I guess, it depends on where this particular discussion ends up.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
I can't remember where, maybe in one of Rich's examples, I thought I saw the basisOfRecord for a taxonName designated as: "NomenclaturalAct". I thought that was both correct and precise. Similarly, I think the basis of a taxon record should be a "TaxonomicAct", i.e., a published description or reclassification.
I would favor it, because keeping recordClass versus resource type better separated. "NomenclaturalAct", "TaxonomicAct" would be dcterms:type =event, for unpublished acts or dcterms:type=text for published acts. In fact in this case, the dcterms:type would no longer be redundant.
Gregor
So....how does one represent a record that is both a NomenclaturalAct *and* a TaxonomicAct at the same time (as I said, virtually all of the former also constitute the latter)? Perhaps this is the solution that I've been looking for a while now -- that is, the basisOfRecord in this case is not really the "basis of the record" (I would describe the basis of the record as a TaxonNameUsage); but rather represents something more like "basis of representation". That is, if a single TaxonNameUsage instance both carries a NomenclaturalAct and represent a TaxonomicAct, then the basisOfRecord could distinguish which of the two "things" that the specific record is intended to represent. If basisOfRecord=NomenclaturalAct, then metadata elements would include all the nomenclatural bits associated with the record (e.g., various Code-governed events, etc.). If basisOfRecord=TaxonomicAct, then the metadata elements would include things like classification, synonymy, included non-name-bearing specimens, etc. In other words, the "thing" is the same in both cases (i.e., a TaxonNameUsage instance), but the difference would be which aspect of that thing the record is intended to represent.
I suspect strongly that the preceding paragraph makes almost no sense whatsoever to anyone other than me (and I'm not even sure I understand it).
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Gregor Hagedorn Sent: Saturday, October 31, 2009 10:44 PM To: Blum, Stan Cc: tdwg-content@lists.tdwg.org; Vishwas Chavan (GBIF); Steve Baskauf Subject: Re: [tdwg-content] Conflict between DarwinCore andDublinCoreusageof dcterms:type / basisOfRecord
I can't remember where, maybe in one of Rich's examples, I
thought I saw the basisOfRecord for a taxonName designated as: "NomenclaturalAct". I thought that was both correct and precise. Similarly, I think the basis of a taxon record should be a "TaxonomicAct", i.e., a published description or reclassification.
I would favor it, because keeping recordClass versus resource type better separated. "NomenclaturalAct", "TaxonomicAct" would be dcterms:type =event, for unpublished acts or dcterms:type=text for published acts. In fact in this case, the dcterms:type would no longer be redundant.
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
In my last post, I wrote:
That is, if a single TaxonNameUsage instance both carries a NomenclaturalAct and represents a TaxonomicAct, then the basisOfRecord could distinguish which of the two "things" that the specific record is intended to represent.
That sentence should have ended with: "...that the specific *instance* of the record is intended to represent."
Aloha, Rich
I can easily imagine similar conversations occurring every time someone wants to share some new kind of information with Darwin Core. It is a perfect example to illustrate the convenience of having a controlled vocabulary that can respond to change without affecting the published standard.
The recommended vocabulary is given at http://code.google.com/p/darwincore/wiki/RecordLevelTerms#basisOfRecord, which is type 2 documentation in TDWG Standards nomenclature. If we wanted to add a new term to that list, we could do so without invoking a standards process. If we wanted to modify any of the circular definitions already there, or add any missing ones, we could without invoking a standards process. The alternative, defining type vocabularies that machines can use to understand the intricacies of our distinctions, would not be so easy to change, but would admittedly offer much more stability as its reward.
On Sun, Nov 1, 2009 at 1:55 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
So....how does one represent a record that is both a NomenclaturalAct *and* a TaxonomicAct at the same time (as I said, virtually all of the former also constitute the latter)? Perhaps this is the solution that I've been looking for a while now -- that is, the basisOfRecord in this case is not really the "basis of the record" (I would describe the basis of the record as a TaxonNameUsage); but rather represents something more like "basis of representation". That is, if a single TaxonNameUsage instance both carries a NomenclaturalAct and represent a TaxonomicAct, then the basisOfRecord could distinguish which of the two "things" that the specific record is intended to represent. If basisOfRecord=NomenclaturalAct, then metadata elements would include all the nomenclatural bits associated with the record (e.g., various Code-governed events, etc.). If basisOfRecord=TaxonomicAct, then the metadata elements would include things like classification, synonymy, included non-name-bearing specimens, etc. In other words, the "thing" is the same in both cases (i.e., a TaxonNameUsage instance), but the difference would be which aspect of that thing the record is intended to represent.
I suspect strongly that the preceding paragraph makes almost no sense whatsoever to anyone other than me (and I'm not even sure I understand it).
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Gregor Hagedorn Sent: Saturday, October 31, 2009 10:44 PM To: Blum, Stan Cc: tdwg-content@lists.tdwg.org; Vishwas Chavan (GBIF); Steve Baskauf Subject: Re: [tdwg-content] Conflict between DarwinCore andDublinCoreusageof dcterms:type / basisOfRecord
I can't remember where, maybe in one of Rich's examples, I
thought I saw the basisOfRecord for a taxonName designated as: "NomenclaturalAct". I thought that was both correct and precise. Similarly, I think the basis of a taxon record should be a "TaxonomicAct", i.e., a published description or reclassification.
I would favor it, because keeping recordClass versus resource type better separated. "NomenclaturalAct", "TaxonomicAct" would be dcterms:type =event, for unpublished acts or dcterms:type=text for published acts. In fact in this case, the dcterms:type would no longer be redundant.
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
PreservedSpecimen implies its an occurrence and TaxonName implies a Taxon class. Isnt this yet another class hierarchy that is flattened to share different levels of granularity for different clients?
Yes, it is. Is there harm in that? We do it all over the place. Keeping this one means we don't have to imply anything, we can get it explicitly.
I agree with John. Redundancy need not be a purist design principle, but here the higher-level redundancy is an expression in a vocabulary that many more people and machines understand, making data accessible to more potential users.
Gregor
Found a moment to get back to this. There currently is no formal (that is, rdf) relation between observations and dcmitype:Event. In fact, there are no formal relations between recommended basisOfRecord values ("PreservedSpecimen", "StillImage", "HumanObservation") and any dcmitype terms (dcmitype:PhysicalObject, dcmitype:StillImage, dcmitype:Event). The former are string literals in a list that is recommended while the latter are the DC recommended Type Vocabulary for dcterms:type.
So, if I was to publish a record for a human observation of a species in nature (a dwc:Occurrence), I would populate the terms as follow (note specifically the use of terms for type and recordClass and the use of a string for basisOfRecord):
dcterms:type = dcmitype:Event dwc:basisOfRecord = "HumanObservation" dwc:recordClass = dwctype:Occurrence
For your other example I would populate the terms similarly, with:
dcterms:type = dcmitype:Event dwc:basisOfRecord = "MachineObservation" dwc:recordClass = dwctype:Occurrence
On Mon, Oct 26, 2009 at 2:24 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
John, yes, I was confused about the intended solution. I did not understand the text introduced with "The following may be confusing" - it is confusing to me, I don't understand which conclusion you draw from this. But you warned us, so I ignored this, and it probably does not matter here.
With respect to the Oct. 24th mail I see that I was reading too fast, and I based my understanding on the wrong of the two solutions for one of the examples (reading too fast, my fault!). This one is clear:
John writes:
For UC-Steve2 the Alaska museum of the North should have an Occurrence record for that specimen with: dcterms:type = "PhysicalObject" dwc:basisOfRecord = "PreservedSpecimen" dwc:recordClass = "Occurrence"
I cannot find the relation of HumanObservation / Machine Observation to dcterms:type spelled out. Do you propose
dcterms:type = "Event" dwc:basisOfRecord = "HumanObservation" dwc:recordClass = "Occurrence"
for unvouchered (no image) observations, and potentially
dcterms:type = "Dataset" dwc:basisOfRecord = "MachineObservation" dwc:recordClass = "Occurrence"
Sorry for email noise by not reading carefully enough!
Gregor
I hesitated to respond to this earlier because I am not up on what is going on in the "observation" community of tdwg and how they define things. However, I have a philosophical problem with saying that HumanObservations and MachineObservations should be typed as dcterm:type=dcmitype:Event. My problem stems from the way that we use "observation" in English.
1. When a taxonomist conducts a field collecting trip (a dcterm:event) at a certain time and place, we end up with a specimen (dcterm:type=dcmitype:PhysicalObject, dwc:basisOfRecord="PhysicalSpecimen"). 2. When a photographer conducts a photo shoot (a dcterm:event) at a certain time and place, we end up with a still image (dcterm:type=dcmitype:StillImage, dwc:basisOfRecord="StillImage"). 3. When a birdwatcher conducts an observation (a dcterm:event) at a certain time and place, we end up with an observation (dcterm:type=???, dwc:basisOfRecord="HumanObservation").
In example 3, the problem is that we use the same word ("Observation") for the act of observing and the product that we create when we observe. We don't do this for other types of things that belong in the dwc:recordClass = dwctype:Occurrence (i.e. examples 1 and 2). If you look at the elements in the DwC class Occurrence, you see elements that can describe the things that are created when we document the presence of an individual, e.g. catalogNumber, preparations, sex, etc. If you look at the elements in the Dwc class Event, you see elements related to the act of creating, e.g. the time (eventDate), place (habitat), and method (samplingProtocol). The use of the word Event to describe a DwC class is true to the DCMI definition of Event: "Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration." The examples given in the definition all indicate things that happen, not the things that result from those happenings (i.e. exhibitions, not exhibits; conference, not proceedings; battle, not dead people; tea party, not tea; etc.).
As I look at the DCMI classes, the problem is that I don't think there is one that describes observation (in the sense of an Occurrence, i.e. the product that is created when we observe). So I think that the appropriate thing in the case of observation is either to provide no value for dcterm:type or to petition DCMI to create a class for observations (if we can get them to understand the distinction that I'm making between act and product). I do not think that the appropriate thing is to call observations (sensu created product) something that they are not. I have previously raised the question of "Why do we need dcterms:type?". I do not think that the reason is to satisfy our need to place all things into conceptual boxes. I think the reason is to let a machine or user know what kinds of metadata to expect when they are told that a record has a particular value for dcterms:type. Under that rationale, if a record identifier is for an observation is being resolved and a consuming application is told that the record has a dcterm:type of Event, that application is going to be expecting metadata about an act (time, place, and method), not metadata about a created entity (catalog number, sex, or whatever types of things that you record about an observation). It is better to tell the machine nothing than to tell it to expect the wrong thing. While it is true that observations have a time, place, and method of creation, PhysicalObjects and StillImages also have a time, place, and method of their creation, yet we do not class them as Events. Let's not put a round peg into a square hole. Either leave the peg out or drill a round hole.
Steve Baskauf
John R. WIECZOREK wrote:
Found a moment to get back to this. There currently is no formal (that is, rdf) relation between observations and dcmitype:Event. In fact, there are no formal relations between recommended basisOfRecord values ("PreservedSpecimen", "StillImage", "HumanObservation") and any dcmitype terms (dcmitype:PhysicalObject, dcmitype:StillImage, dcmitype:Event). The former are string literals in a list that is recommended while the latter are the DC recommended Type Vocabulary for dcterms:type.
So, if I was to publish a record for a human observation of a species in nature (a dwc:Occurrence), I would populate the terms as follow (note specifically the use of terms for type and recordClass and the use of a string for basisOfRecord):
dcterms:type = dcmitype:Event dwc:basisOfRecord = "HumanObservation" dwc:recordClass = dwctype:Occurrence
For your other example I would populate the terms similarly, with:
dcterms:type = dcmitype:Event dwc:basisOfRecord = "MachineObservation" dwc:recordClass = dwctype:Occurrence
Comments follow post extracts.
Richard Pyle wrote:
I've wrestled with similar issues; namely collections of images that span from obvious occurrence records to illustrative images of the sort that Bob shows, to diagrams of particular specimens, to abstract diagrams of no specimen in particular.
My concern about Bob's CharacterIllustration BoR is that this is non-mutually exclusive to others. For example, a StillImage in-situ could represent both a geographic occurrence and a representation of a particular morphological character. How to represent such cases: two separate records?
My gut feeling is that we need to separate records that represent an occurrence, from records that represent the evidence documenting the occurrence. Very often we have undewater video of a fish in its habitat, then we collect the specimen, then we take a prepared specimen digital photograph. I assume the appropriate way to represent this through DwC is via three separate occurrence records, each appropriately types, and each cross-referenced to each other. But perhaps there should be only one occurrence record, with three cross-linked "Evidence" records of some sort.
This is a topic that our SERNEC Live Plant Image subgroup had discussed at length over the last several years - a summary at http://www.sernec.org/?q=node/220 for anyone who can stomach it. I think that the problem stems fundamentally from a confusion between what a resource IS and the PURPOSE of the resource. The most helpful way to classify a resource is according to what it IS and to let the purpose be established through relationships that that resource has with other (probably abstract) resources or through assignment of attributes to the resource. Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations (i.e. by images) and because a specimen does not have to document an occurrence if it is not collected from an organism in nature. This can be illustrated by two figures from a paper I'm writing:
http://www.cas.vanderbilt.edu/bsci110a/conceptual-scheme-botanical.gif http://www.cas.vanderbilt.edu/bsci110a/conceptual-scheme-insect.gif
These diagrams represent scenarios similar to several ongoing large-scale documentation projects - the situation you describe with the fish would be similar. In these situations, the resources that document occurrences are those that were derived directly from an individual in the wild (gray arrows). Note that they are seeds, still images, preserved specimens, but could potentially also be living specimens, observations, tissue samples, or sounds. The other resources in the diagrams derived indirectly from the individual (white arrows) and which are NOT occurrences are: still images, preserved specimens, living specimens, DNA samples, and DNA sequences. The point is that there is no fundamental relationship between the type of resource and whether it is an occurrence or not. The same could be said about Bob and Gregor's character state illustrations. In the insect scheme diagram, the leg and wing preparations (dwc:basisOfRecord=PreservedSpecimen) and their images (dwc:basisOfRecord=StillImage) are specifically created to illustrate character states, but they could just as easily be used to identify the individual in the wild, as a part of a visual key, or as a computer tool to teach visual recognition. Thus it seems to me a bad idea for the value of the recordClass term assigned to a resource to be based on the intended use (and therefore requiring several records with nearly identical metadata for each type of class). Better just to say that a resource is derived from an organism (i.e. call its recordClass "DerivativeResource" rather than "Occurrence"), and indicates it's fitness-for-use through other means (e.g. an RDF relationship "isAnOccurence of", or "illustratesCharacterState" or something like that).
John R. WIECZOREK wrote:
The image would be a different resource that could be referred to by the Occurrence record via dwc:associatedMedia or through an instance of the dwc:ResourceRelationship class. ...
I have been pondering for some time the appropriate way to indicate relationships among resources in complex networks such those shown in the two diagrams. It seems to me that using dcterms:source exactly indicates the relationship of a resource to the one that it was derived from (e.g. the dcterms:source for a specimen image is the identifier for the PreservedSpecimen of which it is a representation). However, I can find in DwC no general term for the inverse relationship: derived resources (i.e. resources that are created from another). There are the specific instances of dwc:associatedMedia and dwc:associatedSequences, but as you can see from the two diagrams, there are many kinds of resources that can be derived from others including living and preserved specimens, DNA samples, etc. in addition to media items (such as StillImage) and sequences. It would be better to have a general term to point to a derived resource (such as "derivedResource"). The type of that resource could then easily be determined by a machine by looking at the basisOfRecord for the derived resource. I looked at the dwc:ResourceRelationship class for something that would work for this and considered a combination of dwc:relatedResourceID and dwc:relationshipOfResource (which could have a value of something like "derived"). However, the descriptions of these two terms look like they are intended for describing ecological relationships rather than resource creation relationships.
I would be interested to know the best way under DwC to indicate these relationships. The scope of this is really beyond the "Simple Darwin Core" because you can't handle such complex collection schemes with a flat database, but presumably it is intended at some point for DwC to be able to handle complex situations beyond the "Simple" cases.
Steve Baskauf
Beginning at the end of Steve's commentary - Darwin Core can handle the complex cases you are describing, but yes, the ResourceRelationships are beyond the capabilities of the Simple Darwin Core (flat by design) except in the limited way of sharing a list (in a single given term) of resources related in the specific ways defined by associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, and associatedTaxa. ResourceRelationship is wide open in its capacity to relate resources to each other forward and backward. As Steve points out, dcterms:source can't do this. Not sure what threw you off thinking that the descriptions suggested ecological relationships, Steve, because you were right on track.
dwc:resourceID Definition: An identifier for the resource that is the subject of the relationship.
dwc: relatedResourceID Definition: An identifier for a related resource (the object, rather than the subject of the relationship).
dwc:relationshipOfResource Definition: The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Comment: Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There is no reason relationshipOfResource couldn't contain the values "derived from" or "source of" or any other values deemed clear and appropriate.
On Sat, Oct 24, 2009 at 2:21 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Comments follow post extracts.
Richard Pyle wrote:
I've wrestled with similar issues; namely collections of images that span from obvious occurrence records to illustrative images of the sort that Bob shows, to diagrams of particular specimens, to abstract diagrams of no specimen in particular.
My concern about Bob's CharacterIllustration BoR is that this is non-mutually exclusive to others. For example, a StillImage in-situ could represent both a geographic occurrence and a representation of a particular morphological character. How to represent such cases: two separate records?
My gut feeling is that we need to separate records that represent an occurrence, from records that represent the evidence documenting the occurrence. Very often we have undewater video of a fish in its habitat, then we collect the specimen, then we take a prepared specimen digital photograph. I assume the appropriate way to represent this through DwC is via three separate occurrence records, each appropriately types, and each cross-referenced to each other. But perhaps there should be only one occurrence record, with three cross-linked "Evidence" records of some sort.
This is a topic that our SERNEC Live Plant Image subgroup had discussed at length over the last several years - a summary at http://www.sernec.org/?q=node/220 for anyone who can stomach it. I think that the problem stems fundamentally from a confusion between what a resource IS and the PURPOSE of the resource. The most helpful way to classify a resource is according to what it IS and to let the purpose be established through relationships that that resource has with other (probably abstract) resources or through assignment of attributes to the resource. Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations (i.e. by images) and because a specimen does not have to document an occurrence if it is not collected from an organism in nature. This can be illustrated by two figures from a paper I'm writing:
http://www.cas.vanderbilt.edu/bsci110a/conceptual-scheme-botanical.gif http://www.cas.vanderbilt.edu/bsci110a/conceptual-scheme-insect.gif
These diagrams represent scenarios similar to several ongoing large-scale documentation projects - the situation you describe with the fish would be similar. In these situations, the resources that document occurrences are those that were derived directly from an individual in the wild (gray arrows). Note that they are seeds, still images, preserved specimens, but could potentially also be living specimens, observations, tissue samples, or sounds. The other resources in the diagrams derived indirectly from the individual (white arrows) and which are NOT occurrences are: still images, preserved specimens, living specimens, DNA samples, and DNA sequences. The point is that there is no fundamental relationship between the type of resource and whether it is an occurrence or not. The same could be said about Bob and Gregor's character state illustrations. In the insect scheme diagram, the leg and wing preparations (dwc:basisOfRecord=PreservedSpecimen) and their images (dwc:basisOfRecord=StillImage) are specifically created to illustrate character states, but they could just as easily be used to identify the individual in the wild, as a part of a visual key, or as a computer tool to teach visual recognition. Thus it seems to me a bad idea for the value of the recordClass term assigned to a resource to be based on the intended use (and therefore requiring several records with nearly identical metadata for each type of class). Better just to say that a resource is derived from an organism (i.e. call its recordClass "DerivativeResource" rather than "Occurrence"), and indicates it's fitness-for-use through other means (e.g. an RDF relationship "isAnOccurence of", or "illustratesCharacterState" or something like that). John R. WIECZOREK wrote:
The image would be a different resource that could be referred to by the Occurrence record via dwc:associatedMedia or through an instance of the dwc:ResourceRelationship class. ...
I have been pondering for some time the appropriate way to indicate relationships among resources in complex networks such those shown in the two diagrams. It seems to me that using dcterms:source exactly indicates the relationship of a resource to the one that it was derived from (e.g. the dcterms:source for a specimen image is the identifier for the PreservedSpecimen of which it is a representation). However, I can find in DwC no general term for the inverse relationship: derived resources (i.e. resources that are created from another). There are the specific instances of dwc:associatedMedia and dwc:associatedSequences, but as you can see from the two diagrams, there are many kinds of resources that can be derived from others including living and preserved specimens, DNA samples, etc. in addition to media items (such as StillImage) and sequences. It would be better to have a general term to point to a derived resource (such as "derivedResource"). The type of that resource could then easily be determined by a machine by looking at the basisOfRecord for the derived resource. I looked at the dwc:ResourceRelationship class for something that would work for this and considered a combination of dwc:relatedResourceID and dwc:relationshipOfResource (which could have a value of something like "derived"). However, the descriptions of these two terms look like they are intended for describing ecological relationships rather than resource creation relationships. I would be interested to know the best way under DwC to indicate these relationships. The scope of this is really beyond the "Simple Darwin Core" because you can't handle such complex collection schemes with a flat database, but presumably it is intended at some point for DwC to be able to handle complex situations beyond the "Simple" cases.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
I'm not sure I agrre here...
Steve Baskauf [steve.baskauf@vanderbilt.edu] Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations
I think there is really only 2 categories of occurrence here - those with physical vouchered specimens, and those with digital only representations. Only those with a physical specimen are "specimen occurrences", all others are "observed occurrences" (even if thay have an image assocuated with them). I can't see why this would really restrict you from represetning any occurrence data you may have.
Also, one of the beneficial things about DwC is its simplicity and specificity. If we generalise again (to handle "all" types of occurrence, "resources derived from organisms"), then I feel the ontology will become less usable, and obvious, to end users. Sometimes it is a good thing to specify precise data fields and types in an ontology.
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Kevin Richards wrote:
I'm not sure I agrre here...
Steve Baskauf [steve.baskauf@vanderbilt.edu] Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations
I think there is really only 2 categories of occurrence here - those with physical vouchered specimens, and those with digital only representations. Only those with a physical specimen are "specimen occurrences", all others are "observed occurrences" (even if thay have an image assocuated with them).
The distinction I was drawing was between non-physical resources that return a representation of the organism and those that do not. For example, a database record representing a digital image of a bird could contain a URL to the location from which the bird image can be retrieved. A consuming application could retrieve this file and display it on the screen for the user to see. In contrast, a database record representing a checked box for a Christmas Bird Count observation the same bird can return no representation of the bird. Both records would have the same metadata about location, date, taxonomy, observer, etc. but only the former would have metadata of the sort that MRTG is dealing with (copyright and licensing information for the image, a title, caption, etc.). In a third case where a bird was mist-netted and the wing length measured, one could put the record in either the first category or the second depending on whether one considered the wing length to be data or metadata. But that is a question for the observation people and out of my area. My point was that aside from occurrences with physical vouchers, there are two fundamentally different types of resources: those that return a digital representation of the organism and those that don't. If a record is linked to a digital representation (StillImage, MovingImage, or Sound), a user may examine that representations for physical or behavioral characters that would allow the taxonomic determination of the organism to be verified, while in the checklist example, the user would simply have to trust the identification ability of the observer.
I can't see why this would really restrict you from represetning any occurrence data you may have. Also, one of the beneficial things about DwC is its simplicity and specificity. If we generalise again (to handle "all" types of occurrence, "resources derived from organisms"), then I feel the ontology will become less usable, and obvious, to end users. Sometimes it is a good thing to specify precise data fields and types in an ontology.
My problem here is with use of the word "occurrence". The nature of that word implies that the record represents a valid occurrence record for a species, i.e. that the record could appropriately be used to put a dot on on a distribution map for the species. If I take a StillImage of an /Osmorhiza longistylis/ plant in the woods and my digital camera records the time and GPS coordinates, then those metadata indicate that /Osmorhiza longistylis/ occurred in that woods on the day that I took the image. On the other hand, if I take an image of a PreservedSpecimen of /Osmorhiza longistylis/ in an herbarium and my camera records the same information, it would not be appropriate to use those time and location metadata to put a dot on the /Osmorhiza longistylis/ distribution map at the location of the herbarium. Rather, the time and location metadata for the collection of the PreservedSpecimen should be used to place the dot. I still need to record the time and place where the specimen image was taken, I just don't want for it to represent an occurrence. That is why it bothers me to classify a StillImage of a PreservedSpecimen as a /recordType/=Occurrence. My suggestion of the term "DerivativeResource" was an attempt to divorce the USE of the image (to document a valid occurrence or not) from what the thing IS (a representation that was derived directly or indirectly from an organism). Calling such representations something other than "Occurrence" gets us away from the issue raised by Gregor and Bob where there are many possible uses for a resource. When I take live plant images, I consciously intend for them to be used simultaneously to record an occurrence, illustrate characters, and be used for media tools such as visual keys and visual recognition software, not just to document an occurrence.
I should also note that although this problem is widespread for images, it can also apply to physical resources as well. A PreservedSpecimen taken from a wild-collected plant growing in a botanical garden or animal in a zoo (i.e. from a LivingSpecimen) has the same problem. Both would provide useful information for identifying the organism but in neither case would the PreservedSpecimen collection time and location represent a valid occurrence that should used to put a dot on a map. The collection time and location for the LivingSpecimen would be the metadata to use to place the dot (i.e. valid occurrence).
Because DwC has traditionally been applied primarily to preserved specimens which usually represent valid species occurrences, this may not have been a very important issue, but for people like me who want to apply DwC to images it is a big deal.
Steve Baskauf
Steve,
Perhaps the definition of the dwc:Occurrence class will help with your quandary, as it was meant specifically to apply to all of the cases you have brought forth so far, at least if you don't get too caught up in the two explicit examples and read more into the "etc."
dwc:Occurrence
Definition: The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.).
Hence, it is perfectly permissible to has a record of a specimen in a collection for which there is no location information, or a record of a zoo animal for which the location information is the zoo, or a photograph of the animal in the zoo with no location information, or the photograph of the animal zoo with the location information for the zoo perfectly georeferenced. All of these are Occurrence records in DwC. How you use them depends on you. How you discover whether they are fit for use is by the content of terms such as dcterms:type (as newly proposed), dwc:basisOfRecord (as newly proposed), dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:samplingProtocol, dwc:georeferenceMethod, or indeed any other term that allows you to filter your special criteria.
John
On Sun, Oct 25, 2009 at 4:59 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Kevin Richards wrote:
I'm not sure I agrre here...
Steve Baskauf [steve.baskauf@vanderbilt.edu] Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations
I think there is really only 2 categories of occurrence here - those with physical vouchered specimens, and those with digital only representations. Only those with a physical specimen are "specimen occurrences", all others are "observed occurrences" (even if thay have an image assocuated with them).
The distinction I was drawing was between non-physical resources that return a representation of the organism and those that do not. For example, a database record representing a digital image of a bird could contain a URL to the location from which the bird image can be retrieved. A consuming application could retrieve this file and display it on the screen for the user to see. In contrast, a database record representing a checked box for a Christmas Bird Count observation the same bird can return no representation of the bird. Both records would have the same metadata about location, date, taxonomy, observer, etc. but only the former would have metadata of the sort that MRTG is dealing with (copyright and licensing information for the image, a title, caption, etc.). In a third case where a bird was mist-netted and the wing length measured, one could put the record in either the first category or the second depending on whether one considered the wing length to be data or metadata. But that is a question for the observation people and out of my area. My point was that aside from occurrences with physical vouchers, there are two fundamentally different types of resources: those that return a digital representation of the organism and those that don't. If a record is linked to a digital representation (StillImage, MovingImage, or Sound), a user may examine that representations for physical or behavioral characters that would allow the taxonomic determination of the organism to be verified, while in the checklist example, the user would simply have to trust the identification ability of the observer.
I can't see why this would really restrict you from represetning any occurrence data you may have. Also, one of the beneficial things about DwC is its simplicity and specificity. If we generalise again (to handle "all" types of occurrence, "resources derived from organisms"), then I feel the ontology will become less usable, and obvious, to end users. Sometimes it is a good thing to specify precise data fields and types in an ontology.
My problem here is with use of the word "occurrence". The nature of that word implies that the record represents a valid occurrence record for a species, i.e. that the record could appropriately be used to put a dot on on a distribution map for the species. If I take a StillImage of an Osmorhiza longistylis plant in the woods and my digital camera records the time and GPS coordinates, then those metadata indicate that Osmorhiza longistylis occurred in that woods on the day that I took the image. On the other hand, if I take an image of a PreservedSpecimen of Osmorhiza longistylis in an herbarium and my camera records the same information, it would not be appropriate to use those time and location metadata to put a dot on the Osmorhiza longistylis distribution map at the location of the herbarium. Rather, the time and location metadata for the collection of the PreservedSpecimen should be used to place the dot. I still need to record the time and place where the specimen image was taken, I just don't want for it to represent an occurrence. That is why it bothers me to classify a StillImage of a PreservedSpecimen as a recordType=Occurrence. My suggestion of the term "DerivativeResource" was an attempt to divorce the USE of the image (to document a valid occurrence or not) from what the thing IS (a representation that was derived directly or indirectly from an organism). Calling such representations something other than "Occurrence" gets us away from the issue raised by Gregor and Bob where there are many possible uses for a resource. When I take live plant images, I consciously intend for them to be used simultaneously to record an occurrence, illustrate characters, and be used for media tools such as visual keys and visual recognition software, not just to document an occurrence.
I should also note that although this problem is widespread for images, it can also apply to physical resources as well. A PreservedSpecimen taken from a wild-collected plant growing in a botanical garden or animal in a zoo (i.e. from a LivingSpecimen) has the same problem. Both would provide useful information for identifying the organism but in neither case would the PreservedSpecimen collection time and location represent a valid occurrence that should used to put a dot on a map. The collection time and location for the LivingSpecimen would be the metadata to use to place the dot (i.e. valid occurrence).
Because DwC has traditionally been applied primarily to preserved specimens which usually represent valid species occurrences, this may not have been a very important issue, but for people like me who want to apply DwC to images it is a big deal.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
My feeling is that in its present form MRTG doesn't mean to directly support all the concerns expressable in http://code.google.com/p/darwincore/wiki/RecordLevelTerms#basisOfRecord (other than those we've already agreed belong in dc:type). Instead, in its "Content Category Vocabulary" MRTG provides a mechanism "CVTerm" to identify and use other communities' controlled vocabularies to describe the content. Thus, if I had an image of a PreservedSpecimen, and wanted to signal that in MRTG metadata, I believe our recommendation would be to use something like the informally stated CVterm = dwc:PreservedSpecimen
The exact representation must depend on the MRTG implementation language, e.g. RDF or XML Schema, and whether there actually is a DwC URI dwc:PreservedSpecimen, or whether PerservedSpecimen is just a favored literal in DwC...something which I can't quite tell at the moment. A secondary, but important, rationale is the anecdotal evidence that many existing biodiversity image stores already try to use DwC terms, and we want it to be easy to reuse those in MRTG in many cases.
From MRTG's point of view, I think the real discussion is the original
one: get dc:type consistent across MRTG, DwC and DC. The rest of the discussion probably doesn't impact MRTG very directly, except that MRTG has to track the fallout and chime in two places: (a) where we have used another DwC term that was consistent with DwC before but becomes inconsistent after the DwC revision and (b)where expertise of MRTG contributors smells DwC usage problems. I suspect (a) is minimal. Happily for me, my expertise probably doesn't cover (b), but Gregor and Steve seem to be chiming just fine. :-)
Bob
On Sun, Oct 25, 2009 at 12:49 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
Steve,
Perhaps the definition of the dwc:Occurrence class will help with your quandary, as it was meant specifically to apply to all of the cases you have brought forth so far, at least if you don't get too caught up in the two explicit examples and read more into the "etc."
dwc:Occurrence
Definition: The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.).
Hence, it is perfectly permissible to has a record of a specimen in a collection for which there is no location information, or a record of a zoo animal for which the location information is the zoo, or a photograph of the animal in the zoo with no location information, or the photograph of the animal zoo with the location information for the zoo perfectly georeferenced. All of these are Occurrence records in DwC. How you use them depends on you. How you discover whether they are fit for use is by the content of terms such as dcterms:type (as newly proposed), dwc:basisOfRecord (as newly proposed), dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:samplingProtocol, dwc:georeferenceMethod, or indeed any other term that allows you to filter your special criteria.
John
On Sun, Oct 25, 2009 at 4:59 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Kevin Richards wrote:
I'm not sure I agrre here...
Steve Baskauf [steve.baskauf@vanderbilt.edu] Unfortunately, this issue has been clouded somewhat by adoption of the term Occurrence for the class that includes specimens and observations. I understand the reason why this was done (i.e. because specimens and observations both can serve as records of occurrence), but I think it would be better to have used something like "DerivativeResource" (i.e. a resource that is derived from an organism) for the dwc:recordClass rather than "Occurrence" because an occurrence can documented by resources other than specimens and observations
I think there is really only 2 categories of occurrence here - those with physical vouchered specimens, and those with digital only representations. Only those with a physical specimen are "specimen occurrences", all others are "observed occurrences" (even if thay have an image assocuated with them).
The distinction I was drawing was between non-physical resources that return a representation of the organism and those that do not. For example, a database record representing a digital image of a bird could contain a URL to the location from which the bird image can be retrieved. A consuming application could retrieve this file and display it on the screen for the user to see. In contrast, a database record representing a checked box for a Christmas Bird Count observation the same bird can return no representation of the bird. Both records would have the same metadata about location, date, taxonomy, observer, etc. but only the former would have metadata of the sort that MRTG is dealing with (copyright and licensing information for the image, a title, caption, etc.). In a third case where a bird was mist-netted and the wing length measured, one could put the record in either the first category or the second depending on whether one considered the wing length to be data or metadata. But that is a question for the observation people and out of my area. My point was that aside from occurrences with physical vouchers, there are two fundamentally different types of resources: those that return a digital representation of the organism and those that don't. If a record is linked to a digital representation (StillImage, MovingImage, or Sound), a user may examine that representations for physical or behavioral characters that would allow the taxonomic determination of the organism to be verified, while in the checklist example, the user would simply have to trust the identification ability of the observer.
I can't see why this would really restrict you from represetning any occurrence data you may have. Also, one of the beneficial things about DwC is its simplicity and specificity. If we generalise again (to handle "all" types of occurrence, "resources derived from organisms"), then I feel the ontology will become less usable, and obvious, to end users. Sometimes it is a good thing to specify precise data fields and types in an ontology.
My problem here is with use of the word "occurrence". The nature of that word implies that the record represents a valid occurrence record for a species, i.e. that the record could appropriately be used to put a dot on on a distribution map for the species. If I take a StillImage of an Osmorhiza longistylis plant in the woods and my digital camera records the time and GPS coordinates, then those metadata indicate that Osmorhiza longistylis occurred in that woods on the day that I took the image. On the other hand, if I take an image of a PreservedSpecimen of Osmorhiza longistylis in an herbarium and my camera records the same information, it would not be appropriate to use those time and location metadata to put a dot on the Osmorhiza longistylis distribution map at the location of the herbarium. Rather, the time and location metadata for the collection of the PreservedSpecimen should be used to place the dot. I still need to record the time and place where the specimen image was taken, I just don't want for it to represent an occurrence. That is why it bothers me to classify a StillImage of a PreservedSpecimen as a recordType=Occurrence. My suggestion of the term "DerivativeResource" was an attempt to divorce the USE of the image (to document a valid occurrence or not) from what the thing IS (a representation that was derived directly or indirectly from an organism). Calling such representations something other than "Occurrence" gets us away from the issue raised by Gregor and Bob where there are many possible uses for a resource. When I take live plant images, I consciously intend for them to be used simultaneously to record an occurrence, illustrate characters, and be used for media tools such as visual keys and visual recognition software, not just to document an occurrence.
I should also note that although this problem is widespread for images, it can also apply to physical resources as well. A PreservedSpecimen taken from a wild-collected plant growing in a botanical garden or animal in a zoo (i.e. from a LivingSpecimen) has the same problem. Both would provide useful information for identifying the organism but in neither case would the PreservedSpecimen collection time and location represent a valid occurrence that should used to put a dot on a map. The collection time and location for the LivingSpecimen would be the metadata to use to place the dot (i.e. valid occurrence).
Because DwC has traditionally been applied primarily to preserved specimens which usually represent valid species occurrences, this may not have been a very important issue, but for people like me who want to apply DwC to images it is a big deal.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
participants (9)
-
"Markus Döring (GBIF)"
-
Blum, Stan
-
Bob Morris
-
Gregor Hagedorn
-
John R. WIECZOREK
-
Kevin Richards
-
Markus Döring
-
Richard Pyle
-
Steve Baskauf