My apologies to John Wieczorek for the panicked tone of my previous email to the list.  It appeared to me (apparently incorrectly) that the issue was being closed without addressing the concern that I raised when I initially brought this up in my post to the MRTG wiki (http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC).   I have the following general concerns about what is being proposed.  I will follow with a particular use-case comment in a separate email. 

Hierarchy clarification needed.  If I am understanding the proposal as John has summarized it there would be three terms that could apply to a resource with metadata subject to DwC.  A new DwC term recordClass has with controlled values corresponding to the TDWG classes: "Occurrence", "Event", "Location", "Taxon".  dcterms:type is another term which could theoretically have DCMI Type values of: "Collection", "Dataset", "Event", "Image", "InteractiveResource", "MovingImage", "PhysicalObject", "Service", "Software", "Sound", "StillImage", or "Text" (although not all may be appropriate in the DwC context).   It is not clear to me in John's proposal whether the assignment of dcterms:type is intended to be independent of the value of recordClass, or if dcterms:type is intended to be a subclass of recordClass Occurrence (i.e. applicable only to resources that represent things that can document Occurrences such as Images and PhysicalObjects). In Gregor's email initiating this discussion he indicated that he felt that they should be independent.  But in John's proposal, the third term: basisOfRecord is clearly intended to be a subclass of dcterms:type with possible values of "StillImage", "MovingImage", "Sound", "PreservedSpecimen", FossilSpecimen", LivingSpecimen", "HumanObservation", "MachineObservation".  Since all of these basisOfRecord objects are the bases for documenting Occurrences, that insinuates that their parent dcterms:type terms should fall under recordClass Occurrence. 

Problems with calling basisOfRecord a subclass of dcterms:typePreservedSpecimen, FossilSpecimen, and LivingSpecimen can clearly fall under PhysicalObject, but what do we do with the rest?  basisOfRecord terms StillImage and MovingImage could be subtypes of dcterms:type Image, but what about a Sound?  Is basisOfRecord Sound a subtype of dcterms:type Sound?  What about a 35mm slide picturing an organism?  It is an Image, but is also a PhysicalObject.  Gregor suggested that basisOfRecord HumanObservation and basisOfRecord MachineObservation could be subtypes of dcterms:type Event but I disagree.  Just like observations, StillImages and PreservedSpecimens have Event information associated with them (the time and location of their creation) but we don't classify them as Events.  An Event is a conceptually different thing from the resource that is created at that Event. 

How does this change facilitate machine processing?  In John's request for comments, he quoted "3.3. Semantic changes in Darwin Core terms" which mentioned "if ... such changes of meaning are likely to have substantial impact on either machine processing of Darwin Core terms... "  It is not at all clear to me how the proposed reorganization of these three terms (recordClass, dcterms:type, and basisOfRecord) will facilitate machine processing, in particular because of the problems in associating particular basisOfRecord terms with dcterms:type terms as I discussed in the previous paragraph.  From a machine-processing standpoint, it makes a lot more sense to subclass recordClass Occurrence as follows:
PhysicalObject (including BasisOfRecord "PreservedSpecimen", FossilSpecimen", LivingSpecimen", and any other relevant material objects such as Seeds, FilmImage, etc.)
DigitalObject (including BasisOfRecord "StillImage", "MovingImage", "Sound", and any other relevant file-representable objects such as DnaSequences)
NoObject (including BasisOfRecord  "HumanObservation", "MachineObservation")
The consuming application that is receiving the metadata can then know that if the record involves an Occurrence, if the record's subclass is:
- PhysicalObject then there is an object somewhere that is not deliverable through the Internet but which could be visited in an herbarium or museum.
- DigitalObject then there is a representational file that should be retrieved and presented to the user of the application in an appropriate way.
- NoObject then there will only be metadata including measurements but nothing for the user to see, hear, etc.
Alternatively, rather than creating the three subclasses, simply create a property for resources of the class Occurrence called "objectType" and allow it to have values of PhysicalObject, DigitalObject, or NoObject.  This would serve the same purpose of informing the consuming application of the nature of the resource without having to create another hierarchical layer.  From a machine-processing standpoint, I don't see a great benefit to presenting a biodiversity-related consuming application with both dcterms:type, and basisOfRecord.

An alternative.  To me, trying to merge the dcterms:type and its DCMI Type values together with the new DwC recordClass and BasisOfRecord is like trying to fit a square peg in a round hole.  DCMI wasn't created with biodiversity records in mind and its vocabulary really doesn't mesh very well with DwC.  I fully support the idea of creating the new DwC term recordClass to contain what DwC formerly put in dcterms:type.  However, I think it would just be better to leave the actual dcterms:type and its DCMI Type values as an independent thing without trying to make DwC basisOfRecord its subtype.  Biodiversity media providers who need for their databases to mesh with non-biodiversity records should assign dcterms:type values to their records, non-media providers can if they want.  Biodiversity data providers (including those who provide media) should assign recordClass and BasisOfRecord values to their records. 

Steve Baskauf

John R. WIECZOREK wrote:
Gregor,

That sounds like a good solution to all of the problems. I would
propose that the basisOfRecord IS the the same thing as your proposed
dwc:subtype, so we should keep basisOfRecord.

Net solution:

1) keep dcterms:type
2) use DCType vocabulary to control dcterms:type (so, StillImage,
PhysicalObject, Event, etc.)
3) keep basisOfRecord
4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen,
HumanObservation, etc.) as the controlled vocabulary for basisOfRecord
without a formal type vocabulary (very close to how it is now, just
some of the terms would go to dcterms:type).
5) add a recordClass term
6) use the DwCType vocabulary to control the recordClass term instead
of the dcterms:type term.

This solutions fixes the Dublin Core - Darwin Core controlled
vocabulary problem, retains all existing terms, isolates the
controlled vocabulary that is specific to our domain, making it very
easy to expand without changes to the standard.

Any objections?

John

On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn
<g.m.hagedorn@gmail.com> wrote:
  
How about we retain basisOfRecord, but have it refine dcterms:type,
drop dcterms:type and add a "recordClass" term in its place that is
governed exactly as dcterms:type is currently being used in the
recently ratified version of the Core?
      
recordClass for Taxon/Occurrence/Event sounds good.

I am less sure about keeping the "perspective-dependent"
basisOfRecord-term in a place where dcterms:type. The dcterms:type
vocabulary is, in principle, extensible, and meant to be extended.
Except, of course, some specific xml-implementation of dublin core
prevent this... To avoid problems with this one might desire to have
only the strict resource type vocabulary in dcterms:type. Then this
could by PhysicalObject/Event and a dwc:subtype added to express
PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to
use such a mrtg:subtype as well to differentiate different StillImage
or Text subtypes:
 http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype

This would then mean, DarwinCore might support:
 dwc:recordClass
 dcterms:type
 dwc:subtype

Gregor

    
.

  

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu