John reminded me that I hadn't responded to his last post in the thread about my call for action on my proposal (http://code.google.com/p/darwincore/issues/detail?id=68) for adding DigitalStillImage to the DwC type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) to serve as a controlled value for basisOfRecord (http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord).  After he sent the reply I actually laid in bed awake pondering what it was that disturbed me so much about basisOfRecord.  After mulling it over for a few days I have concluded that the problem is that basisOfRecord and the DwC type vocabulary are overloaded too much.  I will try to briefly state why I think this and then say what I think the implications are for my DigitalStillImage proposal.  Apologies in advance to Bob for sloppy use of technical terms.

THE PURPOSES
It seems like there are basically three separate purposes that basisOfRecord and the dwctype vocabulary can serve/do serve/are intended to serve:
1. To define the types of resources, e.g. to serve as an object for the predicate rdf:type .  Semantic reasoning based on assigning an rdf:type property to a resource would indicate that the resource was an instance of an rdfs:Class (see http://www.w3.org/TR/rdf-schema/#ch_type).
2. To serve as a property of an instance of dwc:Occurrence to allow a user to evaluate its fitness of use, e.g. an Occurrence having basisOfRecord=PreservedSpecimen is an Occurrence having supporting evidence (or a "token") that is a preserved specimen. 
3. To formally define the ontological relationship among DwC classes through the assignment of an rdfs:subClassOf property to a member of the DwC type vocabularly (see http://www.w3.org/TR/rdf-schema/#ch_subclassof).  As noted below, the current term definitions state that dwctype:PreservedSpecimen is a subclass of dwctype:Occurrence which is a subclass of dcmitype:Event

I think that at the time of the adoption of Darwin Core as a standard, this overloading may have made sense, but now that people are considering using Darwin Core terms in RDF, this overloading is a hindrance, not a help.  If I state that a resource has rdf:type=dwctype:PreservedSpecimen in an attempt to describe what the resource is, I end up causing unintended inferences to be drawn by semantic reasoners.  If I state that a resource has dwc:basisOfRecord=dwctype:PreservedSpecimen, it is not clear whether I'm talking about the Occurrence or the physical specimen itself. 

SUGGESTIONS
I would like to suggest that the way forward here is to separate these three functions, at least for the present.  It seems clear from the recent discussion on the email list that the kind of subclassing that currently exists (#3 above) does not represent the way that many people in the Darwin Core constituency are looking at PreservedSpecimens, Occurrences, and Events.  I would recommend removing the rdfs:subClassOf properties from all of the terms in the Darwin Core vocabulary until more substantial work is completed (with community consensus) on the TDWG ontology.  To facilitate purpose #1, I would also recommend that all current Darwin Core classes be represented in the dwctype vocabulary.  As I noted, the dwc:Identification class and some others are missing.  Based on what John said below, I guess this was intentional, but if dwctype was intended for use #1, then all of the classes should be there.  I actually believe that it would be beneficial to have a PhysicalSpecimen class and to move some Occurrence terms to it (similar issues with Taxon class), but that is a hornet's nest that I don't want to kick at the moment.  But anyway it would be relatively straightforward to just include all of the existing classes as dwctypes. 

Purpose #2 is the one that seems the most problematic to me.  It seems to me that if we want to refine Occurrence (as seems to be the purpose of making PreservedSpecimen, HumanObservation, etc. subclasses of Occurrence), then it would be better to create separate terms for those refined Occurrences to be used as the object of basisOfRecord rather than using the dwctypes.  Thus we would have something like basisOfRecord="OccurrenceWithEvidencePreservedSpecimen", etc. rather than basisOfRecord="PreservedSpecimen".  It would then be clear that the subject is the Occurrence, not the evidence.  With this approach, there would be no problem with making the statement [OccurrenceWithEvidencePreservedSpecimen] rdfs:subClassOf [dwctype:Occurrence] in contrast with saying [PreservedSpecimen] rdfs:subClassOf [dwctype:Occurrence].  I'm not saying that this is a good thing to do - in fact I don't like it at all based on the philosophy that Roger laid out in http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot.  But it is the approach that would get us out of making statements that don't make sense due to our lack of separation between the desire to refine Occurrence and to define rdf:type's. 

ABOUT THE DigitalStillImage PROPOSAL
Having said that, it seems like DigitalStillImage is not needed to serve purpose #1.  For that we already have http://purl.org/dc/dcmitype/StillImage .  However, if people want to do #2, then DigitalStillImage should be there as a value for basisOfRecord.  As I said in the paragraph above, "OccurrenceWithEvidenceDigitalStillImage" would probably be better than "DigitalStillImage" for this purpose.  But I don't really care anymore because I have no intention of doing #2 since I would prefer not to subclass Occurrence.  Instead I would create a new term (desperately needed, I think) that is a property of Occurrence and call it "hasEvidence", "hasToken", "evidenceID", or "tokenID", and that connects an Occurrence to an evidence URI.  I would then type the evidence using rdf:type.  But from what John says, it seems that there may be a lot of people who want to subclass Occurrence and for their benefit perhaps DigitalStillImage should be there. 

>From my point of view, I would be happy with either letting the proposal go up for a vote as is and let the TAG accept it or shoot it down, or to "freeze" the proposal (something John at some point hinted might be possible) until some future time when the ontology and RDF development have reached a point where there is a consensus view (expressed in the DwC standard itself, not just in the email list) about what an Occurrence is and what its properties (including basisOfRecord) should be and mean.  I just think that I'm ready to be done with worrying about it.

Steve

P.S. I'm going to stick in a few inline comments below as appropriate.  Especially take note of the comment regarding the creation of new classes and typing.

John Wieczorek wrote:
...


I think we have a lot of work to go the next step, to get the semantics right, for which choices have to be considered more carefully than has been done in the past. Thankfully, it seems we have reached a critical mass to try to meet the challenge. The effort is much appreciated.
Yes that is encouraging!

...

No. We don't want to say that, even the way Darwin Core is right now, because specimens don't have properties separate from Occurrences - they ARE Occurrences.
Well, I'm glad to hear you say that because I was starting to think that I was crazy when I treated PreservedSpecimens, images, sounds, etc. AS Occurrences in the Biodiversity Informatics paper.  I think I got that notion from reading your posts on the list last (northern hemisphere) fall. 
  ...

Yes. What this says is that Darwin Core, as it currently stands, doesn't care to distinguish between these classes. I agree that this is a mistake. It makes me wonder if it is EVER appropriate to subClass.
OK, how hard would it be to get rid of the subClassOf properties in the RDF term definitions?  Is that something that the TAG could do by fiat or would it have to go through the official DwC change process?

...
 
I guess if one objects to this kind of overloading, one could just use the URIs of the DwC classes themselves as values for rdf:type rather than using the dwctype URIs.  The RDF there doesn't seem to have any kind of subclassing (but there is the problem of typing PhysicalSpecimen which is not a generic DwC class).

And a new class would have to be created every time a new token was conceived, and new predicates for every relationship of every new token to every other Class they might relate to. Doesn't seem scalable to try to define the whole biodiversity universe in this way, so what is the "sweet spot" solution. Create only what you need to create to solve the specific problem at hand? Don't try to standardize at this level?
Well, upon further reflection, I'm thinking that it isn't necessary to create a new class for every kind of token.  I have been mentally equating Darwin Core classes with rdfs:Class'es.  They could be the same thing, but wouldn't have to be.  What is important here is that we have a way to meet the (somewhat vague) requirement in the GUID Applicability statement (http://www.tdwg.org/stdtrack/article/download/150/51) that resources in the biodiversity informatics world should be typed using a well-known vocabulary.  If there are already well-known classes or types for the thing we need to describe (like StillImage in Dublin Core or Person in FOAF) then we can use them.  The issue comes to a head when other vocabularies don't have terms for the things we need to type, e.g. Individuals and PreservedSpecimens.  That's where DwC needs to create either classes or dwctype terms (unencumbered with subClass properties!).  The problem here is that there needs to be a consensus in the community about appropriate values for rdf:type in RDF provided for GUIDs.  Can a "semantic reasoner" program know that http://rs.tdwg.org/dwc/dwctype/Occurrence is the same kind of thing as http://rs.tdwg.org/dwc/terms/Occurrence and as the TDWG ontology URI for Occurrence (which I can't remember at the moment)?  I think not unless we assert some kind of sameAs relationship, which seems dangerous to me.  The GUID applicability statement specifically mentions the TDWG ontology, but I thing the GUID implementation is happening too fast to wait for all of the required "cat herding" that would be needed before a TDWG Ontology is done enough to serve this purpose.  I think dwctype is the best answer for things that aren't already typed in another vocabulary.  But I'm not going to use it while the subclass problems are still there.
 
By the way, we don't have one DwCType for every DwC (or borrowed Dublin Core) Class as you said.  There are no types for Identification, Event, and GeologicalContext

There is a type for Event (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but not for the other two. When I said "We have one DwCType for every DwC (or borrowed Dublin Core) Class" I should have added "for which it makes sense to have a record."
As I implied above, I think it makes sense to have a dwctype for an instance of any DwC class for which one might reasonably create a separate GUID (which is probably all of the classes). 


-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu