[tdwg-content] Overloading of basisOfRecord and the DwC type vocabulary (includes DigitalStillImage term addition proposal)

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed Nov 3 18:02:03 CET 2010


John reminded me that I hadn't responded to his last post in the thread 
about my call for action on my proposal 
(http://code.google.com/p/darwincore/issues/detail?id=68) for adding 
DigitalStillImage to the DwC type vocabulary 
(http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) to serve as a 
controlled value for basisOfRecord 
(http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord).  After he sent 
the reply I actually laid in bed awake pondering what it was that 
disturbed me so much about basisOfRecord.  After mulling it over for a 
few days I have concluded that the problem is that basisOfRecord and the 
DwC type vocabulary are overloaded too much.  I will try to briefly 
state why I think this and then say what I think the implications are 
for my DigitalStillImage proposal.  Apologies in advance to Bob for 
sloppy use of technical terms.

THE PURPOSES
It seems like there are basically three separate purposes that 
basisOfRecord and the dwctype vocabulary can serve/do serve/are intended 
to serve:
1. To define the types of resources, e.g. to serve as an object for the 
predicate rdf:type .  Semantic reasoning based on assigning an rdf:type 
property to a resource would indicate that the resource was an instance 
of an rdfs:Class (see http://www.w3.org/TR/rdf-schema/#ch_type).
2. To serve as a property of an instance of dwc:Occurrence to allow a 
user to evaluate its fitness of use, e.g. an Occurrence having 
basisOfRecord=PreservedSpecimen is an Occurrence having supporting 
evidence (or a "token") that is a preserved specimen. 
3. To formally define the ontological relationship among DwC classes 
through the assignment of an rdfs:subClassOf property to a member of the 
DwC type vocabularly (see 
http://www.w3.org/TR/rdf-schema/#ch_subclassof).  As noted below, the 
current term definitions state that dwctype:PreservedSpecimen is a 
subclass of dwctype:Occurrence which is a subclass of dcmitype:Event

I think that at the time of the adoption of Darwin Core as a standard, 
this overloading may have made sense, but now that people are 
considering using Darwin Core terms in RDF, this overloading is a 
hindrance, not a help.  If I state that a resource has 
rdf:type=dwctype:PreservedSpecimen in an attempt to describe what the 
resource is, I end up causing unintended inferences to be drawn by 
semantic reasoners.  If I state that a resource has 
dwc:basisOfRecord=dwctype:PreservedSpecimen, it is not clear whether I'm 
talking about the Occurrence or the physical specimen itself. 

SUGGESTIONS
I would like to suggest that the way forward here is to separate these 
three functions, at least for the present.  It seems clear from the 
recent discussion on the email list that the kind of subclassing that 
currently exists (#3 above) does not represent the way that many people 
in the Darwin Core constituency are looking at PreservedSpecimens, 
Occurrences, and Events.  I would recommend removing the rdfs:subClassOf 
properties from all of the terms in the Darwin Core vocabulary until 
more substantial work is completed (with community consensus) on the 
TDWG ontology.  To facilitate purpose #1, I would also recommend that 
all current Darwin Core classes be represented in the dwctype 
vocabulary.  As I noted, the dwc:Identification class and some others 
are missing.  Based on what John said below, I guess this was 
intentional, but if dwctype was intended for use #1, then all of the 
classes should be there.  I actually believe that it would be beneficial 
to have a PhysicalSpecimen class and to move some Occurrence terms to it 
(similar issues with Taxon class), but that is a hornet's nest that I 
don't want to kick at the moment.  But anyway it would be relatively 
straightforward to just include all of the existing classes as dwctypes. 

Purpose #2 is the one that seems the most problematic to me.  It seems 
to me that if we want to refine Occurrence (as seems to be the purpose 
of making PreservedSpecimen, HumanObservation, etc. subclasses of 
Occurrence), then it would be better to create separate terms for those 
refined Occurrences to be used as the object of basisOfRecord rather 
than using the dwctypes.  Thus we would have something like 
basisOfRecord="OccurrenceWithEvidencePreservedSpecimen", etc. rather 
than basisOfRecord="PreservedSpecimen".  It would then be clear that the 
subject is the Occurrence, not the evidence.  With this approach, there 
would be no problem with making the statement 
[OccurrenceWithEvidencePreservedSpecimen] rdfs:subClassOf 
[dwctype:Occurrence] in contrast with saying [PreservedSpecimen] 
rdfs:subClassOf [dwctype:Occurrence].  I'm not saying that this is a 
good thing to do - in fact I don't like it at all based on the 
philosophy that Roger laid out in 
http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot.  But it is the 
approach that would get us out of making statements that don't make 
sense due to our lack of separation between the desire to refine 
Occurrence and to define rdf:type's. 

ABOUT THE DigitalStillImage PROPOSAL
Having said that, it seems like DigitalStillImage is not needed to serve 
purpose #1.  For that we already have 
http://purl.org/dc/dcmitype/StillImage .  However, if people want to do 
#2, then DigitalStillImage should be there as a value for 
basisOfRecord.  As I said in the paragraph above, 
"OccurrenceWithEvidenceDigitalStillImage" would probably be better than 
"DigitalStillImage" for this purpose.  But I don't really care anymore 
because I have no intention of doing #2 since I would prefer not to 
subclass Occurrence.  Instead I would create a new term (desperately 
needed, I think) that is a property of Occurrence and call it 
"hasEvidence", "hasToken", "evidenceID", or "tokenID", and that connects 
an Occurrence to an evidence URI.  I would then type the evidence using 
rdf:type.  But from what John says, it seems that there may be a lot of 
people who want to subclass Occurrence and for their benefit perhaps 
DigitalStillImage should be there. 

 From my point of view, I would be happy with either letting the 
proposal go up for a vote as is and let the TAG accept it or shoot it 
down, or to "freeze" the proposal (something John at some point hinted 
might be possible) until some future time when the ontology and RDF 
development have reached a point where there is a consensus view 
(expressed in the DwC standard itself, not just in the email list) about 
what an Occurrence is and what its properties (including basisOfRecord) 
should be and mean.  I just think that I'm ready to be done with 
worrying about it.

Steve

P.S. I'm going to stick in a few inline comments below as appropriate.  
Especially take note of the comment regarding the creation of new 
classes and typing.

John Wieczorek wrote:
> ...
>
>
> I think we have a lot of work to go the next step, to get the 
> semantics right, for which choices have to be considered more 
> carefully than has been done in the past. Thankfully, it seems we have 
> reached a critical mass to try to meet the challenge. The effort is 
> much appreciated.
Yes that is encouraging!
>
> ...
>
> No. We don't want to say that, even the way Darwin Core is right now, 
> because specimens don't have properties separate from Occurrences - 
> they ARE Occurrences.
Well, I'm glad to hear you say that because I was starting to think that 
I was crazy when I treated PreservedSpecimens, images, sounds, etc. AS 
Occurrences in the Biodiversity Informatics paper.  I think I got that 
notion from reading your posts on the list last (northern hemisphere) 
fall. 
>   ...
>
> Yes. What this says is that Darwin Core, as it currently stands, 
> doesn't care to distinguish between these classes. I agree that this 
> is a mistake. It makes me wonder if it is EVER appropriate to subClass.
OK, how hard would it be to get rid of the subClassOf properties in the 
RDF term definitions?  Is that something that the TAG could do by fiat 
or would it have to go through the official DwC change process?
>
> ...
>  
>
>     I guess if one objects to this kind of overloading, one could just
>     use the URIs of the DwC classes themselves as values for rdf:type
>     rather than using the dwctype URIs.  The RDF there doesn't seem to
>     have any kind of subclassing (but there is the problem of typing
>     PhysicalSpecimen which is not a generic DwC class).
>
>
> And a new class would have to be created every time a new token was 
> conceived, and new predicates for every relationship of every new 
> token to every other Class they might relate to. Doesn't seem scalable 
> to try to define the whole biodiversity universe in this way, so what 
> is the "sweet spot" solution. Create only what you need to create to 
> solve the specific problem at hand? Don't try to standardize at this 
> level?
Well, upon further reflection, I'm thinking that it isn't necessary to 
create a new class for every kind of token.  I have been mentally 
equating Darwin Core classes with rdfs:Class'es.  They could be the same 
thing, but wouldn't have to be.  What is important here is that we have 
a way to meet the (somewhat vague) requirement in the GUID Applicability 
statement (http://www.tdwg.org/stdtrack/article/download/150/51) that 
resources in the biodiversity informatics world should be typed using a 
well-known vocabulary.  If there are already well-known classes or types 
for the thing we need to describe (like StillImage in Dublin Core or 
Person in FOAF) then we can use them.  The issue comes to a head when 
other vocabularies don't have terms for the things we need to type, e.g. 
Individuals and PreservedSpecimens.  That's where DwC needs to create 
either classes or dwctype terms (unencumbered with subClass 
properties!).  The problem here is that there needs to be a consensus in 
the community about appropriate values for rdf:type in RDF provided for 
GUIDs.  Can a "semantic reasoner" program know that 
http://rs.tdwg.org/dwc/dwctype/Occurrence is the same kind of thing as 
http://rs.tdwg.org/dwc/terms/Occurrence and as the TDWG ontology URI for 
Occurrence (which I can't remember at the moment)?  I think not unless 
we assert some kind of sameAs relationship, which seems dangerous to 
me.  The GUID applicability statement specifically mentions the TDWG 
ontology, but I thing the GUID implementation is happening too fast to 
wait for all of the required "cat herding" that would be needed before a 
TDWG Ontology is done enough to serve this purpose.  I think dwctype is 
the best answer for things that aren't already typed in another 
vocabulary.  But I'm not going to use it while the subclass problems are 
still there.
>  
>
>     By the way, we don't have one DwCType for every DwC (or borrowed
>     Dublin Core) Class as you said.  There are no types for
>     Identification, Event, and GeologicalContext
>
>
> There is a type for Event 
> (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but 
> not for the other two. When I said "We have one DwCType for every DwC 
> (or borrowed Dublin Core) Class" I should have added "for which it 
> makes sense to have a record."
As I implied above, I think it makes sense to have a dwctype for an 
instance of any DwC class for which one might reasonably create a 
separate GUID (which is probably all of the classes). 
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101103/92936ed2/attachment-0001.html 


More information about the tdwg-content mailing list