[tdwg-content] Overloading of basisOfRecord and the DwC type vocabulary (includes DigitalStillImage term addition proposal)
Steve Baskauf
steve.baskauf at vanderbilt.edu
Wed Nov 3 18:02:03 CET 2010
John reminded me that I hadn't responded to his last post in the thread
about my call for action on my proposal
(http://code.google.com/p/darwincore/issues/detail?id=68) for adding
DigitalStillImage to the DwC type vocabulary
(http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) to serve as a
controlled value for basisOfRecord
(http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord). After he sent
the reply I actually laid in bed awake pondering what it was that
disturbed me so much about basisOfRecord. After mulling it over for a
few days I have concluded that the problem is that basisOfRecord and the
DwC type vocabulary are overloaded too much. I will try to briefly
state why I think this and then say what I think the implications are
for my DigitalStillImage proposal. Apologies in advance to Bob for
sloppy use of technical terms.
THE PURPOSES
It seems like there are basically three separate purposes that
basisOfRecord and the dwctype vocabulary can serve/do serve/are intended
to serve:
1. To define the types of resources, e.g. to serve as an object for the
predicate rdf:type . Semantic reasoning based on assigning an rdf:type
property to a resource would indicate that the resource was an instance
of an rdfs:Class (see http://www.w3.org/TR/rdf-schema/#ch_type).
2. To serve as a property of an instance of dwc:Occurrence to allow a
user to evaluate its fitness of use, e.g. an Occurrence having
basisOfRecord=PreservedSpecimen is an Occurrence having supporting
evidence (or a "token") that is a preserved specimen.
3. To formally define the ontological relationship among DwC classes
through the assignment of an rdfs:subClassOf property to a member of the
DwC type vocabularly (see
http://www.w3.org/TR/rdf-schema/#ch_subclassof). As noted below, the
current term definitions state that dwctype:PreservedSpecimen is a
subclass of dwctype:Occurrence which is a subclass of dcmitype:Event
I think that at the time of the adoption of Darwin Core as a standard,
this overloading may have made sense, but now that people are
considering using Darwin Core terms in RDF, this overloading is a
hindrance, not a help. If I state that a resource has
rdf:type=dwctype:PreservedSpecimen in an attempt to describe what the
resource is, I end up causing unintended inferences to be drawn by
semantic reasoners. If I state that a resource has
dwc:basisOfRecord=dwctype:PreservedSpecimen, it is not clear whether I'm
talking about the Occurrence or the physical specimen itself.
SUGGESTIONS
I would like to suggest that the way forward here is to separate these
three functions, at least for the present. It seems clear from the
recent discussion on the email list that the kind of subclassing that
currently exists (#3 above) does not represent the way that many people
in the Darwin Core constituency are looking at PreservedSpecimens,
Occurrences, and Events. I would recommend removing the rdfs:subClassOf
properties from all of the terms in the Darwin Core vocabulary until
more substantial work is completed (with community consensus) on the
TDWG ontology. To facilitate purpose #1, I would also recommend that
all current Darwin Core classes be represented in the dwctype
vocabulary. As I noted, the dwc:Identification class and some others
are missing. Based on what John said below, I guess this was
intentional, but if dwctype was intended for use #1, then all of the
classes should be there. I actually believe that it would be beneficial
to have a PhysicalSpecimen class and to move some Occurrence terms to it
(similar issues with Taxon class), but that is a hornet's nest that I
don't want to kick at the moment. But anyway it would be relatively
straightforward to just include all of the existing classes as dwctypes.
Purpose #2 is the one that seems the most problematic to me. It seems
to me that if we want to refine Occurrence (as seems to be the purpose
of making PreservedSpecimen, HumanObservation, etc. subclasses of
Occurrence), then it would be better to create separate terms for those
refined Occurrences to be used as the object of basisOfRecord rather
than using the dwctypes. Thus we would have something like
basisOfRecord="OccurrenceWithEvidencePreservedSpecimen", etc. rather
than basisOfRecord="PreservedSpecimen". It would then be clear that the
subject is the Occurrence, not the evidence. With this approach, there
would be no problem with making the statement
[OccurrenceWithEvidencePreservedSpecimen] rdfs:subClassOf
[dwctype:Occurrence] in contrast with saying [PreservedSpecimen]
rdfs:subClassOf [dwctype:Occurrence]. I'm not saying that this is a
good thing to do - in fact I don't like it at all based on the
philosophy that Roger laid out in
http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot. But it is the
approach that would get us out of making statements that don't make
sense due to our lack of separation between the desire to refine
Occurrence and to define rdf:type's.
ABOUT THE DigitalStillImage PROPOSAL
Having said that, it seems like DigitalStillImage is not needed to serve
purpose #1. For that we already have
http://purl.org/dc/dcmitype/StillImage . However, if people want to do
#2, then DigitalStillImage should be there as a value for
basisOfRecord. As I said in the paragraph above,
"OccurrenceWithEvidenceDigitalStillImage" would probably be better than
"DigitalStillImage" for this purpose. But I don't really care anymore
because I have no intention of doing #2 since I would prefer not to
subclass Occurrence. Instead I would create a new term (desperately
needed, I think) that is a property of Occurrence and call it
"hasEvidence", "hasToken", "evidenceID", or "tokenID", and that connects
an Occurrence to an evidence URI. I would then type the evidence using
rdf:type. But from what John says, it seems that there may be a lot of
people who want to subclass Occurrence and for their benefit perhaps
DigitalStillImage should be there.
From my point of view, I would be happy with either letting the
proposal go up for a vote as is and let the TAG accept it or shoot it
down, or to "freeze" the proposal (something John at some point hinted
might be possible) until some future time when the ontology and RDF
development have reached a point where there is a consensus view
(expressed in the DwC standard itself, not just in the email list) about
what an Occurrence is and what its properties (including basisOfRecord)
should be and mean. I just think that I'm ready to be done with
worrying about it.
Steve
P.S. I'm going to stick in a few inline comments below as appropriate.
Especially take note of the comment regarding the creation of new
classes and typing.
John Wieczorek wrote:
> ...
>
>
> I think we have a lot of work to go the next step, to get the
> semantics right, for which choices have to be considered more
> carefully than has been done in the past. Thankfully, it seems we have
> reached a critical mass to try to meet the challenge. The effort is
> much appreciated.
Yes that is encouraging!
>
> ...
>
> No. We don't want to say that, even the way Darwin Core is right now,
> because specimens don't have properties separate from Occurrences -
> they ARE Occurrences.
Well, I'm glad to hear you say that because I was starting to think that
I was crazy when I treated PreservedSpecimens, images, sounds, etc. AS
Occurrences in the Biodiversity Informatics paper. I think I got that
notion from reading your posts on the list last (northern hemisphere)
fall.
> ...
>
> Yes. What this says is that Darwin Core, as it currently stands,
> doesn't care to distinguish between these classes. I agree that this
> is a mistake. It makes me wonder if it is EVER appropriate to subClass.
OK, how hard would it be to get rid of the subClassOf properties in the
RDF term definitions? Is that something that the TAG could do by fiat
or would it have to go through the official DwC change process?
>
> ...
>
>
> I guess if one objects to this kind of overloading, one could just
> use the URIs of the DwC classes themselves as values for rdf:type
> rather than using the dwctype URIs. The RDF there doesn't seem to
> have any kind of subclassing (but there is the problem of typing
> PhysicalSpecimen which is not a generic DwC class).
>
>
> And a new class would have to be created every time a new token was
> conceived, and new predicates for every relationship of every new
> token to every other Class they might relate to. Doesn't seem scalable
> to try to define the whole biodiversity universe in this way, so what
> is the "sweet spot" solution. Create only what you need to create to
> solve the specific problem at hand? Don't try to standardize at this
> level?
Well, upon further reflection, I'm thinking that it isn't necessary to
create a new class for every kind of token. I have been mentally
equating Darwin Core classes with rdfs:Class'es. They could be the same
thing, but wouldn't have to be. What is important here is that we have
a way to meet the (somewhat vague) requirement in the GUID Applicability
statement (http://www.tdwg.org/stdtrack/article/download/150/51) that
resources in the biodiversity informatics world should be typed using a
well-known vocabulary. If there are already well-known classes or types
for the thing we need to describe (like StillImage in Dublin Core or
Person in FOAF) then we can use them. The issue comes to a head when
other vocabularies don't have terms for the things we need to type, e.g.
Individuals and PreservedSpecimens. That's where DwC needs to create
either classes or dwctype terms (unencumbered with subClass
properties!). The problem here is that there needs to be a consensus in
the community about appropriate values for rdf:type in RDF provided for
GUIDs. Can a "semantic reasoner" program know that
http://rs.tdwg.org/dwc/dwctype/Occurrence is the same kind of thing as
http://rs.tdwg.org/dwc/terms/Occurrence and as the TDWG ontology URI for
Occurrence (which I can't remember at the moment)? I think not unless
we assert some kind of sameAs relationship, which seems dangerous to
me. The GUID applicability statement specifically mentions the TDWG
ontology, but I thing the GUID implementation is happening too fast to
wait for all of the required "cat herding" that would be needed before a
TDWG Ontology is done enough to serve this purpose. I think dwctype is
the best answer for things that aren't already typed in another
vocabulary. But I'm not going to use it while the subclass problems are
still there.
>
>
> By the way, we don't have one DwCType for every DwC (or borrowed
> Dublin Core) Class as you said. There are no types for
> Identification, Event, and GeologicalContext
>
>
> There is a type for Event
> (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but
> not for the other two. When I said "We have one DwCType for every DwC
> (or borrowed Dublin Core) Class" I should have added "for which it
> makes sense to have a record."
As I implied above, I think it makes sense to have a dwctype for an
instance of any DwC class for which one might reasonably create a
separate GUID (which is probably all of the classes).
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101103/92936ed2/attachment-0001.html
More information about the tdwg-content
mailing list