[tdwg-content] Use of dwc:xxxxID terms in RDF, was Re: dwc:associatedOccurrences

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Aug 31 21:34:22 CEST 2010


Markus,
Thanks!  After I posted the other night I was thinking that the way I 
had constructed the RDF was a bit convoluted.  Your example is much more 
straightforward.  (although except for a lack of typing on my part the 
two examples actually result in the same triples). 

As you noted, the other thing that my example has is that the record for 
the offspring makes reference to the dwc:resourceRelationshipID as a way 
to assert that the offspring has that relationship.  I would be 
interested in some discussion about the appropriate use of the varous 
dwc:xxxxID terms (e.g. dwc:occurrenceID, dwc:institutionID, etc.).  As 
was the case with the resourceRelationship terms, there aren't a lot of 
examples showing the proper use of these terms, with the noteworthy 
exception of http://rs.tdwg.org/dwc/terms/guides/xml/index.htm .  As I 
see it, there are two possible uses of these terms.  One (the "first" 
use) is to identify the actual resource that is the subject of a 
record.  The other (the "second" use) is as an "IDRef" term that 
connects the subject resource to another record of a different type.  In 
the XML example, we see both uses.  In the first example in section 2.7 
we see

    <dcterms:Location>
        <dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
             ... more metadata...
    </dcterms:Location>
    <dwc:Occurrence>
             ... more metadata ...
        <dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
             ... more metadata ...
        <dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
    </dwc:Occurrence>

where in the dcterms:Location record, dwc:locationID indicates the 
identifier for the Location itself (i.e. the subject), while in the 
dwc:Occurrence record, dwc:locationID refers to the object of the 
location property of the Occurrence.  Because of the freewheeling nature 
of generic XML files, I guess this is OK.  However, when I'm trying to 
think clearly about how to express relationships in RDF, I've come to 
decide that I don't like this dual use.  I would express the above 
relationships in RDF as follows:

<rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
    <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
       ...metadata...
</rdf:Description>
<rdf:Description 
rdf:about="http://resolver.org/urn:catalog:MVZ:Mammals:14523">
    <rdfs:type rdf:resource="http://rs.tdwg.org/dwc/terms/Occurrence"/>
       ... metadata...
    <dwc:locationID rdf:resource="http://guid.mvz.org/sites/arg/127"/>
</rdf:Description>

To me this makes the most sense.  It corresponds to the "second" use in 
the XML example (it indicates the Location property of the Occurrence, 
i.e. serves as the predicate connecting the subject Occurrence to the 
object Location).   One could also do something like

<rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
    <dwc:locationID rdf:resource="http://guid.mvz.org/sites/arg/127"/>
    <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
       ...metadata...
etc.

or

<rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
    <dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
    <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
       ...metadata...
etc.

But this seems rather pointless because in the first case, the rdf:about 
attribute already provides information about the subject of the 
relationship in a Linked Data manner.  In the second case, why have a 
special term to indicate a literal version of the identifier when there 
is a more generic and well-known term that is common use? e.g.

<rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
    
<dcterms:identifier>http://guid.mvz.org/sites/arg/127</dcterms:identifier>
    <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
       ...metadata...
etc.

In contrast, there is no simple alternative that I can see (with the 
possible exception of the dwc:relatedResource, which I wouldn't call a 
SIMPLE alternative) to using dwc:locationID as the predicate that points 
to the Location object of an Occurrence subject.  Darwin Core doesn't 
have any terms specifically designed for this (such as "hasLocation" or 
"hasIdentificatin" for example) but I don't see any reason why the 
dwc:xxxxID terms couldn't be used this way. 

The main issue I can see with the use of these dwc:xxxxID terms is their 
placement in classes which I guess is related to their rdfs:domain .  
Although the term description comments at 
http://darwincore.googlecode.com/svn/trunk/rdf/dwcterms.rdf (is this the 
actual place where DwC is officially defined?) say that each term has a 
rdfs:domain property, they actually don't in the RDF.  So the only real 
clue to the intended subjects of the dwc:xxxxID terms is the placement 
of the dwc:xxxxID terms into classes.  In most cases, a dwc:xxxxID term 
is placed in the class of the thing that the term is identifying (e.g. 
occurrenceID is in the Occurrence class, identificationID is in the 
Identification class, etc.).  This would hint that the terms were 
intended to be used with instances of their class as their subject.  
However, if the terms were used as IDRefs (the "second" use shown in the 
XML example) then they really should be in different classes.  For 
example, dwc:locationID could be a member of the class Occurrence, since 
an Occurrence could have dwc:locationID as a property with an instance 
of a Location as its object.  But then there might be the question of 
whether an Event could also have a dwc:locationID as a property and 
dwc:locationID shouldn't be in two classes.  In the absence of defined 
rdfs:domains, I guess the terms can be used with pretty much any subject 
a user thinks makes sense.  As Bob Morris pointed out in a recent post, 
there aren't very many applications that are doing much in the way of 
semantic reasoning at this point.  But I suppose (hope?) that there 
could be some in the future, so achieving some kind of consensus on how 
the terms should be used in RDF would probably be good.  

At this point, the dwc:xxxxID terms serve a useful purpose for me in 
connecting one kind of resource to another (examples in 
http://bioimages.vanderbilt.edu/ind-baskauf/41870.rdf and 
http://bioimages.vanderbilt.edu/baskauf/51363.rdf), so unless somebody 
tells me that's "wrong" and comes up with a better term to describe the 
relationships I need to describe, I'll keep using the dwc:xxxxID terms 
as IDRefs.

Comments?
Steve

Markus Döring wrote:
> I must say it feels strange to use the id terms in rdf to refer to rdf:resources instead of holding ID literals.
> More natural to the rdf language would be <dwc:resource rdf:resource="..." /> and <dwc:relatedResource rdf:resource="..." /> I suppose, but the dwc vocabulary was built for different technologies, not rdf alone. So I guess thats the price we have to pay.
>
> Markus
>   
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20100831/fa9ec4a6/attachment.html 


More information about the tdwg-content mailing list