Re: [tdwg-content] Comments on Cam's RDF practical details of recording a determination What is an Occurrence?

28 Oct 2010

      Cam,
I have finally taken the time to look carefully at your RDF example (I'm 
not used to the Turtle serialization but managed to translate it into 
XML which is the way I "think" about RDF).  I'm not going to try comment 
on every question that you asked because since this discussion has been 
going on I've changed my thinking somewhat about how I would model 
certain things.  But you raise a number of important questions and I'll 
give opinions on a few.  Whether those opinions are shared by others or 
not remains to be seen and should be part of the discussion if an RDF 
task group gets off the ground. 

Q6.  The question of how DwC resources should be rdf:type'd remains 
open.  When I first tried to write RDF using DwC terms, I tried to type 
things using dwcvoc: .  However, there were too many types of resources 
that didn't have terms there and when I looked at the ontology, I wasn't 
sure that some of the terms that were there actually meant what I 
thought they should.  So I gave up and just decided to use the Darwin 
Core classes since they also qualified as "well known".  The DwC type 
vocabulary is another possibility for typing since it includes both some 
of the DwC classes as well as other types, such as PreservedSpecimen, 
which we would need if the model of separating the "token" from the 
Occurrences were followed.  However, the Identification class isn't 
included in the DwC type vocabulary (is that intentional or an 
oversight?).  Also, tokens that are StillImages, Sounds, etc. would have 
to be typed using the Dublin Core type vocabulary.  So at this point it 
seems like the rdf:type values would have to be drawn from at least 
three different sources to get the job done. 

Q1. I think sernec:Individual would be right rather than 
dwc:individualID (as described above).  I originally used 
dwc:individualID, but I now think that is not right and that the xxxxxID 
terms should be used to show the relationship among described resources. 

Q2-4.  If the "token" is separated from the Occurrence, then 
dwc:recordedBy is a property of the Occurrence and dcterms:created and 
dcterms:creator are properties of the token (if it's a create-able thing). 

Q3 and Q5.  I think that for the sake of a consistent RDF structure that 
a client could actually know how to "crawl" and "understand", it would 
be best to have nodes (having URI identifiers) for all of the resources 
that end up being in a consensus fully-normalized model like 
http://bioimages.vanderbilt.edu/pages/token-explicit.gif .  As you know, 
I suggested the strategy of creating hash URIs for naming nodes for 
which the user doesn't care to maintain as separate database items.  
This has worked well for me in my experimentation. 

Q7.  I think we had a discussion in an earlier thread as to whether in 
RDF the xxxxxID terms should be used to identify the subject resource or 
just be reserved for indicating a reference to another related 
resource.  It was suggested (and I agree) that since a tag like
<dwc:occurrence rdf:about="http://phylodiversity.net/xmalesia/occur/9-1">
already indicates that http://phylodiversity.net/xmalesia/occur/9-1 is a 
URI that identifies the occurrence, it's a bit redundant to also assert 
an identifier as an explicit property of the occurrence.  But I suppose 
it doesn't hurt anything.  Pete uses <dcterms:identifier> to do this as 
you did in your image example. 

Q9. If dwc:identificationReferences is appropriate here, then 
sernec:basedOnOccurrence does not need to exist.  Actually 
sernec:basedOnOccurrence probably shouldn't be used anyway if we 
separate tokens from their Occurrences (the appropriate term would then 
be basedOnToken or something like that). 

Hmm.  I guess I ended up commenting on most of the questions anyway.  
Two more general comments. 
1. After considerably thought, I've decided that I don't want to use 
direct access URLs for images as their identifying URIs.  There is 
nothing "wrong" with doing so, but once you use it as a GUID, you're 
stuck with keeping the image at that location forever.  Also, that URI 
then refers to the specific pattern of bytes in the particular version 
of the image that you are serving from that URL which also may not ever 
change (i.e. no editing).  A lot of the image metadata applies to any 
sized version of the image, not just the one that you've identified 
using the URL.  Then there are content negotiation issues with using a 
.jpg extension for a URI which I could discuss but won't get into here.  
For all of these reasons, I've decided for myself that I'd prefer to 
consider the image as a conceptual thing (non-information resource) and 
assign it an identifier with no extension which could then be subject to 
content negotiation.  I then use MRTG service access class instances to 
provide the mrtg:accessURL's for whatever sizes of images I want to 
provide.  Because the accessURLs are metadata and not themselves 
identifiers, I can change the access URLs without breaking any GUID 
rules.  This gives you the option to move your high-res images to an 
image repository rather than serving it from the domain from which your 
RDF is being served.  You can see an example of this approach at:
http://bioimages.vanderbilt.edu/baskauf/10685.rdf
2. If one assigns URIs to each resource included in the RDF file (i.e. 
the Individual, the image, the Occurrence, the Event, etc.), the degree 
of nesting can be reduced and blank nodes eliminated.  Of course you 
then need a way to connect the various resources.  I have been using the 
xxxxxID terms for this, i.e. to say that the individual has a certain 
Occurrence I say
[individual] dwc:occurrenceID [occurrence]
I think this is within the spirit of what the xxxxxxxID terms were 
intended to do and if we can use them in this way, it greatly reduces 
the number of new terms that would have to be created to express DwC in 
RDF (i.e. we don't need to make up dwc:hasOccurrence).  The downside to 
this is that few (none?) of the relationships that could be expressed by 
xxxxxxID terms have inverse properties defined.  I made up a few in the 
sernec: vocabulary, but the need for such terms would have to be 
discussed at some point in a future RDF task group.  I don't know enough 
about how semantic clients work to know if just providing the properties 
in one direction would be good enough for the client to infer the 
inverse relationship and make use of it as necessary. 

Hope these comments are helpful.  I am a novice RDF user, so take what 
I've said with a grain of salt.
Steve

Cam Webb wrote:
...
Dear Steve and Rich,
Encouraged by your discussion of models of Occurrences and Individuals, 
and by Steve's related Biodiv. Informatics paper, I have modeled a real 
example of an individual plant and some of its various occurrences in RDF, 
using Steve's sernec terms to provide the predicates that are missing from 
DwC.  As I did so, a number of questions came up relating to choices of 
terms, and I would greatly appreciate your input on these choices.  The 
following includes all the choices considered, and so may not be 
semantically correct.  The questions (Q1-9) are interspersed with the RDF 
(serialized as Turtle).
@prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
@prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
<http://phylodiversity.net/xmalesia/indiv/9>
     a sernec:Individual ;  # Q1 - Is this an Individual ...
     a dwc:individualID ;   # Q1 - ... or an individualID (Baskauf 2010 app'x)?
# Specimen
     sernec:derivativeOccurrence [
         # Q2 : Use generic Occurrence from dwc:Occurrence ...
         a dwc:Occurrence ;
         dwc:basisOfRecord "PreservedSpecimen" ;
         dwc:recordNumber "Webb 5008" ;
         dwc:recordedBy "Cam Webb" ;
         # Q2 : ... or treat directly as a Specimen?
         a dwcvoc:Specimen ;
         dwcvoc:collectorsFieldNumber "5008" ;
         dwcvoc:collector "Cam Webb" ;
         # Q3 : Add the dwc:eventDate here as suggested by Baskauf?
         dwc:eventDate "2008-01-01" ;
         # Q4 : Treat occurrence as generic resource, using dc metadata?
         dcterms:creator "Cam Webb" ;
         dcterms:created "2008-01-01" ;
         # Q5 : Add dwc location data for Occurrence...
         dwc:coordinateUncertaintyInMeters "100" ;
         dwc:decimalLatitude "-1.25530" ;
         dwc:decimalLongitude "109.95371" ;
         dwc:geodeticDatum "WGS84" ;
         dwc:locality "Sukadana" ;
         # Q5 : ... or a Location.
         dcterms:spatial _:blank1 ;
         ] ;
# Photo:
     sernec:derivativeOccurrence [
         # Q6 : a dwc:Occurrence...
         a dwc:Occurrence ;
         # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better?
         a dwcvoc:TaxonOccurrence ;
         # Q7 : Again, use dwc terms...
         dwc:occurrenceID
                 <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
         dwc:basisOfRecord "DigitalStillImage" ;
         dwc:recordedBy "Cam Webb" ;
         dwc:eventDate "2008-01-01" ;
         # Q7 : or cd terms?
         dcterms:identifier
                 <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
         dcterms:creator "Cam Webb" ;
         dcterms:created "2008-01-01" ;
         dcterms:type <http://purl.org/dc/dcmitype/StillImage> ;
         # Q8 : Spatial data, same issue as above
         dcterms:spatial _:blank1 ;
         ] .
# Determination
[]  a dwc:Identification ;
     sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
     dwc:identifiedBy "Ferry Slik" ;
     dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
     dwc:dateIdentified "2009-02-22" ;
     # Q9 : Use dwc:identificationReferences or...
     dwc:identificationReferences
          <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
     # Q9 : ... sernec:basedOnOccurrence ?
     sernec:basedOnOccurrence
          <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
# Location data for photo and specimen
_:blank1
     a dcterms:Location ;
     geo:lon "109.95371" ;
     geo:lat "-1.25530" ;
     dwc:locality "Sukadana, on Tanah Merah road to beach" ;
     dwc:coordinateUncertaintyInMeters "100" .
I realize that for LOD applications the blank nodes should eventually have 
GUIDs.  Now, here is a slimmed down version of the above with my own 
choices. In general, I went with dcterms over dwc, where appropriate. 
You can also see the network (via dot) at: 
http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or 
http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp...
@prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
@prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
<http://phylodiversity.net/xmalesia/indiv/9>
     a sernec:Individual ;
     sernec:derivativeOccurrence [  # Specimen
         a dwc:Occurrence ;
         dwc:basisOfRecord "PreservedSpecimen" ;
         dcterms:identifier "Webb 5008" ;
         dcterms:creator "Cam Webb" ;
         dcterms:created "2008-01-01" ;
         dcterms:spatial _:blank1 ;
         ] ;
     sernec:derivativeOccurrence [  # Photo
         a dwc:Occurrence ;
         dwc:basisOfRecord "DigitalStillImage" ;
     	dcterms:identifier
                 <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
         dcterms:creator "Cam Webb" ;
         dcterms:created "2008-01-01" ;
         dcterms:spatial _:blank1 ;
         ] .
[]  a dwc:Identification ;
     sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
  dwc:identifiedBy "Ferry Slik" ;
  dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
     dwc:dateIdentified "2009-02-22" ;
     sernec:basedOnOccurrence
          <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
_:blank1
     a dcterms:Location ;
     geo:lon "109.95371" ;
     geo:lat "-1.25530" ;
     dwc:locality "Sukadana, on Tanah Merah road to beach" ;
     dwc:coordinateUncertaintyInMeters "100" .
I didn't think this could be done without creating new terms, so I'm very 
pleased be getting closer to my goal of a LOD representation of our data
that maintains the Individuals as base entities.
Many thanks in advance for any thoughts.
Best,
Cam
.
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

Re: [tdwg-content] Comments on Cam's RDF practical details of recording a determination What is an Occurrence?

Steve Baskauf