Cam, I have finally taken the time to look carefully at your RDF example (I'm not used to the Turtle serialization but managed to translate it into XML which is the way I "think" about RDF). I'm not going to try comment on every question that you asked because since this discussion has been going on I've changed my thinking somewhat about how I would model certain things. But you raise a number of important questions and I'll give opinions on a few. Whether those opinions are shared by others or not remains to be seen and should be part of the discussion if an RDF task group gets off the ground.
Q6. The question of how DwC resources should be rdf:type'd remains open. When I first tried to write RDF using DwC terms, I tried to type things using dwcvoc: . However, there were too many types of resources that didn't have terms there and when I looked at the ontology, I wasn't sure that some of the terms that were there actually meant what I thought they should. So I gave up and just decided to use the Darwin Core classes since they also qualified as "well known". The DwC type vocabulary is another possibility for typing since it includes both some of the DwC classes as well as other types, such as PreservedSpecimen, which we would need if the model of separating the "token" from the Occurrences were followed. However, the Identification class isn't included in the DwC type vocabulary (is that intentional or an oversight?). Also, tokens that are StillImages, Sounds, etc. would have to be typed using the Dublin Core type vocabulary. So at this point it seems like the rdf:type values would have to be drawn from at least three different sources to get the job done.
Q1. I think sernec:Individual would be right rather than dwc:individualID (as described above). I originally used dwc:individualID, but I now think that is not right and that the xxxxxID terms should be used to show the relationship among described resources.
Q2-4. If the "token" is separated from the Occurrence, then dwc:recordedBy is a property of the Occurrence and dcterms:created and dcterms:creator are properties of the token (if it's a create-able thing).
Q3 and Q5. I think that for the sake of a consistent RDF structure that a client could actually know how to "crawl" and "understand", it would be best to have nodes (having URI identifiers) for all of the resources that end up being in a consensus fully-normalized model like http://bioimages.vanderbilt.edu/pages/token-explicit.gif . As you know, I suggested the strategy of creating hash URIs for naming nodes for which the user doesn't care to maintain as separate database items. This has worked well for me in my experimentation.
Q7. I think we had a discussion in an earlier thread as to whether in RDF the xxxxxID terms should be used to identify the subject resource or just be reserved for indicating a reference to another related resource. It was suggested (and I agree) that since a tag like <dwc:occurrence rdf:about="http://phylodiversity.net/xmalesia/occur/9-1%22%3E already indicates that http://phylodiversity.net/xmalesia/occur/9-1 is a URI that identifies the occurrence, it's a bit redundant to also assert an identifier as an explicit property of the occurrence. But I suppose it doesn't hurt anything. Pete uses dcterms:identifier to do this as you did in your image example.
Q9. If dwc:identificationReferences is appropriate here, then sernec:basedOnOccurrence does not need to exist. Actually sernec:basedOnOccurrence probably shouldn't be used anyway if we separate tokens from their Occurrences (the appropriate term would then be basedOnToken or something like that).
Hmm. I guess I ended up commenting on most of the questions anyway. Two more general comments. 1. After considerably thought, I've decided that I don't want to use direct access URLs for images as their identifying URIs. There is nothing "wrong" with doing so, but once you use it as a GUID, you're stuck with keeping the image at that location forever. Also, that URI then refers to the specific pattern of bytes in the particular version of the image that you are serving from that URL which also may not ever change (i.e. no editing). A lot of the image metadata applies to any sized version of the image, not just the one that you've identified using the URL. Then there are content negotiation issues with using a .jpg extension for a URI which I could discuss but won't get into here. For all of these reasons, I've decided for myself that I'd prefer to consider the image as a conceptual thing (non-information resource) and assign it an identifier with no extension which could then be subject to content negotiation. I then use MRTG service access class instances to provide the mrtg:accessURL's for whatever sizes of images I want to provide. Because the accessURLs are metadata and not themselves identifiers, I can change the access URLs without breaking any GUID rules. This gives you the option to move your high-res images to an image repository rather than serving it from the domain from which your RDF is being served. You can see an example of this approach at: http://bioimages.vanderbilt.edu/baskauf/10685.rdf 2. If one assigns URIs to each resource included in the RDF file (i.e. the Individual, the image, the Occurrence, the Event, etc.), the degree of nesting can be reduced and blank nodes eliminated. Of course you then need a way to connect the various resources. I have been using the xxxxxID terms for this, i.e. to say that the individual has a certain Occurrence I say [individual] dwc:occurrenceID [occurrence] I think this is within the spirit of what the xxxxxxxID terms were intended to do and if we can use them in this way, it greatly reduces the number of new terms that would have to be created to express DwC in RDF (i.e. we don't need to make up dwc:hasOccurrence). The downside to this is that few (none?) of the relationships that could be expressed by xxxxxxID terms have inverse properties defined. I made up a few in the sernec: vocabulary, but the need for such terms would have to be discussed at some point in a future RDF task group. I don't know enough about how semantic clients work to know if just providing the properties in one direction would be good enough for the client to infer the inverse relationship and make use of it as necessary.
Hope these comments are helpful. I am a novice RDF user, so take what I've said with a grain of salt. Steve
Cam Webb wrote:
Dear Steve and Rich,
Encouraged by your discussion of models of Occurrences and Individuals, and by Steve's related Biodiv. Informatics paper, I have modeled a real example of an individual plant and some of its various occurrences in RDF, using Steve's sernec terms to provide the predicates that are missing from DwC. As I did so, a number of questions came up relating to choices of terms, and I would greatly appreciate your input on these choices. The following includes all the choices considered, and so may not be semantically correct. The questions (Q1-9) are interspersed with the RDF (serialized as Turtle).
@prefix dwc: http://rs.tdwg.org/dwc/terms/ . @prefix dwcvoc: http://rs.tdwg.org/ontology/voc/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix sernec: http://bioimages.vanderbilt.edu/rdf/terms# .
http://phylodiversity.net/xmalesia/indiv/9 a sernec:Individual ; # Q1 - Is this an Individual ... a dwc:individualID ; # Q1 - ... or an individualID (Baskauf 2010 app'x)?
# Specimen sernec:derivativeOccurrence [ # Q2 : Use generic Occurrence from dwc:Occurrence ... a dwc:Occurrence ; dwc:basisOfRecord "PreservedSpecimen" ; dwc:recordNumber "Webb 5008" ; dwc:recordedBy "Cam Webb" ; # Q2 : ... or treat directly as a Specimen? a dwcvoc:Specimen ; dwcvoc:collectorsFieldNumber "5008" ; dwcvoc:collector "Cam Webb" ; # Q3 : Add the dwc:eventDate here as suggested by Baskauf? dwc:eventDate "2008-01-01" ; # Q4 : Treat occurrence as generic resource, using dc metadata? dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; # Q5 : Add dwc location data for Occurrence... dwc:coordinateUncertaintyInMeters "100" ; dwc:decimalLatitude "-1.25530" ; dwc:decimalLongitude "109.95371" ; dwc:geodeticDatum "WGS84" ; dwc:locality "Sukadana" ; # Q5 : ... or a Location. dcterms:spatial _:blank1 ; ] ; # Photo: sernec:derivativeOccurrence [ # Q6 : a dwc:Occurrence... a dwc:Occurrence ; # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better? a dwcvoc:TaxonOccurrence ; # Q7 : Again, use dwc terms... dwc:occurrenceID <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ; dwc:basisOfRecord "DigitalStillImage" ; dwc:recordedBy "Cam Webb" ; dwc:eventDate "2008-01-01" ; # Q7 : or cd terms? dcterms:identifier <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:type <http://purl.org/dc/dcmitype/StillImage> ; # Q8 : Spatial data, same issue as above dcterms:spatial _:blank1 ; ] .
# Determination [] a dwc:Identification ; sernec:identifiesIndividual http://phylodiversity.net/xmalesia/indiv/9 ; dwc:identifiedBy "Ferry Slik" ; dwc:taxonConceptID urn:lsid:ubio.org:namebank:5963772 ; dwc:dateIdentified "2009-02-22" ; # Q9 : Use dwc:identificationReferences or... dwc:identificationReferences http://phylodiversity.net/xmalimg/cw_28617.400px.jpg ; # Q9 : ... sernec:basedOnOccurrence ? sernec:basedOnOccurrence http://phylodiversity.net/xmalimg/cw_28617.400px.jpg .
# Location data for photo and specimen _:blank1 a dcterms:Location ; geo:lon "109.95371" ; geo:lat "-1.25530" ; dwc:locality "Sukadana, on Tanah Merah road to beach" ; dwc:coordinateUncertaintyInMeters "100" .
I realize that for LOD applications the blank nodes should eventually have GUIDs. Now, here is a slimmed down version of the above with my own choices. In general, I went with dcterms over dwc, where appropriate. You can also see the network (via dot) at: http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp...
@prefix dwc: http://rs.tdwg.org/dwc/terms/ . @prefix dwcvoc: http://rs.tdwg.org/ontology/voc/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix sernec: http://bioimages.vanderbilt.edu/rdf/terms# .
http://phylodiversity.net/xmalesia/indiv/9 a sernec:Individual ; sernec:derivativeOccurrence [ # Specimen a dwc:Occurrence ; dwc:basisOfRecord "PreservedSpecimen" ; dcterms:identifier "Webb 5008" ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:spatial _:blank1 ; ] ; sernec:derivativeOccurrence [ # Photo a dwc:Occurrence ; dwc:basisOfRecord "DigitalStillImage" ; dcterms:identifier http://phylodiversity.net/xmalimg/cw_28617.400px.jpg ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:spatial _:blank1 ; ] .
[] a dwc:Identification ; sernec:identifiesIndividual http://phylodiversity.net/xmalesia/indiv/9 ; dwc:identifiedBy "Ferry Slik" ; dwc:taxonConceptID urn:lsid:ubio.org:namebank:5963772 ; dwc:dateIdentified "2009-02-22" ; sernec:basedOnOccurrence http://phylodiversity.net/xmalimg/cw_28617.400px.jpg .
_:blank1 a dcterms:Location ; geo:lon "109.95371" ; geo:lat "-1.25530" ; dwc:locality "Sukadana, on Tanah Merah road to beach" ; dwc:coordinateUncertaintyInMeters "100" .
I didn't think this could be done without creating new terms, so I'm very pleased be getting closer to my goal of a LOD representation of our data that maintains the Individuals as base entities.
Many thanks in advance for any thoughts.
Best,
Cam
.