Because of actual work that I've had to get done, I haven't had time yet to carefully read Rich's response and to carefully go through Cam and Pete's posts to digest them. But I also am encouraged by this discussion because it seems like most people are agreeing on the basic conceptual arrangement of entities in Rich's diagram. In some cases people choose to "collapse" the more general model when some of the components have only one-to-one connections (e.g. leave out individuals because all individuals in a database have only one occurrence, leave out event because every occurrence in a database has a separate event defined by atomized lat/long/time) but there seems to be a general agreement that those omitted components exist conceptually and that it is convenient for other users to include them when they are needed as nodes for one-to-many relationships. This makes the creation of an eventual general template for RDF simpler because it means there will be less arguing about how entities in the RDF should be "connected" to each other (i.e. what are the appropriate classes of subjects and objects).
As I said, I haven't yet looked carefully at Cam's example, but she made a comment about blank nodes. One of the things that's troubled me is how to have a consistent RDF template that can be used for both records generated by people who are "compressing" their databases as I described above and people who aren't "compressing". For example, if people have a database that only contains one specimen per individual, they are probably going to be generating GUIDs (i.e. URIs) for the specimens but not for the individuals that exist but weren't explicitly recognized in the database they are using to generate the RDF. According to the "Rich diagram1" general model, the dwc:Identification should be connected to the Individual and the Individual to the Occurrence, but since the specimen databaser didn't explicitly assign a URI to the Individual, the RDF would have a blank node for the Individual. The solution I settled on (illustrated in the http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf example with the non-actionable URIs, use view page source to see the underlying RDF) was to create a default URI for the assumed Individual by slapping a "#ind" onto the end of the URI for the specimen. I should probably do the same thing in RDF like http://bioimages.vanderbilt.edu/baskauf/51249.rdf where I ignore dwc:Location as an entity (i.e. I "collapse" Rich's model because all of my Occurrences have separate Lat/Long), i.e. I should probably enclose the RDF for the location metadata in a rdf:Description about http://bioimages.vanderbilt.edu/baskauf/51249#location element. That would make my RDF format consistent with that of others who connected multiple Occurrences to a single Location and would also make it possible for someone to "reuse" my Location identifier if they later wanted to assert that an event happened at the same location. (I got this idea by looking at Pete's RDF!)
This question of when one needs to apply a GUID to a resource came up in the draft Beginners Guide to Persistent Identifiers. In cases like I discussed above where there is only a single resource connected to another resource that has an explicitly assigned GUID, having a default method for creating "assumed" URIs would reduce the need to generate and maintain a lot of separate identifiers for entities that that the creator of the GUID isn't really interested in.
Steve
Cam Webb wrote:
Dear Steve and Rich,
Encouraged by your discussion of models of Occurrences and Individuals, and by Steve's related Biodiv. Informatics paper, I have modeled a real example of an individual plant and some of its various occurrences in RDF, using Steve's sernec terms to provide the predicates that are missing from DwC. As I did so, a number of questions came up relating to choices of terms, and I would greatly appreciate your input on these choices. The following includes all the choices considered, and so may not be semantically correct. The questions (Q1-9) are interspersed with the RDF (serialized as Turtle).
@prefix dwc: http://rs.tdwg.org/dwc/terms/ . @prefix dwcvoc: http://rs.tdwg.org/ontology/voc/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix sernec: http://bioimages.vanderbilt.edu/rdf/terms# .
http://phylodiversity.net/xmalesia/indiv/9 a sernec:Individual ; # Q1 - Is this an Individual ... a dwc:individualID ; # Q1 - ... or an individualID (Baskauf 2010 app'x)?
# Specimen sernec:derivativeOccurrence [ # Q2 : Use generic Occurrence from dwc:Occurrence ... a dwc:Occurrence ; dwc:basisOfRecord "PreservedSpecimen" ; dwc:recordNumber "Webb 5008" ; dwc:recordedBy "Cam Webb" ; # Q2 : ... or treat directly as a Specimen? a dwcvoc:Specimen ; dwcvoc:collectorsFieldNumber "5008" ; dwcvoc:collector "Cam Webb" ; # Q3 : Add the dwc:eventDate here as suggested by Baskauf? dwc:eventDate "2008-01-01" ; # Q4 : Treat occurrence as generic resource, using dc metadata? dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; # Q5 : Add dwc location data for Occurrence... dwc:coordinateUncertaintyInMeters "100" ; dwc:decimalLatitude "-1.25530" ; dwc:decimalLongitude "109.95371" ; dwc:geodeticDatum "WGS84" ; dwc:locality "Sukadana" ; # Q5 : ... or a Location. dcterms:spatial _:blank1 ; ] ; # Photo: sernec:derivativeOccurrence [ # Q6 : a dwc:Occurrence... a dwc:Occurrence ; # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better? a dwcvoc:TaxonOccurrence ; # Q7 : Again, use dwc terms... dwc:occurrenceID <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ; dwc:basisOfRecord "DigitalStillImage" ; dwc:recordedBy "Cam Webb" ; dwc:eventDate "2008-01-01" ; # Q7 : or cd terms? dcterms:identifier <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:type <http://purl.org/dc/dcmitype/StillImage> ; # Q8 : Spatial data, same issue as above dcterms:spatial _:blank1 ; ] .
# Determination [] a dwc:Identification ; sernec:identifiesIndividual http://phylodiversity.net/xmalesia/indiv/9 ; dwc:identifiedBy "Ferry Slik" ; dwc:taxonConceptID urn:lsid:ubio.org:namebank:5963772 ; dwc:dateIdentified "2009-02-22" ; # Q9 : Use dwc:identificationReferences or... dwc:identificationReferences http://phylodiversity.net/xmalimg/cw_28617.400px.jpg ; # Q9 : ... sernec:basedOnOccurrence ? sernec:basedOnOccurrence http://phylodiversity.net/xmalimg/cw_28617.400px.jpg .
# Location data for photo and specimen _:blank1 a dcterms:Location ; geo:lon "109.95371" ; geo:lat "-1.25530" ; dwc:locality "Sukadana, on Tanah Merah road to beach" ; dwc:coordinateUncertaintyInMeters "100" .
I realize that for LOD applications the blank nodes should eventually have GUIDs. Now, here is a slimmed down version of the above with my own choices. In general, I went with dcterms over dwc, where appropriate. You can also see the network (via dot) at: http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp...
@prefix dwc: http://rs.tdwg.org/dwc/terms/ . @prefix dwcvoc: http://rs.tdwg.org/ontology/voc/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix sernec: http://bioimages.vanderbilt.edu/rdf/terms# .
http://phylodiversity.net/xmalesia/indiv/9 a sernec:Individual ; sernec:derivativeOccurrence [ # Specimen a dwc:Occurrence ; dwc:basisOfRecord "PreservedSpecimen" ; dcterms:identifier "Webb 5008" ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:spatial _:blank1 ; ] ; sernec:derivativeOccurrence [ # Photo a dwc:Occurrence ; dwc:basisOfRecord "DigitalStillImage" ; dcterms:identifier http://phylodiversity.net/xmalimg/cw_28617.400px.jpg ; dcterms:creator "Cam Webb" ; dcterms:created "2008-01-01" ; dcterms:spatial _:blank1 ; ] .
[] a dwc:Identification ; sernec:identifiesIndividual http://phylodiversity.net/xmalesia/indiv/9 ; dwc:identifiedBy "Ferry Slik" ; dwc:taxonConceptID urn:lsid:ubio.org:namebank:5963772 ; dwc:dateIdentified "2009-02-22" ; sernec:basedOnOccurrence http://phylodiversity.net/xmalimg/cw_28617.400px.jpg .
_:blank1 a dcterms:Location ; geo:lon "109.95371" ; geo:lat "-1.25530" ; dwc:locality "Sukadana, on Tanah Merah road to beach" ; dwc:coordinateUncertaintyInMeters "100" .
I didn't think this could be done without creating new terms, so I'm very pleased be getting closer to my goal of a LOD representation of our data that maintains the Individuals as base entities.
Many thanks in advance for any thoughts.
Best,
Cam
.