[tdwg-content] practical details of recording a determination What is an Occurrence?

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed Oct 20 19:06:43 CEST 2010


Because of actual work that I've had to get done, I haven't had time yet 
to carefully read Rich's response and to carefully go through Cam and 
Pete's posts to digest them.  But I also am encouraged by this 
discussion because it seems like most people are agreeing on the basic 
conceptual arrangement of entities in Rich's diagram.  In some cases 
people choose to "collapse" the more general model when some of the 
components have only one-to-one connections (e.g. leave out individuals 
because all individuals in a database have only one occurrence, leave 
out event because every occurrence in a database has a separate event 
defined by atomized lat/long/time) but there seems to be a general 
agreement that those omitted components exist conceptually and that it 
is convenient for other users to include them when they are needed as 
nodes for one-to-many relationships.  This makes the creation of an 
eventual general template for RDF simpler because it means there will be 
less arguing about how entities in the RDF should be "connected" to each 
other (i.e. what are the appropriate classes of subjects and objects).

As I said, I haven't yet looked carefully at Cam's example, but she made 
a comment about blank nodes.  One of the things that's troubled me is 
how to have a consistent RDF template that can be used for both records 
generated by people who are "compressing" their databases as I described 
above and people who aren't "compressing".  For example, if people have 
a database that only contains one specimen per individual, they are 
probably going to be generating GUIDs (i.e. URIs) for the specimens but 
not for the individuals that exist but weren't explicitly recognized in 
the database they are using to generate the RDF.  According to the "Rich 
diagram1" general model, the dwc:Identification should be connected to 
the Individual and the Individual to the Occurrence, but since the 
specimen databaser didn't explicitly assign a URI to the Individual, the 
RDF would have a blank node for the Individual.  The solution I settled 
on (illustrated in the 
http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf example 
with the non-actionable URIs, use view page source to see the underlying 
RDF) was to create a default URI for the assumed Individual by slapping 
a "#ind" onto the end of the URI for the specimen.  I should probably do 
the same thing in RDF like 
http://bioimages.vanderbilt.edu/baskauf/51249.rdf where I ignore 
dwc:Location as an entity (i.e. I "collapse" Rich's model because all of 
my Occurrences have separate Lat/Long), i.e. I should probably enclose 
the RDF for the location metadata in a rdf:Description about 
http://bioimages.vanderbilt.edu/baskauf/51249#location element.  That 
would make my RDF format consistent with that of others who connected 
multiple Occurrences to a single Location and would also make it 
possible for someone to "reuse" my Location identifier if they later 
wanted to assert that an event happened at the same location.  (I got 
this idea by looking at Pete's RDF!)

This question of when one needs to apply a GUID to a resource came up in 
the draft Beginners Guide to Persistent Identifiers.  In cases like I 
discussed above where there is only a single resource connected to 
another resource that has an explicitly assigned GUID, having a default 
method for creating "assumed" URIs would reduce the need to generate and 
maintain a lot of separate identifiers for entities that that the 
creator of the GUID isn't really interested in.

Steve

Cam Webb wrote:
> Dear Steve and Rich,
>
> Encouraged by your discussion of models of Occurrences and Individuals, 
> and by Steve's related Biodiv. Informatics paper, I have modeled a real 
> example of an individual plant and some of its various occurrences in RDF, 
> using Steve's sernec terms to provide the predicates that are missing from 
> DwC.  As I did so, a number of questions came up relating to choices of 
> terms, and I would greatly appreciate your input on these choices.  The 
> following includes all the choices considered, and so may not be 
> semantically correct.  The questions (Q1-9) are interspersed with the RDF 
> (serialized as Turtle).
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
>      a sernec:Individual ;  # Q1 - Is this an Individual ...
>      a dwc:individualID ;   # Q1 - ... or an individualID (Baskauf 2010 app'x)?
>
>      # Specimen
>      sernec:derivativeOccurrence [
>          # Q2 : Use generic Occurrence from dwc:Occurrence ...
>          a dwc:Occurrence ;
>          dwc:basisOfRecord "PreservedSpecimen" ;
>          dwc:recordNumber "Webb 5008" ;
>          dwc:recordedBy "Cam Webb" ;
>          # Q2 : ... or treat directly as a Specimen?
>          a dwcvoc:Specimen ;
>          dwcvoc:collectorsFieldNumber "5008" ;
>          dwcvoc:collector "Cam Webb" ;
>          # Q3 : Add the dwc:eventDate here as suggested by Baskauf?
>          dwc:eventDate "2008-01-01" ;
>          # Q4 : Treat occurrence as generic resource, using dc metadata?
>          dcterms:creator "Cam Webb" ;
>          dcterms:created "2008-01-01" ;
>          # Q5 : Add dwc location data for Occurrence...
>          dwc:coordinateUncertaintyInMeters "100" ;
>          dwc:decimalLatitude "-1.25530" ;
>          dwc:decimalLongitude "109.95371" ;
>          dwc:geodeticDatum "WGS84" ;
>          dwc:locality "Sukadana" ;
>          # Q5 : ... or a Location.
>          dcterms:spatial _:blank1 ;
>          ] ;
>
>      # Photo:
>      sernec:derivativeOccurrence [
>          # Q6 : a dwc:Occurrence...
>          a dwc:Occurrence ;
>          # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better?
>          a dwcvoc:TaxonOccurrence ;
>          # Q7 : Again, use dwc terms...
>          dwc:occurrenceID
>                  <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
>          dwc:basisOfRecord "DigitalStillImage" ;
>          dwc:recordedBy "Cam Webb" ;
>          dwc:eventDate "2008-01-01" ;
>          # Q7 : or cd terms?
>          dcterms:identifier
>                  <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
>          dcterms:creator "Cam Webb" ;
>          dcterms:created "2008-01-01" ;
>          dcterms:type <http://purl.org/dc/dcmitype/StillImage> ;
>          # Q8 : Spatial data, same issue as above
>          dcterms:spatial _:blank1 ;
>          ] .
>
> # Determination
> []  a dwc:Identification ;
>      sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
>      dwc:identifiedBy "Ferry Slik" ;
>      dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
>      dwc:dateIdentified "2009-02-22" ;
>      # Q9 : Use dwc:identificationReferences or...
>      dwc:identificationReferences
>           <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
>      # Q9 : ... sernec:basedOnOccurrence ?
>      sernec:basedOnOccurrence
>           <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> # Location data for photo and specimen
> _:blank1
>      a dcterms:Location ;
>      geo:lon "109.95371" ;
>      geo:lat "-1.25530" ;
>      dwc:locality "Sukadana, on Tanah Merah road to beach" ;
>      dwc:coordinateUncertaintyInMeters "100" .
>
>
> I realize that for LOD applications the blank nodes should eventually have 
> GUIDs.  Now, here is a slimmed down version of the above with my own 
> choices. In general, I went with dcterms over dwc, where appropriate. 
> You can also see the network (via dot) at: 
> http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or 
> http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp/indiv9-slim.rdf
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
>      a sernec:Individual ;
>      sernec:derivativeOccurrence [  # Specimen
>          a dwc:Occurrence ;
>          dwc:basisOfRecord "PreservedSpecimen" ;
>          dcterms:identifier "Webb 5008" ;
>          dcterms:creator "Cam Webb" ;
>          dcterms:created "2008-01-01" ;
>          dcterms:spatial _:blank1 ;
>          ] ;
>      sernec:derivativeOccurrence [  # Photo
>          a dwc:Occurrence ;
>          dwc:basisOfRecord "DigitalStillImage" ;
>      	dcterms:identifier
>                  <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
>          dcterms:creator "Cam Webb" ;
>          dcterms:created "2008-01-01" ;
>          dcterms:spatial _:blank1 ;
>          ] .
>
> []  a dwc:Identification ;
>      sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
>  	dwc:identifiedBy "Ferry Slik" ;
>  	dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
>      dwc:dateIdentified "2009-02-22" ;
>      sernec:basedOnOccurrence
>           <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> _:blank1
>      a dcterms:Location ;
>      geo:lon "109.95371" ;
>      geo:lat "-1.25530" ;
>      dwc:locality "Sukadana, on Tanah Merah road to beach" ;
>      dwc:coordinateUncertaintyInMeters "100" .
>
>
> I didn't think this could be done without creating new terms, so I'm very 
> pleased be getting closer to my goal of a LOD representation of our data
> that maintains the Individuals as base entities.
>
> Many thanks in advance for any thoughts.
>
> Best,
>
> Cam
>
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu



More information about the tdwg-content mailing list