[tdwg-content] practical details of recording a determination What is an Occurrence?
Steve Baskauf
steve.baskauf at vanderbilt.edu
Wed Oct 20 19:06:43 CEST 2010
Because of actual work that I've had to get done, I haven't had time yet
to carefully read Rich's response and to carefully go through Cam and
Pete's posts to digest them. But I also am encouraged by this
discussion because it seems like most people are agreeing on the basic
conceptual arrangement of entities in Rich's diagram. In some cases
people choose to "collapse" the more general model when some of the
components have only one-to-one connections (e.g. leave out individuals
because all individuals in a database have only one occurrence, leave
out event because every occurrence in a database has a separate event
defined by atomized lat/long/time) but there seems to be a general
agreement that those omitted components exist conceptually and that it
is convenient for other users to include them when they are needed as
nodes for one-to-many relationships. This makes the creation of an
eventual general template for RDF simpler because it means there will be
less arguing about how entities in the RDF should be "connected" to each
other (i.e. what are the appropriate classes of subjects and objects).
As I said, I haven't yet looked carefully at Cam's example, but she made
a comment about blank nodes. One of the things that's troubled me is
how to have a consistent RDF template that can be used for both records
generated by people who are "compressing" their databases as I described
above and people who aren't "compressing". For example, if people have
a database that only contains one specimen per individual, they are
probably going to be generating GUIDs (i.e. URIs) for the specimens but
not for the individuals that exist but weren't explicitly recognized in
the database they are using to generate the RDF. According to the "Rich
diagram1" general model, the dwc:Identification should be connected to
the Individual and the Individual to the Occurrence, but since the
specimen databaser didn't explicitly assign a URI to the Individual, the
RDF would have a blank node for the Individual. The solution I settled
on (illustrated in the
http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf example
with the non-actionable URIs, use view page source to see the underlying
RDF) was to create a default URI for the assumed Individual by slapping
a "#ind" onto the end of the URI for the specimen. I should probably do
the same thing in RDF like
http://bioimages.vanderbilt.edu/baskauf/51249.rdf where I ignore
dwc:Location as an entity (i.e. I "collapse" Rich's model because all of
my Occurrences have separate Lat/Long), i.e. I should probably enclose
the RDF for the location metadata in a rdf:Description about
http://bioimages.vanderbilt.edu/baskauf/51249#location element. That
would make my RDF format consistent with that of others who connected
multiple Occurrences to a single Location and would also make it
possible for someone to "reuse" my Location identifier if they later
wanted to assert that an event happened at the same location. (I got
this idea by looking at Pete's RDF!)
This question of when one needs to apply a GUID to a resource came up in
the draft Beginners Guide to Persistent Identifiers. In cases like I
discussed above where there is only a single resource connected to
another resource that has an explicitly assigned GUID, having a default
method for creating "assumed" URIs would reduce the need to generate and
maintain a lot of separate identifiers for entities that that the
creator of the GUID isn't really interested in.
Steve
Cam Webb wrote:
> Dear Steve and Rich,
>
> Encouraged by your discussion of models of Occurrences and Individuals,
> and by Steve's related Biodiv. Informatics paper, I have modeled a real
> example of an individual plant and some of its various occurrences in RDF,
> using Steve's sernec terms to provide the predicates that are missing from
> DwC. As I did so, a number of questions came up relating to choices of
> terms, and I would greatly appreciate your input on these choices. The
> following includes all the choices considered, and so may not be
> semantically correct. The questions (Q1-9) are interspersed with the RDF
> (serialized as Turtle).
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
> a sernec:Individual ; # Q1 - Is this an Individual ...
> a dwc:individualID ; # Q1 - ... or an individualID (Baskauf 2010 app'x)?
>
> # Specimen
> sernec:derivativeOccurrence [
> # Q2 : Use generic Occurrence from dwc:Occurrence ...
> a dwc:Occurrence ;
> dwc:basisOfRecord "PreservedSpecimen" ;
> dwc:recordNumber "Webb 5008" ;
> dwc:recordedBy "Cam Webb" ;
> # Q2 : ... or treat directly as a Specimen?
> a dwcvoc:Specimen ;
> dwcvoc:collectorsFieldNumber "5008" ;
> dwcvoc:collector "Cam Webb" ;
> # Q3 : Add the dwc:eventDate here as suggested by Baskauf?
> dwc:eventDate "2008-01-01" ;
> # Q4 : Treat occurrence as generic resource, using dc metadata?
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> # Q5 : Add dwc location data for Occurrence...
> dwc:coordinateUncertaintyInMeters "100" ;
> dwc:decimalLatitude "-1.25530" ;
> dwc:decimalLongitude "109.95371" ;
> dwc:geodeticDatum "WGS84" ;
> dwc:locality "Sukadana" ;
> # Q5 : ... or a Location.
> dcterms:spatial _:blank1 ;
> ] ;
>
> # Photo:
> sernec:derivativeOccurrence [
> # Q6 : a dwc:Occurrence...
> a dwc:Occurrence ;
> # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better?
> a dwcvoc:TaxonOccurrence ;
> # Q7 : Again, use dwc terms...
> dwc:occurrenceID
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dwc:basisOfRecord "DigitalStillImage" ;
> dwc:recordedBy "Cam Webb" ;
> dwc:eventDate "2008-01-01" ;
> # Q7 : or cd terms?
> dcterms:identifier
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:type <http://purl.org/dc/dcmitype/StillImage> ;
> # Q8 : Spatial data, same issue as above
> dcterms:spatial _:blank1 ;
> ] .
>
> # Determination
> [] a dwc:Identification ;
> sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
> dwc:identifiedBy "Ferry Slik" ;
> dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
> dwc:dateIdentified "2009-02-22" ;
> # Q9 : Use dwc:identificationReferences or...
> dwc:identificationReferences
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> # Q9 : ... sernec:basedOnOccurrence ?
> sernec:basedOnOccurrence
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> # Location data for photo and specimen
> _:blank1
> a dcterms:Location ;
> geo:lon "109.95371" ;
> geo:lat "-1.25530" ;
> dwc:locality "Sukadana, on Tanah Merah road to beach" ;
> dwc:coordinateUncertaintyInMeters "100" .
>
>
> I realize that for LOD applications the blank nodes should eventually have
> GUIDs. Now, here is a slimmed down version of the above with my own
> choices. In general, I went with dcterms over dwc, where appropriate.
> You can also see the network (via dot) at:
> http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or
> http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp/indiv9-slim.rdf
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
> a sernec:Individual ;
> sernec:derivativeOccurrence [ # Specimen
> a dwc:Occurrence ;
> dwc:basisOfRecord "PreservedSpecimen" ;
> dcterms:identifier "Webb 5008" ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:spatial _:blank1 ;
> ] ;
> sernec:derivativeOccurrence [ # Photo
> a dwc:Occurrence ;
> dwc:basisOfRecord "DigitalStillImage" ;
> dcterms:identifier
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:spatial _:blank1 ;
> ] .
>
> [] a dwc:Identification ;
> sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
> dwc:identifiedBy "Ferry Slik" ;
> dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
> dwc:dateIdentified "2009-02-22" ;
> sernec:basedOnOccurrence
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> _:blank1
> a dcterms:Location ;
> geo:lon "109.95371" ;
> geo:lat "-1.25530" ;
> dwc:locality "Sukadana, on Tanah Merah road to beach" ;
> dwc:coordinateUncertaintyInMeters "100" .
>
>
> I didn't think this could be done without creating new terms, so I'm very
> pleased be getting closer to my goal of a LOD representation of our data
> that maintains the Individuals as base entities.
>
> Many thanks in advance for any thoughts.
>
> Best,
>
> Cam
>
> .
>
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
More information about the tdwg-content
mailing list