[tdwg-content] Comments on Cam's RDF practical details of recording a determination What is an Occurrence?
Steve Baskauf
steve.baskauf at vanderbilt.edu
Thu Oct 28 06:48:22 CEST 2010
Cam,
I have finally taken the time to look carefully at your RDF example (I'm
not used to the Turtle serialization but managed to translate it into
XML which is the way I "think" about RDF). I'm not going to try comment
on every question that you asked because since this discussion has been
going on I've changed my thinking somewhat about how I would model
certain things. But you raise a number of important questions and I'll
give opinions on a few. Whether those opinions are shared by others or
not remains to be seen and should be part of the discussion if an RDF
task group gets off the ground.
Q6. The question of how DwC resources should be rdf:type'd remains
open. When I first tried to write RDF using DwC terms, I tried to type
things using dwcvoc: . However, there were too many types of resources
that didn't have terms there and when I looked at the ontology, I wasn't
sure that some of the terms that were there actually meant what I
thought they should. So I gave up and just decided to use the Darwin
Core classes since they also qualified as "well known". The DwC type
vocabulary is another possibility for typing since it includes both some
of the DwC classes as well as other types, such as PreservedSpecimen,
which we would need if the model of separating the "token" from the
Occurrences were followed. However, the Identification class isn't
included in the DwC type vocabulary (is that intentional or an
oversight?). Also, tokens that are StillImages, Sounds, etc. would have
to be typed using the Dublin Core type vocabulary. So at this point it
seems like the rdf:type values would have to be drawn from at least
three different sources to get the job done.
Q1. I think sernec:Individual would be right rather than
dwc:individualID (as described above). I originally used
dwc:individualID, but I now think that is not right and that the xxxxxID
terms should be used to show the relationship among described resources.
Q2-4. If the "token" is separated from the Occurrence, then
dwc:recordedBy is a property of the Occurrence and dcterms:created and
dcterms:creator are properties of the token (if it's a create-able thing).
Q3 and Q5. I think that for the sake of a consistent RDF structure that
a client could actually know how to "crawl" and "understand", it would
be best to have nodes (having URI identifiers) for all of the resources
that end up being in a consensus fully-normalized model like
http://bioimages.vanderbilt.edu/pages/token-explicit.gif . As you know,
I suggested the strategy of creating hash URIs for naming nodes for
which the user doesn't care to maintain as separate database items.
This has worked well for me in my experimentation.
Q7. I think we had a discussion in an earlier thread as to whether in
RDF the xxxxxID terms should be used to identify the subject resource or
just be reserved for indicating a reference to another related
resource. It was suggested (and I agree) that since a tag like
<dwc:occurrence rdf:about="http://phylodiversity.net/xmalesia/occur/9-1">
already indicates that http://phylodiversity.net/xmalesia/occur/9-1 is a
URI that identifies the occurrence, it's a bit redundant to also assert
an identifier as an explicit property of the occurrence. But I suppose
it doesn't hurt anything. Pete uses <dcterms:identifier> to do this as
you did in your image example.
Q9. If dwc:identificationReferences is appropriate here, then
sernec:basedOnOccurrence does not need to exist. Actually
sernec:basedOnOccurrence probably shouldn't be used anyway if we
separate tokens from their Occurrences (the appropriate term would then
be basedOnToken or something like that).
Hmm. I guess I ended up commenting on most of the questions anyway.
Two more general comments.
1. After considerably thought, I've decided that I don't want to use
direct access URLs for images as their identifying URIs. There is
nothing "wrong" with doing so, but once you use it as a GUID, you're
stuck with keeping the image at that location forever. Also, that URI
then refers to the specific pattern of bytes in the particular version
of the image that you are serving from that URL which also may not ever
change (i.e. no editing). A lot of the image metadata applies to any
sized version of the image, not just the one that you've identified
using the URL. Then there are content negotiation issues with using a
.jpg extension for a URI which I could discuss but won't get into here.
For all of these reasons, I've decided for myself that I'd prefer to
consider the image as a conceptual thing (non-information resource) and
assign it an identifier with no extension which could then be subject to
content negotiation. I then use MRTG service access class instances to
provide the mrtg:accessURL's for whatever sizes of images I want to
provide. Because the accessURLs are metadata and not themselves
identifiers, I can change the access URLs without breaking any GUID
rules. This gives you the option to move your high-res images to an
image repository rather than serving it from the domain from which your
RDF is being served. You can see an example of this approach at:
http://bioimages.vanderbilt.edu/baskauf/10685.rdf
2. If one assigns URIs to each resource included in the RDF file (i.e.
the Individual, the image, the Occurrence, the Event, etc.), the degree
of nesting can be reduced and blank nodes eliminated. Of course you
then need a way to connect the various resources. I have been using the
xxxxxID terms for this, i.e. to say that the individual has a certain
Occurrence I say
[individual] dwc:occurrenceID [occurrence]
I think this is within the spirit of what the xxxxxxxID terms were
intended to do and if we can use them in this way, it greatly reduces
the number of new terms that would have to be created to express DwC in
RDF (i.e. we don't need to make up dwc:hasOccurrence). The downside to
this is that few (none?) of the relationships that could be expressed by
xxxxxxID terms have inverse properties defined. I made up a few in the
sernec: vocabulary, but the need for such terms would have to be
discussed at some point in a future RDF task group. I don't know enough
about how semantic clients work to know if just providing the properties
in one direction would be good enough for the client to infer the
inverse relationship and make use of it as necessary.
Hope these comments are helpful. I am a novice RDF user, so take what
I've said with a grain of salt.
Steve
Cam Webb wrote:
> Dear Steve and Rich,
>
> Encouraged by your discussion of models of Occurrences and Individuals,
> and by Steve's related Biodiv. Informatics paper, I have modeled a real
> example of an individual plant and some of its various occurrences in RDF,
> using Steve's sernec terms to provide the predicates that are missing from
> DwC. As I did so, a number of questions came up relating to choices of
> terms, and I would greatly appreciate your input on these choices. The
> following includes all the choices considered, and so may not be
> semantically correct. The questions (Q1-9) are interspersed with the RDF
> (serialized as Turtle).
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
> a sernec:Individual ; # Q1 - Is this an Individual ...
> a dwc:individualID ; # Q1 - ... or an individualID (Baskauf 2010 app'x)?
>
> # Specimen
> sernec:derivativeOccurrence [
> # Q2 : Use generic Occurrence from dwc:Occurrence ...
> a dwc:Occurrence ;
> dwc:basisOfRecord "PreservedSpecimen" ;
> dwc:recordNumber "Webb 5008" ;
> dwc:recordedBy "Cam Webb" ;
> # Q2 : ... or treat directly as a Specimen?
> a dwcvoc:Specimen ;
> dwcvoc:collectorsFieldNumber "5008" ;
> dwcvoc:collector "Cam Webb" ;
> # Q3 : Add the dwc:eventDate here as suggested by Baskauf?
> dwc:eventDate "2008-01-01" ;
> # Q4 : Treat occurrence as generic resource, using dc metadata?
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> # Q5 : Add dwc location data for Occurrence...
> dwc:coordinateUncertaintyInMeters "100" ;
> dwc:decimalLatitude "-1.25530" ;
> dwc:decimalLongitude "109.95371" ;
> dwc:geodeticDatum "WGS84" ;
> dwc:locality "Sukadana" ;
> # Q5 : ... or a Location.
> dcterms:spatial _:blank1 ;
> ] ;
>
> # Photo:
> sernec:derivativeOccurrence [
> # Q6 : a dwc:Occurrence...
> a dwc:Occurrence ;
> # Q6 : ... or a dwcvoc:TaxonOccurrence. Which is better?
> a dwcvoc:TaxonOccurrence ;
> # Q7 : Again, use dwc terms...
> dwc:occurrenceID
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dwc:basisOfRecord "DigitalStillImage" ;
> dwc:recordedBy "Cam Webb" ;
> dwc:eventDate "2008-01-01" ;
> # Q7 : or cd terms?
> dcterms:identifier
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:type <http://purl.org/dc/dcmitype/StillImage> ;
> # Q8 : Spatial data, same issue as above
> dcterms:spatial _:blank1 ;
> ] .
>
> # Determination
> [] a dwc:Identification ;
> sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
> dwc:identifiedBy "Ferry Slik" ;
> dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
> dwc:dateIdentified "2009-02-22" ;
> # Q9 : Use dwc:identificationReferences or...
> dwc:identificationReferences
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> # Q9 : ... sernec:basedOnOccurrence ?
> sernec:basedOnOccurrence
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> # Location data for photo and specimen
> _:blank1
> a dcterms:Location ;
> geo:lon "109.95371" ;
> geo:lat "-1.25530" ;
> dwc:locality "Sukadana, on Tanah Merah road to beach" ;
> dwc:coordinateUncertaintyInMeters "100" .
>
>
> I realize that for LOD applications the blank nodes should eventually have
> GUIDs. Now, here is a slimmed down version of the above with my own
> choices. In general, I went with dcterms over dwc, where appropriate.
> You can also see the network (via dot) at:
> http://phylodiversity.net/cwebb/img/indiv9-slim.jpg or
> http://linkeddata.uriburner.com/ode/?uri=http://phylodiversity.net/cwebb/tmp/indiv9-slim.rdf
>
>
> @prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
> @prefix dwcvoc: <http://rs.tdwg.org/ontology/voc/> .
> @prefix dcterms: <http://purl.org/dc/terms/> .
> @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
> @prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
>
> <http://phylodiversity.net/xmalesia/indiv/9>
> a sernec:Individual ;
> sernec:derivativeOccurrence [ # Specimen
> a dwc:Occurrence ;
> dwc:basisOfRecord "PreservedSpecimen" ;
> dcterms:identifier "Webb 5008" ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:spatial _:blank1 ;
> ] ;
> sernec:derivativeOccurrence [ # Photo
> a dwc:Occurrence ;
> dwc:basisOfRecord "DigitalStillImage" ;
> dcterms:identifier
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> ;
> dcterms:creator "Cam Webb" ;
> dcterms:created "2008-01-01" ;
> dcterms:spatial _:blank1 ;
> ] .
>
> [] a dwc:Identification ;
> sernec:identifiesIndividual <http://phylodiversity.net/xmalesia/indiv/9> ;
> dwc:identifiedBy "Ferry Slik" ;
> dwc:taxonConceptID <urn:lsid:ubio.org:namebank:5963772> ;
> dwc:dateIdentified "2009-02-22" ;
> sernec:basedOnOccurrence
> <http://phylodiversity.net/xmalimg/cw_28617.400px.jpg> .
>
> _:blank1
> a dcterms:Location ;
> geo:lon "109.95371" ;
> geo:lat "-1.25530" ;
> dwc:locality "Sukadana, on Tanah Merah road to beach" ;
> dwc:coordinateUncertaintyInMeters "100" .
>
>
> I didn't think this could be done without creating new terms, so I'm very
> pleased be getting closer to my goal of a LOD representation of our data
> that maintains the Individuals as base entities.
>
> Many thanks in advance for any thoughts.
>
> Best,
>
> Cam
>
> .
>
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
More information about the tdwg-content
mailing list