[tdwg-content] data provenance; was Re: Updated TDWG BioBlitz RDF Example with Pivot View and Data Browsing

Steve Baskauf steve.baskauf at vanderbilt.edu
Mon Feb 28 03:43:21 CET 2011


Comments/questions inline

Peter DeVries wrote:
> Hi Steve,
>
> This is mainly and advantage if everyone supports the ietf.org 
> <http://ietf.org> proposal in the way they support the geo vocabulary.
Well, up to this point, I've seen people using the geo: vocabularly as 
it was described in http://www.w3.org/2003/01/geo/ , i.e. as an RDF 
predicate to define string literal property elements.  The use as you've 
described it is new to me.
>
> The way to think about this is ietf geo could become a well unknown URN.
Speaking completely out of ignorance, are there any LOD applications 
that know how to interpret URIs that aren't HTTP URIs?  I mean, if you 
have RDF that has something like geo:41.53000000,-70.67000000 as an 
object, is there any application that "understands" it?  I learned in 
the last few months that it's "legal" to have an LSID as an object in an 
RDF triple, but that doesn't mean that any application will know what to 
do with it. 
> ......
>
> Note that your use of geo is not standard DarwinCore. What is the 
> official word from the DarwinCore Illuminati on the use of geo?
Well there was a proposal on the table for including geo:lat, geo:long 
in the Darwin Core standard.  As far as I know, there hasn't been any 
movement on that proposal (haven't checked recently).  But I really 
don't feel compelled to use Darwin Core exclusively in RDF that I 
write.  If a vocabulary like FOAF is more "well-known" for describing 
people, I don't think there is any reason not to use it.  I think at 
this point, geo:lat is more well-known than dwc:decimalLatitude (plus 
specifying the dwc:geodedicDatum isn't required with geo: since WGS84 is 
assumed). 

Thanks for the comments,
Steve
>
> I am sympathetic to the need for some measure of radius, 
> pointSpatialFit,  coordinateUncertaintyInMeters etc. but adoption of 
> these has been poor.
>
> I think the "radius" and area form that I am proposing is easy for 
> providers to understand and for tools to interpret.
>
> It probably maps to dwc:coordinateUncertaintyInMeters
>
> Here is a modification of your earlier location example using a TDWG 
> BioBlitz record. Note how I replace some standard vocabulary terms 
> with URI's.
>
> Those things with txn, could be incorporated in to the DarwinCore.
>
> <!-- The who, what, where and when and how of the observation, as 
> efficient as I can make it but with a literal for scientific name 
> (label) -->
>
> <dwc:Occurrence 
> about="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 
> <http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607>">
>     <rdfs:label>OBS: Branta canadensis</rdfs:label>
>     <dwc:Area resource="geo:41.53000000,-70.67000000;u=100">
>     <txn:occurrenceHasSpeciesConcept 
> rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species"/>
>     <txn::hasCollector 
> rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin"/>
>     <txn::occurrenceHasIndividual 
> rdf:resource="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 
> <http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607>"/>
>     <foaf:depiction 
> rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg"/>
>     <txn:basisOfRecord 
> rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#BasisOfRecord_StillImage"/>
>     <dcterms:date>2010-09-29</dcterms:date>
> </dwc:Occurrence>
>
> <!-- The area is is a Location and the georeference method is included 
> as metadata. wdrs is used to link the area to the RDF. -->
> <!-- I will change the Area ontology so that it is a subclass of 
> dcterms:Location -->
> <!-- If others make statements about this particular area, their RDF's 
> wdrs will create the provenence links -->
>
> <dwc:Area about="geo:41.53000000,-70.67000000;u=100">
>     <rdf:type resource="http://purl.org/dc/terms/Location"/>
>     <dwc:georeferenceMethod 
>  resource="http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps">
>     <dwc_area:areaWithInFeature 
> rdf:resource="http://sws.geonames.org/4929772/"/>
>     <wdrs:describedby 
> rdf:resource="http://my_organization.com/occurrence/123.rdf"/>
> </dwc:Area>
>
> <!-- Like the occurrence the individual is only one thing but 
> different people make assertions about what that thing is -->
>
> <dwc:Individual 
> about="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 
> <http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607>">
>     <rdfs:label>IND_1607: Branta canadensis</rdfs:label>
>     <txn:occurrenceHasSpeciesConcept 
> rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species"/>
>     <txn::hasCollector 
> rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin"/>
>     <txn::individualHasOccurrence 
> rdf:resource="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 
> <http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607>"/>
>     <foaf:depiction 
> rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg"/>
>     <!-- Identification history should be part of the documentation of 
> the individual -->
>     <txn:individualHasCurrrentIdentificationAssertion 
> rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_2 
> <http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_2>"/>
>     <txn:individualHasPreviousIdentificationAssertion 
> rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_1 
> <http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_1>"/>
> </dwc:Individual>
>
> Respectfully,
>
> - Pete
>
> On Sun, Feb 27, 2011 at 6:55 AM, Steve Baskauf 
> <steve.baskauf at vanderbilt.edu <mailto:steve.baskauf at vanderbilt.edu>> 
> wrote:
>
>     Pete,
>     This topic has come up several times before and each time I've
>     left the email sitting in my inbox with the intention of trying to
>     understand it better.  I guess what I don't understand is what one
>     "does" with it.  I suppose I could dig in and do some research,
>     but the multiple times the message has sat in my inbox without
>     action tells me that I'm probably not going to get around to doing
>     that.  So maybe you can explain it further. 
>
>     Is this supposed to be usable as a "Linked Data" resource (i.e.
>     object of a predicate such as "hasLocation")?  It isn't an HTTP
>     URI, so it can't get dereferenced via http.  I guess one parses it
>     as a string and interprets the string directly based on some
>     rules.  Isn't that a no-no in the GUID rules?  I guess with enough
>     community support, applications would know what do do with it, but
>     look what happened with
>     "urn:lsid:my_organization.com:location:123".  Web browsers and
>     "regular" Linked Data clients (e.g. Linked Data browsers) didn't
>     know what to do with it.
>
>     The whole issue of provenance is pretty satisfactorily handled in
>     the existing Darwin Core. There are a multitude of terms available
>     to express all kinds of uncertainty and shapes.  For example, if I
>     create a record like:
>
>     <rdf:Description about="http://my_organization.com/location/123"
>     <http://my_organization.com/location/123>>
>         <rdf:type resource="http://purl.org/dc/terms/Location"
>     <http://purl.org/dc/terms/Location>/>
>         <geo:lat>36.144719</geo:lat>
>         <geo:long>-86.801498</geo:long>
>        
>     <dwc:coordinateUncertaintyInMeters>1000</dwc:coordinateUncertaintyInMeters>
>         <dwc:georeferenceRemarks>Location determined from Google
>     maps</dwc:georeferenceRemarks>
>     </rdf:Description>
>
>     I very explicitly express the type of thing, geocoordinates, datum
>     (implicit in the use of geo:), uncertainty, and method of
>     generating the data.  If necessary, I could also use
>     dwc:dataGeneralizations and dwc:informationWithheld to explain how
>     and why I have provided less precise coordinates than I actually
>     know.  This is clearly more verbose, but hey, Linked data in xml
>     IS verbose, and existing Linked Data applications would be able to
>     "understand" something like what I wrote without any kind of
>     special "plug-in" to interpret the spring.  I guess I don't really
>     understand why you are proposing this, especially since you are a
>     passionate advocate of Linked Data.  Your proposed thing is more
>     succinct, but it doesn't seem like it would be usable by normal
>     Linked Data clients.
>
>     Steve
>
>
>     Bob Morris wrote:
>>     Your arguably reasonable recoding of the geo uri's of your
>>     example illustrates an issue on which so much metadata is silent:
>>     provenance. Once exposed, it is probably impossible for someone
>>     to know how the uncertainty (or any other data that might be the
>>     subject of opinion or estimate) was determined and whether the
>>     data is fit for some particular purpose, e.g. that the species
>>     were observed near each other. 
>>
>>     BTW, the IETF geo proposal was adopted in 2010, in the final form
>>     given at  http://tools.ietf.org/html/rfc5870 . One interesting
>>     point is  http://tools.ietf.org/html/rfc5870#section-3.4.3 which
>>     says 
>>       "Note: The number of digits of the values in <coordinates> MUST
>>     NOT be interpreted as an indication to the level of uncertainty."
>>     The section following is also interesting, albeit irrelevant for
>>     your procedure. It implies that when uncertainty is omitted (and
>>     therefore unknown), then "geo:41.53000000,-70.67000000"  and
>>     "geo:41.53,-70.67"  identify  the same geo resource.
>>
>>
>>     Bob Morris
>>
>>     On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries
>>     <pete.devries at gmail.com <mailto:pete.devries at gmail.com>> wrote:
>>
>>
>>         [...]
>>
>>         5) I added in my proposed "area" so that it is easy to see
>>         what species were observed near each other. Since there was
>>         no measure of radius in these longitude and latitudes I made
>>         the radius 100 meters.
>>             Normally I would estimate the radius for a GPS reading to
>>         be within 10 meters but some of these observations were made
>>         where the GPS reading was taken and the readings were given
>>         only to two decimals.
>>
>>         Area = long, lat; radius in meters following the ietf
>>         proposal but with the precision of the long and lat
>>         standardized example "geo:41.53000000,-70.67000000;u=100"
>>
>>         [...]
>>
>>     -- 
>>     Robert A. Morris
>>     Emeritus Professor  of Computer Science
>>     UMASS-Boston
>>     100 Morrissey Blvd
>>     Boston, MA 02125-3390
>>     Associate, Harvard University Herbaria
>>     email: morris.bob at gmail.com <mailto:morris.bob at gmail.com>
>>     web: http://efg.cs.umb.edu/
>>     web: http://etaxonomy.org/mw/FilteredPush
>>     http://www.cs.umb.edu/~ram <http://www.cs.umb.edu/%7Eram>
>>     phone (+1) 857 222 7992 (mobile)
>>
>
>     -- 
>     Steven J. Baskauf, Ph.D., Senior Lecturer
>     Vanderbilt University Dept. of Biological Sciences
>
>     postal mail address:
>     VU Station B 351634
>     Nashville, TN  37235-1634,  U.S.A.
>
>     delivery address:
>     2125 Stevenson Center
>     1161 21st Ave., S.
>     Nashville, TN 37235
>
>     office: 2128 Stevenson Center
>     phone: (615) 343-4582,  fax: (615) 343-6707
>     http://bioimages.vanderbilt.edu
>         
>
>
>
>
> -- 
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / 
> GeoSpecies Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110227/20e705c6/attachment.html 


More information about the tdwg-content mailing list