[tdwg-content] data provenance; was Re: Updated TDWG BioBlitz RDF Example with Pivot View and Data Browsing

Steve Baskauf steve.baskauf at vanderbilt.edu
Sun Feb 27 13:55:25 CET 2011


Pete,
This topic has come up several times before and each time I've left the 
email sitting in my inbox with the intention of trying to understand it 
better.  I guess what I don't understand is what one "does" with it.  I 
suppose I could dig in and do some research, but the multiple times the 
message has sat in my inbox without action tells me that I'm probably 
not going to get around to doing that.  So maybe you can explain it 
further. 

Is this supposed to be usable as a "Linked Data" resource (i.e. object 
of a predicate such as "hasLocation")?  It isn't an HTTP URI, so it 
can't get dereferenced via http.  I guess one parses it as a string and 
interprets the string directly based on some rules.  Isn't that a no-no 
in the GUID rules?  I guess with enough community support, applications 
would know what do do with it, but look what happened with 
"urn:lsid:my_organization.com:location:123".  Web browsers and "regular" 
Linked Data clients (e.g. Linked Data browsers) didn't know what to do 
with it.

The whole issue of provenance is pretty satisfactorily handled in the 
existing Darwin Core. There are a multitude of terms available to 
express all kinds of uncertainty and shapes.  For example, if I create a 
record like:

<rdf:Description about="http://my_organization.com/location/123">
    <rdf:type resource="http://purl.org/dc/terms/Location"/>
    <geo:lat>36.144719</geo:lat>
    <geo:long>-86.801498</geo:long>
    
<dwc:coordinateUncertaintyInMeters>1000</dwc:coordinateUncertaintyInMeters>
    <dwc:georeferenceRemarks>Location determined from Google 
maps</dwc:georeferenceRemarks>
</rdf:Description>

I very explicitly express the type of thing, geocoordinates, datum 
(implicit in the use of geo:), uncertainty, and method of generating the 
data.  If necessary, I could also use dwc:dataGeneralizations and 
dwc:informationWithheld to explain how and why I have provided less 
precise coordinates than I actually know.  This is clearly more verbose, 
but hey, Linked data in xml IS verbose, and existing Linked Data 
applications would be able to "understand" something like what I wrote 
without any kind of special "plug-in" to interpret the spring.  I guess 
I don't really understand why you are proposing this, especially since 
you are a passionate advocate of Linked Data.  Your proposed thing is 
more succinct, but it doesn't seem like it would be usable by normal 
Linked Data clients.

Steve

Bob Morris wrote:
> Your arguably reasonable recoding of the geo uri's of your example 
> illustrates an issue on which so much metadata is silent: provenance. 
> Once exposed, it is probably impossible for someone to know how the 
> uncertainty (or any other data that might be the subject of opinion or 
> estimate) was determined and whether the data is fit for some 
> particular purpose, e.g. that the species were observed near each other. 
>
> BTW, the IETF geo proposal was adopted in 2010, in the final form 
> given at  http://tools.ietf.org/html/rfc5870 . One interesting point 
> is  http://tools.ietf.org/html/rfc5870#section-3.4.3 which says 
>   "Note: The number of digits of the values in <coordinates> MUST NOT 
> be interpreted as an indication to the level of uncertainty." The 
> section following is also interesting, albeit irrelevant for your 
> procedure. It implies that when uncertainty is omitted (and therefore 
> unknown), then "geo:41.53000000,-70.67000000"  and "geo:41.53,-70.67" 
>  identify  the same geo resource.
>
>
> Bob Morris
>
> On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries at gmail.com 
> <mailto:pete.devries at gmail.com>> wrote:
>
>
>     [...]
>
>     5) I added in my proposed "area" so that it is easy to see what
>     species were observed near each other. Since there was no measure
>     of radius in these longitude and latitudes I made the radius 100
>     meters.
>         Normally I would estimate the radius for a GPS reading to be
>     within 10 meters but some of these observations were made where
>     the GPS reading was taken and the readings were given only to two
>     decimals.
>
>     Area = long, lat; radius in meters following the ietf proposal but
>     with the precision of the long and lat standardized
>     example "geo:41.53000000,-70.67000000;u=100"
>
>     [...]
>
> -- 
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: morris.bob at gmail.com <mailto:morris.bob at gmail.com>
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram <http://www.cs.umb.edu/%7Eram>
> phone (+1) 857 222 7992 (mobile)
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110227/bd78405d/attachment.html 


More information about the tdwg-content mailing list