[tdwg-content] data provenance; was Re: Updated TDWG BioBlitz RDF Example with Pivot View and Data Browsing

John Wieczorek tuco at berkeley.edu
Sun Feb 27 17:05:25 CET 2011


The number of digits given is definitely is not a good substitution
for this for many reasons, just one of which is that the the original
may have been captured in a different coordinate system (such as
degrees decimal minutes - the most precise coordinate system other
than UTM or other meter-based systems when recording data from a GPS)
and then converted to decimal degrees where the number of significant
digits then becomes meaningless.

Happily, the Darwin Core also has a dwc:coordinatePrecision term
(http://rs.tdwg.org/dwc/terms/index.htm#coordinatePrecision), which
can say explicitly what the level of precision is in the coordinates
given. The dwc:coordinateUncertaintyInMeters
(http://rs.tdwg.org/dwc/terms/index.htm#coordinateUncertaintyInMeters)
is supposed to account for all sources of uncertainty in the
coordinates given, including GPS accuracy and coordinate precision.

For the sake of completeness, there are the terms
dwc:pointRadiusSpatialFit
(http://rs.tdwg.org/dwc/terms/index.htm#pointRadiusSpatialFit) to
capture analytically how well the point-radius given matches the
actual uncertainty of the coordinates given (in case someone
artificially adds uncertainty), the dwc:georeferenceProtocol
(http://rs.tdwg.org/dwc/terms/index.htm#georeferenceProtocol) to
explain the method used to georeference, and the
dwc:dataGeneralizations
(http://rs.tdwg.org/dwc/terms/index.htm#dataGeneralizations) to
explain what was done to the georeference post-facto to obscure it.

On Sat, Feb 26, 2011 at 4:04 PM, Peter DeVries <pete.devries at gmail.com> wrote:
> Hi Bob,
> Yes an estimate of the precision / extent should be recorded by the original
> observer.
> This has been repeated several times and it is interesting that even TDWG
> did not incorporate this into their data collection.
> What I was proposing was a specific extension to the ietf proposal for
> occurrence records.
> It adds something very similar to pointRadiusSpatialFit to a latitude and
> longitude.
> By standardizing on the significant digits we gain something even before
> there is general software support for this standard.
> That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent
> URN, while.
>  "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are not
> That allows those records to be linked within a triple or quadstore.
> As in this earlier example:
>     Here is a browsable view of one of the
> areas bit.ly  http://bit.ly/hBtVFL
>
>  http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D100
> Without doing anything other than standardizing on the number of digits,
> occurrences attached to the same GPS reading are linked in both a triple
> store and a google search.
> Where as software needs to be written that
> interprets  "geo:41.53000000,-70.67000000;u=100"
> and "geo:41.53,-70.67;u=100" as equivalent.
> Try Googling "geo:44.86294500,-87.23120400;u=10"
> If the ietf.org standard is supported in future versions of Virtuoso and
> other tools then we would not need to include the redundant use of geo:lat
> geo:long for the dynamic maps.
> - Pete
>
> On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris <morris.bob at gmail.com> wrote:
>>
>> Your arguably reasonable recoding of the geo uri's of your example
>> illustrates an issue on which so much metadata is silent: provenance. Once
>> exposed, it is probably impossible for someone to know how the uncertainty
>> (or any other data that might be the subject of opinion or estimate) was
>> determined and whether the data is fit for some particular purpose, e.g.
>> that the species were observed near each other.
>> BTW, the IETF geo proposal was adopted in 2010, in the final form given
>> at http://tools.ietf.org/html/rfc5870 . One interesting point
>> is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says
>>   "Note: The number of digits of the values in <coordinates> MUST NOT be
>> interpreted as an indication to the level of uncertainty." The section
>> following is also interesting, albeit irrelevant for your procedure. It
>> implies that when uncertainty is omitted (and therefore unknown), then
>> "geo:41.53000000,-70.67000000"  and "geo:41.53,-70.67"  identify  the same
>> geo resource.
>>
>> Bob Morris
>> On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries at gmail.com>
>> wrote:
>>>
>>> [...]
>>>
>>> 5) I added in my proposed "area" so that it is easy to see what species
>>> were observed near each other. Since there was no measure of radius in these
>>> longitude and latitudes I made the radius 100 meters.
>>>     Normally I would estimate the radius for a GPS reading to be within
>>> 10 meters but some of these observations were made where the GPS reading was
>>> taken and the readings were given only to two decimals.
>>> Area = long, lat; radius in meters following the ietf proposal but with
>>> the precision of the long and lat standardized
>>> example "geo:41.53000000,-70.67000000;u=100"
>>> [...]
>>
>> --
>> Robert A. Morris
>> Emeritus Professor  of Computer Science
>> UMASS-Boston
>> 100 Morrissey Blvd
>> Boston, MA 02125-3390
>> Associate, Harvard University Herbaria
>> email: morris.bob at gmail.com
>> web: http://efg.cs.umb.edu/
>> web: http://etaxonomy.org/mw/FilteredPush
>> http://www.cs.umb.edu/~ram
>> phone (+1) 857 222 7992 (mobile)
>>
>
>
>
> --
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
> About the GeoSpecies Knowledge Base
> ------------------------------------------------------------
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>


More information about the tdwg-content mailing list