[tdwg-content] data provenance; was Re: Updated TDWG BioBlitz RDF Example with Pivot View and Data Browsing

Peter DeVries pete.devries at gmail.com
Tue Mar 1 22:05:34 CET 2011


Hi Bob,

I thought it might be useful to elaborate on the geoareas.

My hope is that these could be incorporated into the DarwinCore so that an
Area would look something like this.

<dwc:Area about="geo:41.53000000,-70.67000000;u=100">
    <rdf:type resource="http://purl.org/dc/terms/Location"/>
    <dwc:georeferenceMethod  resource="
http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps">
    <dwc_area:areaWithInFeature rdf:resource="
http://sws.geonames.org/4929772/"/>
    <wdrs:describedby rdf:resource="
http://my_organization.com/occurrence/123.rdf"/>
</dwc:Area>

The ietf.org came up with the following proposal for dealing with location
data.

http://tools.ietf.org/html/rfc5870

<http://tools.ietf.org/html/rfc5870>What I am proposing is that we
incorporate a subset of this into the DarwinCore.

That is the latitude and longitude and a uncertainty in meters (radius)

For now Virtuoso does not understand that this is a mappable thing so I
still include the regular geo:lat and geo:long in my mapping examples.

OK, so what is the main advantage?

If I create URI's for insect collection areas in my namespace (
http://lod.taxonconcept.org/#####)

And Steve has plant collections from those same geographic locations that
you identify in your namespace (http://bioimages.org/#####)

Our two datasets, about the same place, are not linked.

But if Steve marks up his data with the geoarea <dwc:Area
resource="geo:41.53000000,-70.67000000;u=100"> it would be possible for a
consuming service to see what observations have identical locations or
overlap.

A link back to the RDF that makes a statement about a particular geoarea is
included using the *wdrs* vocabulary.

This allows a consumer to find the original RDF that contained statements
about a particular geoarea. (provenance)

Does this make sense?

It is likely that what I am proposing will need to be tweaked in someway
that I have not anticipated.

I would like it them to still work as a subset of the more complicated
ietf.org proposal.

That is tools and services that understand the ietf.org standard will be
able to consume the geoarea's.

This already work as urn's in a triple/quadstore, allowing occurrences tied
to the same GPS reading to be linked together.

If these catch on I suspect Virtuoso and others will start supporting them
so that entities like  "geo:41.53000000,-70.67000000;u=100" are seen as
mappable things and processed accordingly.

This does not offer everything that PointRadiusSpatialFit,
coordinateUncertaintyInMeters
provides but I think that these areas are more efficient, easier for people
to understand and more likely to be adopted. I think we should keep
PointRadiusSpatialFit, coordinateUncertaintyInMeters in the DarwinCore for
those who want to use them.

Standardizing on some set of decimal points makes the standard easier for
producers to implement, and the areas easier to compare as simple strings
and urn's.

The current ontology is here and I am open to having it tweaked and
improved.

http://lod.taxonconcept.org/ontology/dwc_area.owl

OntDoc http://lod.taxonconcept.org/ontology/dwc_area_doc/index.html

* I was thinking that it might make sense to add support for linking to
Yahoo Placefinder's Where On Earth ID (WOEID) in addition to Geonames

   http://developer.yahoo.com/geo/placefinder/


Respectfully,

- Pete



On Sun, Feb 27, 2011 at 12:14 PM, Bob Morris <morris.bob at gmail.com> wrote:

> Ah, well, the point I was raising was only about geo as a URI scheme, the
> subject of the now adopted IETF RFC 5870. To my mind this raises several
> distinguishable questions (whose answers may not be independent):
>
> 1. What mappings are there to other georeferencing schemes with richer
> semantics? RFC 5870 mentions offers a non-normative one to parts of GML.
> Pete and you seem to be discussing possible mappings to DwC.  At the very
> least, I would agree with any subtext in that discussion that such a mapping
> and best practices for its use be the subject of a TDWG applicability
> statement or other non-normative document.
>
> 2. The questions implicit in a recent posting of Steve Baskauf:
> a. To the extent that LOD or other uses of RDF require or urge an http URI,
> what provides the mapping of the now approved IANA geo URI scheme to the
> IANA http URI scheme, and what should the http service status values be?
> (Maybe service status is a separate question. But LOD as used seems to
> depend in practice on conventions about http as a service protocol, not just
> as a URI scheme.)
> b. What defines the (semantics of the) dereferencing of a geo URI?
> http://tools.ietf.org/html/rfc5870#section-5 is rather spare on this
> point, but has the warning:
> "Currently, just one operation on a 'geo' URI is defined - location
>
>    dereference: in that operation, a client dereferences the URI by
>    extracting the geographical coordinates from the URI path component
>    <geo-path>.  Further use of those coordinates (and the uncertainty
>    value from <uval>) is then up to the application processing the URI,
>    and might depend on the context of the URI."
>
>
>
> Bob
>
>
>
>
>
>
> On Sun, Feb 27, 2011 at 11:05 AM, John Wieczorek <tuco at berkeley.edu>wrote:
>
>> The number of digits given is definitely is not a good substitution
>> for this for many reasons, just one of which is that the the original
>> may have been captured in a different coordinate system (such as
>> degrees decimal minutes - the most precise coordinate system other
>> than UTM or other meter-based systems when recording data from a GPS)
>> and then converted to decimal degrees where the number of significant
>> digits then becomes meaningless.
>>
>> Happily, the Darwin Core also has a dwc:coordinatePrecision term
>> (http://rs.tdwg.org/dwc/terms/index.htm#coordinatePrecision), which
>> can say explicitly what the level of precision is in the coordinates
>> given. The dwc:coordinateUncertaintyInMeters
>> (http://rs.tdwg.org/dwc/terms/index.htm#coordinateUncertaintyInMeters)
>> is supposed to account for all sources of uncertainty in the
>> coordinates given, including GPS accuracy and coordinate precision.
>>
>> For the sake of completeness, there are the terms
>> dwc:pointRadiusSpatialFit
>> (http://rs.tdwg.org/dwc/terms/index.htm#pointRadiusSpatialFit) to
>> capture analytically how well the point-radius given matches the
>> actual uncertainty of the coordinates given (in case someone
>> artificially adds uncertainty), the dwc:georeferenceProtocol
>> (http://rs.tdwg.org/dwc/terms/index.htm#georeferenceProtocol) to
>> explain the method used to georeference, and the
>> dwc:dataGeneralizations
>> (http://rs.tdwg.org/dwc/terms/index.htm#dataGeneralizations) to
>> explain what was done to the georeference post-facto to obscure it.
>>
>> On Sat, Feb 26, 2011 at 4:04 PM, Peter DeVries <pete.devries at gmail.com>
>> wrote:
>> > Hi Bob,
>> > Yes an estimate of the precision / extent should be recorded by the
>> original
>> > observer.
>> > This has been repeated several times and it is interesting that even
>> TDWG
>> > did not incorporate this into their data collection.
>> > What I was proposing was a specific extension to the ietf proposal for
>> > occurrence records.
>> > It adds something very similar to pointRadiusSpatialFit to a latitude
>> and
>> > longitude.
>> > By standardizing on the significant digits we gain something even before
>> > there is general software support for this standard.
>> > That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent
>> > URN, while.
>> >  "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are
>> not
>> > That allows those records to be linked within a triple or quadstore.
>> > As in this earlier example:
>> >     Here is a browsable view of one of the
>> > areas bit.ly  http://bit.ly/hBtVFL
>> >
>> >
>> http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D100
>> > Without doing anything other than standardizing on the number of digits,
>> > occurrences attached to the same GPS reading are linked in both a triple
>> > store and a google search.
>> > Where as software needs to be written that
>> > interprets  "geo:41.53000000,-70.67000000;u=100"
>> > and "geo:41.53,-70.67;u=100" as equivalent.
>> > Try Googling "geo:44.86294500,-87.23120400;u=10"
>> > If the ietf.org standard is supported in future versions of Virtuoso
>> and
>> > other tools then we would not need to include the redundant use of
>> geo:lat
>> > geo:long for the dynamic maps.
>> > - Pete
>> >
>> > On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris <morris.bob at gmail.com>
>> wrote:
>> >>
>> >> Your arguably reasonable recoding of the geo uri's of your example
>> >> illustrates an issue on which so much metadata is silent: provenance.
>> Once
>> >> exposed, it is probably impossible for someone to know how the
>> uncertainty
>> >> (or any other data that might be the subject of opinion or estimate)
>> was
>> >> determined and whether the data is fit for some particular purpose,
>> e.g.
>> >> that the species were observed near each other.
>> >> BTW, the IETF geo proposal was adopted in 2010, in the final form given
>> >> at http://tools.ietf.org/html/rfc5870 . One interesting point
>> >> is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says
>> >>   "Note: The number of digits of the values in <coordinates> MUST NOT
>> be
>> >> interpreted as an indication to the level of uncertainty." The section
>> >> following is also interesting, albeit irrelevant for your procedure. It
>> >> implies that when uncertainty is omitted (and therefore unknown), then
>> >> "geo:41.53000000,-70.67000000"  and "geo:41.53,-70.67"  identify  the
>> same
>> >> geo resource.
>> >>
>> >> Bob Morris
>> >> On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries at gmail.com
>> >
>> >> wrote:
>> >>>
>> >>> [...]
>> >>>
>> >>> 5) I added in my proposed "area" so that it is easy to see what
>> species
>> >>> were observed near each other. Since there was no measure of radius in
>> these
>> >>> longitude and latitudes I made the radius 100 meters.
>> >>>     Normally I would estimate the radius for a GPS reading to be
>> within
>> >>> 10 meters but some of these observations were made where the GPS
>> reading was
>> >>> taken and the readings were given only to two decimals.
>> >>> Area = long, lat; radius in meters following the ietf proposal but
>> with
>> >>> the precision of the long and lat standardized
>> >>> example "geo:41.53000000,-70.67000000;u=100"
>> >>> [...]
>> >>
>> >> --
>> >> Robert A. Morris
>> >> Emeritus Professor  of Computer Science
>> >> UMASS-Boston
>> >> 100 Morrissey Blvd
>> >> Boston, MA 02125-3390
>> >> Associate, Harvard University Herbaria
>> >> email: morris.bob at gmail.com
>> >> web: http://efg.cs.umb.edu/
>> >> web: http://etaxonomy.org/mw/FilteredPush
>> >> http://www.cs.umb.edu/~ram
>> >> phone (+1) 857 222 7992 (mobile)
>> >>
>> >
>> >
>> >
>> > --
>> > ---------------------------------------------------------------
>> > Pete DeVries
>> > Department of Entomology
>> > University of Wisconsin - Madison
>> > 445 Russell Laboratories
>> > 1630 Linden Drive
>> > Madison, WI 53706
>> > TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
>> > About the GeoSpecies Knowledge Base
>> > ------------------------------------------------------------
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content at lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>> >
>>
>
>
>
> --
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
>
>


-- 
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110301/2887443d/attachment.html 


More information about the tdwg-content mailing list