data provenance; was Re: Updated TDWG BioBlitz RDF Example with Pivot View and Data Browsing
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other.
BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT beinterpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
Bob Morris
On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries pete.devries@gmail.comwrote:
[...]
- I added in my proposed "area" so that it is easy to see what species
were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals.
Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100"
[...]
Hi Bob,
Yes an estimate of the precision / extent should be recorded by the original observer.
This has been repeated several times and it is interesting that even TDWG did not incorporate this into their data collection.
What I was proposing was a specific extension to the ietf proposal for occurrence records.
It adds something very similar to pointRadiusSpatialFit to a latitude and longitude.
By standardizing on the significant digits we gain something even before there is general software support for this standard.
That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent URN, while.
"geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are not
That allows those records to be linked within a triple or quadstore.
As in this earlier example:
Here is a browsable view of one of the areas bit.ly http://bit.ly/hBtVFL
http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D1...
Without doing anything other than standardizing on the number of digits, occurrences attached to the same GPS reading are linked in both a triple store and a google search.
Where as software needs to be written that interprets "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" as equivalent.
Try Googling "geo:44.86294500,-87.23120400;u=10"
If the ietf.org standard is supported in future versions of Virtuoso and other tools then we would not need to include the redundant use of geo:lat geo:long for the dynamic maps.
- Pete
On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris morris.bob@gmail.com wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other.
BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT beinterpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
Bob Morris
On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries pete.devries@gmail.comwrote:
[...]
- I added in my proposed "area" so that it is easy to see what species
were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals.
Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100"
[...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
The number of digits given is definitely is not a good substitution for this for many reasons, just one of which is that the the original may have been captured in a different coordinate system (such as degrees decimal minutes - the most precise coordinate system other than UTM or other meter-based systems when recording data from a GPS) and then converted to decimal degrees where the number of significant digits then becomes meaningless.
Happily, the Darwin Core also has a dwc:coordinatePrecision term (http://rs.tdwg.org/dwc/terms/index.htm#coordinatePrecision), which can say explicitly what the level of precision is in the coordinates given. The dwc:coordinateUncertaintyInMeters (http://rs.tdwg.org/dwc/terms/index.htm#coordinateUncertaintyInMeters) is supposed to account for all sources of uncertainty in the coordinates given, including GPS accuracy and coordinate precision.
For the sake of completeness, there are the terms dwc:pointRadiusSpatialFit (http://rs.tdwg.org/dwc/terms/index.htm#pointRadiusSpatialFit) to capture analytically how well the point-radius given matches the actual uncertainty of the coordinates given (in case someone artificially adds uncertainty), the dwc:georeferenceProtocol (http://rs.tdwg.org/dwc/terms/index.htm#georeferenceProtocol) to explain the method used to georeference, and the dwc:dataGeneralizations (http://rs.tdwg.org/dwc/terms/index.htm#dataGeneralizations) to explain what was done to the georeference post-facto to obscure it.
On Sat, Feb 26, 2011 at 4:04 PM, Peter DeVries pete.devries@gmail.com wrote:
Hi Bob, Yes an estimate of the precision / extent should be recorded by the original observer. This has been repeated several times and it is interesting that even TDWG did not incorporate this into their data collection. What I was proposing was a specific extension to the ietf proposal for occurrence records. It adds something very similar to pointRadiusSpatialFit to a latitude and longitude. By standardizing on the significant digits we gain something even before there is general software support for this standard. That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent URN, while. "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are not That allows those records to be linked within a triple or quadstore. As in this earlier example: Here is a browsable view of one of the areas bit.ly http://bit.ly/hBtVFL
http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D1... Without doing anything other than standardizing on the number of digits, occurrences attached to the same GPS reading are linked in both a triple store and a google search. Where as software needs to be written that interprets "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" as equivalent. Try Googling "geo:44.86294500,-87.23120400;u=10" If the ietf.org standard is supported in future versions of Virtuoso and other tools then we would not need to include the redundant use of geo:lat geo:long for the dynamic maps.
- Pete
On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris morris.bob@gmail.com wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other. BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870%C2%A0. One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3%C2%A0which says "Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
Bob Morris On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries pete.devries@gmail.com wrote:
[...]
- I added in my proposed "area" so that it is easy to see what species
were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Ah, well, the point I was raising was only about geo as a URI scheme, the subject of the now adopted IETF RFC 5870. To my mind this raises several distinguishable questions (whose answers may not be independent):
1. What mappings are there to other georeferencing schemes with richer semantics? RFC 5870 mentions offers a non-normative one to parts of GML. Pete and you seem to be discussing possible mappings to DwC. At the very least, I would agree with any subtext in that discussion that such a mapping and best practices for its use be the subject of a TDWG applicability statement or other non-normative document.
2. The questions implicit in a recent posting of Steve Baskauf: a. To the extent that LOD or other uses of RDF require or urge an http URI, what provides the mapping of the now approved IANA geo URI scheme to the IANA http URI scheme, and what should the http service status values be? (Maybe service status is a separate question. But LOD as used seems to depend in practice on conventions about http as a service protocol, not just as a URI scheme.) b. What defines the (semantics of the) dereferencing of a geo URI? http://tools.ietf.org/html/rfc5870#section-5 is rather spare on this point, but has the warning: "Currently, just one operation on a 'geo' URI is defined - location
dereference: in that operation, a client dereferences the URI by extracting the geographical coordinates from the URI path component <geo-path>. Further use of those coordinates (and the uncertainty value from <uval>) is then up to the application processing the URI, and might depend on the context of the URI."
Bob
On Sun, Feb 27, 2011 at 11:05 AM, John Wieczorek tuco@berkeley.edu wrote:
The number of digits given is definitely is not a good substitution for this for many reasons, just one of which is that the the original may have been captured in a different coordinate system (such as degrees decimal minutes - the most precise coordinate system other than UTM or other meter-based systems when recording data from a GPS) and then converted to decimal degrees where the number of significant digits then becomes meaningless.
Happily, the Darwin Core also has a dwc:coordinatePrecision term (http://rs.tdwg.org/dwc/terms/index.htm#coordinatePrecision), which can say explicitly what the level of precision is in the coordinates given. The dwc:coordinateUncertaintyInMeters (http://rs.tdwg.org/dwc/terms/index.htm#coordinateUncertaintyInMeters) is supposed to account for all sources of uncertainty in the coordinates given, including GPS accuracy and coordinate precision.
For the sake of completeness, there are the terms dwc:pointRadiusSpatialFit (http://rs.tdwg.org/dwc/terms/index.htm#pointRadiusSpatialFit) to capture analytically how well the point-radius given matches the actual uncertainty of the coordinates given (in case someone artificially adds uncertainty), the dwc:georeferenceProtocol (http://rs.tdwg.org/dwc/terms/index.htm#georeferenceProtocol) to explain the method used to georeference, and the dwc:dataGeneralizations (http://rs.tdwg.org/dwc/terms/index.htm#dataGeneralizations) to explain what was done to the georeference post-facto to obscure it.
On Sat, Feb 26, 2011 at 4:04 PM, Peter DeVries pete.devries@gmail.com wrote:
Hi Bob, Yes an estimate of the precision / extent should be recorded by the
original
observer. This has been repeated several times and it is interesting that even TDWG did not incorporate this into their data collection. What I was proposing was a specific extension to the ietf proposal for occurrence records. It adds something very similar to pointRadiusSpatialFit to a latitude and longitude. By standardizing on the significant digits we gain something even before there is general software support for this standard. That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent URN, while. "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are
not
That allows those records to be linked within a triple or quadstore. As in this earlier example: Here is a browsable view of one of the areas bit.ly http://bit.ly/hBtVFL
http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D1...
Without doing anything other than standardizing on the number of digits, occurrences attached to the same GPS reading are linked in both a triple store and a google search. Where as software needs to be written that interprets "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" as equivalent. Try Googling "geo:44.86294500,-87.23120400;u=10" If the ietf.org standard is supported in future versions of Virtuoso and other tools then we would not need to include the redundant use of
geo:lat
geo:long for the dynamic maps.
- Pete
On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris morris.bob@gmail.com
wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance.
Once
exposed, it is probably impossible for someone to know how the
uncertainty
(or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other. BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the
same
geo resource.
Bob Morris On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries pete.devries@gmail.com wrote:
[...]
- I added in my proposed "area" so that it is easy to see what species
were observed near each other. Since there was no measure of radius in
these
longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS
reading was
taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Bob,
I thought it might be useful to elaborate on the geoareas.
My hope is that these could be incorporated into the DarwinCore so that an Area would look something like this.
<dwc:Area about="geo:41.53000000,-70.67000000;u=100"> <rdf:type resource="http://purl.org/dc/terms/Location%22/%3E <dwc:georeferenceMethod resource=" http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps%22%3E <dwc_area:areaWithInFeature rdf:resource=" http://sws.geonames.org/4929772/%22/%3E <wdrs:describedby rdf:resource=" http://my_organization.com/occurrence/123.rdf%22/%3E </dwc:Area>
The ietf.org came up with the following proposal for dealing with location data.
http://tools.ietf.org/html/rfc5870
http://tools.ietf.org/html/rfc5870What I am proposing is that we incorporate a subset of this into the DarwinCore.
That is the latitude and longitude and a uncertainty in meters (radius)
For now Virtuoso does not understand that this is a mappable thing so I still include the regular geo:lat and geo:long in my mapping examples.
OK, so what is the main advantage?
If I create URI's for insect collection areas in my namespace ( http://lod.taxonconcept.org/#####)
And Steve has plant collections from those same geographic locations that you identify in your namespace (http://bioimages.org/#####)
Our two datasets, about the same place, are not linked.
But if Steve marks up his data with the geoarea dwc:Area resource="geo:41.53000000,-70.67000000;u=100" it would be possible for a consuming service to see what observations have identical locations or overlap.
A link back to the RDF that makes a statement about a particular geoarea is included using the *wdrs* vocabulary.
This allows a consumer to find the original RDF that contained statements about a particular geoarea. (provenance)
Does this make sense?
It is likely that what I am proposing will need to be tweaked in someway that I have not anticipated.
I would like it them to still work as a subset of the more complicated ietf.org proposal.
That is tools and services that understand the ietf.org standard will be able to consume the geoarea's.
This already work as urn's in a triple/quadstore, allowing occurrences tied to the same GPS reading to be linked together.
If these catch on I suspect Virtuoso and others will start supporting them so that entities like "geo:41.53000000,-70.67000000;u=100" are seen as mappable things and processed accordingly.
This does not offer everything that PointRadiusSpatialFit, coordinateUncertaintyInMeters provides but I think that these areas are more efficient, easier for people to understand and more likely to be adopted. I think we should keep PointRadiusSpatialFit, coordinateUncertaintyInMeters in the DarwinCore for those who want to use them.
Standardizing on some set of decimal points makes the standard easier for producers to implement, and the areas easier to compare as simple strings and urn's.
The current ontology is here and I am open to having it tweaked and improved.
http://lod.taxonconcept.org/ontology/dwc_area.owl
OntDoc http://lod.taxonconcept.org/ontology/dwc_area_doc/index.html
* I was thinking that it might make sense to add support for linking to Yahoo Placefinder's Where On Earth ID (WOEID) in addition to Geonames
http://developer.yahoo.com/geo/placefinder/
Respectfully,
- Pete
On Sun, Feb 27, 2011 at 12:14 PM, Bob Morris morris.bob@gmail.com wrote:
Ah, well, the point I was raising was only about geo as a URI scheme, the subject of the now adopted IETF RFC 5870. To my mind this raises several distinguishable questions (whose answers may not be independent):
- What mappings are there to other georeferencing schemes with richer
semantics? RFC 5870 mentions offers a non-normative one to parts of GML. Pete and you seem to be discussing possible mappings to DwC. At the very least, I would agree with any subtext in that discussion that such a mapping and best practices for its use be the subject of a TDWG applicability statement or other non-normative document.
- The questions implicit in a recent posting of Steve Baskauf:
a. To the extent that LOD or other uses of RDF require or urge an http URI, what provides the mapping of the now approved IANA geo URI scheme to the IANA http URI scheme, and what should the http service status values be? (Maybe service status is a separate question. But LOD as used seems to depend in practice on conventions about http as a service protocol, not just as a URI scheme.) b. What defines the (semantics of the) dereferencing of a geo URI? http://tools.ietf.org/html/rfc5870#section-5 is rather spare on this point, but has the warning: "Currently, just one operation on a 'geo' URI is defined - location
dereference: in that operation, a client dereferences the URI by extracting the geographical coordinates from the URI path component <geo-path>. Further use of those coordinates (and the uncertainty value from <uval>) is then up to the application processing the URI, and might depend on the context of the URI."
Bob
On Sun, Feb 27, 2011 at 11:05 AM, John Wieczorek tuco@berkeley.eduwrote:
The number of digits given is definitely is not a good substitution for this for many reasons, just one of which is that the the original may have been captured in a different coordinate system (such as degrees decimal minutes - the most precise coordinate system other than UTM or other meter-based systems when recording data from a GPS) and then converted to decimal degrees where the number of significant digits then becomes meaningless.
Happily, the Darwin Core also has a dwc:coordinatePrecision term (http://rs.tdwg.org/dwc/terms/index.htm#coordinatePrecision), which can say explicitly what the level of precision is in the coordinates given. The dwc:coordinateUncertaintyInMeters (http://rs.tdwg.org/dwc/terms/index.htm#coordinateUncertaintyInMeters) is supposed to account for all sources of uncertainty in the coordinates given, including GPS accuracy and coordinate precision.
For the sake of completeness, there are the terms dwc:pointRadiusSpatialFit (http://rs.tdwg.org/dwc/terms/index.htm#pointRadiusSpatialFit) to capture analytically how well the point-radius given matches the actual uncertainty of the coordinates given (in case someone artificially adds uncertainty), the dwc:georeferenceProtocol (http://rs.tdwg.org/dwc/terms/index.htm#georeferenceProtocol) to explain the method used to georeference, and the dwc:dataGeneralizations (http://rs.tdwg.org/dwc/terms/index.htm#dataGeneralizations) to explain what was done to the georeference post-facto to obscure it.
On Sat, Feb 26, 2011 at 4:04 PM, Peter DeVries pete.devries@gmail.com wrote:
Hi Bob, Yes an estimate of the precision / extent should be recorded by the
original
observer. This has been repeated several times and it is interesting that even
TDWG
did not incorporate this into their data collection. What I was proposing was a specific extension to the ietf proposal for occurrence records. It adds something very similar to pointRadiusSpatialFit to a latitude
and
longitude. By standardizing on the significant digits we gain something even before there is general software support for this standard. That records with "geo:41.53000000,-70.67000000;u=100" are an equivalent URN, while. "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" are
not
That allows those records to be linked within a triple or quadstore. As in this earlier example: Here is a browsable view of one of the areas bit.ly http://bit.ly/hBtVFL
http://lsd.taxonconcept.org/describe/?url=geo:41.53000000,-70.67000000;u%3D1...
Without doing anything other than standardizing on the number of digits, occurrences attached to the same GPS reading are linked in both a triple store and a google search. Where as software needs to be written that interprets "geo:41.53000000,-70.67000000;u=100" and "geo:41.53,-70.67;u=100" as equivalent. Try Googling "geo:44.86294500,-87.23120400;u=10" If the ietf.org standard is supported in future versions of Virtuoso
and
other tools then we would not need to include the redundant use of
geo:lat
geo:long for the dynamic maps.
- Pete
On Sat, Feb 26, 2011 at 1:01 PM, Bob Morris morris.bob@gmail.com
wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance.
Once
exposed, it is probably impossible for someone to know how the
uncertainty
(or any other data that might be the subject of opinion or estimate)
was
determined and whether the data is fit for some particular purpose,
e.g.
that the species were observed near each other. BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT
be
interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the
same
geo resource.
Bob Morris On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries@gmail.com
wrote:
[...]
- I added in my proposed "area" so that it is easy to see what
species
were observed near each other. Since there was no measure of radius in
these
longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be
within
10 meters but some of these observations were made where the GPS
reading was
taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but
with
the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
Pete, This topic has come up several times before and each time I've left the email sitting in my inbox with the intention of trying to understand it better. I guess what I don't understand is what one "does" with it. I suppose I could dig in and do some research, but the multiple times the message has sat in my inbox without action tells me that I'm probably not going to get around to doing that. So maybe you can explain it further.
Is this supposed to be usable as a "Linked Data" resource (i.e. object of a predicate such as "hasLocation")? It isn't an HTTP URI, so it can't get dereferenced via http. I guess one parses it as a string and interprets the string directly based on some rules. Isn't that a no-no in the GUID rules? I guess with enough community support, applications would know what do do with it, but look what happened with "urn:lsid:my_organization.com:location:123". Web browsers and "regular" Linked Data clients (e.g. Linked Data browsers) didn't know what to do with it.
The whole issue of provenance is pretty satisfactorily handled in the existing Darwin Core. There are a multitude of terms available to express all kinds of uncertainty and shapes. For example, if I create a record like:
<rdf:Description about="http://my_organization.com/location/123%22%3E <rdf:type resource="http://purl.org/dc/terms/Location%22/%3E geo:lat36.144719</geo:lat> geo:long-86.801498</geo:long>
dwc:coordinateUncertaintyInMeters1000</dwc:coordinateUncertaintyInMeters> dwc:georeferenceRemarksLocation determined from Google maps</dwc:georeferenceRemarks> </rdf:Description>
I very explicitly express the type of thing, geocoordinates, datum (implicit in the use of geo:), uncertainty, and method of generating the data. If necessary, I could also use dwc:dataGeneralizations and dwc:informationWithheld to explain how and why I have provided less precise coordinates than I actually know. This is clearly more verbose, but hey, Linked data in xml IS verbose, and existing Linked Data applications would be able to "understand" something like what I wrote without any kind of special "plug-in" to interpret the spring. I guess I don't really understand why you are proposing this, especially since you are a passionate advocate of Linked Data. Your proposed thing is more succinct, but it doesn't seem like it would be usable by normal Linked Data clients.
Steve
Bob Morris wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other.
BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
Bob Morris
On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries@gmail.com mailto:pete.devries@gmail.com> wrote:
[...] 5) I added in my proposed "area" so that it is easy to see what species were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com mailto:morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram http://www.cs.umb.edu/%7Eram phone (+1) 857 222 7992 (mobile)
Hi Steve,
This is mainly and advantage if everyone supports the ietf.org proposal in the way they support the geo vocabulary.
The way to think about this is ietf geo could become a well unknown URN.
The reason to have a standard number of digits is not as a measure of precision, but to make the units, strings and urns consistent for both the LOD and Google.
What I would like is a standard efficient way to markup potentially billions of occurrences, that can be correctly interpreted and mapped using current and future LOD tools.
Note that your use of geo is not standard DarwinCore. What is the official word from the DarwinCore Illuminati on the use of geo?
I am sympathetic to the need for some measure of radius, pointSpatialFit, coordinateUncertaintyInMeters etc. but adoption of these has been poor.
I think the "radius" and area form that I am proposing is easy for providers to understand and for tools to interpret.
It probably maps to dwc:coordinateUncertaintyInMeters
Here is a modification of your earlier location example using a TDWG BioBlitz record. Note how I replace some standard vocabulary terms with URI's.
Those things with txn, could be incorporated in to the DarwinCore.
<!-- The who, what, where and when and how of the observation, as efficient as I can make it but with a literal for scientific name (label) -->
<dwc:Occurrence about=" http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607%22%3E rdfs:labelOBS: Branta canadensis</rdfs:label> <dwc:Area resource="geo:41.53000000,-70.67000000;u=100"> <txn:occurrenceHasSpeciesConcept rdf:resource=" http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource=" http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::occurrenceHasIndividual rdf:resource=" http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607%22/%3E <foaf:depiction rdf:resource=" http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <txn:basisOfRecord rdf:resource=" http://lod.taxonconcept.org/ontology/txn.owl#BasisOfRecord_StillImage%22/%3E dcterms:date2010-09-29</dcterms:date> </dwc:Occurrence>
<!-- The area is is a Location and the georeference method is included as metadata. wdrs is used to link the area to the RDF. --> <!-- I will change the Area ontology so that it is a subclass of dcterms:Location --> <!-- If others make statements about this particular area, their RDF's wdrs will create the provenence links -->
<dwc:Area about="geo:41.53000000,-70.67000000;u=100"> <rdf:type resource="http://purl.org/dc/terms/Location%22/%3E <dwc:georeferenceMethod resource=" http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps%22%3E <dwc_area:areaWithInFeature rdf:resource=" http://sws.geonames.org/4929772/%22/%3E <wdrs:describedby rdf:resource=" http://my_organization.com/occurrence/123.rdf%22/%3E </dwc:Area>
<!-- Like the occurrence the individual is only one thing but different people make assertions about what that thing is -->
<dwc:Individual about=" http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607%22%3E rdfs:labelIND_1607: Branta canadensis</rdfs:label> <txn:occurrenceHasSpeciesConcept rdf:resource=" http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource=" http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::individualHasOccurrence rdf:resource=" http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607%22/%3E <foaf:depiction rdf:resource=" http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <!-- Identification history should be part of the documentation of the individual --> <txn:individualHasCurrrentIdentificationAssertion rdf:resource=" http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_2%22... <txn:individualHasPreviousIdentificationAssertion rdf:resource=" http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_1%22... </dwc:Individual>
Respectfully,
- Pete
On Sun, Feb 27, 2011 at 6:55 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
Pete, This topic has come up several times before and each time I've left the email sitting in my inbox with the intention of trying to understand it better. I guess what I don't understand is what one "does" with it. I suppose I could dig in and do some research, but the multiple times the message has sat in my inbox without action tells me that I'm probably not going to get around to doing that. So maybe you can explain it further.
Is this supposed to be usable as a "Linked Data" resource (i.e. object of a predicate such as "hasLocation")? It isn't an HTTP URI, so it can't get dereferenced via http. I guess one parses it as a string and interprets the string directly based on some rules. Isn't that a no-no in the GUID rules? I guess with enough community support, applications would know what do do with it, but look what happened with "urn:lsid:my_organization.com:location:123". Web browsers and "regular" Linked Data clients (e.g. Linked Data browsers) didn't know what to do with it.
The whole issue of provenance is pretty satisfactorily handled in the existing Darwin Core. There are a multitude of terms available to express all kinds of uncertainty and shapes. For example, if I create a record like:
<rdf:Description about="http://my_organization.com/location/123"http://my_organization.com/location/123
<rdf:type resource="http://purl.org/dc/terms/Location"<http://purl.org/dc/terms/Location>
/> geo:lat36.144719</geo:lat> geo:long-86.801498</geo:long>
dwc:coordinateUncertaintyInMeters1000</dwc:coordinateUncertaintyInMeters> dwc:georeferenceRemarksLocation determined from Google maps</dwc:georeferenceRemarks> </rdf:Description>
I very explicitly express the type of thing, geocoordinates, datum (implicit in the use of geo:), uncertainty, and method of generating the data. If necessary, I could also use dwc:dataGeneralizations and dwc:informationWithheld to explain how and why I have provided less precise coordinates than I actually know. This is clearly more verbose, but hey, Linked data in xml IS verbose, and existing Linked Data applications would be able to "understand" something like what I wrote without any kind of special "plug-in" to interpret the spring. I guess I don't really understand why you are proposing this, especially since you are a passionate advocate of Linked Data. Your proposed thing is more succinct, but it doesn't seem like it would be usable by normal Linked Data clients.
Steve
Bob Morris wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other.
BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT beinterpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
Bob Morris
On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries pete.devries@gmail.comwrote:
[...]
- I added in my proposed "area" so that it is easy to see what species
were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals.
Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100"
[...]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Comments/questions inline
Peter DeVries wrote:
Hi Steve,
This is mainly and advantage if everyone supports the ietf.org http://ietf.org proposal in the way they support the geo vocabulary.
Well, up to this point, I've seen people using the geo: vocabularly as it was described in http://www.w3.org/2003/01/geo/ , i.e. as an RDF predicate to define string literal property elements. The use as you've described it is new to me.
The way to think about this is ietf geo could become a well unknown URN.
Speaking completely out of ignorance, are there any LOD applications that know how to interpret URIs that aren't HTTP URIs? I mean, if you have RDF that has something like geo:41.53000000,-70.67000000 as an object, is there any application that "understands" it? I learned in the last few months that it's "legal" to have an LSID as an object in an RDF triple, but that doesn't mean that any application will know what to do with it.
......
Note that your use of geo is not standard DarwinCore. What is the official word from the DarwinCore Illuminati on the use of geo?
Well there was a proposal on the table for including geo:lat, geo:long in the Darwin Core standard. As far as I know, there hasn't been any movement on that proposal (haven't checked recently). But I really don't feel compelled to use Darwin Core exclusively in RDF that I write. If a vocabulary like FOAF is more "well-known" for describing people, I don't think there is any reason not to use it. I think at this point, geo:lat is more well-known than dwc:decimalLatitude (plus specifying the dwc:geodedicDatum isn't required with geo: since WGS84 is assumed).
Thanks for the comments, Steve
I am sympathetic to the need for some measure of radius, pointSpatialFit, coordinateUncertaintyInMeters etc. but adoption of these has been poor.
I think the "radius" and area form that I am proposing is easy for providers to understand and for tools to interpret.
It probably maps to dwc:coordinateUncertaintyInMeters
Here is a modification of your earlier location example using a TDWG BioBlitz record. Note how I replace some standard vocabulary terms with URI's.
Those things with txn, could be incorporated in to the DarwinCore.
<!-- The who, what, where and when and how of the observation, as efficient as I can make it but with a literal for scientific name (label) -->
<dwc:Occurrence about="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607"> rdfs:labelOBS: Branta canadensis</rdfs:label> <dwc:Area resource="geo:41.53000000,-70.67000000;u=100"> <txn:occurrenceHasSpeciesConcept rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::occurrenceHasIndividual rdf:resource="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607"/> <foaf:depiction rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <txn:basisOfRecord rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#BasisOfRecord_StillImage%22/%3E dcterms:date2010-09-29</dcterms:date> </dwc:Occurrence>
<!-- The area is is a Location and the georeference method is included as metadata. wdrs is used to link the area to the RDF. -->
<!-- I will change the Area ontology so that it is a subclass of dcterms:Location -->
<!-- If others make statements about this particular area, their RDF's wdrs will create the provenence links -->
<dwc:Area about="geo:41.53000000,-70.67000000;u=100"> <rdf:type resource="http://purl.org/dc/terms/Location%22/%3E <dwc:georeferenceMethod resource="http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps%22%3E <dwc_area:areaWithInFeature rdf:resource="http://sws.geonames.org/4929772/%22/%3E <wdrs:describedby rdf:resource="http://my_organization.com/occurrence/123.rdf%22/%3E </dwc:Area>
<!-- Like the occurrence the individual is only one thing but different people make assertions about what that thing is -->
<dwc:Individual about="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607"> rdfs:labelIND_1607: Branta canadensis</rdfs:label> <txn:occurrenceHasSpeciesConcept rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::individualHasOccurrence rdf:resource="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607"/> <foaf:depiction rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <!-- Identification history should be part of the documentation of the individual --> <txn:individualHasCurrrentIdentificationAssertion rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_2 http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_2"/> <txn:individualHasPreviousIdentificationAssertion rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_1 http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_1"/> </dwc:Individual>
Respectfully,
- Pete
On Sun, Feb 27, 2011 at 6:55 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Pete, This topic has come up several times before and each time I've left the email sitting in my inbox with the intention of trying to understand it better. I guess what I don't understand is what one "does" with it. I suppose I could dig in and do some research, but the multiple times the message has sat in my inbox without action tells me that I'm probably not going to get around to doing that. So maybe you can explain it further. Is this supposed to be usable as a "Linked Data" resource (i.e. object of a predicate such as "hasLocation")? It isn't an HTTP URI, so it can't get dereferenced via http. I guess one parses it as a string and interprets the string directly based on some rules. Isn't that a no-no in the GUID rules? I guess with enough community support, applications would know what do do with it, but look what happened with "urn:lsid:my_organization.com:location:123". Web browsers and "regular" Linked Data clients (e.g. Linked Data browsers) didn't know what to do with it. The whole issue of provenance is pretty satisfactorily handled in the existing Darwin Core. There are a multitude of terms available to express all kinds of uncertainty and shapes. For example, if I create a record like: <rdf:Description about="http://my_organization.com/location/123" <http://my_organization.com/location/123>> <rdf:type resource="http://purl.org/dc/terms/Location" <http://purl.org/dc/terms/Location>/> <geo:lat>36.144719</geo:lat> <geo:long>-86.801498</geo:long> <dwc:coordinateUncertaintyInMeters>1000</dwc:coordinateUncertaintyInMeters> <dwc:georeferenceRemarks>Location determined from Google maps</dwc:georeferenceRemarks> </rdf:Description> I very explicitly express the type of thing, geocoordinates, datum (implicit in the use of geo:), uncertainty, and method of generating the data. If necessary, I could also use dwc:dataGeneralizations and dwc:informationWithheld to explain how and why I have provided less precise coordinates than I actually know. This is clearly more verbose, but hey, Linked data in xml IS verbose, and existing Linked Data applications would be able to "understand" something like what I wrote without any kind of special "plug-in" to interpret the spring. I guess I don't really understand why you are proposing this, especially since you are a passionate advocate of Linked Data. Your proposed thing is more succinct, but it doesn't seem like it would be usable by normal Linked Data clients. Steve Bob Morris wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other. BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource. Bob Morris On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries@gmail.com <mailto:pete.devries@gmail.com>> wrote: [...] 5) I added in my proposed "area" so that it is easy to see what species were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...] -- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com <mailto:morris.bob@gmail.com> web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram <http://www.cs.umb.edu/%7Eram> phone (+1) 857 222 7992 (mobile)
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
Comments/questions inline
Peter DeVries wrote:
Hi Steve,
This is mainly and advantage if everyone supports the ietf.org http://ietf.org proposal in the way they support the geo vocabulary.
Well, up to this point, I've seen people using the geo: vocabularly as it was described in http://www.w3.org/2003/01/geo/ , i.e. as an RDF predicate to define string literal property elements. The use as you've described it is new to me.
The way to think about this is ietf geo could become a well unknown URN.
Speaking completely out of ignorance, are there any LOD applications that know how to interpret URIs that aren't HTTP URIs? I mean, if you have RDF that has something like geo:41.53000000,-70.67000000 as an object, is there any application that "understands" it? I learned in the last few months that it's "legal" to have an LSID as an object in an RDF triple, but that doesn't mean that any application will know what to do with it.
......
Note that your use of geo is not standard DarwinCore. What is the official word from the DarwinCore Illuminati on the use of geo?
Well there was a proposal on the table for including geo:lat, geo:long in the Darwin Core standard. As far as I know, there hasn't been any movement on that proposal (haven't checked recently). But I really don't feel compelled to use Darwin Core exclusively in RDF that I write. If a vocabulary like FOAF is more "well-known" for describing people, I don't think there is any reason not to use it. I think at this point, geo:lat is more well-known than dwc:decimalLatitude (plus specifying the dwc:geodedicDatum isn't required with geo: since WGS84 is assumed).
Thanks for the comments, Steve
I am sympathetic to the need for some measure of radius, pointSpatialFit, coordinateUncertaintyInMeters etc. but adoption of these has been poor.
I think the "radius" and area form that I am proposing is easy for providers to understand and for tools to interpret.
It probably maps to dwc:coordinateUncertaintyInMeters
Here is a modification of your earlier location example using a TDWG BioBlitz record. Note how I replace some standard vocabulary terms with URI's.
Those things with txn, could be incorporated in to the DarwinCore.
<!-- The who, what, where and when and how of the observation, as efficient as I can make it but with a literal for scientific name (label) -->
<dwc:Occurrence about="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607"> rdfs:labelOBS: Branta canadensis</rdfs:label> <dwc:Area resource="geo:41.53000000,-70.67000000;u=100"> <txn:occurrenceHasSpeciesConcept rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::occurrenceHasIndividual rdf:resource="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607"/> <foaf:depiction rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <txn:basisOfRecord rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#BasisOfRecord_StillImage%22/%3E dcterms:date2010-09-29</dcterms:date> </dwc:Occurrence>
<!-- The area is is a Location and the georeference method is included as metadata. wdrs is used to link the area to the RDF. -->
<!-- I will change the Area ontology so that it is a subclass of dcterms:Location -->
<!-- If others make statements about this particular area, their RDF's wdrs will create the provenence links -->
<dwc:Area about="geo:41.53000000,-70.67000000;u=100"> <rdf:type resource="http://purl.org/dc/terms/Location%22/%3E <dwc:georeferenceMethod resource="http://rs.tdwg.org/dwc/terms/index.htm#GeoMethod_GoogleMaps%22%3E <dwc_area:areaWithInFeature rdf:resource="http://sws.geonames.org/4929772/%22/%3E <wdrs:describedby rdf:resource="http://my_organization.com/occurrence/123.rdf%22/%3E </dwc:Area>
<!-- Like the occurrence the individual is only one thing but different people make assertions about what that thing is -->
<dwc:Individual about="http://www.cs.umbc.edu/~jsachs/individuals/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/individuals/tdwg2010bioblitz_1607"> rdfs:labelIND_1607: Branta canadensis</rdfs:label> <txn:occurrenceHasSpeciesConcept rdf:resource="http://lod.taxonconcept.org/ses/SeecQ#Species%22/%3E <txn::hasCollector rdf:resource="http://lod.taxonconcept.org/people/tdwg2010bioblitz#Dmitry_Mozzherin%22/%3E <txn::individualHasOccurrence rdf:resource="http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1607 http://www.cs.umbc.edu/%7Ejsachs/occurrences/tdwg2010bioblitz_1607"/> <foaf:depiction rdf:resource="http://farm5.static.flickr.com/4126/5037315500_4c555f742a_b.jpg%22/%3E <!-- Identification history should be part of the documentation of the individual --> <txn:individualHasCurrrentIdentificationAssertion rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_2 http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_2"/> <txn:individualHasPreviousIdentificationAssertion rdf:resource="http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1607_id_1 http://www.cs.umbc.edu/%7Ejsachs/identifications/tdwg2010bioblitz_1607_id_1"/> </dwc:Individual>
Respectfully,
- Pete
On Sun, Feb 27, 2011 at 6:55 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Pete, This topic has come up several times before and each time I've left the email sitting in my inbox with the intention of trying to understand it better. I guess what I don't understand is what one "does" with it. I suppose I could dig in and do some research, but the multiple times the message has sat in my inbox without action tells me that I'm probably not going to get around to doing that. So maybe you can explain it further. Is this supposed to be usable as a "Linked Data" resource (i.e. object of a predicate such as "hasLocation")? It isn't an HTTP URI, so it can't get dereferenced via http. I guess one parses it as a string and interprets the string directly based on some rules. Isn't that a no-no in the GUID rules? I guess with enough community support, applications would know what do do with it, but look what happened with "urn:lsid:my_organization.com:location:123". Web browsers and "regular" Linked Data clients (e.g. Linked Data browsers) didn't know what to do with it. The whole issue of provenance is pretty satisfactorily handled in the existing Darwin Core. There are a multitude of terms available to express all kinds of uncertainty and shapes. For example, if I create a record like: <rdf:Description about="http://my_organization.com/location/123" <http://my_organization.com/location/123>> <rdf:type resource="http://purl.org/dc/terms/Location" <http://purl.org/dc/terms/Location>/> <geo:lat>36.144719</geo:lat> <geo:long>-86.801498</geo:long> <dwc:coordinateUncertaintyInMeters>1000</dwc:coordinateUncertaintyInMeters> <dwc:georeferenceRemarks>Location determined from Google maps</dwc:georeferenceRemarks> </rdf:Description> I very explicitly express the type of thing, geocoordinates, datum (implicit in the use of geo:), uncertainty, and method of generating the data. If necessary, I could also use dwc:dataGeneralizations and dwc:informationWithheld to explain how and why I have provided less precise coordinates than I actually know. This is clearly more verbose, but hey, Linked data in xml IS verbose, and existing Linked Data applications would be able to "understand" something like what I wrote without any kind of special "plug-in" to interpret the spring. I guess I don't really understand why you are proposing this, especially since you are a passionate advocate of Linked Data. Your proposed thing is more succinct, but it doesn't seem like it would be usable by normal Linked Data clients. Steve Bob Morris wrote:
Your arguably reasonable recoding of the geo uri's of your example illustrates an issue on which so much metadata is silent: provenance. Once exposed, it is probably impossible for someone to know how the uncertainty (or any other data that might be the subject of opinion or estimate) was determined and whether the data is fit for some particular purpose, e.g. that the species were observed near each other. BTW, the IETF geo proposal was adopted in 2010, in the final form given at http://tools.ietf.org/html/rfc5870 . One interesting point is http://tools.ietf.org/html/rfc5870#section-3.4.3 which says "Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource. Bob Morris On Wed, Feb 23, 2011 at 4:56 PM, Peter DeVries <pete.devries@gmail.com <mailto:pete.devries@gmail.com>> wrote: [...] 5) I added in my proposed "area" so that it is easy to see what species were observed near each other. Since there was no measure of radius in these longitude and latitudes I made the radius 100 meters. Normally I would estimate the radius for a GPS reading to be within 10 meters but some of these observations were made where the GPS reading was taken and the readings were given only to two decimals. Area = long, lat; radius in meters following the ietf proposal but with the precision of the long and lat standardized example "geo:41.53000000,-70.67000000;u=100" [...] -- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com <mailto:morris.bob@gmail.com> web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram <http://www.cs.umb.edu/%7Eram> phone (+1) 857 222 7992 (mobile)
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
On 27/02/2011, at 6:01 AM, Bob Morris wrote:
"Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
This follows from the definition of xs:float, which RDF (and presumably geo) borrow. These two strings are each representations of the same IEEE floating-point value. Something with a precision is a different datatype.
It seems to me that georeferencing needs a notion of density - some way to express a location as a value at a point and a standard deviation (we assume a normal distribution). A rectangle becomes an integral of these densities at each point - standard deviation of zero corresponding to the usual sharply-defined rectangle. You'd probably also want to supply a cutoff value.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
If you are going to go that far, then you might as well not use the naive normal distribution for the location. See the following two articles if you would like to understand why:
Guo Q., Y. Liu, and J. Wieczorek. 2008. Georeferencing locality descriptions and computing associated uncertainty using a probabilistic approach. International Journal of Geographical Information Science, Vol. 22, No. 10., pp. 1067-1090.
Liu Y., Q.H. Guo, J. Wieczorek, and M.J. Goodchild. 2009. Positioning localities based on spatial assertions. International Journal of Geographical Information Science, Vol. 23, No. 11., pp. 1471-1501.
On Tue, Mar 1, 2011 at 5:40 PM, Paul Murray pmurray@anbg.gov.au wrote:
On 27/02/2011, at 6:01 AM, Bob Morris wrote:
"Note: The number of digits of the values in <coordinates> MUST NOT be interpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
This follows from the definition of xs:float, which RDF (and presumably geo) borrow. These two strings are each representations of the same IEEE floating-point value. Something with a precision is a different datatype. It seems to me that georeferencing needs a notion of density - some way to express a location as a value at a point and a standard deviation (we assume a normal distribution). A rectangle becomes an integral of these densities at each point - standard deviation of zero corresponding to the usual sharply-defined rectangle. You'd probably also want to supply a cutoff value. _______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
How many times have I said that *standardizing on the number of significance does not say anything about the the level of uncertainty.*
Again, a rough approximation of the uncertainty is obtained by the *radius*.
Why do you assume that the measure of extent or accuracy follows a normal distribution?
There is also *no* assumption in *geo* or in the ietf.org proposal that these are typed as a float. They are a string of characters.
The standardization on the number of significant digits is to standardise the resulting geo *urn* as a string.
According to the ietf proposal (if it is adopted) is that "geo:41.53000000,-70.67000000" and geo:41.53,-70.67" identify the same resource, but they are not the same urn and will not be seen as the same urn in a triple/quadstore.
This means that you will not be able to browse between occurrences associated with the same GPS reading, or search for them via Google.
"geo:41.53000000,-70.67000000" and "geo:41.53000000,-70.67000000" will be interpreted as the same urn by triplestores, the same string by Google and the same location by software that correctly interprets the ietf.orgstandard.
Rather than having all the providers rounding to different numbers of significant digits, standardizing on them makes the resulting data more comparable.
- Pete
On Tue, Mar 1, 2011 at 7:40 PM, Paul Murray pmurray@anbg.gov.au wrote:
On 27/02/2011, at 6:01 AM, Bob Morris wrote:
"Note: The number of digits of the values in <coordinates> MUST NOT beinterpreted as an indication to the level of uncertainty." The section following is also interesting, albeit irrelevant for your procedure. It implies that when uncertainty is omitted (and therefore unknown), then "geo:41.53000000,-70.67000000" and "geo:41.53,-70.67" identify the same geo resource.
This follows from the definition of xs:float, which RDF (and presumably geo) borrow. These two strings are each representations of the same IEEE floating-point value. Something with a precision is a different datatype.
It seems to me that georeferencing needs a notion of density - some way to express a location as a value at a point and a standard deviation (we assume a normal distribution). A rectangle becomes an integral of these densities at each point - standard deviation of zero corresponding to the usual sharply-defined rectangle. You'd probably also want to supply a cutoff value.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.
On 02/03/2011, at 2:55 PM, Peter DeVries wrote:
According to the ietf proposal (if it is adopted) is that "geo:41.53000000,-70.67000000" and geo:41.53,-70.67" identify the same resource, but they are not the same urn and will not be seen as the same urn in a triple/quadstore.
Indeed, unless that triple store specifically implements an implied rule that all the different ways of writing a given georeference are "same-as". We can write an OWL rule that specifically handles some particular coordinate (by crafting a regular expression that matches that coordinate), but not one that handles any location.
Time and date has a similar problem. This is managed by date and time being treated as a primitive value with "facets". The specification describes the various lexical forms and how to deal with them, but that specification must be implemented in code.
It seems to me that this would be the way to go with georeferences and polygons. They should be an OWL data (with a datatype), not owl resources. Things that don't know about particular datatypes treat their various lexical representations as opaque. But looking into a bit further, OWL datatypes are limited in expressivity because the restriction value of a datataype restriction must be a literal. There is no easy way to say "a thing is near Sydney if the polygon that's 100km larger than its polygon intersects Sydney's polygon".
Oh well.
Perhaps the real problem is that we are trying to do to much with vocabulary. OWL and RDF are not really built for performing any sort of numerical calculation - there is no add or subtract in the OWL language at all. But if you are serving up SPARQL, then the engine could do some magic. For instance, we could parameterise georeference queries as annotation properties like so:
* declare distanceFromSydney as a subproperty of geo:calculatedDistanceFrom * add annotations to the distanceFromSydney property: * geo:useLocationOf #Sydney
An engine that does geo can now treat everything with a location as also having a property "distanceFromSydney". You search on this with the existing tools for defining classes with numerical values within certain ranges.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
participants (5)
-
Bob Morris
-
John Wieczorek
-
Paul Murray
-
Peter DeVries
-
Steve Baskauf