The geolocations in the *stans apparently have a systematic sign error in the longitude. The one on the Turkish coast is more mysterious. If the actual observer and observation time were available, the strategy for quality control on geolocation outliers would start with correlation of the locations of all the observations that day of the observer whose data assert <geo:lat 38.47 geo:long 27.09> For example, it's fairly unlikely that the observer could get from Massachusetts to Turkey in a few hours.... OTOH, if it's a solitary observation from that observer, it might really be a Turkish observation.
Bob
On Thu, Jan 13, 2011 at 8:55 PM, Peter DeVries pete.devries@gmail.com wrote:
... Also the geoquery works but it seems to return some strange locations? http://bit.ly/dYQXUp
Respectfully,
- Pete
On Thu, Jan 13, 2011 at 6:53 AM, joel sachs jsachs@csee.umbc.edu wrote:
Pete, Thanks - I corrected the geo properties. Joel.
On Wed, 12 Jan 2011, Peter DeVries wrote:
Hi Joel,
Cool :-)
I just loaded this into my SPARQL endpoint.
In the named graph urn:org:linkedopenspeciesdata:dataspace:tdwg2010bioblitz
It consists of 19,990 Triples
Here is one of the dwc:taxonConceptID entries.
*About: http://spire.umbc.edu/ethan/Ampelopsis_brevipedunculata*
http://lsd.taxonconcept.org/describe/?url=http://spire.umbc.edu/ethan/Ampelo...
*About: http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1627*
http://lsd.taxonconcept.org/describe/?url=http://spire.umbc.edu/ethan/Ampelopsis_brevipedunculata
http://lsd.taxonconcept.org/describe/?url=http://www.cs.umbc.edu/~jsachs/occ...
http://lsd.taxonconcept.org/describe/?url=http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1627This should give you an a count of occurrences.
SELECT count(*) WHERE {?s a http://rs.tdwg.org/dwc/terms/#Occurrence};
= 1882
SELECT count(*) WHERE {?s a http://rs.tdwg.org/dwc/terms/#taxonConceptID};
This should give you a list of occurrences
http://lsd.taxonconcept.org/describe/?url=http://rs.tdwg.org/dwc/terms/%23Oc...
If this did not come through your email system try the bit.ly.
I tried the following that should have given me a google map of all the occurrences but it did not result in the map.
DESCRIBE ?x WHERE { ?x http://www.w3.org/1999/02/22-rdf-syntax-ns#type < http://rs.tdwg.org/dwc/terms/#Occurrence%3E. }
I looked that the RDF and I think I see the problem.
In the RDF
geo:latitude 41.53 </geo:latitude>
geo:longitude -70.67 </geo:longitude>
Should be
geo:lat 41.53 </geo:lat>
geo:long -70.67 </geo:long>
See http://www.w3.org/2003/01/geo/
I did the following query to get a list of all the dwc:taxonConceptID's and have attached them as a .txt file.
select distinct ?o WHERE {?s http://rs.tdwg.org/dwc/terms/#taxonConceptID ?o}
Pretty neat :-)
There are some things that I will get back to Joel on.
Here is where you can manually enter a SPARQL query. Click on "Advanced" for the entry window.
http://lsd.taxonconcept.org/isparql/
Respectfully,
- Pete
On Wed, Jan 12, 2011 at 5:55 PM, joel sachs jsachs@csee.umbc.edu wrote:
Hi Everyone,
I've posted rdf of the bioblitz data. It's at http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf .
Individual occurrences can be retrieved via http://www.cs.umbc.edu/~jsachs/occurrences/%5Boccurrence_id] e.g. http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1835
Individual identifications can be retrieved via http://www.cs.umbc.edu/~jsachs/identifications/%5Bidentification_id] e.g.
http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1835_id_1
The scripts behind this are on the kludgy side, so reports of errors and abnormalities will be warmly welcomed.
Implicit in each of the following notes is the question "Is this a good way to do it?":
- The data is "normalized" w.r.t. identification. "Normalized" is in
quotes because I mean it in the sense that Steve Baskauf was using in his Fall 2010 series of posts. His meaning of the term makes sense to me, but many people (e.g. the OBO folks), take "normalized ontology" to mean "disentangled" (i.e. no multiple inheritance.) As an example, here's an occurrence with two crowdsourced determinations: http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644
- I used sequential integers for observation and identification IDs; in
practice, a mechanism needs to be in place to prevent two people from assigning the same id to their respective identifications.
- My answer to Cam Webb's Question #1 from
http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001720.html is "both". In other words, just as "Joel Sachs" is both me and also my name, so http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1668 is both an occurrence and an occurrence_id, expressed as:
dwc:Occurrence rdf:about=" http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644" dwc:occurrenceID http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644 </dwc:occurrenceID>
<blah blah blah/> </dwc:Occurrence> ---
- I was surprised to see that the Darwin Core Identification class has
no "occurrenceID" or "specimenID" term. How is one supposed to tie an identification to an observation (assuming the identification is not in-lined, of course)? DeVries and Baskauf each mint their own terms for doing this (txn:identificationHasOccurrence, and sernec:basedOnOccurrence, respectively); I used dwc:occurrenceID as if it were a record level term.
- We had scope for multiple taxonConceptID columns in the Fusion table,
and assigned lsids where possible. I also mean to work with Pete to assign GUIDs from taxoncocept.org. In addition, I assigned ethan taxon concept ids, which look like this: http:.//spire.umbc.edu/ethan/Coffea_arabica
In their argument over opaque vs. transparent taxonCoceptIDs, I was sympathetic to both Pete's and Gregor's arguments. Ultimately, if the tooling exists to always display the rdfs:labels every time I'm loooking at a list of opaqueIDs, then transparent IDs are unnecessary. But, for now, it's really helpful to look at an ID and know what it's referring to.
(For species names not in the spire database, the rdf returned by http:.//spire.umbc.edu/ethan/$name is simply an rdfs:seeAlso to http://http://gni.globalnames.org/name_strings?search_term=$name)
- It was easy to assert membership in RDF classes corresponding to
various Cape Cod categories of concern - invasive species, threatenened species, indicators, etc. You can see these classes at http://spire.umbc.edu/ontologies/lists (Information of where these lists come from is included as rdfs:comments. I'll add further documentation, e.g. links to eml files.)
Note that "ThingOfConcern" is defined as the superclass of all the other classes in the collection. The idea here is that people can create their own "ThingOfConcern" class, and then query for observations that are of concern to them. You can see sample sparql queries at http://www.csee.umbc.edu/~jsachs/occurrences/queries/sample.txt
As an aside, I think we, as a community, should come up with a biodiversity benchmark suite of rdf data and corresponding sparql queries, that can be used to test the suitability and scalability of semantic web knowledge bases. I'll take this up in a future post (unless someone beats me to it).
Comments, questions, and better ideas are welcome.
Thanks - Joel.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content