[tdwg-content] Bioblitz as rdf: does this make sense?

joel sachs jsachs at csee.umbc.edu
Thu Jan 13 13:53:45 CET 2011


Pete,
Thanks - I corrected the geo properties.
Joel.



On Wed, 12 Jan 2011, Peter DeVries wrote:

> Hi Joel,
>
> Cool :-)
>
> I just loaded this into my SPARQL endpoint.
>
> In the named graph urn:org:linkedopenspeciesdata:dataspace:tdwg2010bioblitz
>
> It consists of 19,990 Triples
>
> Here is one of the dwc:taxonConceptID entries.
>
> *About: http://spire.umbc.edu/ethan/Ampelopsis_brevipedunculata*
> http://lsd.taxonconcept.org/describe/?url=http://spire.umbc.edu/ethan/Ampelopsis_brevipedunculata
>
> *About: http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1627*
> <http://lsd.taxonconcept.org/describe/?url=http://spire.umbc.edu/ethan/Ampelopsis_brevipedunculata>
> http://lsd.taxonconcept.org/describe/?url=http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1627
>
>
> <http://lsd.taxonconcept.org/describe/?url=http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1627>This
> should give you an a count of occurrences.
>
> SELECT count(*) WHERE {?s a <http://rs.tdwg.org/dwc/terms/#Occurrence>};
>
> = 1882
>
> SELECT count(*) WHERE {?s a <http://rs.tdwg.org/dwc/terms/#taxonConceptID>};
>
> This should give you a list of occurrences
>
> http://lsd.taxonconcept.org/describe/?url=http://rs.tdwg.org/dwc/terms/%23Occurrence
>
> If this did not come through your email system try the bit.ly.
>
> http://bit.ly/g9BcoL
>
> I tried the following that should have given me a google map of all the
> occurrences but it did not result in the map.
>
> DESCRIBE ?x WHERE {
>  ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
> http://rs.tdwg.org/dwc/terms/#Occurrence>.
> }
>
> I looked that the RDF and I think I see the problem.
>
> In the RDF
>
> <geo:latitude>
> 41.53
> </geo:latitude>
>
> <geo:longitude>
> -70.67
> </geo:longitude>
>
> Should be
>
> <geo:lat>
> 41.53
> </geo:lat>
>
> <geo:long>
> -70.67
> </geo:long>
>
> See http://www.w3.org/2003/01/geo/
>
> I did the following query to get a list of all the dwc:taxonConceptID's and
> have attached them as a .txt file.
>
> select distinct ?o WHERE {?s <http://rs.tdwg.org/dwc/terms/#taxonConceptID>
> ?o}
>
> Pretty neat :-)
>
> There are some things that I will get back to Joel on.
>
> Here is where you can manually enter a SPARQL query. Click on "Advanced" for
> the entry window.
>
> http://lsd.taxonconcept.org/isparql/
>
> Respectfully,
>
> - Pete
>
>
>
> On Wed, Jan 12, 2011 at 5:55 PM, joel sachs <jsachs at csee.umbc.edu> wrote:
>
>> Hi Everyone,
>>
>> I've posted rdf of the bioblitz data. It's at
>> http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf .
>>
>> Individual occurrences can be retrieved via
>> http://www.cs.umbc.edu/~jsachs/occurrences/[occurrence_id]
>> e.g. http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1835
>>
>> Individual identifications can be retrieved via
>> http://www.cs.umbc.edu/~jsachs/identifications/[identification_id]
>> e.g.
>> http://www.cs.umbc.edu/~jsachs/identifications/tdwg2010bioblitz_1835_id_1
>>
>> The scripts behind this are on the kludgy side, so reports of errors and
>> abnormalities will be warmly welcomed.
>>
>> Implicit in each of the following notes is the question "Is this a good
>> way to do it?":
>>
>> 1. The data is "normalized" w.r.t. identification. "Normalized" is in
>> quotes because I mean it in the sense that Steve Baskauf was using in his
>> Fall 2010 series of posts. His meaning of the term makes sense to me, but
>> many people (e.g. the OBO folks), take "normalized ontology" to mean
>> "disentangled" (i.e. no multiple inheritance.)
>> As an example, here's an occurrence with two crowdsourced determinations:
>> http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644
>>
>> 2. I used sequential integers for observation and identification IDs; in
>> practice, a mechanism needs to be in place to prevent two people from
>> assigning the same id to their respective identifications.
>>
>> 3. My answer to Cam Webb's Question #1 from
>> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001720.html
>> is "both". In other words, just as "Joel Sachs" is both me and also my
>> name, so
>> http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1668 is both
>> an occurrence and an occurrence_id, expressed as:
>> ---
>> <dwc:Occurrence
>> rdf:about="
>> http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644">
>> <dwc:occurrenceID>
>> http://www.cs.umbc.edu/~jsachs/occurrences/tdwg2010bioblitz_1644
>> </dwc:occurrenceID>
>> <blah blah blah/>
>> </dwc:Occurrence>
>> ---
>>
>> 4. I was surprised to see that the Darwin Core Identification class has no
>> "occurrenceID" or "specimenID" term. How is one supposed to tie an
>> identification to an observation (assuming the identification is not
>> in-lined, of course)? DeVries and Baskauf each mint their own terms for
>> doing this (txn:identificationHasOccurrence, and sernec:basedOnOccurrence,
>> respectively); I used dwc:occurrenceID as if it were a record level term.
>>
>> 5. We had scope for multiple taxonConceptID columns in the Fusion table,
>> and assigned lsids where possible. I also mean to work with Pete to assign
>> GUIDs from taxoncocept.org. In addition, I assigned ethan taxon concept
>> ids, which look like this:
>> http:.//spire.umbc.edu/ethan/Coffea_arabica
>>
>> In their argument over opaque vs. transparent taxonCoceptIDs, I was
>> sympathetic to both Pete's and Gregor's arguments. Ultimately, if the
>> tooling exists to always display the rdfs:labels every time I'm loooking
>> at a list of opaqueIDs, then transparent IDs are unnecessary. But, for
>> now, it's really helpful to look at an ID and know what it's referring to.
>>
>> (For species names not in the spire database, the rdf returned by
>> http:.//spire.umbc.edu/ethan/$name
>> is simply an rdfs:seeAlso to
>> http://http://gni.globalnames.org/name_strings?search_term=$name)
>>
>> 6. It was easy to assert membership in RDF classes corresponding to
>> various Cape Cod categories of concern - invasive species, threatenened
>> species, indicators, etc. You can see these classes at
>> http://spire.umbc.edu/ontologies/lists (Information of where these lists
>> come from is included as rdfs:comments. I'll add further documentation,
>> e.g. links to eml files.)
>>
>> Note that "ThingOfConcern" is defined as the superclass of all the other
>> classes in the collection. The idea here is that people can create their
>> own "ThingOfConcern" class, and then query for observations that are of
>> concern to them. You can see sample sparql queries at
>> http://www.csee.umbc.edu/~jsachs/occurrences/queries/sample.txt
>>
>>
>> As an aside, I think we, as a community, should come up with a
>> biodiversity benchmark suite of rdf data and corresponding sparql queries,
>> that can be
>> used to test the suitability and scalability of semantic web knowledge
>> bases. I'll take this up in a future post (unless someone beats me to it).
>>
>> Comments, questions, and better ideas are welcome.
>>
>> Thanks -
>> Joel.
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>
>
>
> -- 
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
> Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
>


More information about the tdwg-content mailing list