Thanks for the information, Pete. I was very interested to try doing a SPARQL query using the urnburner interface (as I have already confessed to lack of experience with that). One thing I was curious about was how OpenLink/uriburner knew what metadata to run the query on. I was going to redo the process to see if there was an option to point to some particular triple store, but the site seems to be down at the moment. Or do they run the query on data that they have "discovered" through links on the cloud or on data that people have asked them to scrape/crawl?
As cool as the SPARQL querying thing is, I still think that I have a general issue with the approach that you are suggesting, i.e. that each "species concept" has a set of classes defined as "partOf" the general species concept class for that species. For the sake of argument, let's say that you manage to describe a species concept for each of the approximately 1.7 million described species. That means that you will have 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Image classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Occurrence classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Indivdual classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#NCBI_Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#OriginalDescription classes, and 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Population classes in addition to the 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Species classes that describe the species concept itself. That is a total of 13.6 million separate classes in your model that are needed to describe biodiversity records of life on earth. In contrast, we defined or imported a total of seven classes to do the same thing in Darwin-SW (not counting foaf:Person which is somewhat tangential to the ontology) and those seven classes should be capable of describing biodiversity records of life on earth. My point here is that the structure of the taxonconcept.org ontology seems to be designed around making queries easy (by creating a class for anything that somebody may want to ask about), but not around describing classes that reflect the structure of databases that people in the TDWG community are likely to use. In contrast, simple queries would (it seems to me) be difficult to construct based on Darwin-SW, but it would be relatively easy to adopt the class structure to the primary types of things that people keep track of in databases (even "flattened" databases that only explicitly recognize fewer than the seven classes in Darwin-SW). So it's a trade-off, but it seems like it would be more productive to put the burden on the few software developers (i.e. people who would be creating clients that could search RDF databases/triple stores) than on the many data providers.
I also still do not see how you get around the problem that I mentioned in my May 1 email (http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002385.html). In a nutshell, let's say that a tree in an arboretum has its HTTP URI GUID on a label nailed to its trunk. If I take a picture of that tree (recording evidence of an Occurrence) and assign that tree to a Taxon through an Identification, and somebody else collects a specimen from that tree and assigns that same tree to a different Taxon through their own Identification, how could a query on a txn: species occurrence tag ever show me both the occurrence record associated with the image and the one associated with the specimen? I am going to query for
describe ?s where { ?s a <http://lod.taxonconcept.org/ses/[myTaxon]#Occurrence> }
which will pick up the occurrence documented by my image, but it would not pick up the occurrence documented by the specimen, which would require the search
describe ?s where { ?s a <http://lod.taxonconcept.org/ses/[theOtherPersonsTaxon]#Occurrence> }
In other words, the approach that you are suggesting requires me to know in advance what other Identifications somebody else may apply to the tree and either:
type my occurrence record with those other taxa tags or
know to run a separate query for each of those taxa
Either of these involves mind-reading on my part. This is different than the way one would find this out using Darwin-SW. In Darwin-SW, one would first query for Identifications that specified [myTaxon] and then find the dsw:Individuals associated with those Identifications. Then one would look for all of the dwc:Occurrences that were associated with the dsw:Individuals. The fact that somebody else assigned the tree to a different taxon is irrelevant to me finding the occurrences of the tree. This is messy and I don't see how you could do it with SPARQL, but I don't think it would require complex programming to write software that could do it. Since the taxonconcept.org ontology also has properties to relate occurrences to individuals and individuals to identifications and taxa, one could do the same kind of complex search. But that leaves me wondering what purpose the "lightweight tags" have if they can't be used reliably to search for all of the metadata that others have put out on the cloud. They allow me to find out about things that I already know but restrict my ability to discover unknown things.
Steve
Peter DeVries wrote:Hi Steve,
I try to take some time to think about your notes, sorry for the delay.
There are many different contexts that can be used when thinking about species and related data.
It is often useful to separate these contexts into different kinds of related entities.
Here are some contexts that I think are useful to separate
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classificationshttp://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available cladehttp://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
* Note that in this model a species can have several Taxonomies or classifications. This reflects the reality that the same species has one hierarchy in NCBI and a different one in CoL.
You can find all the tagged images of the Cougar by finding all those that are of the type <http://lod.taxonconcept.org/ses/mCcSp#Image>
Here is one example of an image that is tagged in this way. (From http://lod.taxonconcept.org/ses/v6n7p.html )
<foaf:Image rdf:about="http://assets.taxonconcept.org/seuuids/603bebac-cc44-4168-bbf7-b11b976f9d79/Puma_concolor_480x320.jpg"><rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Image"/><dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/><dcterms:source rdf:resource="http://commons.wikimedia.org/wiki/File:Mountain_lion.jpg"/><dcterms:contributor>United States Department of Agriculture</dcterms:contributor><cc:license rdf:resource="http://creativecommons.org/publicdomain/"/><wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/></foaf:Image>
You are correct in noting that an occurrence of a species could simply be typed in a similar way, and maybe that would be better than the somewhat awkward.
txn:occurrenceHasSpeciesOccurrenceTag
I originally went with this name because I wanted it to be clear that the subject and objects should be.
If we use this data set as and example http://ocs.taxonconcept.org/ocs/index.html (Mainly TDWG BioBlitz 2010)
We can demonstrate how this is useful for SPARQL Queries.
We can run a SPARQL describe query for all the observations of the Honey Bee with this query.
PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#>
describe ?s where { ?s txn:occurrenceHasSpeciesOccurrenceTag <http://lod.taxonconcept.org/ses/z9oqP#Occurrence> }
* It might be simpler to mark these observations up as having a type of <http://lod.taxonconcept.org/ses/z9oqP#Occurrence>.
In this case the query would look like this. (You can use "a" as a short cut meaning (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#>
describe ?s where { ?s a <http://lod.taxonconcept.org/ses/z9oqP#Occurrence> }* I would need to redo the occurrence record RDF for this new query to work
We can take that original query above and paste into the LOD SPARQL Endpoint http://uriburner.com/isparql/ (Advanced Tab)
Run the query
This link will run the query - will probably not go through all email system intact. See bit.ly link below.
Bit.ly version http://bit.ly/lM6vWB
and get a esult (Not very pretty, or interpretable by humans)
We can select make "Make Pivot" from the top left corner of the Window.
This will run the query and feed the data to MS Pivot which parses and displays the result.
In theory, and I hope in the future, there will be an open source solution that does this as easily and does not require MS Silverlight.
The result is a Browsable Pivot View which you can select to view the result by Observer, Location etc.
This bit.ly will take you to a view by observer (the person who made the observation) http://bit.ly/lacRb1This biit.ly will take you to a view by dwcArea http://t.co/eu55BaG
I have bundled all these examples including screenshots into one bit.ly bundle so you won't need Sliverlight to get an idea on how this works.
http://bit.ly/iXg2y8 <- Link to Bit.ly bundle with screen shots etc.
I have included closeups of the Pivot settings in the top right corner so you can see how to change the attribute that Pivot uses to create the view.
Note also that if you go to the Knowledge Base View of the Honey Bee you can browse to the observations of that species.
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Species Bit.ly Link http://bit.ly/g1zzJC
Since I have updated to the latest version of Virtuoso the strange URI links have been replaced with Human readable text from the label view for that entity.
This includes the links to occurrences, gni names strings, and links to GeoNames.
Part of the reasoning behind this structure is to make explicit to computers what context we are talking about.
The human brain makes these context switches automatically but computers do not.
That said there are areas where they could be improved or simplified.
Also I think that you will need a class for each species concept, but they are all instances of txn:SpeciesConcept - something allowed in OWL2.
My ontology has probably changed slightly since you last saw it.
Respectfully,
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence". In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (Boloria selene). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is Boloria selene . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species Bororia selene. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for Bororia selene (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fv6n7p%23Species Knowledge Base View (http://bit.ly bit.ly/gMFqR1The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougarhttp://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classificationshttp://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available cladehttp://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF.This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual"><dcterms:title>A Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title><skos:prefLabel>A Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel><dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier><dcterms:description>A lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description><dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/><wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/></txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF.This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population"><dcterms:title>A Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title><skos:prefLabel>A Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel><dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier><dcterms:description>A lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description><dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/><wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/></txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism.It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation"><rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population"/><skos:prefLabel>The population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel><dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individual"/><wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf"/></owl:Class>
Respectfully,
- Pete-------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept & GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data Project
---------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept & GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data Project
--------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu