Hilmar,
You're a fan of LOD who sees a number of use cases where deep domain ontologies play a crucial role. So am I. Our differences are irreconcilable!
If we're arguing, it could be because we differ on what those use cases are, and, generally, how to charecterize them. My sense, in regards the ontologies recommended by the GBIF KOS report:
Darwin Core: As I've been arguing, I would't get too carried away here.
SDD: This is a great example of a good match for description logics. SDD expressed as OWL2-DL *could* be a path towards robust polyclave keys, which (I believe) have long been a goal not just for citizen science, but for field identification in general. I stress "could" above, because I'm a little surprised it hasn't happened yet. There was movement in that direction at least as far back as 2005 [http://dcpapers.dublincore.org/ojs/pubs/article/viewFile/808/804], and I heard the idea discussed back at the Montpellier VoCamp. I don't know if lack of progress here is because of lack of funding, or because the problem is a lot harder that it at first appears. I'd love to see a concerted effort in this direction, starting modestly, focusing on a small taxonomic group for which there is already a lot of SDD instance data. (This would, IMHO, make a strong funding proposal.)
Taxonomic treatments: I don't know a lot about this, but, as I previously indicated, I think that ontolgies for the artifacts of human behaviour should be less constrained than ontologies for the natural world.
SPM: We're mostly talking here about the resources that humans want and expect in a species description. This should be straightforward.
Moving beyond the report to charcterizing the ontological needs of general use cases:
DATA INTEGRATION: Recourse to upper level ontologies for data integration have so far proved to be of limited utility. Can anyone point me to examples of this approach working for anything other than contrived examples, or narrow domain areas? Maybe OBOE will be the first to succeed.
DISCOVERING NEW KNOWLEDGE (as envisioned by Einstein and the Queen in last year's KR classic, http://www.xtranormal.com/watch/7471601/): This is one of the most potentially exciting areas of the semantic web, and has been for ten years. Consider two approaches to answering the query "Find occurrences of invasive species."
i. In the ETHAN ontology, we define a taxon as invasive by asserting it to be a subClass of a class of invaders, like the class "GISDThing". So querying for occurrences of invaders simply involves looking for occurrences, and then doing subsumption reasoning over the Invasives ontology and branches of the Tree of Life.
ii. What if, instead, we defined an invader as any species which has a definite tendency to expand its range into areas where it is unwanted (Thorpe's definition)? Could we still answer the query "Find occurrences of invasive species."? To do so with this definition would, potentially, involve the discovery of a new scientific fact, the discovery that a species, previously thought benign, is, in fact, invasive. Is there the prospect of being able to do this? Maybe. It would take a lot of work (and would be another good funding proposal).
(I do realize that the line between data integration and discovering new knowledge is blurry. If you can integrate the data, you can apply exploratory data mining techniques to discover new knowledge. So by dscovering new knowledge via ontologies, I (like Einstien and the Queen) mean that it's the OWL reasoner itself that's making the discoveries.)
Before responding to a couple of your specific comments, I want to stress for anyone following that there is (or should be) no tension between LOD and ontologies. LOD is simply the RESTful way to do the semantic web, and is the current semantic web best practice. So whether our semantic web rests on fancy ontologies or simple ones, we all (I think) agree that the ontologies and instance data should be published according to best practices.
Further comments below ...
On Mon, 21 Feb 2011, Hilmar Lapp wrote:
Joel:
On Feb 21, 2011, at 4:51 PM, joel sachs wrote:
most ontologies don't have users. I'll check Swoogle for some statistics to back that up, but does anyone really dispute it?
I'm not sure that's a useful statement by itself. It is akin to saying that most software source code doesn't have users, and therefore the way we think about software is flawed.
True. I apologize for trying to pull a fast one. (Although the way most people think about software *is* flawed.)
So, of course if you count any ontology that has ever been started by anyone, the majority of those will likely not have users. That doesn't mean at all that that is necessarily also so for each and every community of practice. Most of the ontologies in the OBO Foundry/Library do have users, and publications arising from that.
And what does that then mean for TDWG / Biodiversity ontologies, if you mean to say that most of those do not have users? I don't claim to know, but I think it does go to suggest 3 things: 1) Ontologies created by a narrow (not the same as small) group of people and intended to be used by many will likely end up not getting used at all. 2) To get domain scientists engaged in ontology development at breadth, training and community are not dispensable. 3) Ontology building is time consuming, and merely talking about ontologies, or developing ontologies for the sake of having developed ontologies, doesn't justify anyone's time investment. But using them to demonstrate biological discovery does.
I agree with all the above. The only point I would add (which is how this conversation got started) is 4) Ontologies that are developed without generating significant amounts of instance data as part of the development spiral start life with two strikes against them.
I"m a big fan of LOD, in particular *because* it does not require full-blown ontologies for entry. I'm hugely in favor of de-siloing data, and LOD has much promise in this regard by applying the ultimate normalization. But we should also not fool ourselves into believing that somehow normalizing all data into triple form will let us discover new knowledge. I have yet to see the paper that reports a scientific discovery from a flat vocabulary LOD-style RDF integration that you couldn't have achieved in a fraction of the time by cobbling together a database schema and some massaging scripts.
You can always cobble something together if you happen to know where the data is, and have easy access to it. LOD exposes data.
Have you seen any papers that report a scientific discovery from fancy ontologies *on the semantic web* ?. The Washington paper, which we both think is a good example of ontologies at work, doesn't mention rdf, owl, or the semantic web. Ontologies predate the semantic web, and live fine without it. Many of us are working at migrating Washington's approach onto the semantic web. Time will tell if description logics distribute well.
Joel.
-hilmar
--
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :