Re: [tdwg-content] More schema-last (was Monkey Business)

25 Feb 2011

      Hilmar,

You're a fan of LOD who sees a number of use cases where deep domain 
ontologies play a crucial role. So am I. Our differences are irreconcilable!

If we're arguing, it could be because we differ on what those use cases 
are, and, generally, how to charecterize them. My sense, in regards the 
ontologies recommended by the GBIF KOS report:

Darwin Core: As I've been arguing, I would't get too carried away here.

SDD: This is a great example of a good match for description 
logics. SDD expressed as OWL2-DL *could* be a path towards robust 
polyclave keys, which (I believe) have long been a goal not just for 
citizen science, but for field identification in general. I stress "could" 
above, because I'm a little surprised it hasn't happened yet. There was 
movement in that direction at least as far back as 2005 
[http://dcpapers.dublincore.org/ojs/pubs/article/viewFile/808/804], and I 
heard the idea discussed back at the Montpellier VoCamp. I don't 
know if lack of progress here is because of lack of funding, or because 
the problem is a lot harder that it at first appears. I'd love to see a 
concerted effort in this direction,  starting modestly, focusing on a 
small taxonomic group for which there is already a lot of SDD instance 
data. (This would, IMHO, make a strong funding proposal.)

Taxonomic treatments: I don't know a lot about this, but, as I previously 
indicated, I think that ontolgies for the artifacts of human behaviour 
should be less constrained than ontologies for the natural world.

SPM: We're mostly talking here about the resources that humans want and 
expect in a species description. This should be straightforward.

Moving beyond the report to charcterizing the ontological needs of general 
use cases:

DATA INTEGRATION: Recourse to upper level ontologies for data 
integration have so far proved to be of limited utility. Can anyone point 
me to examples of this approach working for anything other than contrived 
examples, or narrow domain areas? Maybe OBOE will be the first to 
succeed.

DISCOVERING NEW KNOWLEDGE (as envisioned by Einstein and the Queen in last 
year's KR classic, http://www.xtranormal.com/watch/7471601/): This is one 
of the most potentially exciting areas of the semantic web, and has been 
for ten years. Consider two approaches to answering the query "Find 
occurrences of invasive species."

i. In the ETHAN ontology, we define a taxon as invasive by asserting it to 
be a subClass of a class of invaders, like the class "GISDThing". So 
querying for occurrences of invaders simply involves looking for 
occurrences, and then doing subsumption reasoning over the Invasives 
ontology and branches of the Tree of Life.

ii. What if, instead, we defined an invader as any species which has a 
definite tendency to expand its range into areas where it is unwanted 
(Thorpe's definition)? Could we still answer the query "Find occurrences 
of invasive species."? To do so with this definition would, potentially, 
involve the discovery of a new scientific fact, the discovery that a 
species, previously thought benign, is, in fact, invasive.  Is there the 
prospect of being able to do this? Maybe. It would take a lot of 
work (and would be another good funding proposal).

(I do realize that the line between data integration and discovering new 
knowledge is blurry. If you can integrate the data, you can apply 
exploratory data mining techniques to discover new knowledge. So by 
dscovering new knowledge via ontologies, I (like Einstien and the Queen) 
mean that it's the OWL reasoner itself that's making the discoveries.)

Before responding to a couple of your specific comments, I want to stress 
for anyone following that there is (or should be) no tension between LOD 
and ontologies. LOD is simply the RESTful way to do the semantic web, and 
is the current semantic web best practice. So whether our semantic web 
rests on fancy ontologies or simple ones, we all (I think) agree that the 
ontologies and instance data should be published according to best 
practices.

Further comments below ...

On Mon, 21 Feb 2011, Hilmar Lapp wrote:
...
Joel:
On Feb 21, 2011, at 4:51 PM, joel sachs wrote:
...
most ontologies don't have users. I'll check Swoogle for some statistics to 
back that up, but does anyone really dispute it?
I'm not sure that's a useful statement by itself. It is akin to saying that 
most software source code doesn't have users, and therefore the way we think 
about software is flawed.
True. I apologize for trying to pull a fast one. (Although the way most 
people think about software *is* flawed.)
...
So, of course if you count any ontology that has ever been started by anyone, 
the majority of those will likely not have users. That doesn't mean at all 
that that is necessarily also so for each and every community of practice. 
Most of the ontologies in the OBO Foundry/Library do have users, and 
publications arising from that.
And what does that then mean for TDWG / Biodiversity ontologies, if you mean 
to say that most of those do not have users? I don't claim to know, but I 
think it does go to suggest 3 things: 1) Ontologies created by a narrow (not 
the same as small) group of people and intended to be used by many will 
likely end up not getting used at all. 2) To get domain scientists engaged in 
ontology development at breadth, training and community are not dispensable. 
3) Ontology building is time consuming, and merely talking about ontologies, 
or developing ontologies for the sake of having developed ontologies, doesn't 
justify anyone's time investment. But using them to demonstrate biological 
discovery does.
I agree with all the above. The only point I would add (which is how this 
conversation got started) is 4) Ontologies that are developed without 
generating significant amounts of instance data as part of the development 
spiral start life with two strikes against them.
...
I"m a big fan of LOD, in particular *because* it does not require full-blown 
ontologies for entry. I'm hugely in favor of de-siloing data, and LOD has 
much promise in this regard by applying the ultimate normalization. But we 
should also not fool ourselves into believing that somehow normalizing all 
data into triple form will let us discover new knowledge. I have yet to see 
the paper that reports a scientific discovery from a flat vocabulary 
LOD-style RDF integration that you couldn't have achieved in a fraction of 
the time by cobbling together a database schema and some massaging scripts.
You can always cobble something together if you happen to know where
the data is, and have easy access to it.  LOD exposes data.

Have you seen any papers that report a scientific discovery 
from fancy ontologies *on the semantic web* ?. The Washington paper, which 
we both think is a good example of ontologies at work, doesn't mention 
rdf, owl, or the semantic web. Ontologies predate the 
semantic web, and live fine without it. Many of us are working at 
migrating Washington's approach onto the semantic web. Time will tell if 
description logics distribute well.

Joel.
...
-hilmar
-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================