[tdwg-content] More schema-last (was Monkey Business)
joel sachs
jsachs at csee.umbc.edu
Fri Feb 25 13:59:05 CET 2011
Hilmar,
You're a fan of LOD who sees a number of use cases where deep domain
ontologies play a crucial role. So am I. Our differences are irreconcilable!
If we're arguing, it could be because we differ on what those use cases
are, and, generally, how to charecterize them. My sense, in regards the
ontologies recommended by the GBIF KOS report:
Darwin Core: As I've been arguing, I would't get too carried away here.
SDD: This is a great example of a good match for description
logics. SDD expressed as OWL2-DL *could* be a path towards robust
polyclave keys, which (I believe) have long been a goal not just for
citizen science, but for field identification in general. I stress "could"
above, because I'm a little surprised it hasn't happened yet. There was
movement in that direction at least as far back as 2005
[http://dcpapers.dublincore.org/ojs/pubs/article/viewFile/808/804], and I
heard the idea discussed back at the Montpellier VoCamp. I don't
know if lack of progress here is because of lack of funding, or because
the problem is a lot harder that it at first appears. I'd love to see a
concerted effort in this direction, starting modestly, focusing on a
small taxonomic group for which there is already a lot of SDD instance
data. (This would, IMHO, make a strong funding proposal.)
Taxonomic treatments: I don't know a lot about this, but, as I previously
indicated, I think that ontolgies for the artifacts of human behaviour
should be less constrained than ontologies for the natural world.
SPM: We're mostly talking here about the resources that humans want and
expect in a species description. This should be straightforward.
Moving beyond the report to charcterizing the ontological needs of general
use cases:
DATA INTEGRATION: Recourse to upper level ontologies for data
integration have so far proved to be of limited utility. Can anyone point
me to examples of this approach working for anything other than contrived
examples, or narrow domain areas? Maybe OBOE will be the first to
succeed.
DISCOVERING NEW KNOWLEDGE (as envisioned by Einstein and the Queen in last
year's KR classic, http://www.xtranormal.com/watch/7471601/): This is one
of the most potentially exciting areas of the semantic web, and has been
for ten years. Consider two approaches to answering the query "Find
occurrences of invasive species."
i. In the ETHAN ontology, we define a taxon as invasive by asserting it to
be a subClass of a class of invaders, like the class "GISDThing". So
querying for occurrences of invaders simply involves looking for
occurrences, and then doing subsumption reasoning over the Invasives
ontology and branches of the Tree of Life.
ii. What if, instead, we defined an invader as any species which has a
definite tendency to expand its range into areas where it is unwanted
(Thorpe's definition)? Could we still answer the query "Find occurrences
of invasive species."? To do so with this definition would, potentially,
involve the discovery of a new scientific fact, the discovery that a
species, previously thought benign, is, in fact, invasive. Is there the
prospect of being able to do this? Maybe. It would take a lot of
work (and would be another good funding proposal).
(I do realize that the line between data integration and discovering new
knowledge is blurry. If you can integrate the data, you can apply
exploratory data mining techniques to discover new knowledge. So by
dscovering new knowledge via ontologies, I (like Einstien and the Queen)
mean that it's the OWL reasoner itself that's making the discoveries.)
Before responding to a couple of your specific comments, I want to stress
for anyone following that there is (or should be) no tension between LOD
and ontologies. LOD is simply the RESTful way to do the semantic web, and
is the current semantic web best practice. So whether our semantic web
rests on fancy ontologies or simple ones, we all (I think) agree that the
ontologies and instance data should be published according to best
practices.
Further comments below ...
On Mon, 21 Feb 2011, Hilmar Lapp wrote:
> Joel:
>
> On Feb 21, 2011, at 4:51 PM, joel sachs wrote:
>
>> most ontologies don't have users. I'll check Swoogle for some statistics to
>> back that up, but does anyone really dispute it?
>
>
> I'm not sure that's a useful statement by itself. It is akin to saying that
> most software source code doesn't have users, and therefore the way we think
> about software is flawed.
True. I apologize for trying to pull a fast one. (Although the way most
people think about software *is* flawed.)
> So, of course if you count any ontology that has ever been started by anyone,
> the majority of those will likely not have users. That doesn't mean at all
> that that is necessarily also so for each and every community of practice.
> Most of the ontologies in the OBO Foundry/Library do have users, and
> publications arising from that.
>
> And what does that then mean for TDWG / Biodiversity ontologies, if you mean
> to say that most of those do not have users? I don't claim to know, but I
> think it does go to suggest 3 things: 1) Ontologies created by a narrow (not
> the same as small) group of people and intended to be used by many will
> likely end up not getting used at all. 2) To get domain scientists engaged in
> ontology development at breadth, training and community are not dispensable.
> 3) Ontology building is time consuming, and merely talking about ontologies,
> or developing ontologies for the sake of having developed ontologies, doesn't
> justify anyone's time investment. But using them to demonstrate biological
> discovery does.
I agree with all the above. The only point I would add (which is how this
conversation got started) is 4) Ontologies that are developed without
generating significant amounts of instance data as part of the development
spiral start life with two strikes against them.
>
> I"m a big fan of LOD, in particular *because* it does not require full-blown
> ontologies for entry. I'm hugely in favor of de-siloing data, and LOD has
> much promise in this regard by applying the ultimate normalization. But we
> should also not fool ourselves into believing that somehow normalizing all
> data into triple form will let us discover new knowledge. I have yet to see
> the paper that reports a scientific discovery from a flat vocabulary
> LOD-style RDF integration that you couldn't have achieved in a fraction of
> the time by cobbling together a database schema and some massaging scripts.
>
You can always cobble something together if you happen to know where
the data is, and have easy access to it. LOD exposes data.
Have you seen any papers that report a scientific discovery
from fancy ontologies *on the semantic web* ?. The Washington paper, which
we both think is a good example of ontologies at work, doesn't mention
rdf, owl, or the semantic web. Ontologies predate the
semantic web, and live fine without it. Many of us are working at
migrating Washington's approach onto the semantic web. Time will tell if
description logics distribute well.
Joel.
> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
> ===========================================================
>
>
More information about the tdwg-content
mailing list