Hilmar,
Schema-last, to me, is an attitude of holding back (sometimes forever) before i) restricting the vocabulary available to users; and/or ii) defining a semantics that draws inferences way beyond a user's assertions.
I think this attitude can apply not only to the terms of an ontology, but to the general shape and style of the ontology, and I am concerned about GBIF/TDWG assuming that its ontologies should be DL in flavour. By DL, I mean more than whether an ontology is technically within the OWL-DL profile. I mean the general approach of building classifiers, which, traditionally, has been the goal of description logics. So, by DL in flavour, I mean making heavy use of domain and range restrictions, functional and inverseFunctional properties, class definition via property restriction, etc. This DL-based approach seems to be working in genomics.
Will it work in biodiversity informatics? One cause for concern is that the current Darwin Core, which is simple, is widely misunderstood and intimidates many. It is possible that the problem will be solved with tighter restriction and more formalisms. But I'm skeptical.
Even if we are able, through the laborious process of doing things a certain way, to build classifiers for biodiversity informatics artifacts (occurrence records, evidence, identifications, etc.) in ths same way that we can build them for actual objects of biology (genes, taxa, etc.), why would we want to? The natural world comes without labels, so it's helpful to be able to synthesize everything that we know about something to determine what it is. But human-made information artifacts are typically labeled, or have their types implied by context.
I'm currently arguing with someone off-list about what I think is my minimal example, that I hope that everyone can agree on. It's about domain constraints on "hasIdentification". If I say
"http://fu.bar hasIdentifcation rabbit",
should we, as a community, interpret that to mean that http://fu.bar is an individulOrganism (as opposed to, say, a picture)? Must I, as a guy who likes to make assertions, be told either
a. that I need additional vocabulary terms: pictureHasIdentification, occurenceHasIdentification, individualHasIdentification, etc. or b. that I need to limit hasIdentification to describing a single type of thing.
If you can convince me of either (a) or (b) above, then I'll be inclined to accept your entire vision for the semantic web.
A few more comments, in-line, below ...
On Thu, 17 Feb 2011, Hilmar Lapp wrote:
On Feb 17, 2011, at 3:23 PM, Shawn Bowers wrote:
Both OBOE and EQ do introduce classes that prescribe how to structure new classes and type individuals
That's actually not quite true. The EQ model itself doesn't prescribe any new classes or the types that individuals must be of; instead it simply says that a phenotype instance can be expressed as some instance of a quality Q that inheres_in some instance of an entity E, and thus a class of phenotypes (or observations of an organism's characteristics) is the intersection of all instances of Q (a subclass restriction), and all things that inhere_in E (a property restriction).
While typically we will draw Q and E from certain ontologies (such as PATO for qualities), you can designate any class (term) in those places, and the class expression by itself will not support inferences about the nature of Q or E or their instances (the ontologies that Q and E are drawn from do that). The class expression itself is often anonymous, but there are (so-called "pre-composed") ontologies that identify and label them.
That being said, while EQ in principle allows you to do real crazy things if you want to (which perhaps is what Joel means by schema-last?), if you want to be able to do discovery and reasoning with a set of EQ class expressions from different sources, they will need to follow some shared conventions, such as not simply making up quality and entity terms as needed, but drawing them from PATO and shared entity ontologies.
Conversely, OBOE does prescribe the nature of the things that it relates to each other in the model, the cardinality of those relationships, and what it means for an instance it is has such a relationship. For example, if I assert o oboe:ofEntity e, the semantics of oboe:ofEntity prescribe that o is an instance of oboe:Observation, e is an instance of oboe:Entity, and if I also assert o oboe:ofEntity e1, it prescribes that e and e1 are identical, i.e., the same instance.
I think these differences are a result of how they were motivated, and it is interesting to me that Joel would pick these as examples for illustrating "schema-lastishness".
An example of why I see EQ being more schema-last than OBOE is the question you recently forwarded to the Observations list: How do you represent "petiole 5x longer than wide"?
In EQ, you could say something like: <5:1 length to width ratio> <inheres_in> <petiole> and then wait for some more examples of ratios to come in, before deciding how to update your Quality ontology to handle ratios. In OBOE (please correct me if I'm wrong), it seems (to me) that you need to make more of an ontological commitment to express the same thing.
(Also, could you please direct me to sources of OBOE instance data? A quick search of TDWG-Observation, SONet, Google, and Swoogle only turned up the ontolgy itself, and a few examples of the "how do you do this in OBOE" variety.)
OBOE was motivated by having a unified data model for observational data, in the interest of better data exchange and integration. I think all its class and property constraints are a reflection of that - there is a desire not to "allow anything". Conversely, EQ wouldn't make for a good model in which to exchange arbitrary observational data - there would be no guarantees for what you get. However, it is very powerful for reasoning over the semantics of the observations (see the Washington et al 2009 paper), which is what it was conceived for.
I like the Washington paper a lot. One thing it illustrates to me is the power that comes from the judicious use of an appropriate domain ontology with witch to value simple attributes. One of the most important recommemdations in the KOS report, IMO, is the one I quoted to Pete: "Promote widespread adoption of URI-based standard values for key Darwin Core attribute values." Constructing appropriate ontologies for these values strikes me as a much better way to bring DwC on to the semantic web than recrafting DwC as an OWL ontology. (I'm not opposed to the latter, which may serve a data validation need, but I don't think its necessary for typical data integration use cases.)
On Thu, Feb 17, 2011 at 11:28 AM, joel sachs jsachs@csee.umbc.edu wrote:
Do you (or does anyone else on the list) know the status of OBD? From the NCBO FAQ:
Funny you should ask. We're in the final stages of writing up a manuscript about it. I can share a preprint with you next week. OBD is what is underpinning the Phenoscape Knowledgebase (http://kb.phenoscape.org).
The URL is http://www.berkeleybop.org/obd/. It is still pretty outdated, but will be updated very soon.
Is it still the plan to integrate OBD into BioPortal?
I don't think so. And there are lots of resources working on that (at least in the biomedical domain), so it'd be hard for them to pick what to follow.
So in the OBOE case, the characteristics (color, perimeter texture, basic shape) are given a priori, while in the EQ case they would (presumably) be abstracted during subsequent ontology development.
Yes. They are implied by the subclass structure of PATO (and thus subject to change).
it might be worth experimenting with tag-driven ontology evolution, as in [1], where tags are associated to concepts in an ontology. [...] So the domain expert/knowledge engineer partnership is preserved, but with the domain expert role being replaced by collective wisdom from the community.
Are you aware of the "Fast, Cheap, and Out of Control" paper from Mark Wilkinson's group: Good et al. 2006. Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development. Pacific Symposium on Biocomputing 11: 128-139.
http://psb.stanford.edu/psb-online/proceedings/psb06/good.pdf
Cool, thanks. Looks like what they're describing is, essentially, the first VoCamp.
Joel.
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================