Fwd: LSIDs and ontology segmentation
Im not sure how many of you follow the mailing list public-semweb-lifesci@w3.org, but here is an interesting recent posting regarding LSIDs and ontology documents. This group is currently in a similar situation to us, debating the best approach for GUIDs and sorting out ontologies. It includes the BioRDF group that Suzie Stevens spoke of at the RDF workshop in Edinburgh (for those of you who attended). They are looking at LSIDs but as always there are people for and against the technology. The thread started by the following post goes on with some interesting arguments for and against using LSIDs for identifiers to "ontology fragments". You can see more at http://lists.w3.org/Archives/Public/public-semweb-lifesci/2006Jul/0076.html.
Kevin
Mark Wilkinson markw@illuminae.com 14/07/2006 5:01:33 a.m. >>>
Hi all,
I chatted with Sean Martin yesterday and he indicated that, on the last SWLS teleconference, he mentioned one of the ideas that my lab and his group have been tossing around for the past few weeks v.v. LSIDs representing ontology nodes. He asked me to fill-in some more detail in a message to this mailing list.
In a publication that will be available soon [1] we (briefly) discuss the problem of actually *using* the currently available ontologies in a "real" Semantic Web setting - i.e. dynamically downloading whatever ontologies are necessary given the predicates that you find in some discovered RDF instance document. The OWL representation of GO is over 10 Meg... for heavens sake!... and GO is a small ontology compared to things like the NCI Metathesaurus.
The problem with using document#fragment URLs to identify ontology nodes is that the defined behaviour for resolving such an identifier is to drop the fragment (since that isn't available server-side anyway) and to return the entire document... all 10Meg's of GO... each time... We would argue, therefore, that the URL (if you adopt its default behaviour) is not only a bit of a nuisance, it is a blocker in some/many cases.
There's been some exciting work in the domain of ontology segmentation [2,3,4,5] that, we believe, is perhaps a more rational way of working with these massive ontologies when you need to get on-the-fly access to only the portions of the ontology that are relevant to your Blackberry's agent at that moment. I know that others (e.g. Damian Gessler and collaborators at NCGR, but I don't have the reference to his submitted manuscript at hand right now... sorry Damian!) are also working on the problem of segmentation by passing a self-inflating "flattened" ontology fragment. The problem is that there is no Semantic Web-style protocol available to specify that this is the behaviour you want, or for the agent to know that this is the behaviour to expect. Some of these projects are setting up the ontology fragment-generator as a Web Service (if I recall correctly, Rector's group does this [4]), however this doesn't solve the SW problem either because we can't (easily) model a Web Service invocation as a single URI (at least, not by any existing standard or convention... I guess some long REST-style URLs could do this...)
Here is where I think the LSID could really shine! Unlike a URL, the LSID does not have to return an entire document in response to a getMetaData call. Thus, if an LSID were used as the identifier for an ontology node, the behaviour of the getMetadata call could be, by convention or by standard, to return only the relevant ontology fragment, where that fragment was generated by e.g. the Rector Segmentation generator in the background.
These were just early thoughts we've been having, but Sean asked me to share them with the group in hopes of fanning the flames of discussion and debate. It seems to me to be a "blocker" issue when it comes to deploying SW applications in the wild, and I know that projects like Damian's Semantic MOBY have hit this problem early and hard, as have I in my own sandbox. It's all well and good when we play SW on our own local machine, but as soon as we try to play SW in the wi(l)der world this problem cripples us almost instantly. We think the LSID is (a/the) solution to this problem, but no solution will be useful if it doesn't have wider adoption, so...
opinions?
Cheers all!
Mark
[1] Good, B, Wilkinson, M. (in press). The Life Sciences Semantic Web is Full of Creeps! Briefings in Bioinformatics. [2] Noy, N, Musen, M. Specifying Ontology Views by Traversal. 2004. [3] Alani, H, Harris, S, O'Neil, B. Ontology Winnowing: A Case Study on the AKT Reference Ontology. 2005. [4] Seidenberg, J, Rector, A (2006), 'Web Ontology Segmentation: Analysis, Classification and Use', World Wide Web, ACM, Edinburgh, Scotland. [5] Stuckenschmidt, H, Klein, M. Structure-Based Partitioning of Large Concept Hierarchies. 2004.
participants (1)
-
Kevin Richards