Re: RDF query and inference in a distributed environment

3 Jan 2006

      Dear Ricardo,

The issues you raise are the ones we are currently playing with at
Glasgow. It's still early days, but a here are a few thoughts.

In terms of distributed queries, languages such as SPARQL support
querying remote sources (for an introduction see Leigh Dodd's article
http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-
semantic-web-tutorial.html). SPARQL has its own query protocol using
web services. I've not played with this much yet, but it look
promising. Basically, your query can contain the URL of a resource and
SPARQL will query it. How this will scale across multiple sources I'm
not sure.

Another tool to consider is TAP (http://tap.stanford.edu/).

In terms of discovery, this is a separate issue in some ways. I think
an obvious thing to look at is BioMoby. A light weight version of this
would be to have a RDF source that listed data sources and described
what kinds of information they contain. A bit like GBIFs UDDI registry,
but actually useful ;-)

Long term what I think might happen is that users have their own triple
stores, and as they do queries the results get added to their own
triple store and they can make inferences locally that they are
interested in. MIT's Piggy bank project
(http://simile.mit.edu/piggy-bank/) is an example of this sort of
approach.

Regards

Rod

On 3 Jan 2006, at 13:24, Ricardo Scachetti Pereira wrote:
...
Dear Rod,
Thanks for putting those ideas together and sharing them with the
group. We really needed some more concrete examples to explore the use
of LSID and RDF in the GUID metadata. It definitely sound like a good
combination of technologies to explore.
Reading your manuscript one question kept coming to my mind. The
most important benefits of using RDF seem to be the graph
representation, the possibility of merging separate graphs into larger
ones (the merge operation), and the possibility of making inferences on
the resulting graph. All those features are supported by the so called
triple stores.
However, the triple store examples I come across always use a local
and centralized architecture. I'm aware that those are simplifications
needed to get the new concepts through, but I suppose we need to
address
the scalability issues of those technologies if we want to use them in
production environment.
So I wanted to explore a little in this list is the following issue:
How does the RDF inference mechanisms scale up to a distributed
environment like ours? In my opinion, GUID technologies provide only a
bit (although a fundamental bit) of the infrastructure needed to make
the RDF metadata framework fully functional, i.e., it provides the
function: get me all metadata for this ID (besides persistence and
maybe
provenance). I suppose that other pieces of functionality will be
required, such as: i) other metadata discovery mechanisms (to find out,
for example, who has metadata using this or that ontology); ii) a more
flexible search mechanism (such as RDQL or SPARQL). In other words, we
might need a RDF compatible query protocol.
Then it makes a lot of sense to have "aggregators" all around (such
as GBIF or others) harvesting and merging RDF triples of interest into
local triple stores and providing query and inference services, and
providing links (for humans and machines) back to the information
source.
Does anyone have an idea on how SPARQL or related technologies are
right now in terms of supporting this kind of functionality?
Cheers,
Ricardo
Roderic Page wrote:
...
I've finally managed to put down some ideas on LSIDs, metadata, and
taxonomic names in the form of a manuscript. It's a bit rough, but I
thought members of this list might find it of interest. You can grab a
PDF here:
http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples/lsid.pdf
The web site http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples has
links to some example RDF files that are mentioned in the manuscript
(where they are trimmed to save space).
I'd welcome any comments.
Regards
Rod
----------------------------------------------------------------------
--
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at
http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org
<ricardo.vcf>

----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page@bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org

Roderic Page

tags

participants (1)