RDF query and inference in a distributed environment

Tue Jan 3 13:48:17 CET 2006

Dear Ricardo,

The issues you raise are the ones we are currently playing with at
Glasgow. It's still early days, but a here are a few thoughts.

In terms of distributed queries, languages such as SPARQL support
querying remote sources (for an introduction see Leigh Dodd's article
http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-
semantic-web-tutorial.html). SPARQL has its own query protocol using
web services. I've not played with this much yet, but it look
promising. Basically, your query can contain the URL of a resource and
SPARQL will query it. How this will scale across multiple sources I'm
not sure.

Another tool to consider is TAP (http://tap.stanford.edu/).

In terms of discovery, this is a separate issue in some ways. I think
an obvious thing to look at is BioMoby. A light weight version of this
would be to have a RDF source that listed data sources and described
what kinds of information they contain. A bit like GBIFs UDDI registry,
but actually useful ;-)

Long term what I think might happen is that users have their own triple
stores, and as they do queries the results get added to their own
triple store and they can make inferences locally that they are
interested in. MIT's Piggy bank project
(http://simile.mit.edu/piggy-bank/) is an example of this sort of
approach.

Regards

Rod

On 3 Jan 2006, at 13:24, Ricardo Scachetti Pereira wrote:

>    Dear Rod,
>
>    Thanks for putting those ideas together and sharing them with the
> group. We really needed some more concrete examples to explore the use
> of LSID and RDF in the GUID metadata. It definitely sound like a good
> combination of technologies to explore.
>
>    Reading your manuscript one question kept coming to my mind. The
> most important benefits of using RDF seem to be the graph
> representation, the possibility of merging separate graphs into larger
> ones (the merge operation), and the possibility of making inferences on
> the resulting graph. All those features are supported by the so called
> triple stores.
>
>    However, the triple store examples I come across always use a local
> and centralized architecture. I'm aware that those are simplifications
> needed to get the new concepts through, but I suppose we need to
> address
> the scalability issues of those technologies if we want to use them in
> production environment.
>
>    So I wanted to explore a little in this list is the following issue:
> How does the RDF inference mechanisms scale up to a distributed
> environment like ours? In my opinion, GUID technologies provide only a
> bit (although a fundamental bit) of the infrastructure needed to make
> the RDF metadata framework fully functional, i.e., it provides the
> function: get me all metadata for this ID (besides persistence and
> maybe
> provenance). I suppose that other pieces of functionality will be
> required, such as: i) other metadata discovery mechanisms (to find out,
> for example, who has metadata using this or that ontology); ii) a more
> flexible search mechanism (such as RDQL or SPARQL). In other words, we
> might need a RDF compatible query protocol.
>
>    Then it makes a lot of sense to have "aggregators" all around (such
> as GBIF or others) harvesting and merging RDF triples of interest into
> local triple stores and providing query and inference services, and
> providing links (for humans and machines) back to the information
> source.
>
>    Does anyone have an idea on how SPARQL or related technologies are
> right now in terms of supporting this kind of functionality?
>
>    Cheers,
>
> Ricardo
>
>
> Roderic Page wrote:
>
>> I've finally managed to put down some ideas on LSIDs, metadata, and
>> taxonomic names in the form of a manuscript. It's a bit rough, but I
>> thought members of this list might find it of interest. You can grab a
>> PDF here:
>> http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples/lsid.pdf
>>
>> The web site http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples has
>> links to some example RDF files that are mentioned in the manuscript
>> (where they are trimmed to save space).
>>
>> I'd welcome any comments.
>>
>> Regards
>>
>> Rod
>>
>>
>>
>> ----------------------------------------------------------------------
>> --
>> ----------------------------------------
>> Professor Roderic D. M. Page
>> Editor, Systematic Biology
>> DEEB, IBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QP
>> United Kingdom
>>
>> Phone:    +44 141 330 4778
>> Fax:      +44 141 330 2792
>> email:    r.page at bio.gla.ac.uk
>> web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>
>> Subscribe to Systematic Biology through the Society of Systematic
>> Biologists Website:  http://systematicbiology.org
>> Search for taxon names at
>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>> Find out what we know about a species at http://ispecies.org
>>
>
> <ricardo.vcf>
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org