RDF query and inference in a distributed environment

Tue Jan 3 11:24:47 CET 2006

    Dear Rod,

    Thanks for putting those ideas together and sharing them with the
group. We really needed some more concrete examples to explore the use
of LSID and RDF in the GUID metadata. It definitely sound like a good
combination of technologies to explore.

    Reading your manuscript one question kept coming to my mind. The
most important benefits of using RDF seem to be the graph
representation, the possibility of merging separate graphs into larger
ones (the merge operation), and the possibility of making inferences on
the resulting graph. All those features are supported by the so called
triple stores.

    However, the triple store examples I come across always use a local
and centralized architecture. I'm aware that those are simplifications
needed to get the new concepts through, but I suppose we need to address
the scalability issues of those technologies if we want to use them in
production environment.

    So I wanted to explore a little in this list is the following issue:
How does the RDF inference mechanisms scale up to a distributed
environment like ours? In my opinion, GUID technologies provide only a
bit (although a fundamental bit) of the infrastructure needed to make
the RDF metadata framework fully functional, i.e., it provides the
function: get me all metadata for this ID (besides persistence and maybe
provenance). I suppose that other pieces of functionality will be
required, such as: i) other metadata discovery mechanisms (to find out,
for example, who has metadata using this or that ontology); ii) a more
flexible search mechanism (such as RDQL or SPARQL). In other words, we
might need a RDF compatible query protocol.

    Then it makes a lot of sense to have "aggregators" all around (such
as GBIF or others) harvesting and merging RDF triples of interest into
local triple stores and providing query and inference services, and
providing links (for humans and machines) back to the information source.

    Does anyone have an idea on how SPARQL or related technologies are
right now in terms of supporting this kind of functionality?

    Cheers,

Ricardo

Roderic Page wrote:

> I've finally managed to put down some ideas on LSIDs, metadata, and
> taxonomic names in the form of a manuscript. It's a bit rough, but I
> thought members of this list might find it of interest. You can grab a
> PDF here: http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples/lsid.pdf
>
> The web site http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples has
> links to some example RDF files that are mentioned in the manuscript
> (where they are trimmed to save space).
>
> I'd welcome any comments.
>
> Regards
>
> Rod
>
>
>
> ------------------------------------------------------------------------
> ----------------------------------------
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone:    +44 141 330 4778
> Fax:      +44 141 330 2792
> email:    r.page at bio.gla.ac.uk
> web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website:  http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
> Find out what we know about a species at http://ispecies.org
>

--------------000809050309050307040907
Content-Type: text/x-vcard; charset=utf-8;
 name="ricardo.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="ricardo.vcf"

begin:vcard
fn:Ricardo Pereira
n:Pereira;Ricardo
org:International Working Group on Taxonomic Databases
email;internet:ricardo at tdwg.org
title:Software Engineer
url:http://www.tdwg.org/
version:2.1
end:vcard