RDF query and inference in a distributed environment

Roderic Page r.page at BIO.GLA.AC.UK
Wed Jan 4 15:25:10 CET 2006


On 4 Jan 2006, at 13:23, Sally Hinchcliffe wrote:

> Species 2000 uses a federated data model. As to whether that counts as
>  'large scale'  or not I don't know. I believe that even getting
> permission for a data cache from the various providers was a painful
> exercise.
> Sally

I think there is a continuum of possibilities:

1. Pure distributed - query always sent to remote sources, no data ever
held locally

2. Distributed with cache - results from a user query are cached
(either for a limited time, or until source sends a message saying its
data has been updated).

3. Distributed with partial local copy - some information harvested
from sources stored locally (e.g., metadata), detailed information only
held by sources.

4. Not distributed (data warehouse) - all data from distributed sources
held locally (harvested), with periodic updates.

My own Taxonomic Search Engine belongs in category 2 (it has a 24 hour
cache for queries made by users). I'm guessing GBIF is category 3 (or
is it 4, I can get full details from GBIF as well as from remote
source).

I'm not totally sure where Species 2000's SPICE fits in, they certainly
use a cache, and my feeling is they cache more than just queries made
by individual users (if the speed of response is anything to go by).

 From 1 -4 we trade off several things. 1 gives us the freshest data, at
the cost of speed (querying multiple sources takes time, sources can go
offline, etc.), 4 gives us maximum power (we can do queries that are
hard if not impossible in a distributed environment, such as find how
many names are in common to two data sources) at the cost of possibly
being out of date.

My comment was really about the lack of large scale federated systems
was really about systems of type 1 and 2, where essentially every
search query is live. If the search is done on a local copy of the
data, or on a persistent cache, then it's not really distributed
(IMHO).

Regards

Rod






------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org




More information about the tdwg-tag mailing list