On 4 Jan 2006, at 13:23, Sally Hinchcliffe wrote:
Species 2000 uses a federated data model. As to whether that counts as 'large scale' or not I don't know. I believe that even getting permission for a data cache from the various providers was a painful exercise. Sally
I think there is a continuum of possibilities:
1. Pure distributed - query always sent to remote sources, no data ever held locally
2. Distributed with cache - results from a user query are cached (either for a limited time, or until source sends a message saying its data has been updated).
3. Distributed with partial local copy - some information harvested from sources stored locally (e.g., metadata), detailed information only held by sources.
4. Not distributed (data warehouse) - all data from distributed sources held locally (harvested), with periodic updates.
My own Taxonomic Search Engine belongs in category 2 (it has a 24 hour cache for queries made by users). I'm guessing GBIF is category 3 (or is it 4, I can get full details from GBIF as well as from remote source).
I'm not totally sure where Species 2000's SPICE fits in, they certainly use a cache, and my feeling is they cache more than just queries made by individual users (if the speed of response is anything to go by).
From 1 -4 we trade off several things. 1 gives us the freshest data, at the cost of speed (querying multiple sources takes time, sources can go offline, etc.), 4 gives us maximum power (we can do queries that are hard if not impossible in a distributed environment, such as find how many names are in common to two data sources) at the cost of possibly being out of date.
My comment was really about the lack of large scale federated systems was really about systems of type 1 and 2, where essentially every search query is live. If the search is done on a local copy of the data, or on a persistent cache, then it's not really distributed (IMHO).
Regards
Rod
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org