Re: [tdwg-tapir] Hosting strategies

14 May 2007

      Dave,

I think you are right. Whatever strategy is used there has to be an  
element of push in it. The production database has to push data to  
the publicly visible database that can then be scraped/searched by  
interested parties. The difference between pushing data to a public  
database that is managed by your own institution or one that is  
managed by a third party is really quite minor.

I am concerned because I am not sure of the technical facilities and  
human resources of potential data suppliers.

I wonder if anyone has some figures on this stuff?

All the best,

Roger.

On 11 May 2007, at 12:02, Dave Vieglais wrote:
...
Hi Everyone,
Not really a TAPIR specific response, but perhaps the right  
audience.  I'm probably stating the obvious, but the simplest way  
to get around the hassles of running a server and the associated  
firewall headaches is not to serve the data but instead to push  
it.  By adding an authentication layer, it would be an extension to  
the GBIF REST services to allow POSTing data, rather than just GET  
(I may be wrong on this - not exactly sure what degree of REST  
implementation has been done by GBIF).  Add in DELETE and UPDATE  
and instead of GBIF running harvesters to capture data,  
contributors could simply push their data when necessary.  This  
would I expect be an attractive solution for those data providers  
that would prefer not to operate servers but would still like to  
contribute to the global knowledge pool of biodiversity.
More than likely a mixed model may be more ideal - with some data  
sources acting as servers, and others pushing their data.  How  
about if those institutions that were comfortable running and  
maintaining servers also adopted the same complete REST  
implementation as the hypothetical GBIF mentioned above?  And what  
if the servers were, for the most part, aware of each other and so  
could act as proxies or mirrors for the other servers (perhaps even  
automatically replicating content).  The end result would be more  
of a mesh topology of comparatively high reliability and availability.
It would I think be a more scalable solution, and perhaps more  
maintainable in the long term, since it is a relatively simple  
thing to update a standalone application that could push the data  
compared with updating and securing a server.  The expense of  
participation in the networks would drop, and resources could be  
directed towards operation of a few high quality / high reliability  
services for accessing the data.
Just a thought.  There are obvious social implications, such as the  
perception of loosing control of one's data - but then it could  
also be argued that if a provider had the ability to DELETE their  
records from a server, then they actually have more control over  
the distribution of their data than currently.
cheers,
  Dave V.
On May 11, 2007, at 19:19, Roger Hyam wrote:
...
Hi Everyone,
There is a requirement that all wrapper type applications (TAPIR,  
DiGIR, BioCASe and others) have but that I don't think we address.
All instances need to have:
Either a database on a server in a DMZ or with an ISP with the  
ability to export data from the production database to the public  
database and then keep changes in the production database  
synchronize with the public database.
Or the ability to provide a secured/restricted connection directly  
to production database through the firewall.
Configuring the wrapper software against a database seems a  
smaller problem than getting a handle on an up to date database to  
configure it against!
Should we have a recommended strategy or best practice for  
overcoming these problems? Do we have any figures on how they are  
overcome in the existing BioCASe and DiGIR networks?
Many thanks for your thoughts,
Roger
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir