Re: [tdwg-tapir] Hosting strategies

14 May 2007

      Thanks Wouter,

How are you getting the data into the remote database. Have you  
designed you own service? Do you post spreadsheets to the server and  
get it to ingest them?

Roger

On 14 May 2007, at 13:12, Wouter Addink wrote:
...
Dave,
we are following a kind of push strategy with the development of a  
checklists provider tool for GBIF currently. The tool converts data  
in spreadsheets or text files into a public database compatible  
with TCS, with a TAPIR service. That database can be at the local  
provider or directly at GBIF.
regards,
Wouter
----- Original Message ----- From: "Roger Hyam" <roger@tdwg.org>
To: "Dave Vieglais" <vieglais@ku.edu>
Cc: <tdwg-tapir@lists.tdwg.org>
Sent: Monday, May 14, 2007 11:40 AM
Subject: Re: [tdwg-tapir] Hosting strategies
...
Dave,
I think you are right. Whatever strategy is used there has to be  
an element of push in it. The production database has to push data  
to  the publicly visible database that can then be scraped/ 
searched by  interested parties. The difference between pushing  
data to a public  database that is managed by your own institution  
or one that is  managed by a third party is really quite minor.
I am concerned because I am not sure of the technical facilities  
and human resources of potential data suppliers.
I wonder if anyone has some figures on this stuff?
All the best,
Roger.
On 11 May 2007, at 12:02, Dave Vieglais wrote:
...
Hi Everyone,
Not really a TAPIR specific response, but perhaps the right   
audience. I'm probably stating the obvious, but the simplest way   
to get around the hassles of running a server and the associated   
firewall headaches is not to serve the data but instead to push   
it.  By adding an authentication layer, it would be an extension  
to  the GBIF REST services to allow POSTing data, rather than  
just GET  (I may be wrong on this - not exactly sure what degree  
of REST  implementation has been done by GBIF).  Add in DELETE  
and UPDATE  and instead of GBIF running harvesters to capture  
data,  contributors could simply push their data when necessary.   
This would I expect be an attractive solution for those data  
providers  that would prefer not to operate servers but would  
still like to  contribute to the global knowledge pool of  
biodiversity.
More than likely a mixed model may be more ideal - with some data  
sources acting as servers, and others pushing their data.  How   
about if those institutions that were comfortable running and   
maintaining servers also adopted the same complete REST   
implementation as the hypothetical GBIF mentioned above?  And  
what  if the servers were, for the most part, aware of each other  
and so  could act as proxies or mirrors for the other servers  
(perhaps even  automatically replicating content).  The end  
result would be more  of a mesh topology of comparatively high  
reliability and availability.
It would I think be a more scalable solution, and perhaps more  
maintainable in the long term, since it is a relatively simple   
thing to update a standalone application that could push the  
data  compared with updating and securing a server.  The expense  
of  participation in the networks would drop, and resources could  
be  directed towards operation of a few high quality / high  
reliability  services for accessing the data.
Just a thought.  There are obvious social implications, such as  
the perception of loosing control of one's data - but then it  
could  also be argued that if a provider had the ability to  
DELETE their  records from a server, then they actually have more  
control over  the distribution of their data than currently.
cheers,
  Dave V.
On May 11, 2007, at 19:19, Roger Hyam wrote:
...
Hi Everyone,
There is a requirement that all wrapper type applications  
(TAPIR, DiGIR, BioCASe and others) have but that I don't think  
we address.
All instances need to have:
Either a database on a server in a DMZ or with an ISP with the   
ability to export data from the production database to the  
public  database and then keep changes in the production  
database  synchronize with the public database.
Or the ability to provide a secured/restricted connection  
directly  to production database through the firewall.
Configuring the wrapper software against a database seems a   
smaller problem than getting a handle on an up to date database  
to  configure it against!
Should we have a recommended strategy or best practice for   
overcoming these problems? Do we have any figures on how they  
are  overcome in the existing BioCASe and DiGIR networks?
Many thanks for your thoughts,
Roger
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir