[tdwg-tapir] Hosting strategies
wouter at eti.uva.nl
Mon May 14 14:12:44 CEST 2007
we are following a kind of push strategy with the development of a
checklists provider tool for GBIF currently. The tool converts data in
spreadsheets or text files into a public database compatible with TCS, with
a TAPIR service. That database can be at the local provider or directly at
----- Original Message -----
From: "Roger Hyam" <roger at tdwg.org>
To: "Dave Vieglais" <vieglais at ku.edu>
Cc: <tdwg-tapir at lists.tdwg.org>
Sent: Monday, May 14, 2007 11:40 AM
Subject: Re: [tdwg-tapir] Hosting strategies
> I think you are right. Whatever strategy is used there has to be an
> element of push in it. The production database has to push data to the
> publicly visible database that can then be scraped/searched by interested
> parties. The difference between pushing data to a public database that is
> managed by your own institution or one that is managed by a third party
> is really quite minor.
> I am concerned because I am not sure of the technical facilities and
> human resources of potential data suppliers.
> I wonder if anyone has some figures on this stuff?
> All the best,
> On 11 May 2007, at 12:02, Dave Vieglais wrote:
>> Hi Everyone,
>> Not really a TAPIR specific response, but perhaps the right audience.
>> I'm probably stating the obvious, but the simplest way to get around the
>> hassles of running a server and the associated firewall headaches is not
>> to serve the data but instead to push it. By adding an authentication
>> layer, it would be an extension to the GBIF REST services to allow
>> POSTing data, rather than just GET (I may be wrong on this - not exactly
>> sure what degree of REST implementation has been done by GBIF). Add in
>> DELETE and UPDATE and instead of GBIF running harvesters to capture
>> data, contributors could simply push their data when necessary. This
>> would I expect be an attractive solution for those data providers that
>> would prefer not to operate servers but would still like to contribute
>> to the global knowledge pool of biodiversity.
>> More than likely a mixed model may be more ideal - with some data
>> sources acting as servers, and others pushing their data. How about if
>> those institutions that were comfortable running and maintaining servers
>> also adopted the same complete REST implementation as the hypothetical
>> GBIF mentioned above? And what if the servers were, for the most part,
>> aware of each other and so could act as proxies or mirrors for the other
>> servers (perhaps even automatically replicating content). The end
>> result would be more of a mesh topology of comparatively high
>> reliability and availability.
>> It would I think be a more scalable solution, and perhaps more
>> maintainable in the long term, since it is a relatively simple thing to
>> update a standalone application that could push the data compared with
>> updating and securing a server. The expense of participation in the
>> networks would drop, and resources could be directed towards operation
>> of a few high quality / high reliability services for accessing the
>> Just a thought. There are obvious social implications, such as the
>> perception of loosing control of one's data - but then it could also be
>> argued that if a provider had the ability to DELETE their records from a
>> server, then they actually have more control over the distribution of
>> their data than currently.
>> Dave V.
>> On May 11, 2007, at 19:19, Roger Hyam wrote:
>>> Hi Everyone,
>>> There is a requirement that all wrapper type applications (TAPIR,
>>> DiGIR, BioCASe and others) have but that I don't think we address.
>>> All instances need to have:
>>> Either a database on a server in a DMZ or with an ISP with the ability
>>> to export data from the production database to the public database and
>>> then keep changes in the production database synchronize with the
>>> public database.
>>> Or the ability to provide a secured/restricted connection directly to
>>> production database through the firewall.
>>> Configuring the wrapper software against a database seems a smaller
>>> problem than getting a handle on an up to date database to configure it
>>> Should we have a recommended strategy or best practice for overcoming
>>> these problems? Do we have any figures on how they are overcome in the
>>> existing BioCASe and DiGIR networks?
>>> Many thanks for your thoughts,
>>> tdwg-tapir mailing list
>>> tdwg-tapir at lists.tdwg.org
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
More information about the tdwg-tag