[tdwg-tapir] Hosting strategies
roger at tdwg.org
Mon May 14 17:22:06 CEST 2007
How are you getting the data into the remote database. Have you
designed you own service? Do you post spreadsheets to the server and
get it to ingest them?
On 14 May 2007, at 13:12, Wouter Addink wrote:
> we are following a kind of push strategy with the development of a
> checklists provider tool for GBIF currently. The tool converts data
> in spreadsheets or text files into a public database compatible
> with TCS, with a TAPIR service. That database can be at the local
> provider or directly at GBIF.
> ----- Original Message ----- From: "Roger Hyam" <roger at tdwg.org>
> To: "Dave Vieglais" <vieglais at ku.edu>
> Cc: <tdwg-tapir at lists.tdwg.org>
> Sent: Monday, May 14, 2007 11:40 AM
> Subject: Re: [tdwg-tapir] Hosting strategies
>> I think you are right. Whatever strategy is used there has to be
>> an element of push in it. The production database has to push data
>> to the publicly visible database that can then be scraped/
>> searched by interested parties. The difference between pushing
>> data to a public database that is managed by your own institution
>> or one that is managed by a third party is really quite minor.
>> I am concerned because I am not sure of the technical facilities
>> and human resources of potential data suppliers.
>> I wonder if anyone has some figures on this stuff?
>> All the best,
>> On 11 May 2007, at 12:02, Dave Vieglais wrote:
>>> Hi Everyone,
>>> Not really a TAPIR specific response, but perhaps the right
>>> audience. I'm probably stating the obvious, but the simplest way
>>> to get around the hassles of running a server and the associated
>>> firewall headaches is not to serve the data but instead to push
>>> it. By adding an authentication layer, it would be an extension
>>> to the GBIF REST services to allow POSTing data, rather than
>>> just GET (I may be wrong on this - not exactly sure what degree
>>> of REST implementation has been done by GBIF). Add in DELETE
>>> and UPDATE and instead of GBIF running harvesters to capture
>>> data, contributors could simply push their data when necessary.
>>> This would I expect be an attractive solution for those data
>>> providers that would prefer not to operate servers but would
>>> still like to contribute to the global knowledge pool of
>>> More than likely a mixed model may be more ideal - with some data
>>> sources acting as servers, and others pushing their data. How
>>> about if those institutions that were comfortable running and
>>> maintaining servers also adopted the same complete REST
>>> implementation as the hypothetical GBIF mentioned above? And
>>> what if the servers were, for the most part, aware of each other
>>> and so could act as proxies or mirrors for the other servers
>>> (perhaps even automatically replicating content). The end
>>> result would be more of a mesh topology of comparatively high
>>> reliability and availability.
>>> It would I think be a more scalable solution, and perhaps more
>>> maintainable in the long term, since it is a relatively simple
>>> thing to update a standalone application that could push the
>>> data compared with updating and securing a server. The expense
>>> of participation in the networks would drop, and resources could
>>> be directed towards operation of a few high quality / high
>>> reliability services for accessing the data.
>>> Just a thought. There are obvious social implications, such as
>>> the perception of loosing control of one's data - but then it
>>> could also be argued that if a provider had the ability to
>>> DELETE their records from a server, then they actually have more
>>> control over the distribution of their data than currently.
>>> Dave V.
>>> On May 11, 2007, at 19:19, Roger Hyam wrote:
>>>> Hi Everyone,
>>>> There is a requirement that all wrapper type applications
>>>> (TAPIR, DiGIR, BioCASe and others) have but that I don't think
>>>> we address.
>>>> All instances need to have:
>>>> Either a database on a server in a DMZ or with an ISP with the
>>>> ability to export data from the production database to the
>>>> public database and then keep changes in the production
>>>> database synchronize with the public database.
>>>> Or the ability to provide a secured/restricted connection
>>>> directly to production database through the firewall.
>>>> Configuring the wrapper software against a database seems a
>>>> smaller problem than getting a handle on an up to date database
>>>> to configure it against!
>>>> Should we have a recommended strategy or best practice for
>>>> overcoming these problems? Do we have any figures on how they
>>>> are overcome in the existing BioCASe and DiGIR networks?
>>>> Many thanks for your thoughts,
>>>> tdwg-tapir mailing list
>>>> tdwg-tapir at lists.tdwg.org
>> tdwg-tapir mailing list
>> tdwg-tapir at lists.tdwg.org
More information about the tdwg-tag