Phil, incremental harvesting is not implemented on the GBIF side as far as I am aware. And I dont think that will be a simple thing to implement on the current system. Also, even if we can detect only the changed records since the last harevesting via dateLastModified we still have no information about deletions. We could have an arrangement saying that you keep deleted records as empty records with just the ID and nothing else (I vaguely remember LSIDs were supposed to work like this too). But that also needs to be supported on your side then, never entirely removing any record. I will have a discussion with the others at GBIF about that.
Markus
On 1 May, 2008, at 15:28, Phil Cryer wrote:
On Mon, 2008-04-28 at 17:34 -0500, Markus Döring wrote:
Phil, from the GBIF side it doesnt matter whether you use DiGIR or TAPIR. Both protocols are currently supported by the GBIF indexer. If you use TapirLink simply mapping to DarwinCore is enough. For other TAPIRlite providers please make sure your service works with the 2 following DarwinCore TAPIR templates found at TDWG:
http://rs.tdwg.org/tapir/cs/dwc/1.4/template/dwc_sci_name_range.xml http://rs.tdwg.org/tapir/cs/dwc/1.4/template/ dwc_unfiltered_search.xml
Markus I've gotten DiGIR back in line and will start tracking it to see what kind of usage we're experiencing, after that I want to bring up Tapir, mapping out data via ABCD - after this I will speak to you so we can determine if I have things configured the most efficiently. I'm interested in how we can have the harvester pull only the latest data...I'll think about that.
Phil
At GBIF we are currently also thinking about a much simpler provider software tailored for harvesting. That will reduce load on providers enormously while still supporting basic TAPIR capabilities for true distributed queries. We will keep this list informed once we have thought this through.
Markus
-- Markus Döring, Berlin Senior Software Developer GBIF Secretariat mdoering@gbif.org
On 28 Apr, 2008, at 23:02, Blum, Stan wrote:
Phil,
TAPIR was intended to be a unification of DiGIR and BioCASE. There are a few implementations of providers but fewer instances of portals built
on
TAPIR. Networks built on DiGIR may eventually switch to TAPIR, but that remains to be seen. DiGIR and BioCASE were designed for distributed queries, not really harvesting. I understand harvesting can be done more simply and efficiently by other approaches, such as OAI-PMH. If the sensibilities of data providers evolves to accept and allow harvesting (which seems likely), we may see "networks" built on that architecture, instead of distributed
queries.
If your only goal is to provide data to GBIF, I would suggest installing TAPIR (unless Tim Robertson tells you something else). If you are concerned about providing data to other networks, like www.SERNEC.org, you'll need a DiGIR provider, too. (Such is the nature of technical transition.)
-Stan
Stanley D. Blum, Ph.D. Research Information Manager California Academy of Sciences 875 Howard St. San Francisco, CA +1 (415) 321-8183
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of Phil Cryer Sent: Monday, April 28, 2008 1:22 PM To: Renato De Giovanni; tdwg-tapir@lists.tdwg.org Subject: RE: [tdwg-tapir] Tapir protocol - Harvest methods?
So we have DiGIR running at Mobot for Tropicos data, and clients
hit
it to harvest data. I was just wondering if people are still deploying DiGIR at all, or are they just using Tapir by default? It seems to have taken over for DiGIR, and I want to know if that's a 'standard' that we should follow.
For testing, yes, we're talking more of performance; make sure our network and server will handle X load. So I guess I want to know more of, how do clients attach to a Tapir server, how do they pull the data from us?
Sorry if this is such a newbie question, but I can't understand
this
aspect from the docs I've read.
Thanks for the reply!
Phil
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of Renato De Giovanni Sent: Monday, April 28, 2008 1:54 PM To: tdwg-tapir@lists.tdwg.org Subject: Re: [tdwg-tapir] Tapir protocol - Harvest methods?
Phil,
Is the "DiGIR implementation that you want to move away from" just a DiGIR service? Or is it something else?
I would only keep a parallel DiGIR service if there are older
clients
that can only talk to it and for some reason (time/resources) can't be updated. I'm not sure if this is your case.
Also, when you said that you want to "test your implementation", did you mean that you want to test a TAPIR service, or is it some other application based on TAPIR? If you just want to test a TAPIR
service,
you could simply run TapirTester on it instead of developing your
own
harvester:
Note: If necessary, the existing tests can be improved. New ones can also be created (TapirTester is open source).
Hope this helps,
Renato
On 28 Apr 2008 at 10:39, Phil Cryer wrote:
Just starting with Tapir/DiGIR - I have 2 questions:
- I would like to know if the Tapir protocol is the preferred
method
over DiGIR. We have a DiGIR implementation that we want to move
away
from, and bring up a Tapir one in its place. Is this normal, or do organizations run both to facilitate their older clients to do harvesting?
- What is a method to harvest data from Tapir, and/or DiGIR -we
want
to do this internally to test our implementation before we open up
to
the world, how can I do this (we run Windows and Linux as clients)
Thank you
Phil
Phil Cryer Open Source Development Missouri Botanical Garden
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
--