[tdwg-tapir] Tapir protocol - Harvest methods?

Markus Döring mdoering at gbif.org
Mon May 5 11:24:36 CEST 2008


Phil,
incremental harvesting is not implemented on the GBIF side as far as I  
am aware. And I dont think that will be a simple thing to implement on  
the current system. Also, even if we can detect only the changed  
records since the last harevesting via dateLastModified we still have  
no information about deletions. We could have an arrangement saying  
that you keep deleted records as empty records with just the ID and  
nothing else (I vaguely remember LSIDs were supposed to work like this  
too). But that also needs to be supported on your side then, never  
entirely removing any record. I will have a discussion with the others  
at GBIF about that.

Markus



On 1 May, 2008, at 15:28, Phil Cryer wrote:

> On Mon, 2008-04-28 at 17:34 -0500, Markus Döring wrote:
>> Phil,
>> from the GBIF side it doesnt matter whether you use DiGIR or TAPIR.
>> Both protocols are currently supported by the GBIF indexer.
>> If you use TapirLink simply mapping to DarwinCore is enough. For
>> other
>> TAPIRlite providers please make sure your service works with the 2
>> following DarwinCore TAPIR templates found at TDWG:
>>
>> http://rs.tdwg.org/tapir/cs/dwc/1.4/template/dwc_sci_name_range.xml
>> http://rs.tdwg.org/tapir/cs/dwc/1.4/template/ 
>> dwc_unfiltered_search.xml
>
> Markus
> I've gotten DiGIR back in line and will start tracking it to see what
> kind of usage we're experiencing, after that I want to bring up Tapir,
> mapping out data via ABCD - after this I will speak to you so we can
> determine if I have things configured the most efficiently.  I'm
> interested in how we can have the harvester pull only the latest
> data...I'll think about that.
>
> Phil
>
>
>>
>> At GBIF we are currently also thinking about a much simpler provider
>> software tailored for harvesting. That will reduce load on providers
>> enormously while still supporting basic TAPIR capabilities for true
>> distributed queries. We will keep this list informed once we have
>> thought this through.
>>
>> Markus
>>
>> --
>>  Markus Döring, Berlin
>>  Senior Software Developer
>>  GBIF Secretariat
>>  mdoering at gbif.org
>>
>>
>>
>>
>> On 28 Apr, 2008, at 23:02, Blum, Stan wrote:
>>
>>> Phil,
>>>
>>> TAPIR was intended to be a unification of DiGIR and BioCASE. There
>>> are a few
>>> implementations of providers but fewer instances of portals built
>> on
>>> TAPIR.
>>> Networks built on DiGIR may eventually switch to TAPIR, but that
>>> remains to
>>> be seen.  DiGIR and BioCASE were designed for distributed queries,
>>> not really
>>> harvesting.  I understand harvesting can be done more simply and
>>> efficiently
>>> by other approaches, such as OAI-PMH.  If the sensibilities of data
>>> providers
>>> evolves to accept and allow harvesting (which seems likely), we may
>>> see
>>> "networks" built on that architecture, instead of distributed
>> queries.
>>>
>>> If your only goal is to provide data to GBIF, I would suggest
>>> installing
>>> TAPIR (unless Tim Robertson tells you something else).  If you are
>>> concerned
>>> about providing data to other networks, like www.SERNEC.org, you'll
>>> need a
>>> DiGIR provider, too.  (Such is the nature of technical transition.)
>>>
>>> -Stan
>>>
>>> Stanley D. Blum, Ph.D.
>>> Research Information Manager
>>> California Academy of Sciences
>>> 875 Howard St.
>>> San Francisco,  CA
>>> +1 (415) 321-8183
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: tdwg-tapir-bounces at lists.tdwg.org
>>> [mailto:tdwg-tapir-bounces at lists.tdwg.org] On Behalf Of Phil Cryer
>>> Sent: Monday, April 28, 2008 1:22 PM
>>> To: Renato De Giovanni; tdwg-tapir at lists.tdwg.org
>>> Subject: RE: [tdwg-tapir] Tapir protocol - Harvest methods?
>>>
>>>
>>> So we have DiGIR running at Mobot for Tropicos data, and clients
>> hit
>>> it to
>>> harvest data.  I was just wondering if people are still deploying
>>> DiGIR at
>>> all, or are they just using Tapir by default?  It seems to have
>>> taken over
>>> for DiGIR, and I want to know if that's a 'standard' that we should
>>> follow.
>>>
>>> For testing, yes, we're talking more of performance; make sure our
>>> network
>>> and server will handle X load.  So I guess I want to know more of,
>>> how do
>>> clients attach to a Tapir server, how do they pull the data from us?
>>>
>>> Sorry if this is such a newbie question, but I can't understand
>> this
>>> aspect
>>> from the docs I've read.
>>>
>>> Thanks for the reply!
>>>
>>> Phil
>>>
>>> -----Original Message-----
>>> From: tdwg-tapir-bounces at lists.tdwg.org
>>> [mailto:tdwg-tapir-bounces at lists.tdwg.org] On Behalf Of Renato De
>>> Giovanni
>>> Sent: Monday, April 28, 2008 1:54 PM
>>> To: tdwg-tapir at lists.tdwg.org
>>> Subject: Re: [tdwg-tapir] Tapir protocol - Harvest methods?
>>>
>>> Phil,
>>>
>>> Is the "DiGIR implementation that you want to move away from" just a
>>> DiGIR service? Or is it something else?
>>>
>>> I would only keep a parallel DiGIR service if there are older
>> clients
>>> that can only talk to it and for some reason (time/resources) can't
>>> be updated. I'm not sure if this is your case.
>>>
>>> Also, when you said that you want to "test your implementation", did
>>> you mean that you want to test a TAPIR service, or is it some other
>>> application based on TAPIR? If you just want to test a TAPIR
>> service,
>>> you could simply run TapirTester on it instead of developing your
>> own
>>> harvester:
>>>
>>> http://tapir.tdwg.org/tester/
>>>
>>> Note: If necessary, the existing tests can be improved. New ones can
>>> also be created (TapirTester is open source).
>>>
>>> Hope this helps,
>>> --
>>> Renato
>>>
>>> On 28 Apr 2008 at 10:39, Phil Cryer wrote:
>>>>
>>>> Just starting with Tapir/DiGIR - I have 2 questions:
>>>>
>>>> * I would like to know if the Tapir protocol is the preferred
>> method
>>>> over DiGIR. We have a DiGIR implementation that we want to move
>> away
>>>> from, and bring up a Tapir one in its place. Is this normal, or do
>>>> organizations run both to facilitate their older clients to do
>>>> harvesting?
>>>>
>>>> * What is a method to harvest data from Tapir, and/or DiGIR -we
>> want
>>>> to do this internally to test our implementation before we open up
>> to
>>>> the world, how can I do this (we run Windows and Linux as clients)
>>>>
>>>> Thank you
>>>>
>>>> Phil
>>>> --
>>>> Phil Cryer
>>>> Open Source Development
>>>> Missouri Botanical Garden
>>>
>>> _______________________________________________
>>> tdwg-tapir mailing list
>>> tdwg-tapir at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>>> _______________________________________________
>>> tdwg-tapir mailing list
>>> tdwg-tapir at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>>> _______________________________________________
>>> tdwg-tapir mailing list
>>> tdwg-tapir at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>>>
>>
>>
>>
>>
>>
>>
> -- 
>




More information about the tdwg-tag mailing list