[tdwg-tapir] Tapir protocol - Harvest methods?

Tue May 13 23:24:41 CEST 2008

Hi Kevin,

This is the same as what I do for WFS...
I can't offer the full rich schema in WFS for the GBIF density layers, but
by putting in what I call a "callback url" as a mapped concept (feature)
the client calls a rest service to get back the (in this case) RDF for
extra info.  This is analogous to your LSID mapped concept.  I can't see a
better way of doing it, as a WFS response contains a flat structure.

Cheers

Tim

> I think this is a great idea.
> I have thought a bit about how we can "build upon" then tapir protocol
> and services that currently exist, and this post reminded me of a few
> that I would like to look at.  One in particular is extending the type
> of data sources that the Tapir configurator tools can connect to - I
> have done this a little in my TapirDotNET implementation where you can
> connect a concept to an LSID data source (ie it resolves the LSID and
> returns the resulting xml as the value for that mapped Tapir concept).
> But connecting to web services, etc, and also providing a "Tapir API"
> for the advanced user to programmatically provide data through a Tapir
> service would also be cool.  Any thoughts?
>
> Kevin
>
>>>> "Aaron D. Steele" <eightysteele at gmail.com> 14/05/2008 8:40 a.m.
>>>>
> at berkeley we've recently prototyped a simple php program that uses
> an existing tapirlink installation to periodically dump tapir
> resources into a csv file. the solution is totally generic and can
> dump darwin core (and technically abcd schema, although it's currently
> untested). the resulting csv files are zip archived and made
> accessible using a web service. it's a simple approach that has proven
> to be, at least internally, quite reliable and useful.
>
> for example, several of our caching applications use the web service
> to harvest csv data from tapirlink resources using the following
> process:
> 1) download latest csv dump for a resource using the web service.
> 2) flush all locally cached records for the resource.
> 3) bulk load the latest csv data into the cache.
>
> in this way, cached data are always synchronized with the resource and
> there's no need to track new, deleted, or changed records. as an
> aside, each time these cached data are queried by the caching
> application or selected in the user interface, log-only search
> requests are sent back to the resource.
>
> after discussion with renato giovanni and john wieczorek, we've
> decided that merging this functionality into the tapirlink codebase
> would benefit the broader community. csv generation support would be
> declared through capabilities. although incremental harvesting
> wouldn't be immediately implemented, we could certainly extend the
> service to include it later.
>
> i'd like to pause here to gauge the consensus, thoughts, concerns, and
> ideas of others. anyone?
>
> thanks,
> aaron
>
> 2008/5/5 Kevin Richards <RichardsK at landcareresearch.co.nz>:
>>
>>
>> I think I agree here.
>>
>> The harvesting "procedure" is really defined outside the Tapir
> protocol, is
>> it not?  So it is really an agreement between the harvester and the
>> harvestees.
>>
>> So what is really needed here is the standard procedure for
> maintaining a
>> "harvestable" dataset and the standard procedure for harvesting that
>> dataset.
>> We have a general rule at Landcare, that we never delete records in
> our
>> datasets - they are either deprecated in favour of another record,
> and so
>> the resolution of that record would point to the new record, or the
> are set
>> to a state of "deleted", but are still kept in the dataset, and can
> be
>> resolved (which would indicate a state of deleted).
>>
>> Kevin
>>
>>
>> >>> "Renato De Giovanni" <renato at cria.org.br> 6/05/2008 7:33 a.m.
>>>>
>>
>>
>> Hi Markus,
>>
>> I would suggest creating new concepts for incremental harvesting,
>> either in the data standards themselves or in some new extension. In
>> the case of TAPIR, GBIF could easily check the mapped concepts
> before
>> deciding between incremental or full harvesting.
>>
>> Actually it could be just one new concept such as "recordStatus" or
>> "deletionFlag". Or perhaps you could also want to create your own
>> definition for dateLastModified indicating which set of concepts
>> should be considered to see if something has changed or not, but I
>> guess this level of granularity would be difficult to be supported.
>>
>> Regards,
>> --
>> Renato
>>
>> On 5 May 2008 at 11:24, Markus DÃ¶ring wrote:
>>
>> > Phil,
>> > incremental harvesting is not implemented on the GBIF side as far
> as I
>> > am aware. And I dont think that will be a simple thing to implement
> on
>> > the current system. Also, even if we can detect only the changed
>> > records since the last harevesting via dateLastModified we still
> have
>> > no information about deletions. We could have an arrangement
> saying
>> > that you keep deleted records as empty records with just the ID
> and
>> > nothing else (I vaguely remember LSIDs were supposed to work like
> this
>> > too). But that also needs to be supported on your side then, never
>> > entirely removing any record. I will have a discussion with the
> others
>> > at GBIF about that.
>> >
>> > Markus
>>
>> _______________________________________________
>> tdwg-tapir mailing list
>> tdwg-tapir at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>>
>>
>>
>>
>>  Please consider the environment before printing this email
>>
>>  WARNING : This email and any attachments may be confidential and/or
>> privileged. They are intended for the addressee only and are not to
> be read,
>> used, copied or disseminated by anyone receiving them in error. If
> you are
>> not the intended recipient, please notify the sender by return email
> and
>> delete this message and any attachments.
>>
>> The views expressed in this email are those of the sender and do not
>> necessarily reflect the
>> official views of Landcare Research.
> http://www.landcareresearch.co.nz
>> _______________________________________________
>>  tdwg-tapir mailing list
>>  tdwg-tapir at lists.tdwg.org
>>  http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>>
>>
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> WARNING: This email and any attachments may be confidential and/or
> privileged.
> They are intended for the addressee only and are not to be read, used,
> copied or disseminated
> by anyone receiving them in error. If you are not the intended
> recipient, please notify the sender by
> return email and delete this message and any attachments.
>
> The views expressed in this email are those of the sender and do not
> necessarily reflect the
> official views of Landcare Research.
> http://www.landcareresearch.co.nz
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>