[tdwg-tag] FW: Final changes in TAPIR

Mon Feb 2 18:12:09 CET 2009

Hi Guido,

Thanks again for your thoughts.

(as you already know, I'm copying this message to the mailing list)

Incremental harvesting does require some sort of "modified since"
parameter used against the last harvesting date. Ideally you should also
be able to get all deleted records in another step using a "deleted since"
parameter. The way to achieve this in TAPIR is to define two concepts and
then use them in filters if the providers have mapped those concepts. I
think we shouldn't force providers to have the corresponding content, and
networks should remain completely free to define their own data
abstraction layers.

In your case, you don't need to embed such concepts in all query templates
as filter conditions, unless each query template returns completely
different things. You can try to define a single query template just for
harvesting.

Regarding dump files, even if a provider is developed and configured to
return a dump file behind certain service calls, there may be lots of
other queries that can return almost all records. For this reason
providers may still want to limit the number of records that can be
returned and advertise this limitation through capabilities, so the
situation can get a bit confusing for clients. I still think it makes more
sense to simply allow providers to declare dump files separately. Please
note that this would be an optional feature, so you're totally free to
decide whether you want to implement it or not.

Regarding the dump format, I'm certainly happy to leave it open to any
option that makes sense.

Thanks again,
--
Renato

> Hi Renato,
>
> that whole thing sounds good to me, just two comments:
>
> In order to facilitate incremental updates (which I am in absolute favor
> of), each query requires some sort of a timestamp parameter for
> specifying the maximum age of the data to return as an incremental
> update, some sort of "modified_since" parameter. This should become an
> inherently permitted part of every query then, without having to define
> it in each template.
>
> Using a dump file should be up to the individual TAPIR providers, since
> it easily hides behind the web front-end. That file just needs to
> contain what querying the database would return anyway, as sort of a
> file based cache. Returning the dump instead of querying the database
> can be done either completely inside the TAPIR provider (invisible to
> the client), or using a redirect (possible one generated dynamically).
> So it should not be part of the specification, imho.
> If you decide, howerver, to include dumps in the capabilities, the
> format should definitely be customizable, not strictly bound to XML or
> CSV, because XML is overkill for some data, while CSV is too flat for
> other data. How about JSON, in addition? It's a nice combination of CSVs
> simplicity and XMLs power to express hierarchical content. In the
> future, some might want RDF as a further format ... to be continued. In
> order not to unnecessarily hamper TAPIRs acceptance, it should really be
> up to the individual providers which format to use.
>
> So far my two cents,
> Guido

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.