[tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

Thu May 15 20:42:05 CEST 2008

Right. I agree there's no particular reason to expose the dump file
through a typical TAPIR URL. Headers could also be in a separate file.
However, from a TAPIR service perspective, I think it's still important to
somehow advertise the availability of a dump file in capabilities (even if
GBIF doesn't use this). There's a slot in the end of a capabilities
response that could be used for this purpose:

...
<custom>
  <ext:dump baseurl="http://somehost/somepath/"/>
</custom>
...

Providers that only want to see their data being served through GBIF could
simply make the dump files available somewhere, without the need to
install and maintain a web service. TAPIR providers that have other
reasons to exist could decide if they want to register the TAPIR endpoint
or just the base URL of the dump file in GBIF's registry.

HTTP headers ("If-Modified-Since" and "Last-Modified") seem to solve the
timestamp issue in an elegant way.

Regarding complex data, I would be inclined to propose some compact XML
representation compatible with TAPIR so that existing wrapper
functionalities could be used to generate the dump file. I suppose this
could save considerable time. Another advantage is that it would be a
generic solution, not restricted to one level relationships. Since TAPIR
output models can map XML nodes to a concatenation of concepts and
literals, it's also possible to have a single record element with some
sort of csv content inside. I'm just not sure how to escape eventual
separators that could be present in real content.

We could also provide more information about the format in the new dump
element:

<ext:dump baseurl="http://somehost/somepath/" format="csv"/>

or

<ext:dump baseurl="http://somehost/somepath/" format="xml"
outputModel="some_url"/>

Regards,
--
Renato

> Hi Renato,
>
> Do you think this really go under TAPIR spec?
>
> Sure we want the wrappers to produce it but it's just a document on a URL
> and can be described in such a simple way that loads of other people could
> incorporate it without getting into TAPIR specs, nor can they claim any
> TAPIR compliance just because they can do a 'select to outfile'.
>
> I would also request that the headers aren't in the data file but the
> metafile.  It is way easier to dump a big DB to this 'document standard'
> without needing to worry about how to get headers in a 20gig file.
>
> Just some more thoughts
>
> Cheers
>
> Tim