Right. I agree there's no particular reason to expose the dump file through a typical TAPIR URL. Headers could also be in a separate file. However, from a TAPIR service perspective, I think it's still important to somehow advertise the availability of a dump file in capabilities (even if GBIF doesn't use this). There's a slot in the end of a capabilities response that could be used for this purpose:
... <custom> <ext:dump baseurl="http://somehost/somepath/%22/%3E </custom> ...
Providers that only want to see their data being served through GBIF could simply make the dump files available somewhere, without the need to install and maintain a web service. TAPIR providers that have other reasons to exist could decide if they want to register the TAPIR endpoint or just the base URL of the dump file in GBIF's registry.
HTTP headers ("If-Modified-Since" and "Last-Modified") seem to solve the timestamp issue in an elegant way.
Regarding complex data, I would be inclined to propose some compact XML representation compatible with TAPIR so that existing wrapper functionalities could be used to generate the dump file. I suppose this could save considerable time. Another advantage is that it would be a generic solution, not restricted to one level relationships. Since TAPIR output models can map XML nodes to a concatenation of concepts and literals, it's also possible to have a single record element with some sort of csv content inside. I'm just not sure how to escape eventual separators that could be present in real content.
We could also provide more information about the format in the new dump element:
<ext:dump baseurl="http://somehost/somepath/" format="csv"/>
or
<ext:dump baseurl="http://somehost/somepath/" format="xml" outputModel="some_url"/>
Regards, -- Renato
Hi Renato,
Do you think this really go under TAPIR spec?
Sure we want the wrappers to produce it but it's just a document on a URL and can be described in such a simple way that loads of other people could incorporate it without getting into TAPIR specs, nor can they claim any TAPIR compliance just because they can do a 'select to outfile'.
I would also request that the headers aren't in the data file but the metafile. It is way easier to dump a big DB to this 'document standard' without needing to worry about how to get headers in a 20gig file.
Just some more thoughts
Cheers
Tim