[tdwg-tapir] Fwd: Tapir protocol - Harvest methods?
mdoering at gbif.org
Fri May 16 10:29:51 CEST 2008
I was thinking along those lines too. It would be nice for TAPIRs to
announce the availablility of the index files. I wouldnt mind adding
it even to the regular tapir schema once it has proven to work with
the custom slot approach you have given.
Regarding star shaped data I would prefer to agree on one format
instead of allowing different ones to save consumers from this pain.
There is a straight forward xml serialisation for this scheme that we
could use instead of tab files:
Advantage is, it can be produced by TAPIR software and xml
serialisation is required for many services, eg RSS anyway.
But then again the whole point of the index files is that they are
easy to generate and consume. On the other hand this xml structure is
pretty simple to process and can be genereated from databases like
sqlserver that have xml output straight away without the need of
That touches a different issue I am facing with the star scheme by the
way. I have created an identification extension for darwin core that
holds the historical list of identification events and their outcome.
This is a YAML section of the metafile describing the columns for this
extension through fully qualified concepts ala TAPIR:
When creating this I realised that pretty much all concepts I was
interested in already existed in darwin core or the curatorial
extension. Wouldnt it be wise to reuse those concepts? Or are they
strictly tight to the idea of a current identification and therefore
cant be used for historical ones? This is probably more of a darwin
core question than TAPIR, but we are all on this list anyway ...
The xml in that case would look sth like this:
<dwc:ScientificName>Aster alpinus subsp.
<dwc:ScientificName>Aster alpinus subsp.
On 15 May, 2008, at 20:42, Renato De Giovanni wrote:
> Right. I agree there's no particular reason to expose the dump file
> through a typical TAPIR URL. Headers could also be in a separate file.
> However, from a TAPIR service perspective, I think it's still
> important to
> somehow advertise the availability of a dump file in capabilities
> (even if
> GBIF doesn't use this). There's a slot in the end of a capabilities
> response that could be used for this purpose:
> <ext:dump baseurl="http://somehost/somepath/"/>
> Providers that only want to see their data being served through GBIF
> simply make the dump files available somewhere, without the need to
> install and maintain a web service. TAPIR providers that have other
> reasons to exist could decide if they want to register the TAPIR
> or just the base URL of the dump file in GBIF's registry.
> HTTP headers ("If-Modified-Since" and "Last-Modified") seem to solve
> timestamp issue in an elegant way.
> Regarding complex data, I would be inclined to propose some compact
> representation compatible with TAPIR so that existing wrapper
> functionalities could be used to generate the dump file. I suppose
> could save considerable time. Another advantage is that it would be a
> generic solution, not restricted to one level relationships. Since
> output models can map XML nodes to a concatenation of concepts and
> literals, it's also possible to have a single record element with some
> sort of csv content inside. I'm just not sure how to escape eventual
> separators that could be present in real content.
> We could also provide more information about the format in the new
> <ext:dump baseurl="http://somehost/somepath/" format="csv"/>
> <ext:dump baseurl="http://somehost/somepath/" format="xml"
>> Hi Renato,
>> Do you think this really go under TAPIR spec?
>> Sure we want the wrappers to produce it but it's just a document on
>> a URL
>> and can be described in such a simple way that loads of other
>> people could
>> incorporate it without getting into TAPIR specs, nor can they claim
>> TAPIR compliance just because they can do a 'select to outfile'.
>> I would also request that the headers aren't in the data file but the
>> metafile. It is way easier to dump a big DB to this 'document
>> without needing to worry about how to get headers in a 20gig file.
>> Just some more thoughts
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
More information about the tdwg-tag