[tdwg-tag] Final changes in TAPIR

renato at cria.org.br renato at cria.org.br
Mon Feb 2 15:01:03 CET 2009

Hi Michael,

Thanks for your input.

The original idea was to advertise only complete dumps, not delta files,
otherwise things can get more complicated and, as Markus said, it looks a
bit out of scope. Even if there can be multiple "archive" elements for
whatever reasons (multiple locations, different output models, etc.), in
most cases I think there will only be a single element pointing to the
most recent dump file. After getting the file, it should be possible to do
incremental harvesting using the search operation instead of handling
multiple delta files.

Regarding the other suggestions, I don't mind adding attributes for
creation timestamp and number of records. Following this approach, we can
also add an attribute to indicate compression, as Tim suggested on the
Wiki. I'll do this if there are no further comments or ideas.

Thanks again,

> Michael,
> I think you have a good point here. But is this use case related to
> TAPIR really?
> It doesnt seem you want to use TAPIR at all, but rather aggregate or
> sync complete datasets only using dump files.
> Although this is a very frequent scenario and I agree dump files are
> much better in doing this job than OAI, Atom feeds or TAPIR, I dont
> see the need to use TAPIR for this.
> Things get TAPIR related only when providing already supported/
> advertised output models as full dumps. Those XML models are already
> part of TAPIR and generated for responses, so why not provide the
> entire dataset like this?
> Markus
> On Jan 30, 2009, at 16:30, Michael Giddens wrote:
>> Renato,
>> I really think #2 is worth including.  There are times when I wish to
>> send small requests through the TAPIR protocol but there are other
>> times
>> when, especially on first inspection it would be nice to pull a
>> initial
>> dump.  Your XML format for archives looks find but I would consider
>> adding attributes like dateCreated and numberOfRecords.  This way if
>> there are monthly archives for example I could pull the latest one.
>> Secondly the number of records would be useful to know how much is in
>> the dump and not just the size.  Depending on the actual dump files
>> the
>> other question I would have is is the dump the delta of the previous
>> dump or a complete dump.  That way I would know if I was getting the
>> full dump or I have to get all the dumps and then merge them together.
>> These are my inital thoughts.
>> Regards,
>> Michael Giddens
>> Biodiversity Informatics Software Development
>> www.SilverBiology.com
>> Baton Rouge, LA
>> phone: +1 225-937-9657
>> email: mikegiddens at silverbiology.com
>> skype: mikegiddens
>> renato at cria.org.br wrote:
>>> Dear all,
>>> There are just two items left on the list of possible changes before
>>> submitting TAPIR to the TDWG standards track:
>>> 1) Allow custom operations to be declared as part of capabilities.
>>> I would suggest to simply include a new custom slot for this in the
>>> schema
>>> in case someone needs to use it in the future.
>>> 2) Allow dump files to be declared.
>>> This has been discussed some time ago in the TAPIR mailing list but
>>> we
>>> didn't come to a final conclusion.
>>> Since some networks are starting to harvest data by fetching entire
>>> dump
>>> files, I think it's important to allow TAPIR services to declare
>>> any dump
>>> files that may be available. Fetching a dump file from a provider and
>>> using incremental harvesting in later interactions with the service
>>> will
>>> probably be the most efficient approach.
>>> Since TAPIR generates XML output, it makes more sense to me to see
>>> dump
>>> files in XML. However, Tim/Markus (GBIF) are proposing another
>>> format for
>>> dump files using tab/csv files together with a metafile. It should
>>> be easy
>>> to allow both options when declaring a dump file in TAPIR
>>> capabilities,
>>> but I don't think it's the role of TAPIR to define specific
>>> formats. We
>>> can probably use something like this to declare dump files:
>>> <archives>
>>>  <archive format="" location="" outputModel=""/>
>>>  ...
>>> </archives>
>>> Where format could be "xml" or any custom term, and outputModel
>>> would be
>>> optional (only used with "xml" format). Things like date when the
>>> dump
>>> file was generated and whether it's gzipped or not could be
>>> additional
>>> attributes, but in most cases this can be discovered through the
>>> protocol
>>> used to retrieve the file, so in principle I would not include the
>>> attributes.
>>> Please let me know if this is an acceptable solution or if you have
>>> any
>>> different thoughts. Also let me know if you have any other ideas or
>>> suggestions about TAPIR in general. This is the time.
>>> I would like to finally submit specification & schema to the
>>> standards
>>> track in the beginning of the week.
>>> Best Regards,
>>> --
>>> Renato
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the tdwg-tag mailing list