[tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

Mon May 26 03:18:56 CEST 2008

I agree, and I'd suggest there are a couple of other useful formats. JSON and serialized PHP are commonly implemented in other web services, such as those offered by Yahoo and Google: JSON and serialized PHP.

Both of these are immediately useful in most programming languages, which would make it very easy to digest and display biodiversity information without the overhead of, say ABCD or our other XML structures. It would be interesting to see whether JSON and/or serialized PHP were easier or faster to consume than CSV. (I'm still looking for a reference for the term "star CSV", anyone want to explain?)

We need to make it as easy as possible to be involved at both ends of the data connection.

Cheers,
Ben

--
Ben Richardson
w=http://science.dec.wa.gov.au/people/?sid=98
e=ben.richardson at dec.wa.gov.au tz=ADST (UTC+9)
t=+61 8 9334 0511 f=+61 8 9334 0515

> -----Original Message-----
> From: tdwg-tapir-bounces at lists.tdwg.org
> [mailto:tdwg-tapir-bounces at lists.tdwg.org]On Behalf Of Aaron D. Steele
> Sent: Thursday, 22 May 2008 3:02
> To: tdwg-tapir at lists.tdwg.org
> Subject: Re: [tdwg-tapir] Fwd: Tapir protocol - Harvest methods?
> 
> 
> it's my intuition that harvesting data in *different formats* is going
> to become a dominant use case handled by data providers worldwide. for
> example, some clients will want csv or star, while others will want
> xml or sqlite. i'd like to explore adding a simple plug-in
> architecture to tapirlink that, given a format plug-in (for example,
> csv_plugin.php), creates a resource data dump in that format which can
> be zip archived (along with any other metadata files required by the
> format) and downloaded by clients. in this way, as new formats are
> requested by the community, new format plug-ins can be added. it's a
> simple approach that's scalable, improves interoperability with
> clients, and avoids the need to agree on single format to support.
> 
> i'd also like to explore using a new 'harvest' tapir operation to
> facilitate harvest requests. for example:
> 
> tapir.php/myresource?op=harvest&format=csv&sbn=604800
> 
> the optional sbn parameter above stands for seconds before now. you
> can interpret the above request as:
> 
> "i want to download a csv dump of myresource only if it has been
> created within the last week (604,800 seconds)."
> 
> this approach might be somewhat controversial since it involves
> potential changes in the tapir protocol that not everyone agrees with.
> on the other hand, after consulting with renato and john, i don't see
> any harm with prototyping these new features, and giving the community
> the opportunity to experiment with concrete harvesting functionality
> before coming to a general consensus.
> 
> if you're keen on collaborating, i've created a new branch to
> prototype these ideas in:
> https://digir.svn.sourceforge.net/svnroot/digir/tapirlink/bran
> ches/harvest
> 
> thoughts? concerns?
> 
> thanks,
> aaron
> 
> On Wed, May 21, 2008 at 11:16 AM, Renato De Giovanni 
> <renato at cria.org.br> wrote:
> > Markus,
> >
> > If we want to ensure the lowest possible barrier for providers, then
> > I think zipped csv files need to be supported. If we really want to
> > handle complex data using the same format, then we need something
> > like the csv star scheme you mentioned (with well-defined 
> rules about
> > all files and how the records are related).
> >
> > The limitation in this case is that we would only handle one-level
> > relationships (not a generic solution) and providers with complex
> > data would probably need to write some code to generate the dumps
> > (not sure how many providers would do it) - unless wrappers that can
> > handle complex data implement additional functionality to produce
> > these dumps.
> >
> > On the other hand, if we allow more than one format, complex data
> > could be handled with compact XML representations (in a generic way)
> > which could be automatically produced by existing wrappers.
> >
> > So my understanding is that the biggest decision is: Use a single
> > format (csv) with additional rules for complex data, or allow
> > different formats (one for simple and another for complex data).
> >
> > Although I know it's usually much better for clients to deal with a
> > single format, my *feeling* in this case is that it would be more
> > effective to allow different formats. I'm also not sure if it would
> > be easier for clients to handle additional star scheme rules when
> > importing complex data than it would be to parse a single XML file
> > encoded in some compact structure.
> >
> > Just some thoughts...
> >
> > Best Regards,
> > --
> > Renato
> >
> > On 20 May 2008 at 17:36, Markus Döring wrote:
> >
> >> Renato,
> >> complex data can also be represented by tab files, with a file for
> >> each extension that has a pointer in the first column.
> >> That is what we originally had in mind with the star scheme.
> >>
> >> Markus
> >
> > _______________________________________________
> > tdwg-tapir mailing list
> > tdwg-tapir at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
> >
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
> 

This email, together with any attachments, is intended for the
addressee only. It may contain confidential or privileged information.
If you are not the intended recipient of this email, please notify
the sender, delete the email and attachments from your system and
destroy any copies you may have taken of the email and its attachments.
Duplication or further distribution by hardcopy, by electronic means
or verbally is not permitted without permission.