<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=utf-8">

<META content="MSHTML 6.00.5730.11" name=GENERATOR></HEAD>

<BODY style="MARGIN: 4px 4px 1px">

<DIV>I think this is a great idea.</DIV>

<DIV>I have thought a bit about how we can "build upon" then tapir protocol and services that currently exist, and this post reminded me of a few that I would like to look at.&nbsp; One in particular is extending the type of&nbsp;data sources that the Tapir configurator tools can connect to - I have done this a little in my TapirDotNET implementation where you can connect a concept to an LSID data source (ie it resolves the LSID and returns the resulting xml as the value for that mapped Tapir concept).&nbsp; But connecting to web services, etc, and also providing a "Tapir API" for the advanced&nbsp;user to programmatically provide data through a Tapir service would also be cool.&nbsp; Any thoughts?</DIV>

<DIV>&nbsp;</DIV>

<DIV>Kevin<BR><BR>&gt;&gt;&gt; "Aaron D. Steele" &lt;eightysteele@gmail.com&gt; 14/05/2008 8:40 a.m. &gt;&gt;&gt;<BR>at berkeley we've recently prototyped a simple php program that uses<BR>an existing tapirlink installation to periodically dump tapir<BR>resources into a csv file. the solution is totally generic and can<BR>dump darwin core (and technically abcd schema, although it's currently<BR>untested). the resulting csv files are zip archived and made<BR>accessible using a web service. it's a simple approach that has proven<BR>to be, at least internally, quite reliable and useful.<BR><BR>for example, several of our caching applications use the web service<BR>to harvest csv data from tapirlink resources using the following<BR>process:<BR>1) download latest csv dump for a resource using the web service.<BR>2) flush all locally cached records for the resource.<BR>3) bulk load the latest csv data into the cache.<BR><BR>in this way, cached data are always synchronized with the resource and<BR>there's no need to track new, deleted, or changed records. as an<BR>aside, each time these cached data are queried by the caching<BR>application or selected in the user interface, log-only search<BR>requests are sent back to the resource.<BR><BR>after discussion with renato giovanni and john wieczorek, we've<BR>decided that merging this functionality into the tapirlink codebase<BR>would benefit the broader community. csv generation support would be<BR>declared through capabilities. although incremental harvesting<BR>wouldn't be immediately implemented, we could certainly extend the<BR>service to include it later.<BR><BR>i'd like to pause here to gauge the consensus, thoughts, concerns, and<BR>ideas of others. anyone?<BR><BR>thanks,<BR>aaron<BR><BR>2008/5/5 Kevin Richards &lt;RichardsK@landcareresearch.co.nz&gt;:<BR>&gt;<BR>&gt;<BR>&gt; I think I agree here.<BR>&gt;<BR>&gt; The harvesting "procedure" is really defined outside the Tapir protocol, is<BR>&gt; it not?&nbsp; So it is really an agreement between the harvester and the<BR>&gt; harvestees.<BR>&gt;<BR>&gt; So what is really needed here is the standard procedure for maintaining a<BR>&gt; "harvestable" dataset and the standard procedure for harvesting that<BR>&gt; dataset.<BR>&gt; We have a general rule at Landcare, that we never delete records in our<BR>&gt; datasets - they are either deprecated in favour of another record, and so<BR>&gt; the resolution of that record would point to the new record, or the are set<BR>&gt; to a state of "deleted", but are still kept in the dataset, and can be<BR>&gt; resolved (which would indicate a state of deleted).<BR>&gt;<BR>&gt; Kevin<BR>&gt;<BR>&gt;<BR>&gt; &gt;&gt;&gt; "Renato De Giovanni" &lt;renato@cria.org.br&gt; 6/05/2008 7:33 a.m. &gt;&gt;&gt;<BR>&gt;<BR>&gt;<BR>&gt; Hi Markus,<BR>&gt;<BR>&gt; I would suggest creating new concepts for incremental harvesting,<BR>&gt; either in the data standards themselves or in some new extension. In<BR>&gt; the case of TAPIR, GBIF could easily check the mapped concepts before<BR>&gt; deciding between incremental or full harvesting.<BR>&gt;<BR>&gt; Actually it could be just one new concept such as "recordStatus" or<BR>&gt; "deletionFlag". Or perhaps you could also want to create your own<BR>&gt; definition for dateLastModified indicating which set of concepts<BR>&gt; should be considered to see if something has changed or not, but I<BR>&gt; guess this level of granularity would be difficult to be supported.<BR>&gt;<BR>&gt; Regards,<BR>&gt; --<BR>&gt; Renato<BR>&gt;<BR>&gt; On 5 May 2008 at 11:24, Markus Döring wrote:<BR>&gt;<BR>&gt; &gt; Phil,<BR>&gt; &gt; incremental harvesting is not implemented on the GBIF side as far as I<BR>&gt; &gt; am aware. And I dont think that will be a simple thing to implement on<BR>&gt; &gt; the current system. Also, even if we can detect only the changed<BR>&gt; &gt; records since the last harevesting via dateLastModified we still have<BR>&gt; &gt; no information about deletions. We could have an arrangement saying<BR>&gt; &gt; that you keep deleted records as empty records with just the ID and<BR>&gt; &gt; nothing else (I vaguely remember LSIDs were supposed to work like this<BR>&gt; &gt; too). But that also needs to be supported on your side then, never<BR>&gt; &gt; entirely removing any record. I will have a discussion with the others<BR>&gt; &gt; at GBIF about that.<BR>&gt; &gt;<BR>&gt; &gt; Markus<BR>&gt;<BR>&gt; _______________________________________________<BR>&gt; tdwg-tapir mailing list<BR>&gt; tdwg-tapir@lists.tdwg.org<BR>&gt; <A href="http://lists.tdwg.org/mailman/listinfo/tdwg">http://lists.tdwg.org/mailman/listinfo/tdwg</A>-tapir<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt;&nbsp; Please consider the environment before printing this email<BR>&gt;<BR>&gt;&nbsp; WARNING : This email and any attachments may be confidential and/or<BR>&gt; privileged. They are intended for the addressee only and are not to be read,<BR>&gt; used, copied or disseminated by anyone receiving them in error. If you are<BR>&gt; not the intended recipient, please notify the sender by return email and<BR>&gt; delete this message and any attachments.<BR>&gt;<BR>&gt; The views expressed in this email are those of the sender and do not<BR>&gt; necessarily reflect the<BR>&gt; official views of Landcare Research. <A href="http://www.landcareresearch.co.nz">http://www.landcareresearch.co.nz</A><BR>&gt; _______________________________________________<BR>&gt;&nbsp; tdwg-tapir mailing list<BR>&gt;&nbsp; tdwg-tapir@lists.tdwg.org<BR>&gt;&nbsp; <A href="http://lists.tdwg.org/mailman/listinfo/tdwg">http://lists.tdwg.org/mailman/listinfo/tdwg</A>-tapir<BR>&gt;<BR>&gt;<BR>_______________________________________________<BR>tdwg-tapir mailing list<BR>tdwg-tapir@lists.tdwg.org<BR><A href="http://lists.tdwg.org/mailman/listinfo/tdwg">http://lists.tdwg.org/mailman/listinfo/tdwg</A>-tapir<BR></DIV><BR>


    <div style="margin-left: 4px; line-height: normal; margin-right: 4px; font-variant: normal; margin-bottom: 1px; margin-top: 4px">

      <p>

        <img src="cid:DSHJZOEKAYVF.46e1787f.jpg" border="0">

        <font face="Book Antiqua" color="#006600" size="4">Please consider the 

        environment before printing this email </font>

      </p>

      <p>

        <strong><font face="Book Antiqua" size="2">WARNING</font></strong><font face="Book Antiqua" size="2">

: This email and any attachments may be confidential and/or privileged. They 

        are intended for the addressee only and are not to be read, used, 

        copied or disseminated by anyone receiving them in error. If you are 

        not the intended recipient, please notify the sender by return email 

        and delete this message and any attachments.<br><br>The views 

        expressed in this email are those of the sender and do not necessarily 

        reflect the<br>official views of Landcare Research. </font><a href="http://www.landcareresearch.co.nz"><font face="Book Antiqua" color="#0000ff" size="2"><u>

http://www.landcareresearch.co.nz</u></font></a><font face="Book Antiqua" color="#0000ff" size="2">

 </font>

      </p>

    </div>

  </BODY></HTML>