Hi Markus,
Without the notion of incremental harvesting there is little notion of harvesting at all I think. The supplier may as well burn a table to CD and mail it to you (or gzip it or something). The EML model of having a data set (table) bound to a descriptive file is more appropriate than an online data provider one.
If supplier 'A' provides 10,000 records this week and then replaces them with 10,001 next week and with 9,999 the week after how many records do we have from the point of view of a data consumer? 30k or just 3 (with ~10k data points in each)? It is a very different way to look at the data than from the original specimen based one that we started with. If the data represents an entomological collection it seems crazy (we are not replacing the specimens each week) if it represents bird sightings it seem a sensible (these may be different studies and are not replacements but separate data sets.).
Are we trying to combine two kinds of data that don't fit together very well?
I keep coming back to the need to know how people will use the data...
All the best,
Roger
On 5 May 2008, at 20:33, Renato De Giovanni wrote:
Hi Markus,
I would suggest creating new concepts for incremental harvesting, either in the data standards themselves or in some new extension. In the case of TAPIR, GBIF could easily check the mapped concepts before deciding between incremental or full harvesting.
Actually it could be just one new concept such as "recordStatus" or "deletionFlag". Or perhaps you could also want to create your own definition for dateLastModified indicating which set of concepts should be considered to see if something has changed or not, but I guess this level of granularity would be difficult to be supported.
Regards,
Renato
On 5 May 2008 at 11:24, Markus Döring wrote:
Phil, incremental harvesting is not implemented on the GBIF side as far as I am aware. And I dont think that will be a simple thing to implement on the current system. Also, even if we can detect only the changed records since the last harevesting via dateLastModified we still have no information about deletions. We could have an arrangement saying that you keep deleted records as empty records with just the ID and nothing else (I vaguely remember LSIDs were supposed to work like this too). But that also needs to be supported on your side then, never entirely removing any record. I will have a discussion with the others at GBIF about that.
Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir