[tdwg-tapir] Tapir protocol - Harvest methods?

Roger Hyam rogerhyam at mac.com
Mon May 5 22:41:54 CEST 2008


Hi Markus,

Without the notion of incremental harvesting there is little notion of  
harvesting at all I think. The supplier may as well burn a table to CD  
and mail it to you (or gzip it or something).  The EML model of having  
a data set (table) bound to a descriptive file is more appropriate  
than an online data provider one.

If supplier 'A' provides 10,000 records this week and then replaces  
them with 10,001 next week and with 9,999 the week after how many  
records do we have from the point of view of a data consumer? 30k or  
just 3 (with ~10k data points in each)? It is a very different way to  
look at the data than from the original specimen based one that we  
started with. If the data represents an entomological collection it  
seems crazy (we are not replacing the specimens each week) if it  
represents bird sightings it seem a sensible (these may be different  
studies and are not replacements but separate data sets.).

Are we trying to combine two kinds of data that don't fit together  
very well?

I keep coming back to the need to know how people will use the data...

All the best,

Roger





On 5 May 2008, at 20:33, Renato De Giovanni wrote:

> Hi Markus,
>
> I would suggest creating new concepts for incremental harvesting,
> either in the data standards themselves or in some new extension. In
> the case of TAPIR, GBIF could easily check the mapped concepts before
> deciding between incremental or full harvesting.
>
> Actually it could be just one new concept such as "recordStatus" or
> "deletionFlag". Or perhaps you could also want to create your own
> definition for dateLastModified indicating which set of concepts
> should be considered to see if something has changed or not, but I
> guess this level of granularity would be difficult to be supported.
>
> Regards,
> --
> Renato
>
> On 5 May 2008 at 11:24, Markus Döring wrote:
>
>> Phil,
>> incremental harvesting is not implemented on the GBIF side as far  
>> as I
>> am aware. And I dont think that will be a simple thing to implement  
>> on
>> the current system. Also, even if we can detect only the changed
>> records since the last harevesting via dateLastModified we still have
>> no information about deletions. We could have an arrangement saying
>> that you keep deleted records as empty records with just the ID and
>> nothing else (I vaguely remember LSIDs were supposed to work like  
>> this
>> too). But that also needs to be supported on your side then, never
>> entirely removing any record. I will have a discussion with the  
>> others
>> at GBIF about that.
>>
>> Markus
>
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir




More information about the tdwg-tag mailing list