[tdwg] Ideas on having Harvesters like GBIF clean, flag inconsistencies, and add additional value to the data

Arthur Chapman tdwg.lists at achapman.org
Sat May 9 03:50:28 CEST 2009


Peter - there is one additional issue here

You imply that if the data is not in WGS84 that the lat and long be 
removed ("Those records without a Datum would still be exposed but the 
added geo:latitude and geo:longitude fields would be empty.") However - 
the lat and long could still be included, but the Uncertainty would be 
increased as discussed in the Georferencing Best Practices document 
(http://www.gbif.org/prog/digit/data_quality/BioGeomancerGuide) and as 
also calculated in the MaNIS Georeferencing calculator 
(http://manisnet.org/gc.html) 
<http://www.gbif.org/prog/digit/data_quality/BioGeomancerGuide> and in 
the BioGeomancer toolkit (http://biogeomancer.org/)

Cheers

Arthur

Peter DeVries wrote:
> Arthur Chapman sent me some good comments regarding Datums etc.
>
> The discussion made me realize that there may be a need for two types 
> of formats. One for the providers and a second one that is output by 
> the harvesting service.
>
> This is because the needs and abilities of the data providers are 
> different than the needs and abilities of those who would like to 
> consume the data.
>
> Consumers, who analyze and map the data, would like something that is 
> easy to process, standardized and as as error free as as possible.
>
> It could work in the following way.  
>
> Data harvesters, like GBIF, collect the records. Run them through 
> cleaning algorithms that check attributes including that the lat and 
> long actually match the location described.
>
> These harvesters would then expose this cleaned data via XML and RDF 
> with tags that flag possible inconsistencies. The harvesters would 
> also add a field for the lat and long in WGS84 if the original record 
> contains a valid Datum. Those records without a Datum would still be 
> exposed but the added geo:latitude and geo:longitude fields would be 
> empty.
>
> I can imagine that that data uploaded to GBIF and other harvester 
> services will be replete with typo's and inconsistencies that will 
> frustrate people trying to analyze or simply map the data, the 
> harvester services could add value by minimizing these frustrations.
>
> Originally, it seemed that a global service should standardize on a 
> global Datum like WGS84. After all, we have standardized on 
> meters? However, after discussing this with Arthur, I realize that 
> this is not possible for a number of reasons. That said, I think the 
> data would be much more valuable and less likely to be misinterpreted 
> if if a version of it was available in WGS84. This solution would 
> eventually encourage data providers to understand what a Datum is and 
> include it in their data. It would also help solve a number of other 
> data integration problems.
>
> Respectfully,
>
> Pete
>
>
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> ------------------------------------------------------------
> ------------------------------------------------------------------------
>
> _______________________________________________
> tdwg mailing list
> tdwg at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg/attachments/20090509/22ac0c86/attachment.html 


More information about the tdwg mailing list