Very Cool and Thanks,<div><br></div><div>I downloaded <span class="Apple-style-span" style="border-collapse: collapse; ">(<a href="http://code.google.com/p/gbif-providertoolkit/" target="_blank" style="color: rgb(51, 51, 204); ">http://code.google.com/p/gbif-providertoolkit/</a>) </span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse;">and got it working on one of my test machines.</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div>
<span class="Apple-style-span" style="border-collapse: collapse;">Is there a plan to move or not move this to the new DarwinCore?</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse;">Thanks!</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse;">Pete</span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><br><div class="gmail_quote">On Mon, May 11, 2009 at 2:48 AM, Tim Robertson <span dir="ltr"><<a href="mailto:trobertson@gbif.org">trobertson@gbif.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Peter,<br>
<br>
Just to expand on what Donald has written here:<br>
<div class="im"><br>
> My current thinking is that we should offer this as a service which<br>
> can both be executed during harvesting and also as a stand-alone<br>
> service for which users can submit a batch of Darwin Core-style<br>
> records (probably tab-delimited) and get back a report for whichever<br>
> set of tests or value-add operations they choose. This could help<br>
> providers with data cleaning even before they share their data (and<br>
> also could help them to make sure there are no known sensitivity<br>
> issues around their data). Such a service could be extended more or<br>
> less indefinitely to report more and more aspects of interest. One<br>
> of the major options could be to cross-reference records to accepted<br>
> taxonomic authorities (via LSIDs or other identifiers).<br>
<br>
</div>GBIF recently launched an early release of a biodiversity data<br>
publishing tool (<a href="http://code.google.com/p/gbif-providertoolkit/" target="_blank">http://code.google.com/p/gbif-providertoolkit/</a>) which<br>
allows for serving of occurrence and species oriented data, in a "star<br>
schema" format with Darwin Core as the core of the star. This tool<br>
has an embedded database, which allows for serving of text files (csv,<br>
tab delimited etc) and also the ability to sit in front of an existing<br>
database to offer DwC through a complete archive, TAPIR and WFS,WMS<br>
services. As you publish data through this tool, it currently does<br>
very basic type checking of input data, and creates "annotations" on<br>
the records that have issue (e.g. <a href="http://ipt.gbif.org/annotations.html?resource_id=11" target="_blank">http://ipt.gbif.org/annotations.html?resource_id=11</a>)<br>
. As the tool matures in the coming months, we plan to open up an API<br>
so that data provides can call external services and have them push<br>
back annotations - e.g. check my coordinates, check my names with IPNI<br>
etc. By publishing the complete dataset as an "archive" (a zipped<br>
dump with an xml file describing the columns, <a href="http://rs.tdwg.org/dwc/terms/guides/text/index.htm" target="_blank">http://rs.tdwg.org/dwc/terms/guides/text/index.htm</a><br>
as Donald mentions) the technical threshold is reduced to a minimum<br>
for the data transfer to implement such a quality service, while also<br>
ensuring decent harvesting performance. It is in the current GBIF<br>
workplan to register such quality services in the GBIF registry which<br>
is undergoing development now, so that they may be discovered and used<br>
by all, including the GBIF publishing toolkit, and portals. By doing<br>
this, the roles of checking data, or implementing quality services are<br>
not centralised in a GBIF portal, but can be used by the data owner<br>
before sharing with GBIF or other networks.<br>
<br>
Additionally, by allowing for remote annotations, we can aim to<br>
ultimately push back all feedback from the GBIF portal (or others)<br>
into the publishing tools as opposed to through email as is the<br>
current feedback mechanism - this is related to other topics such as<br>
uniquely identifying resources as they are shared through various<br>
networks for example. It would then be trivial to have (for example)<br>
a google map with a clickable point which opens the details holding a<br>
link "this record has bad coordinates", or a form to fill in.<br>
Feedback could take the form of free text or perhaps even better, as<br>
"structured annotations" where possible (this record would be correct<br>
if the isoCountryCode was "DE") which could then be automatically<br>
removed should the source be updated to meet the annotation criteria.<br>
<br>
Best wishes,<br>
<font color="#888888"><br>
Tim<br>
</font><div><div></div><div class="h5"><br>
<br>
<br>
<br>
><br>
><br>
> Best wishes,<br>
><br>
> Donald<br>
><br>
><br>
> Donald Hobern, Director, Atlas of Living Australia<br>
> CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601<br>
> Phone: (02) 62464352 Mobile: 0437990208<br>
> Email: Donald.Hobern@csiro.au<br>
> Web: <a href="http://www.ala.org.au/" target="_blank">http://www.ala.org.au/</a><br>
><br>
><br>
> -----Original Message-----<br>
> Date: Fri, 8 May 2009 19:23:32 -0500<br>
> From: Peter DeVries <<a href="mailto:pete.devries@gmail.com">pete.devries@gmail.com</a>><br>
> Subject: [tdwg] Ideas on having Harvesters like GBIF clean, flag<br>
> inconsistencies, and add additional value to the data<br>
> To: <a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>
> Message-ID:<br>
> <<a href="mailto:3833bf630905081723l2f1d5369je8af6b0e4a26324d@mail.gmail.com">3833bf630905081723l2f1d5369je8af6b0e4a26324d@mail.gmail.com</a>><br>
> Content-Type: text/plain; charset="iso-8859-1"<br>
><br>
> Arthur Chapman sent me some good comments regarding Datums etc.<br>
> The discussion made me realize that there may be a need for two<br>
> types of<br>
> formats. One for the providers and a second one that is output by the<br>
> harvesting service.<br>
><br>
> This is because the needs and abilities of the data providers are<br>
> different<br>
> than the needs and abilities of those who would like to consume the<br>
> data.<br>
><br>
> Consumers, who analyze and map the data, would like something that<br>
> is easy<br>
> to process, standardized and as as error free as as possible.<br>
><br>
> It could work in the following way.<br>
><br>
> Data harvesters, like GBIF, collect the records. Run them through<br>
> cleaning algorithms that check attributes including that the lat and<br>
> long<br>
> actually match the location described.<br>
><br>
> These harvesters would then expose this cleaned data via XML and RDF<br>
> with<br>
> tags that flag possible inconsistencies. The harvesters would also<br>
> add a<br>
> field for the lat and long in WGS84 if the original record contains<br>
> a valid<br>
> Datum. Those records without a Datum would still be exposed but the<br>
> added<br>
> geo:latitude and geo:longitude fields would be empty.<br>
><br>
> I can imagine that that data uploaded to GBIF and other harvester<br>
> services<br>
> will be replete with typo's and inconsistencies that will frustrate<br>
> people<br>
> trying to analyze or simply map the data, the harvester services<br>
> could add<br>
> value by minimizing these frustrations.<br>
><br>
> Originally, it seemed that a global service should standardize on a<br>
> global<br>
> Datum like WGS84. After all, we have standardized on meters?<br>
> However, after<br>
> discussing this with Arthur, I realize that this is not possible for a<br>
> number of reasons. That said, I think the data would be much more<br>
> valuable<br>
> and less likely to be misinterpreted if if a version of it was<br>
> available in<br>
> WGS84. This solution would eventually encourage data providers to<br>
> understand<br>
> what a Datum is and include it in their data. It would also help<br>
> solve a<br>
> number of other data integration problems.<br>
><br>
> Respectfully,<br>
><br>
> Pete<br>
> _______________________________________________<br>
> tdwg mailing list<br>
> <a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>
> <a href="http://lists.tdwg.org/mailman/listinfo/tdwg" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg</a><br>
><br>
<br>
_______________________________________________<br>
tdwg mailing list<br>
<a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>
<a href="http://lists.tdwg.org/mailman/listinfo/tdwg" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>---------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>
1630 Linden Drive<br>Madison, WI 53706<br>------------------------------------------------------------<br>
</div>