Very Cool and Thanks,<div><br></div><div>I downloaded <span class="Apple-style-span" style="border-collapse: collapse; ">(<a href="http://code.google.com/p/gbif-providertoolkit/" target="_blank" style="color: rgb(51, 51, 204); ">http://code.google.com/p/gbif-providertoolkit/</a>) </span></div>

<div><span class="Apple-style-span" style="border-collapse: collapse;">and got it working on one of my test machines.</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div>

<span class="Apple-style-span" style="border-collapse: collapse;">Is there a plan to move or not move this to the new DarwinCore?</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div>

<div><span class="Apple-style-span" style="border-collapse: collapse;">Thanks!</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse;">Pete</span></div>

<div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><br><div class="gmail_quote">On Mon, May 11, 2009 at 2:48 AM, Tim Robertson <span dir="ltr">&lt;<a href="mailto:trobertson@gbif.org">trobertson@gbif.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Peter,<br>

<br>

Just to expand on what Donald has written here:<br>

<div class="im"><br>

&gt; My current thinking is that we should offer this as a service which<br>

&gt; can both be executed during harvesting and also as a stand-alone<br>

&gt; service for which users can submit a batch of Darwin Core-style<br>

&gt; records (probably tab-delimited) and get back a report for whichever<br>

&gt; set of tests or value-add operations they choose.  This could help<br>

&gt; providers with data cleaning even before they share their data (and<br>

&gt; also could help them to make sure there are no known sensitivity<br>

&gt; issues around their data).  Such a service could be extended more or<br>

&gt; less indefinitely to report more and more aspects of interest.  One<br>

&gt; of the major options could be to cross-reference records to accepted<br>

&gt; taxonomic authorities (via LSIDs or other identifiers).<br>

<br>

</div>GBIF recently launched an early release of a biodiversity data<br>

publishing tool (<a href="http://code.google.com/p/gbif-providertoolkit/" target="_blank">http://code.google.com/p/gbif-providertoolkit/</a>) which<br>

allows for serving of occurrence and species oriented data, in a &quot;star<br>

schema&quot; format with Darwin Core as the core of the star.  This tool<br>

has an embedded database, which allows for serving of text files (csv,<br>

tab delimited etc) and also the ability to sit in front of an existing<br>

database to offer DwC through a complete archive, TAPIR and WFS,WMS<br>

services.  As you publish data through this tool, it currently does<br>

very basic type checking of input data, and creates &quot;annotations&quot; on<br>

the records that have issue (e.g. <a href="http://ipt.gbif.org/annotations.html?resource_id=11" target="_blank">http://ipt.gbif.org/annotations.html?resource_id=11</a>)<br>

.  As the tool matures in the coming months, we plan to open up an API<br>

so that data provides can call external services and have them push<br>

back annotations - e.g. check my coordinates, check my names with IPNI<br>

etc.  By publishing the complete dataset as an &quot;archive&quot; (a zipped<br>

dump with an xml file describing the columns, <a href="http://rs.tdwg.org/dwc/terms/guides/text/index.htm" target="_blank">http://rs.tdwg.org/dwc/terms/guides/text/index.htm</a><br>

  as Donald mentions) the technical threshold is reduced to a minimum<br>

for the data transfer to implement such a quality service, while also<br>

ensuring decent harvesting performance.  It is in the current GBIF<br>

workplan to register such quality services in the GBIF registry which<br>

is undergoing development now, so that they may be discovered and used<br>

by all, including the GBIF publishing toolkit, and portals.  By doing<br>

this, the roles of checking data, or implementing quality services are<br>

not centralised in a GBIF portal, but can be used by the data owner<br>

before sharing with GBIF or other networks.<br>

<br>

Additionally, by allowing for remote annotations, we can aim to<br>

ultimately push back all feedback from the GBIF portal (or others)<br>

into the publishing tools as opposed to through email as is the<br>

current feedback mechanism - this is related to other topics such as<br>

uniquely identifying resources as they are shared through various<br>

networks for example.  It would then be trivial to have (for example)<br>

a google map with a clickable point which opens the details holding a<br>

link &quot;this record has bad coordinates&quot;, or a form to fill in.<br>

Feedback could take the form of free text or perhaps even better, as<br>

&quot;structured annotations&quot; where possible (this record would be correct<br>

if the isoCountryCode was &quot;DE&quot;) which could then be automatically<br>

removed should the source be updated to meet the annotation criteria.<br>

<br>

Best wishes,<br>

<font color="#888888"><br>

Tim<br>

</font><div><div></div><div class="h5"><br>

<br>

<br>

<br>

&gt;<br>

&gt;<br>

&gt; Best wishes,<br>

&gt;<br>

&gt; Donald<br>

&gt;<br>

&gt;<br>

&gt; Donald Hobern, Director, Atlas of Living Australia<br>

&gt; CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601<br>

&gt; Phone: (02) 62464352 Mobile: 0437990208<br>

&gt; Email: Donald.Hobern@csiro.au<br>

&gt; Web: <a href="http://www.ala.org.au/" target="_blank">http://www.ala.org.au/</a><br>

&gt;<br>

&gt;<br>

&gt; -----Original Message-----<br>

&gt; Date: Fri, 8 May 2009 19:23:32 -0500<br>

&gt; From: Peter DeVries &lt;<a href="mailto:pete.devries@gmail.com">pete.devries@gmail.com</a>&gt;<br>

&gt; Subject: [tdwg] Ideas on having Harvesters like GBIF clean,   flag<br>

&gt;       inconsistencies, and    add additional value to the data<br>

&gt; To: <a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>

&gt; Message-ID:<br>

&gt;       &lt;<a href="mailto:3833bf630905081723l2f1d5369je8af6b0e4a26324d@mail.gmail.com">3833bf630905081723l2f1d5369je8af6b0e4a26324d@mail.gmail.com</a>&gt;<br>

&gt; Content-Type: text/plain; charset=&quot;iso-8859-1&quot;<br>

&gt;<br>

&gt; Arthur Chapman sent me some good comments regarding Datums etc.<br>

&gt; The discussion made me realize that there may be a need for two<br>

&gt; types of<br>

&gt; formats. One for the providers and a second one that is output by the<br>

&gt; harvesting service.<br>

&gt;<br>

&gt; This is because the needs and abilities of the data providers are<br>

&gt; different<br>

&gt; than the needs and abilities of those who would like to consume the<br>

&gt; data.<br>

&gt;<br>

&gt; Consumers, who analyze and map the data, would like something that<br>

&gt; is easy<br>

&gt; to process, standardized and as as error free as as possible.<br>

&gt;<br>

&gt; It could work in the following way.<br>

&gt;<br>

&gt; Data harvesters, like GBIF, collect the records. Run them through<br>

&gt; cleaning algorithms that check attributes including that the lat and<br>

&gt; long<br>

&gt; actually match the location described.<br>

&gt;<br>

&gt; These harvesters would then expose this cleaned data via XML and RDF<br>

&gt; with<br>

&gt; tags that flag possible inconsistencies. The harvesters would also<br>

&gt; add a<br>

&gt; field for the lat and long in WGS84 if the original record contains<br>

&gt; a valid<br>

&gt; Datum. Those records without a Datum would still be exposed but the<br>

&gt; added<br>

&gt; geo:latitude and geo:longitude fields would be empty.<br>

&gt;<br>

&gt; I can imagine that that data uploaded to GBIF and other harvester<br>

&gt; services<br>

&gt; will be replete with typo&#39;s and inconsistencies that will frustrate<br>

&gt; people<br>

&gt; trying to analyze or simply map the data, the harvester services<br>

&gt; could add<br>

&gt; value by minimizing these frustrations.<br>

&gt;<br>

&gt; Originally, it seemed that a global service should standardize on a<br>

&gt; global<br>

&gt; Datum like WGS84. After all, we have standardized on meters?<br>

&gt; However, after<br>

&gt; discussing this with Arthur, I realize that this is not possible for a<br>

&gt; number of reasons. That said, I think the data would be much more<br>

&gt; valuable<br>

&gt; and less likely to be misinterpreted if if a version of it was<br>

&gt; available in<br>

&gt; WGS84. This solution would eventually encourage data providers to<br>

&gt; understand<br>

&gt; what a Datum is and include it in their data. It would also help<br>

&gt; solve a<br>

&gt; number of other data integration problems.<br>

&gt;<br>

&gt; Respectfully,<br>

&gt;<br>

&gt; Pete<br>

&gt; _______________________________________________<br>

&gt; tdwg mailing list<br>

&gt; <a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>

&gt; <a href="http://lists.tdwg.org/mailman/listinfo/tdwg" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg</a><br>

&gt;<br>

<br>

_______________________________________________<br>

tdwg mailing list<br>

<a href="mailto:tdwg@lists.tdwg.org">tdwg@lists.tdwg.org</a><br>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>---------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>

1630 Linden Drive<br>Madison, WI 53706<br>------------------------------------------------------------<br>

</div>