[tdwg] Country name reconciliation

Matt Jones jones at nceas.ucsb.edu
Fri May 17 16:39:32 CEST 2013


A good official list of countries is available from the Library of Congress:
  http://www.loc.gov/standards/codelists/countries.xml
  For background, see: http://www.loc.gov/marc/countries/

And of course there's ISO 3166, the list of country codes:

http://www.iso.org/iso/home/standards/country_codes/country_names_and_code_elements_xml.htm
  http://www.iso.org/iso/country_codes

Not sure about the alternate representations and misspellings, though.

Matt


On Fri, May 17, 2013 at 5:57 AM, Shorthouse, David <
davidpshorthouse at gmail.com> wrote:

> Folks,
>
> The Canadensys development team, http://www.canadensys.net is looking
> for efficient, low-maintenance ways to validate and reconcile data in
> its National cache of occurrence data. We are working on a Java
> library to initially tackle single-field Darwin Core validations,
> https://github.com/Canadensys/narwhal-processor. We hope this library
> is sufficiently generalized for uses outside our project.
>
> Our current challenge is to reconcile country names, which requires
> access to an up-to-date, well-maintained knowledge base of country
> names, their alternative representations (possibly multilingual), and
> mappings to known misspellings. For performance reasons, we'd like
> this thesaurus to be embedded in the library, but with the capacity to
> be periodically refreshed with data pulled from external resources
> such as dbpedia.org. This clearly has ties to semantic web thinking
> and, because we're new to the tools and services in this space, we'd
> like to solicit pointers and feedback such that we build this part of
> our library with maximal benefit to other projects. We started
> collecting thoughts here:
> https://github.com/Canadensys/narwhal-processor/issues/14.
>
> Cheers,
>
> David P. Shorthouse
> Christian Gendreau
> _______________________________________________
> tdwg mailing list
> tdwg at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg/attachments/20130517/c756dc32/attachment.html 


More information about the tdwg mailing list