David,

You might like to use the GBIF vocabulary server.  It has a multi-lingual country name thesaurus based on ISO 3166 and has over 23K terms for 226 ISO countries.  You can download the data or use the service.  It may have some lexical variants and misspellings.  You can also get an account and add any you might know of.   And all presented to you in your old friend Drupal.  Perhaps you might like to serve as curator.  Maybe?  Diamond in the rough here, I'm sure of it.

http://vocabularies.gbif.org/vocabularies/country

Best,
Dave

----------------------------------------------------------------------------
David Remsen
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +1 508 289 7477   Fax: +1 508 289 7900
Mobile +1 508 274 4055
Skype: dremsen
----------------------------------------------------------------------------





On May 17, 2013, at 10:39 AM, Matt Jones wrote:

A good official list of countries is available from the Library of Congress:
  For background, see: http://www.loc.gov/marc/countries/

And of course there's ISO 3166, the list of country codes:

Not sure about the alternate representations and misspellings, though.

Matt


On Fri, May 17, 2013 at 5:57 AM, Shorthouse, David <davidpshorthouse@gmail.com> wrote:
Folks,

The Canadensys development team, http://www.canadensys.net is looking
for efficient, low-maintenance ways to validate and reconcile data in
its National cache of occurrence data. We are working on a Java
library to initially tackle single-field Darwin Core validations,
https://github.com/Canadensys/narwhal-processor. We hope this library
is sufficiently generalized for uses outside our project.

Our current challenge is to reconcile country names, which requires
access to an up-to-date, well-maintained knowledge base of country
names, their alternative representations (possibly multilingual), and
mappings to known misspellings. For performance reasons, we'd like
this thesaurus to be embedded in the library, but with the capacity to
be periodically refreshed with data pulled from external resources
such as dbpedia.org. This clearly has ties to semantic web thinking
and, because we're new to the tools and services in this space, we'd
like to solicit pointers and feedback such that we build this part of
our library with maximal benefit to other projects. We started
collecting thoughts here:
https://github.com/Canadensys/narwhal-processor/issues/14.

Cheers,

David P. Shorthouse
Christian Gendreau
_______________________________________________
tdwg mailing list
tdwg@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg

_______________________________________________
tdwg mailing list
tdwg@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg