Now that we've been talking about the variations, perhaps we can move toward doing something about it.
I mentioned we have been talking about having a service built that can read and unpack a DarwinCore Archive, evaluate specific things like we have been discussing that may be expressed inconsistently, set them right, and spit back out a new and more consistent archive file. In order to put out a call for a developer to do this, we need to capture some of those things it should do. Markus and I have started this but we have a lot of end-of-year business and less time than money. We still have some funds available to at least start this process.
I'd like to know if anyone is interested in, and feels qualified to develop a more complete set of requirements for such a service, which we would then try to find a developer to build. We aren't trying to deal with everything at once, mind you. Just a some key things that might make ingesting a DarwinCore Archive for either Taxon data or Occurrence data a bit more consistent in regard to the taxonomic elements. I'd need a couple of days or three to do this as complete as I think is needed so it's that sort of time I'm anticipating you smart people can do in about the same.
For example,
• Checking the integrity of normal and denormal classifications. What are the steps, and conditions to check integrity in normal classifications and to transform a denormal to a normal. In the latter, for example, you have to make sure the same Family value doesn't have two different parents. If so, what then? * Creating IDs for IDless, normalised records (e.g parentNameUsage) * Map taxon ranks to our taxon rank vocabulary so that alternative forms (ssp, subspec, ss., are replaced, when possible, to the standard form). normalising taxon and nomenclatural status * Splitting a merged name into name and authorship parts. * Checking the split version is consistent with the complete one if both are given.
I put the working doc Markus and I have in Dropbox and put it here: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srci...
Again, we not looking for someone to do the programming but to provide enough details so that person the steps needed to do that work.
So anyone want to help out a data standard over the holidays? We are eager to start because I'd like to get a call for a developer before the middle of December when the little bag of gold gets taken away? If multiple people are interested we will have to draw straws or pistols or some other means for making a good decision. But please contact me directly and we can followup offline.
Best, David Remsen
---------------------------------------------------------------------------- David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen ----------------------------------------------------------------------------