[tdwg-content] DwC-A taxonomic normaliser/standardiser - get paid to write requirements

Thu Nov 25 17:10:23 CET 2010

Now that we've been talking about the variations,  perhaps we can move  
toward doing something about it.

I mentioned we have been talking about having a service built that can  
read and unpack a DarwinCore Archive,  evaluate specific things like  
we have been discussing  that may be expressed inconsistently,  set  
them right,  and spit back out a new and more consistent archive  
file.   In order to put out a call for a developer to do this,  we  
need to capture some of those things it should do.   Markus and I have  
started this but we have a lot of end-of-year business and less time  
than money.  We still have some funds available to at least start this  
process.

I'd like to know if anyone is interested in, and feels qualified to  
develop a more complete set of requirements for such a service, which  
we would then try to find a developer to build.   We aren't trying to  
deal with everything at once, mind you.   Just a some key things that  
might make ingesting a DarwinCore Archive for either Taxon data or  
Occurrence data a bit more consistent in regard to the taxonomic  
elements.    I'd need a couple of days or three to do this as complete  
as I think is needed so it's that sort of time I'm anticipating you  
smart people can do in about the same.

For example,

• Checking the integrity of normal and denormal classifications.    
What are the steps, and conditions to check integrity in normal  
classifications and to transform a denormal to a normal.   In the  
latter, for example,  you have to make sure the same Family value  
doesn't have two different parents.  If so,  what then?
* Creating IDs for IDless, normalised records (e.g parentNameUsage)
* Map taxon ranks to our taxon rank vocabulary so that alternative  
forms (ssp, subspec, ss., are replaced, when possible, to the standard  
form).
normalising taxon and nomenclatural status
* Splitting a merged name into name and authorship parts.
* Checking the split version is consistent with the complete one if  
both are given.

I put the working doc Markus and I have in Dropbox and put it here:  https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B

Again, we not looking for someone to do the programming but to provide  
enough details so that person the steps needed to do that work.

So anyone want to help out a data standard over the holidays?   We are  
eager to start because I'd like to get a call for a developer before  
the middle of December when the little bag of gold gets taken away?    
If multiple people are interested we will have to draw straws or  
pistols or some other means for making a good decision.  But please  
contact me directly and we can followup offline.

Best,
David Remsen

----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472   Fax: +45-35321480
Mobile +45 28751472
Skype: dremsen
----------------------------------------------------------------------------