[tdwg-content] DwC-A taxonomic normaliser/standardiser - get paid to write requirements
David Remsen (GBIF)
dremsen at gbif.org
Thu Nov 25 17:10:23 CET 2010
Now that we've been talking about the variations, perhaps we can move
toward doing something about it.
I mentioned we have been talking about having a service built that can
read and unpack a DarwinCore Archive, evaluate specific things like
we have been discussing that may be expressed inconsistently, set
them right, and spit back out a new and more consistent archive
file. In order to put out a call for a developer to do this, we
need to capture some of those things it should do. Markus and I have
started this but we have a lot of end-of-year business and less time
than money. We still have some funds available to at least start this
process.
I'd like to know if anyone is interested in, and feels qualified to
develop a more complete set of requirements for such a service, which
we would then try to find a developer to build. We aren't trying to
deal with everything at once, mind you. Just a some key things that
might make ingesting a DarwinCore Archive for either Taxon data or
Occurrence data a bit more consistent in regard to the taxonomic
elements. I'd need a couple of days or three to do this as complete
as I think is needed so it's that sort of time I'm anticipating you
smart people can do in about the same.
For example,
• Checking the integrity of normal and denormal classifications.
What are the steps, and conditions to check integrity in normal
classifications and to transform a denormal to a normal. In the
latter, for example, you have to make sure the same Family value
doesn't have two different parents. If so, what then?
* Creating IDs for IDless, normalised records (e.g parentNameUsage)
* Map taxon ranks to our taxon rank vocabulary so that alternative
forms (ssp, subspec, ss., are replaced, when possible, to the standard
form).
normalising taxon and nomenclatural status
* Splitting a merged name into name and authorship parts.
* Checking the split version is consistent with the complete one if
both are given.
I put the working doc Markus and I have in Dropbox and put it here: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B
Again, we not looking for someone to do the programming but to provide
enough details so that person the steps needed to do that work.
So anyone want to help out a data standard over the holidays? We are
eager to start because I'd like to get a call for a developer before
the middle of December when the little bag of gold gets taken away?
If multiple people are interested we will have to draw straws or
pistols or some other means for making a good decision. But please
contact me directly and we can followup offline.
Best,
David Remsen
----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472 Fax: +45-35321480
Mobile +45 28751472
Skype: dremsen
----------------------------------------------------------------------------
More information about the tdwg-content
mailing list