[tdwg-content] DwC-A taxonomic normaliser/standardiser - get paid to write requirements

Fri Nov 26 11:22:26 CET 2010

Dear David & all,

as a sidenote from me as a data-user who would like to assist in avoiding
errors in interpretation of taxonomic names, - not sure if it helps in
planning your future strategies:
I'm just wondering, why not explore a few gentle steps towards ... whoknows
... something like elements of a future "BioCode" (being aware of the
incompatible parts of current domain-specific IC*N Codes, - but making use
of uniting principles).

The usefulness of automated name parsing is limited, - not enough or even
misleading in too many cases, e.g. with binomina combined with different
homonymous genera, e.g. the following:

Tylonotus bimaculatus
Tylonotus rugicollis
Tylonotus fryi

The first genus is Tylonotus HALDEMAN 1847 (Coleoptera Cerambycidae),
second: Tylonotus FIEBER 1858 (Heteroptera Miridae), third: Tylonotus SCHAUM
1863 (Coleoptera Carabidae).
Now imagine if we had an 'expanded parsing' strategy (with the assistance of
taxon experts!) resulting in unique namestrings that contain all basic
information on the nomenclatural status of names.
In the above example, unique ID-strings for the nomenclatural content might
look like this:

ZS-Tylonotus_bimaculatus
ZS-Lygaeus_rugicollis/2Tylonotus_rugicollis/=Plesiocoris_rugicollis
ZS-3Tylonotus_fryi/=Nototylus_fryi

(prefix 'ZS-' for zoological species-group names; the second string shows we
have to do with a new generic combination, etc.)
Such strings can be perfectly stable, unique, human and machine-readable.
They could serve as a solid basis for interpretation of actual name usages
[= the GNUB task?] ... in my imagination.

Best regards,
Wolfgang

----------------------------------

Wolfgang Lorenz, Tutzing, Germany

2010/11/25 David Remsen (GBIF) <dremsen at gbif.org>

> Now that we've been talking about the variations,  perhaps we can move
> toward doing something about it.
>
> I mentioned we have been talking about having a service built that can
> read and unpack a DarwinCore Archive,  evaluate specific things like
> we have been discussing  that may be expressed inconsistently,  set
> them right,  and spit back out a new and more consistent archive
> file.   In order to put out a call for a developer to do this,  we
> need to capture some of those things it should do.   Markus and I have
> started this but we have a lot of end-of-year business and less time
> than money.  We still have some funds available to at least start this
> process.
>
> I'd like to know if anyone is interested in, and feels qualified to
> develop a more complete set of requirements for such a service, which
> we would then try to find a developer to build.   We aren't trying to
> deal with everything at once, mind you.   Just a some key things that
> might make ingesting a DarwinCore Archive for either Taxon data or
> Occurrence data a bit more consistent in regard to the taxonomic
> elements.    I'd need a couple of days or three to do this as complete
> as I think is needed so it's that sort of time I'm anticipating you
> smart people can do in about the same.
>
> For example,
>
> • Checking the integrity of normal and denormal classifications.
> What are the steps, and conditions to check integrity in normal
> classifications and to transform a denormal to a normal.   In the
> latter, for example,  you have to make sure the same Family value
> doesn't have two different parents.  If so,  what then?
> * Creating IDs for IDless, normalised records (e.g parentNameUsage)
> * Map taxon ranks to our taxon rank vocabulary so that alternative
> forms (ssp, subspec, ss., are replaced, when possible, to the standard
> form).
> normalising taxon and nomenclatural status
> * Splitting a merged name into name and authorship parts.
> * Checking the split version is consistent with the complete one if
> both are given.
>
> I put the working doc Markus and I have in Dropbox and put it here:
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B
>
> Again, we not looking for someone to do the programming but to provide
> enough details so that person the steps needed to do that work.
>
> So anyone want to help out a data standard over the holidays?   We are
> eager to start because I'd like to get a call for a developer before
> the middle of December when the little bag of gold gets taken away?
> If multiple people are interested we will have to draw straws or
> pistols or some other means for making a good decision.  But please
> contact me directly and we can followup offline.
>
> Best,
> David Remsen
>
>
> ----------------------------------------------------------------------------
> David Remsen, Senior Programme Officer
> Electronic Catalog of Names of Known Organisms
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321472   Fax: +45-35321480
> Mobile +45 28751472
> Skype: dremsen
>
> ----------------------------------------------------------------------------
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101126/b7699164/attachment.html