Dear David & all,

as a sidenote from me as a data-user who would like to assist in avoiding errors in interpretation of taxonomic names, - not sure if it helps in planning your future strategies:
I'm just wondering, why not explore a few gentle steps towards ... whoknows ... something like elements of a future "BioCode" (being aware of the incompatible parts of current domain-specific IC*N Codes, - but making use of uniting principles).

The usefulness of automated name parsing is limited, - not enough or even misleading in too many cases, e.g. with binomina combined with different homonymous genera, e.g. the following:

Tylonotus bimaculatus
Tylonotus rugicollis
Tylonotus fryi

The first genus is Tylonotus HALDEMAN 1847 (Coleoptera Cerambycidae), second: Tylonotus FIEBER 1858 (Heteroptera Miridae), third: Tylonotus SCHAUM 1863 (Coleoptera Carabidae).
Now imagine if we had an 'expanded parsing' strategy (with the assistance of taxon experts!) resulting in unique namestrings that contain all basic information on the nomenclatural status of names.
In the above example, unique ID-strings for the nomenclatural content might look like this:

ZS-Tylonotus_bimaculatus
ZS-Lygaeus_rugicollis/2Tylonotus_rugicollis/=Plesiocoris_rugicollis
ZS-3Tylonotus_fryi/=Nototylus_fryi

(prefix 'ZS-' for zoological species-group names; the second string shows we have to do with a new generic combination, etc.)
Such strings can be perfectly stable, unique, human and machine-readable. They could serve as a solid basis for interpretation of actual name usages [= the GNUB task?] ... in my imagination.

Best regards,
Wolfgang

----------------------------------

Wolfgang Lorenz, Tutzing, Germany


2010/11/25 David Remsen (GBIF) <dremsen@gbif.org>
Now that we've been talking about the variations,  perhaps we can move
toward doing something about it.

I mentioned we have been talking about having a service built that can
read and unpack a DarwinCore Archive,  evaluate specific things like
we have been discussing  that may be expressed inconsistently,  set
them right,  and spit back out a new and more consistent archive
file.   In order to put out a call for a developer to do this,  we
need to capture some of those things it should do.   Markus and I have
started this but we have a lot of end-of-year business and less time
than money.  We still have some funds available to at least start this
process.

I'd like to know if anyone is interested in, and feels qualified to
develop a more complete set of requirements for such a service, which
we would then try to find a developer to build.   We aren't trying to
deal with everything at once, mind you.   Just a some key things that
might make ingesting a DarwinCore Archive for either Taxon data or
Occurrence data a bit more consistent in regard to the taxonomic
elements.    I'd need a couple of days or three to do this as complete
as I think is needed so it's that sort of time I'm anticipating you
smart people can do in about the same.

For example,

• Checking the integrity of normal and denormal classifications.
What are the steps, and conditions to check integrity in normal
classifications and to transform a denormal to a normal.   In the
latter, for example,  you have to make sure the same Family value
doesn't have two different parents.  If so,  what then?
* Creating IDs for IDless, normalised records (e.g parentNameUsage)
* Map taxon ranks to our taxon rank vocabulary so that alternative
forms (ssp, subspec, ss., are replaced, when possible, to the standard
form).
normalising taxon and nomenclatural status
* Splitting a merged name into name and authorship parts.
* Checking the split version is consistent with the complete one if
both are given.

I put the working doc Markus and I have in Dropbox and put it here:  https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B

Again, we not looking for someone to do the programming but to provide
enough details so that person the steps needed to do that work.

So anyone want to help out a data standard over the holidays?   We are
eager to start because I'd like to get a call for a developer before
the middle of December when the little bag of gold gets taken away?
If multiple people are interested we will have to draw straws or
pistols or some other means for making a good decision.  But please
contact me directly and we can followup offline.

Best,
David Remsen

----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472   Fax: +45-35321480
Mobile +45 28751472
Skype: dremsen
----------------------------------------------------------------------------




_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content