Dear David &amp; all,<br><br>as a sidenote from me as a data-user who would like to assist in avoiding errors in interpretation of taxonomic names, - not sure if it helps in planning your future strategies:<br>I&#39;m just wondering, why not explore a few gentle steps towards ... whoknows ... something like elements of a future &quot;BioCode&quot; (being aware of the incompatible parts of current domain-specific IC*N Codes, - but making use of uniting principles). <br>

<br>The usefulness of automated name parsing is limited, - not enough or even misleading in too many cases, e.g. with binomina combined with different homonymous genera, e.g. the following:<br><br>Tylonotus bimaculatus<br>

Tylonotus rugicollis<br>Tylonotus fryi<br><br>The first genus is Tylonotus HALDEMAN 1847 (Coleoptera Cerambycidae), second: Tylonotus FIEBER 1858 (Heteroptera Miridae), third: Tylonotus SCHAUM 1863 (Coleoptera Carabidae).<br>

Now imagine if we had an &#39;expanded parsing&#39; strategy (with the assistance of taxon experts!) resulting in unique namestrings that contain all basic information on the nomenclatural status of names. <br>In the above example, unique ID-strings for the nomenclatural content might look like this:<br>

<br>ZS-Tylonotus_bimaculatus<br>ZS-Lygaeus_rugicollis/2Tylonotus_rugicollis/=Plesiocoris_rugicollis<br>ZS-3Tylonotus_fryi/=Nototylus_fryi<br><br>(prefix &#39;ZS-&#39; for zoological species-group names; the second string shows we have to do with a new generic combination, etc.)<br>

Such strings can be perfectly stable, unique, human and machine-readable. They could serve as a solid basis for interpretation of actual name usages [= the GNUB task?] ... in my imagination.<br><br>Best regards,<br>Wolfgang<br>

<br>----------------------------------<br><br>Wolfgang Lorenz, Tutzing, Germany<br><br><br><div class="gmail_quote">2010/11/25 David Remsen (GBIF) <span dir="ltr">&lt;<a href="mailto:dremsen@gbif.org">dremsen@gbif.org</a>&gt;</span><br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Now that we&#39;ve been talking about the variations,  perhaps we can move<br>

toward doing something about it.<br>

<br>

I mentioned we have been talking about having a service built that can<br>

read and unpack a DarwinCore Archive,  evaluate specific things like<br>

we have been discussing  that may be expressed inconsistently,  set<br>

them right,  and spit back out a new and more consistent archive<br>

file.   In order to put out a call for a developer to do this,  we<br>

need to capture some of those things it should do.   Markus and I have<br>

started this but we have a lot of end-of-year business and less time<br>

than money.  We still have some funds available to at least start this<br>

process.<br>

<br>

I&#39;d like to know if anyone is interested in, and feels qualified to<br>

develop a more complete set of requirements for such a service, which<br>

we would then try to find a developer to build.   We aren&#39;t trying to<br>

deal with everything at once, mind you.   Just a some key things that<br>

might make ingesting a DarwinCore Archive for either Taxon data or<br>

Occurrence data a bit more consistent in regard to the taxonomic<br>

elements.    I&#39;d need a couple of days or three to do this as complete<br>

as I think is needed so it&#39;s that sort of time I&#39;m anticipating you<br>

smart people can do in about the same.<br>

<br>

For example,<br>

<br>

• Checking the integrity of normal and denormal classifications.<br>

What are the steps, and conditions to check integrity in normal<br>

classifications and to transform a denormal to a normal.   In the<br>

latter, for example,  you have to make sure the same Family value<br>

doesn&#39;t have two different parents.  If so,  what then?<br>

* Creating IDs for IDless, normalised records (e.g parentNameUsage)<br>

* Map taxon ranks to our taxon rank vocabulary so that alternative<br>

forms (ssp, subspec, ss., are replaced, when possible, to the standard<br>

form).<br>

normalising taxon and nomenclatural status<br>

* Splitting a merged name into name and authorship parts.<br>

* Checking the split version is consistent with the complete one if<br>

both are given.<br>

<br>

I put the working doc Markus and I have in Dropbox and put it here:  <a href="https://docs.google.com/viewer?a=v&amp;pid=explorer&amp;chrome=true&amp;srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&amp;hl=en&amp;authkey=CK_rqc4B" target="_blank">https://docs.google.com/viewer?a=v&amp;pid=explorer&amp;chrome=true&amp;srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&amp;hl=en&amp;authkey=CK_rqc4B</a><br>


<br>

Again, we not looking for someone to do the programming but to provide<br>

enough details so that person the steps needed to do that work.<br>

<br>

So anyone want to help out a data standard over the holidays?   We are<br>

eager to start because I&#39;d like to get a call for a developer before<br>

the middle of December when the little bag of gold gets taken away?<br>

If multiple people are interested we will have to draw straws or<br>

pistols or some other means for making a good decision.  But please<br>

contact me directly and we can followup offline.<br>

<br>

Best,<br>

David Remsen<br>

<br>

----------------------------------------------------------------------------<br>

David Remsen, Senior Programme Officer<br>

Electronic Catalog of Names of Known Organisms<br>

Global Biodiversity Information Facility Secretariat<br>

Universitetsparken 15, DK-2100 Copenhagen, Denmark<br>

Tel: +45-35321472   Fax: +45-35321480<br>

Mobile +45 28751472<br>

Skype: dremsen<br>

----------------------------------------------------------------------------<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

tdwg-content mailing list<br>

<a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>

</blockquote></div><br>