Dear David & all,<br><br>as a sidenote from me as a data-user who would like to assist in avoiding errors in interpretation of taxonomic names, - not sure if it helps in planning your future strategies:<br>I'm just wondering, why not explore a few gentle steps towards ... whoknows ... something like elements of a future "BioCode" (being aware of the incompatible parts of current domain-specific IC*N Codes, - but making use of uniting principles). <br>
<br>The usefulness of automated name parsing is limited, - not enough or even misleading in too many cases, e.g. with binomina combined with different homonymous genera, e.g. the following:<br><br>Tylonotus bimaculatus<br>
Tylonotus rugicollis<br>Tylonotus fryi<br><br>The first genus is Tylonotus HALDEMAN 1847 (Coleoptera Cerambycidae), second: Tylonotus FIEBER 1858 (Heteroptera Miridae), third: Tylonotus SCHAUM 1863 (Coleoptera Carabidae).<br>
Now imagine if we had an 'expanded parsing' strategy (with the assistance of taxon experts!) resulting in unique namestrings that contain all basic information on the nomenclatural status of names. <br>In the above example, unique ID-strings for the nomenclatural content might look like this:<br>
<br>ZS-Tylonotus_bimaculatus<br>ZS-Lygaeus_rugicollis/2Tylonotus_rugicollis/=Plesiocoris_rugicollis<br>ZS-3Tylonotus_fryi/=Nototylus_fryi<br><br>(prefix 'ZS-' for zoological species-group names; the second string shows we have to do with a new generic combination, etc.)<br>
Such strings can be perfectly stable, unique, human and machine-readable. They could serve as a solid basis for interpretation of actual name usages [= the GNUB task?] ... in my imagination.<br><br>Best regards,<br>Wolfgang<br>
<br>----------------------------------<br><br>Wolfgang Lorenz, Tutzing, Germany<br><br><br><div class="gmail_quote">2010/11/25 David Remsen (GBIF) <span dir="ltr"><<a href="mailto:dremsen@gbif.org">dremsen@gbif.org</a>></span><br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Now that we've been talking about the variations, perhaps we can move<br>
toward doing something about it.<br>
<br>
I mentioned we have been talking about having a service built that can<br>
read and unpack a DarwinCore Archive, evaluate specific things like<br>
we have been discussing that may be expressed inconsistently, set<br>
them right, and spit back out a new and more consistent archive<br>
file. In order to put out a call for a developer to do this, we<br>
need to capture some of those things it should do. Markus and I have<br>
started this but we have a lot of end-of-year business and less time<br>
than money. We still have some funds available to at least start this<br>
process.<br>
<br>
I'd like to know if anyone is interested in, and feels qualified to<br>
develop a more complete set of requirements for such a service, which<br>
we would then try to find a developer to build. We aren't trying to<br>
deal with everything at once, mind you. Just a some key things that<br>
might make ingesting a DarwinCore Archive for either Taxon data or<br>
Occurrence data a bit more consistent in regard to the taxonomic<br>
elements. I'd need a couple of days or three to do this as complete<br>
as I think is needed so it's that sort of time I'm anticipating you<br>
smart people can do in about the same.<br>
<br>
For example,<br>
<br>
• Checking the integrity of normal and denormal classifications.<br>
What are the steps, and conditions to check integrity in normal<br>
classifications and to transform a denormal to a normal. In the<br>
latter, for example, you have to make sure the same Family value<br>
doesn't have two different parents. If so, what then?<br>
* Creating IDs for IDless, normalised records (e.g parentNameUsage)<br>
* Map taxon ranks to our taxon rank vocabulary so that alternative<br>
forms (ssp, subspec, ss., are replaced, when possible, to the standard<br>
form).<br>
normalising taxon and nomenclatural status<br>
* Splitting a merged name into name and authorship parts.<br>
* Checking the split version is consistent with the complete one if<br>
both are given.<br>
<br>
I put the working doc Markus and I have in Dropbox and put it here: <a href="https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B" target="_blank">https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1c8QPPC59XkZTBlZDE4YjUtZDczYS00MWUxLTlkZGItNDhhYzg0ZmNhOTNk&hl=en&authkey=CK_rqc4B</a><br>
<br>
Again, we not looking for someone to do the programming but to provide<br>
enough details so that person the steps needed to do that work.<br>
<br>
So anyone want to help out a data standard over the holidays? We are<br>
eager to start because I'd like to get a call for a developer before<br>
the middle of December when the little bag of gold gets taken away?<br>
If multiple people are interested we will have to draw straws or<br>
pistols or some other means for making a good decision. But please<br>
contact me directly and we can followup offline.<br>
<br>
Best,<br>
David Remsen<br>
<br>
----------------------------------------------------------------------------<br>
David Remsen, Senior Programme Officer<br>
Electronic Catalog of Names of Known Organisms<br>
Global Biodiversity Information Facility Secretariat<br>
Universitetsparken 15, DK-2100 Copenhagen, Denmark<br>
Tel: +45-35321472 Fax: +45-35321480<br>
Mobile +45 28751472<br>
Skype: dremsen<br>
----------------------------------------------------------------------------<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
tdwg-content mailing list<br>
<a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>
<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
</blockquote></div><br>