<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Re: [tdwg-content] canonical name for named hybrid & infragenericnames</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Van: Bob Morris [<A HREF="mailto:morris.bob@gmail.com">mailto:morris.bob@gmail.com</A>]<BR>
Verzonden: do 9-12-2010 14:52<BR>
<BR>
Thanks. To me what is interesting about this thread is that documents<BR>
whose main(?) audience is authors and publishers, do not always<BR>
address the needs of parser writers.<BR>
<BR>
***<BR>
That depends on how you look at it. The ICBN is mostly written so that<BR>
nobody who just browses through will make sense of it. It requires<BR>
any user to read it in some depth, if he is to apply it. Perhaps the<BR>
parser writer should realize that he is no exception?<BR>
<BR>
But, actually, parsers are not going to be the answer to any question<BR>
in biodiversity informatics. This is impossible, as the natural laws<BR>
in a nomenclatural universe are subject to change (almost without<BR>
notice). What was true ten years ago is not necessarily true now:<BR>
it may have been retroactively changed. Anybody doing anything in<BR>
biodiversity informatics should have at least some basic awareness<BR>
of the natural laws that govern nomenclatural universes.<BR>
* * *<BR>
<BR>
It is a rare and happy<BR>
circumstance for a programmer to have the document author to consult!<BR>
<BR>
***<BR>
Not the document, just the recommendation (excluding the Note). The<BR>
ICBN more or less is a wiki (has been for a hundred years).<BR>
* * *<BR>
<BR>
What I \think/ is implied by your answer is (something that requires<BR>
biological knowledge that I don't have, namely) that there are hybrid<BR>
names which are not necessarily a cross of two things, but rather only<BR>
one is mentioned.<BR>
<BR>
***<BR>
No, numbers are irrelevant, provided there are at least two parents<BR>
involved.<BR>
* * *<BR>
<BR>
The distinction then is that "formula" means at<BR>
least two, but there are uses which do not appear in a formula, right?<BR>
<BR>
***<BR>
No, the distinction is that a name is a name, while a formula is a<BR>
summation of (at least two) names.<BR>
<BR>
×Agropogon littoralis is a name, and it is the same as<BR>
Agropogon littoralis, for most purposes.<BR>
<BR>
Agrostis stolonifera × Polypogon monspeliensis are two names,<BR>
and the formula indicates their relation, which may be more<BR>
complex than here: see Rec. H.2A.1; so just lifting a formula<BR>
in isolation from the literature is out (Mentha longifolia ><BR>
× rotundifolia is an obsolete form).<BR>
<BR>
* * *<BR>
<BR>
So a natural language name extractor should follow this rule:<BR>
- If the × adjoins text, the token to the left of any predecessor<BR>
white space is not part of a taxon name, but otherwise it is.<BR>
Example: In the fragment "not unlike ×Agropogon littoralis" the token<BR>
'unlike' is not part of a name.<BR>
<BR>
Believe it or not, I am not complaining about ICBN. No programmer<BR>
interpreting a document not written for programmers should complain if<BR>
understanding it assumes knowledge and insight of the intended<BR>
audience. Nor should they complain if they are raising points that are<BR>
addressed in other parts of the document that they haven't read--which<BR>
in this case for me is everything but H.3A.<BR>
<BR>
Robust context sensitive parsers are marginally more complicated to<BR>
write than those that require no lookahead, but this is surely not the<BR>
only name parsing issue that requires lookahead, so I can't even<BR>
complain on that score. In a vaguely related setting, parser writers<BR>
might see the rather nicely set forth<BR>
<A HREF="http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterministic-xml-schema-to-deterministic">http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterministic-xml-schema-to-deterministic</A><BR>
<BR>
<BR>
<BR>
Bob Morris<BR>
p.s. Hey, I thought of something to complain about, albeit not about<BR>
ICBN: I sure wish spec writers targeting software would banish<BR>
"should" from their documents in favor of "must", even if multiple<BR>
choices are accompanied by "... is preferred". Well, maybe it's a<BR>
little complaint about the nomenclatural codes, because movement<BR>
towards born-digital, semantically marked-up systematics literature<BR>
will bump into it when people try to write semantically enhanced<BR>
applications. It would be far better if publishers followed a set of<BR>
rules with no "should" in them, for which compliance could be tested<BR>
before publication.<BR>
<BR>
***<BR>
There are more distinctions than just "must" and "should" in the ICBN.<BR>
Eliminating the "should" is not going to happen, but sometimes a<BR>
"should" will grow up to become a "must".<BR>
<BR>
Paul van Rijckevorsel<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>