Re: [tdwg-content] canonical name for named hybrid & infragenericnames

9 Dec 2010

      Van: Bob Morris [mailto:morris.bob@gmail.com]
Verzonden: do 9-12-2010 14:52

Thanks. To me what is interesting about this thread is that documents
whose main(?) audience is authors and publishers, do not always
address the needs of parser writers. 

***
That depends on how you look at it. The ICBN is mostly written so that
nobody who just browses through will make sense of it. It requires 
any user to read it in some depth, if he is to apply it. Perhaps the 
parser writer should realize that he is no exception?

But, actually, parsers are not going to be the answer to any question
in biodiversity informatics. This is impossible, as the natural laws
in a nomenclatural universe are subject to change (almost without
notice). What was true ten years ago is not necessarily true now: 
it may have been retroactively changed. Anybody doing anything in
biodiversity informatics should have at least some basic awareness
of the natural laws that govern nomenclatural universes.
* * *

It is a rare and happy
circumstance for a programmer to have the document author to consult!

***
Not the document, just the recommendation (excluding the Note). The
ICBN more or less is a wiki (has been for a hundred years).
* * *

What I \think/ is implied by your answer is (something that requires
biological knowledge that I don't have, namely) that there are hybrid
names which are not necessarily a cross of two things, but rather only
one is mentioned. 

***
No, numbers are irrelevant, provided there are at least two parents 
involved.
* * *

The distinction then is that "formula" means at
least two, but there are uses which do not appear in a formula, right?

***
No, the distinction is that a name is a name, while a formula is a 
summation of (at least two) names.

   ×Agropogon littoralis is a name, and it is the same as
    Agropogon littoralis, for most purposes.

    Agrostis stolonifera × Polypogon monspeliensis are two names,
    and the formula indicates their relation, which may be more 
    complex than here: see Rec. H.2A.1; so just lifting a formula 
    in isolation from the literature is out (Mentha longifolia > 
    × rotundifolia is an obsolete form).

* * *

 So a natural language name extractor should follow this rule:
    - If the × adjoins text, the token to the left of any predecessor
white space is not part of a taxon name, but otherwise it is.
Example: In the fragment "not unlike ×Agropogon littoralis"  the token
'unlike' is not part of a name.

Believe it or not, I am not complaining about ICBN. No programmer
interpreting a document not written for programmers should complain if
understanding it assumes knowledge and insight of the intended
audience. Nor should they complain if they are raising points that are
addressed in other parts of the document that they haven't read--which
in this case for me is everything but H.3A.

Robust context sensitive parsers are marginally more complicated to
write than those that require no lookahead, but this is surely not the
only name parsing issue that requires lookahead, so I can't even
complain on that score. In a vaguely related setting, parser writers
might see the rather nicely set forth
http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterminis...

Bob Morris
p.s. Hey, I thought of something to complain about, albeit not about
ICBN: I sure wish spec writers targeting software would banish
"should" from their documents in favor of "must", even if multiple
choices are accompanied by "... is preferred".  Well, maybe it's a
little complaint about the nomenclatural codes, because  movement
towards born-digital, semantically marked-up systematics literature
will bump into it when people try to write semantically enhanced
applications. It would be far better if publishers followed a set of
rules with no "should" in them, for which compliance could be tested
before publication.

***
There are more distinctions than just "must" and "should" in the ICBN. 
Eliminating the "should" is not going to happen, but sometimes a 
"should" will grow up to become a "must".

Paul van Rijckevorsel

Re: [tdwg-content] canonical name for named hybrid & infragenericnames

dipteryx＠freeler.nl