I think it makes the most sense to model these based on biology and informatics and then be able to output a code compliant string.

One reason is that the code can change and so you don't want to have that a fixed part of the most fundamental units of your information systems.

Another advantage is you don't have to have different intermediate structures and forms for each of the nomenclatural codes.

Do we want to have one entity for species or four+ that have to be duplicated throughout the entire software stack?

Done this way if the code changes you just need to alter the output code.

You also don't always know what is the appropriate code for a given string until the end.

Respectfully,

- Pete


On Thu, Dec 9, 2010 at 7:52 AM, Bob Morris <morris.bob@gmail.com> wrote:
Thanks. To me what is interesting about this thread is that documents
whose main(?) audience is authors and publishers, do not always
address the needs of parser writers. It is a rare and happy
circumstance for a programmer to have the document author to consult!

What I \think/ is implied by your answer is (something that requires
biological knowledge that I don't have, namely) that there are hybrid
names which are not necessarily a cross of two things, but rather only
one is mentioned.  The distinction then is that "formula" means at
least two, but there are uses which do not appear in a formula, right?
 So a natural language name extractor should follow this rule:
   - If the × adjoins text, the token to the left of any predecessor
white space is not part of a taxon name, but otherwise it is.
Example: In the fragment "not unlike ×Agropogon littoralis"  the token
'unlike' is not part of a name.

Believe it or not, I am not complaining about ICBN. No programmer
interpreting a document not written for programmers should complain if
understanding it assumes knowledge and insight of the intended
audience. Nor should they complain if they are raising points that are
addressed in other parts of the document that they haven't read--which
in this case for me is everything but H.3A.

Robust context sensitive parsers are marginally more complicated to
write than those that require no lookahead, but this is surely not the
only name parsing issue that requires lookahead, so I can't even
complain on that score. In a vaguely related setting, parser writers
might see the rather nicely set forth
http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterministic-xml-schema-to-deterministic



Bob Morris
p.s. Hey, I thought of something to complain about, albeit not about
ICBN: I sure wish spec writers targeting software would banish
"should" from their documents in favor of "must", even if multiple
choices are accompanied by "... is preferred".  Well, maybe it's a
little complaint about the nomenclatural codes, because  movement
towards born-digital, semantically marked-up systematics literature
will bump into it when people try to write semantically enhanced
applications. It would be far better if publishers followed a set of
rules with no "should" in them, for which compliance could be tested
before publication.



Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)


On Thu, Dec 9, 2010 at 3:39 AM,  <dipteryx@freeler.nl> wrote:
> Having personally written Rec. H.3A.1, I do not see that it offers
> scope for being misread: the placement of the multiplication sign
> is a matter of style (and insight). As background information, the
> ICBN-preferred style is to put it directly in front of the name or
> epithet (no space whatsoever: ×Agropogon littoralis): just keep
> it nice together, so as to give computers no chance to mess it
> up (after all, at a line break, a computer is likely to separate
> these over more than one line).
>
> Rec. H.3A Note 1 has been put in there (redundantly) for those who
> are careless readers, just to make sure the matter could not
> possibly be misunderstood by even the most whimsical. So, in a
> formula, the parents are separated by: space, multiplication sign,
> space;
>      Agrostis stolonifera × Polypogon monspeliensis.
>
> Paul van Rijckevorsel
>
> * * *
> -----Oorspronkelijk bericht-----
> Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris
> Verzonden: wo 8-12-2010 20:12
> Aan: Markus Döring (GBIF)
> CC: tdwg-content@lists.tdwg.org List
> Onderwerp: Re: [tdwg-content] canonical name for named hybrid &
> infragenericnames
>
> Your placement of the multiplication sign ×  does not seem code
> compliant. It looks too close. Maybe.  Also there might be a question
> about whether a TDWG requirement to use the multiplication sign can be
> easily implemented by all providers.
>
> On these subjects The Appendix on Hybrid Names of ICBN seems
> contradictory in that H.3A.1
> (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted
> below)  seems to allow your placement, but Note 1. there seems to
> require space. Note 1. would, with H.3A.1 imply that there must be
> more white space to the left than right of the multiplication sign or
> its surrogate. One spacing that seems to violate all interpretations
> of A.3A.1 is equal white space around the multiplication sign. My
> guess is that the overwhelming fraction of printed hybrid names are
> thereby noncompliant unless something elsewhere resolves the issue).
> Making the amount of white space significant in a parsed string  is a
> horrifying thought.
>
> --Bob Morris
>
> "Recommendation H.3A
>
> H.3A.1. The multiplication sign ×, indicating the hybrid nature of a
> taxon, should be placed so as to express that it belongs with the name
> or epithet but is not actually part of it. The exact amount of space,
> if any, between the multiplication sign and the initial letter of the
> name or epithet should depend on what best serves readability.
>
> Note 1. The multiplication sign × in a hybrid formula is always placed
> between, and separate from, the names of the parents.
> H.3A.2. If the multiplication sign is not available it should be
> approximated by a lower case letter "x" (not italicized)."
> http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
>
>
> ======================
>
>
>
> On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
> <mdoering@gbif.org> wrote:
>> talking about canonical names again I want to use the oppertunity and get
>> rid of another question I have.
>> What is the code compliant canonical version of named hybrids (not
>> formulas) and infrageneric names?
>>
>>
>> Are these examples correct?
>>
>> Botanical section:
>> verbatim: Maxillaria sect. Multiflorae Christenson
>> canonical:  Maxillaria sect. Multiflorae
>>
>> Botanical subgenus:
>> verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
>> canonical:  Anthemis subgen. Maruta
>>
>> Botanical series:
>> verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
>> canonical:  Artemisia ser. Codonocephalae
>>
>> Zoological subgenus:
>> verbatim: Murex (Promurex) Ponder & Vokes, 1988
>> canonical:  Murex subgen. Promurex
>> # if we use parenthesis to indicate the subgenus we can only guess if its
>> an author or subgenus name
>>
>> Zoological species
>> verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
>> canonical: Leptochilus beaumonti
>>
>>
>>
>> Botanical named genus hybrid:
>> verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
>> canonical: ×Agropogon littoralis
>>
>> Botanical named infrageneric hybrid:
>> verbatim: Eryngium nothosect. Alpestria Burdet & Miège
>> canonical: Eryngium nothosect. Alpestria
>>
>> Botanical named species hybrid:
>> verbatim: Salix ×capreola Andersson (1867)
>> canonical: Salix ×capreola Andersson (1867)
>>
>> Botanical variety, named species hybrid:
>> verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
>> canonical: Populus ×canadensis var. serotina
>>
>> Botanical named infraspecific hybrid:
>> verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
>> canonical: Polypodium vulgare nothosubsp. mantoniae
>>
>>
>>
>
> _______________________________________________
> tdwg-content mailing list
--



On Thu, Dec 9, 2010 at 3:39 AM,  <dipteryx@freeler.nl> wrote:
> Having personally written Rec. H.3A.1, I do not see that it offers
> scope for being misread: the placement of the multiplication sign
> is a matter of style (and insight). As background information, the
> ICBN-preferred style is to put it directly in front of the name or
> epithet (no space whatsoever: ×Agropogon littoralis): just keep
> it nice together, so as to give computers no chance to mess it
> up (after all, at a line break, a computer is likely to separate
> these over more than one line).
>
> Rec. H.3A Note 1 has been put in there (redundantly) for those who
> are careless readers, just to make sure the matter could not
> possibly be misunderstood by even the most whimsical. So, in a
> formula, the parents are separated by: space, multiplication sign,
> space;
>      Agrostis stolonifera × Polypogon monspeliensis.
>
> Paul van Rijckevorsel
>
> * * *
> -----Oorspronkelijk bericht-----
> Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris
> Verzonden: wo 8-12-2010 20:12
> Aan: Markus Döring (GBIF)
> CC: tdwg-content@lists.tdwg.org List
> Onderwerp: Re: [tdwg-content] canonical name for named hybrid &
> infragenericnames
>
> Your placement of the multiplication sign ×  does not seem code
> compliant. It looks too close. Maybe.  Also there might be a question
> about whether a TDWG requirement to use the multiplication sign can be
> easily implemented by all providers.
>
> On these subjects The Appendix on Hybrid Names of ICBN seems
> contradictory in that H.3A.1
> (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted
> below)  seems to allow your placement, but Note 1. there seems to
> require space. Note 1. would, with H.3A.1 imply that there must be
> more white space to the left than right of the multiplication sign or
> its surrogate. One spacing that seems to violate all interpretations
> of A.3A.1 is equal white space around the multiplication sign. My
> guess is that the overwhelming fraction of printed hybrid names are
> thereby noncompliant unless something elsewhere resolves the issue).
> Making the amount of white space significant in a parsed string  is a
> horrifying thought.
>
> --Bob Morris
>
> "Recommendation H.3A
>
> H.3A.1. The multiplication sign ×, indicating the hybrid nature of a
> taxon, should be placed so as to express that it belongs with the name
> or epithet but is not actually part of it. The exact amount of space,
> if any, between the multiplication sign and the initial letter of the
> name or epithet should depend on what best serves readability.
>
> Note 1. The multiplication sign × in a hybrid formula is always placed
> between, and separate from, the names of the parents.
> H.3A.2. If the multiplication sign is not available it should be
> approximated by a lower case letter "x" (not italicized)."
> http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
>
>
> ======================
>
>
>
> On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
> <mdoering@gbif.org> wrote:
>> talking about canonical names again I want to use the oppertunity and get
>> rid of another question I have.
>> What is the code compliant canonical version of named hybrids (not
>> formulas) and infrageneric names?
>>
>>
>> Are these examples correct?
>>
>> Botanical section:
>> verbatim: Maxillaria sect. Multiflorae Christenson
>> canonical:  Maxillaria sect. Multiflorae
>>
>> Botanical subgenus:
>> verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
>> canonical:  Anthemis subgen. Maruta
>>
>> Botanical series:
>> verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
>> canonical:  Artemisia ser. Codonocephalae
>>
>> Zoological subgenus:
>> verbatim: Murex (Promurex) Ponder & Vokes, 1988
>> canonical:  Murex subgen. Promurex
>> # if we use parenthesis to indicate the subgenus we can only guess if its
>> an author or subgenus name
>>
>> Zoological species
>> verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
>> canonical: Leptochilus beaumonti
>>
>>
>>
>> Botanical named genus hybrid:
>> verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
>> canonical: ×Agropogon littoralis
>>
>> Botanical named infrageneric hybrid:
>> verbatim: Eryngium nothosect. Alpestria Burdet & Miège
>> canonical: Eryngium nothosect. Alpestria
>>
>> Botanical named species hybrid:
>> verbatim: Salix ×capreola Andersson (1867)
>> canonical: Salix ×capreola Andersson (1867)
>>
>> Botanical variety, named species hybrid:
>> verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
>> canonical: Populus ×canadensis var. serotina
>>
>> Botanical named infraspecific hybrid:
>> verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
>> canonical: Polypodium vulgare nothosubsp. mantoniae
>>
>>
>>
>
> _______________________________________________
> tdwg-content mailing list
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>



--
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
_______________________________________________
tdwg-content mailing list



--
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------