I think it makes the most sense to model these based on biology and informatics and then be able to output a code compliant string.<div><br></div><div>One reason is that the code can change and so you don't want to have that a fixed part of the most fundamental units of your information systems.</div>
<div><br></div><div>Another advantage is you don't have to have different intermediate structures and forms for each of the nomenclatural codes.</div><div><br></div><div>Do we want to have one entity for species or four+ that have to be duplicated throughout the entire software stack?</div>
<div><br></div><div><meta charset="utf-8">Done this way if the code changes you just need to alter the output code.</div><div><br></div><div>You also don't always know what is the appropriate code for a given string until the end.</div>
<div><br></div><div>Respectfully,</div><div><br></div><div>- Pete</div><div><br><br><div class="gmail_quote">On Thu, Dec 9, 2010 at 7:52 AM, Bob Morris <span dir="ltr"><<a href="mailto:morris.bob@gmail.com">morris.bob@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Thanks. To me what is interesting about this thread is that documents<br>
whose main(?) audience is authors and publishers, do not always<br>
address the needs of parser writers. It is a rare and happy<br>
circumstance for a programmer to have the document author to consult!<br>
<br>
What I \think/ is implied by your answer is (something that requires<br>
biological knowledge that I don't have, namely) that there are hybrid<br>
names which are not necessarily a cross of two things, but rather only<br>
one is mentioned. The distinction then is that "formula" means at<br>
least two, but there are uses which do not appear in a formula, right?<br>
So a natural language name extractor should follow this rule:<br>
- If the × adjoins text, the token to the left of any predecessor<br>
white space is not part of a taxon name, but otherwise it is.<br>
Example: In the fragment "not unlike ×Agropogon littoralis" the token<br>
'unlike' is not part of a name.<br>
<br>
Believe it or not, I am not complaining about ICBN. No programmer<br>
interpreting a document not written for programmers should complain if<br>
understanding it assumes knowledge and insight of the intended<br>
audience. Nor should they complain if they are raising points that are<br>
addressed in other parts of the document that they haven't read--which<br>
in this case for me is everything but H.3A.<br>
<br>
Robust context sensitive parsers are marginally more complicated to<br>
write than those that require no lookahead, but this is surely not the<br>
only name parsing issue that requires lookahead, so I can't even<br>
complain on that score. In a vaguely related setting, parser writers<br>
might see the rather nicely set forth<br>
<a href="http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterministic-xml-schema-to-deterministic" target="_blank">http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterministic-xml-schema-to-deterministic</a><br>
<br>
<br>
<br>
Bob Morris<br>
p.s. Hey, I thought of something to complain about, albeit not about<br>
ICBN: I sure wish spec writers targeting software would banish<br>
"should" from their documents in favor of "must", even if multiple<br>
choices are accompanied by "... is preferred". Well, maybe it's a<br>
little complaint about the nomenclatural codes, because movement<br>
towards born-digital, semantically marked-up systematics literature<br>
will bump into it when people try to write semantically enhanced<br>
applications. It would be far better if publishers followed a set of<br>
rules with no "should" in them, for which compliance could be tested<br>
before publication.<br>
<div class="im"><br>
<br>
<br>
Robert A. Morris<br>
Emeritus Professor of Computer Science<br>
UMASS-Boston<br>
100 Morrissey Blvd<br>
Boston, MA 02125-3390<br>
Associate, Harvard University Herbaria<br>
email: <a href="mailto:morris.bob@gmail.com">morris.bob@gmail.com</a><br>
web: <a href="http://bdei.cs.umb.edu/" target="_blank">http://bdei.cs.umb.edu/</a><br>
web: <a href="http://etaxonomy.org/mw/FilteredPush" target="_blank">http://etaxonomy.org/mw/FilteredPush</a><br>
<a href="http://www.cs.umb.edu/~ram" target="_blank">http://www.cs.umb.edu/~ram</a><br>
phone (+1) 857 222 7992 (mobile)<br>
<br>
<br>
</div><div><div></div><div class="h5">On Thu, Dec 9, 2010 at 3:39 AM, <<a href="mailto:dipteryx@freeler.nl">dipteryx@freeler.nl</a>> wrote:<br>
> Having personally written Rec. H.3A.1, I do not see that it offers<br>
> scope for being misread: the placement of the multiplication sign<br>
> is a matter of style (and insight). As background information, the<br>
> ICBN-preferred style is to put it directly in front of the name or<br>
> epithet (no space whatsoever: ×Agropogon littoralis): just keep<br>
> it nice together, so as to give computers no chance to mess it<br>
> up (after all, at a line break, a computer is likely to separate<br>
> these over more than one line).<br>
><br>
> Rec. H.3A Note 1 has been put in there (redundantly) for those who<br>
> are careless readers, just to make sure the matter could not<br>
> possibly be misunderstood by even the most whimsical. So, in a<br>
> formula, the parents are separated by: space, multiplication sign,<br>
> space;<br>
> Agrostis stolonifera × Polypogon monspeliensis.<br>
><br>
> Paul van Rijckevorsel<br>
><br>
> * * *<br>
> -----Oorspronkelijk bericht-----<br>
> Van: <a href="mailto:tdwg-content-bounces@lists.tdwg.org">tdwg-content-bounces@lists.tdwg.org</a> namens Bob Morris<br>
> Verzonden: wo 8-12-2010 20:12<br>
> Aan: Markus Döring (GBIF)<br>
> CC: <a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a> List<br>
> Onderwerp: Re: [tdwg-content] canonical name for named hybrid &<br>
> infragenericnames<br>
><br>
> Your placement of the multiplication sign × does not seem code<br>
> compliant. It looks too close. Maybe. Also there might be a question<br>
> about whether a TDWG requirement to use the multiplication sign can be<br>
> easily implemented by all providers.<br>
><br>
> On these subjects The Appendix on Hybrid Names of ICBN seems<br>
> contradictory in that H.3A.1<br>
> (<a href="http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm" target="_blank">http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm</a>, quoted<br>
> below) seems to allow your placement, but Note 1. there seems to<br>
> require space. Note 1. would, with H.3A.1 imply that there must be<br>
> more white space to the left than right of the multiplication sign or<br>
> its surrogate. One spacing that seems to violate all interpretations<br>
> of A.3A.1 is equal white space around the multiplication sign. My<br>
> guess is that the overwhelming fraction of printed hybrid names are<br>
> thereby noncompliant unless something elsewhere resolves the issue).<br>
> Making the amount of white space significant in a parsed string is a<br>
> horrifying thought.<br>
><br>
> --Bob Morris<br>
><br>
> "Recommendation H.3A<br>
><br>
> H.3A.1. The multiplication sign ×, indicating the hybrid nature of a<br>
> taxon, should be placed so as to express that it belongs with the name<br>
> or epithet but is not actually part of it. The exact amount of space,<br>
> if any, between the multiplication sign and the initial letter of the<br>
> name or epithet should depend on what best serves readability.<br>
><br>
> Note 1. The multiplication sign × in a hybrid formula is always placed<br>
> between, and separate from, the names of the parents.<br>
> H.3A.2. If the multiplication sign is not available it should be<br>
> approximated by a lower case letter "x" (not italicized)."<br>
> <a href="http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm" target="_blank">http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm</a><br>
><br>
><br>
> ======================<br>
><br>
><br>
><br>
> On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"<br>
> <<a href="mailto:mdoering@gbif.org">mdoering@gbif.org</a>> wrote:<br>
>> talking about canonical names again I want to use the oppertunity and get<br>
>> rid of another question I have.<br>
>> What is the code compliant canonical version of named hybrids (not<br>
>> formulas) and infrageneric names?<br>
>><br>
>><br>
>> Are these examples correct?<br>
>><br>
>> Botanical section:<br>
>> verbatim: Maxillaria sect. Multiflorae Christenson<br>
>> canonical: Maxillaria sect. Multiflorae<br>
>><br>
>> Botanical subgenus:<br>
>> verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev<br>
>> canonical: Anthemis subgen. Maruta<br>
>><br>
>> Botanical series:<br>
>> verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling<br>
>> canonical: Artemisia ser. Codonocephalae<br>
>><br>
>> Zoological subgenus:<br>
>> verbatim: Murex (Promurex) Ponder & Vokes, 1988<br>
>> canonical: Murex subgen. Promurex<br>
>> # if we use parenthesis to indicate the subgenus we can only guess if its<br>
>> an author or subgenus name<br>
>><br>
>> Zoological species<br>
>> verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953<br>
>> canonical: Leptochilus beaumonti<br>
>><br>
>><br>
>><br>
>> Botanical named genus hybrid:<br>
>> verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.<br>
>> canonical: ×Agropogon littoralis<br>
>><br>
>> Botanical named infrageneric hybrid:<br>
>> verbatim: Eryngium nothosect. Alpestria Burdet & Miège<br>
>> canonical: Eryngium nothosect. Alpestria<br>
>><br>
>> Botanical named species hybrid:<br>
>> verbatim: Salix ×capreola Andersson (1867)<br>
>> canonical: Salix ×capreola Andersson (1867)<br>
>><br>
>> Botanical variety, named species hybrid:<br>
>> verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder<br>
>> canonical: Populus ×canadensis var. serotina<br>
>><br>
>> Botanical named infraspecific hybrid:<br>
>> verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay<br>
>> canonical: Polypodium vulgare nothosubsp. mantoniae<br>
>><br>
>><br>
>><br>
><br>
</div></div>> _______________________________________________<br>
> tdwg-content mailing list<br>
<div class="im">> <a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>
</div><div class="im">> <a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
><br>
><br>
<br>
<br>
<br>
</div>--<br>
<div><div></div><div class="h5"><br>
<br>
<br>
On Thu, Dec 9, 2010 at 3:39 AM, <<a href="mailto:dipteryx@freeler.nl">dipteryx@freeler.nl</a>> wrote:<br>
> Having personally written Rec. H.3A.1, I do not see that it offers<br>
> scope for being misread: the placement of the multiplication sign<br>
> is a matter of style (and insight). As background information, the<br>
> ICBN-preferred style is to put it directly in front of the name or<br>
> epithet (no space whatsoever: ×Agropogon littoralis): just keep<br>
> it nice together, so as to give computers no chance to mess it<br>
> up (after all, at a line break, a computer is likely to separate<br>
> these over more than one line).<br>
><br>
> Rec. H.3A Note 1 has been put in there (redundantly) for those who<br>
> are careless readers, just to make sure the matter could not<br>
> possibly be misunderstood by even the most whimsical. So, in a<br>
> formula, the parents are separated by: space, multiplication sign,<br>
> space;<br>
> Agrostis stolonifera × Polypogon monspeliensis.<br>
><br>
> Paul van Rijckevorsel<br>
><br>
> * * *<br>
> -----Oorspronkelijk bericht-----<br>
> Van: <a href="mailto:tdwg-content-bounces@lists.tdwg.org">tdwg-content-bounces@lists.tdwg.org</a> namens Bob Morris<br>
> Verzonden: wo 8-12-2010 20:12<br>
> Aan: Markus Döring (GBIF)<br>
> CC: <a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a> List<br>
> Onderwerp: Re: [tdwg-content] canonical name for named hybrid &<br>
> infragenericnames<br>
><br>
> Your placement of the multiplication sign × does not seem code<br>
> compliant. It looks too close. Maybe. Also there might be a question<br>
> about whether a TDWG requirement to use the multiplication sign can be<br>
> easily implemented by all providers.<br>
><br>
> On these subjects The Appendix on Hybrid Names of ICBN seems<br>
> contradictory in that H.3A.1<br>
> (<a href="http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm" target="_blank">http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm</a>, quoted<br>
> below) seems to allow your placement, but Note 1. there seems to<br>
> require space. Note 1. would, with H.3A.1 imply that there must be<br>
> more white space to the left than right of the multiplication sign or<br>
> its surrogate. One spacing that seems to violate all interpretations<br>
> of A.3A.1 is equal white space around the multiplication sign. My<br>
> guess is that the overwhelming fraction of printed hybrid names are<br>
> thereby noncompliant unless something elsewhere resolves the issue).<br>
> Making the amount of white space significant in a parsed string is a<br>
> horrifying thought.<br>
><br>
> --Bob Morris<br>
><br>
> "Recommendation H.3A<br>
><br>
> H.3A.1. The multiplication sign ×, indicating the hybrid nature of a<br>
> taxon, should be placed so as to express that it belongs with the name<br>
> or epithet but is not actually part of it. The exact amount of space,<br>
> if any, between the multiplication sign and the initial letter of the<br>
> name or epithet should depend on what best serves readability.<br>
><br>
> Note 1. The multiplication sign × in a hybrid formula is always placed<br>
> between, and separate from, the names of the parents.<br>
> H.3A.2. If the multiplication sign is not available it should be<br>
> approximated by a lower case letter "x" (not italicized)."<br>
> <a href="http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm" target="_blank">http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm</a><br>
><br>
><br>
> ======================<br>
><br>
><br>
><br>
> On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"<br>
> <<a href="mailto:mdoering@gbif.org">mdoering@gbif.org</a>> wrote:<br>
>> talking about canonical names again I want to use the oppertunity and get<br>
>> rid of another question I have.<br>
>> What is the code compliant canonical version of named hybrids (not<br>
>> formulas) and infrageneric names?<br>
>><br>
>><br>
>> Are these examples correct?<br>
>><br>
>> Botanical section:<br>
>> verbatim: Maxillaria sect. Multiflorae Christenson<br>
>> canonical: Maxillaria sect. Multiflorae<br>
>><br>
>> Botanical subgenus:<br>
>> verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev<br>
>> canonical: Anthemis subgen. Maruta<br>
>><br>
>> Botanical series:<br>
>> verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling<br>
>> canonical: Artemisia ser. Codonocephalae<br>
>><br>
>> Zoological subgenus:<br>
>> verbatim: Murex (Promurex) Ponder & Vokes, 1988<br>
>> canonical: Murex subgen. Promurex<br>
>> # if we use parenthesis to indicate the subgenus we can only guess if its<br>
>> an author or subgenus name<br>
>><br>
>> Zoological species<br>
>> verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953<br>
>> canonical: Leptochilus beaumonti<br>
>><br>
>><br>
>><br>
>> Botanical named genus hybrid:<br>
>> verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.<br>
>> canonical: ×Agropogon littoralis<br>
>><br>
>> Botanical named infrageneric hybrid:<br>
>> verbatim: Eryngium nothosect. Alpestria Burdet & Miège<br>
>> canonical: Eryngium nothosect. Alpestria<br>
>><br>
>> Botanical named species hybrid:<br>
>> verbatim: Salix ×capreola Andersson (1867)<br>
>> canonical: Salix ×capreola Andersson (1867)<br>
>><br>
>> Botanical variety, named species hybrid:<br>
>> verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder<br>
>> canonical: Populus ×canadensis var. serotina<br>
>><br>
>> Botanical named infraspecific hybrid:<br>
>> verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay<br>
>> canonical: Polypodium vulgare nothosubsp. mantoniae<br>
>><br>
>><br>
>><br>
><br>
</div></div>> _______________________________________________<br>
> tdwg-content mailing list<br>
<div class="im">> <a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>
</div><div class="im">> <a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Robert A. Morris<br>
Emeritus Professor of Computer Science<br>
UMASS-Boston<br>
100 Morrissey Blvd<br>
Boston, MA 02125-3390<br>
Associate, Harvard University Herbaria<br>
email: <a href="mailto:morris.bob@gmail.com">morris.bob@gmail.com</a><br>
web: <a href="http://bdei.cs.umb.edu/" target="_blank">http://bdei.cs.umb.edu/</a><br>
web: <a href="http://etaxonomy.org/mw/FilteredPush" target="_blank">http://etaxonomy.org/mw/FilteredPush</a><br>
<a href="http://www.cs.umb.edu/~ram" target="_blank">http://www.cs.umb.edu/~ram</a><br>
phone (+1) 857 222 7992 (mobile)<br>
_______________________________________________<br>
tdwg-content mailing list<br>
</div><div class="im"><a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>
</div><div><div></div><div class="h5"><a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>---------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>
1630 Linden Drive<br>Madison, WI 53706<br><a href="http://www.taxonconcept.org/" target="_blank">TaxonConcept Knowledge Base</a> / <a href="http://lod.geospecies.org/" target="_blank">GeoSpecies Knowledge Base</a><br><a href="http://about.geospecies.org/" target="_blank">About the GeoSpecies Knowledge Base</a><br>
------------------------------------------------------------<br>
</div>