proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
1. We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and 2. dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information. authorship - the authorship information that follows a scientific name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
1. canonical names only
Mapping: canonical name -> dwc:scientificName
2. canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship
3. verbatim name only
Mapping: verbatim name -> dwc:scientificName
4. all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
5. a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best, David Remsen / Markus Döring
talking about canonical names again I want to use the oppertunity and get rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if its an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and
- dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information. authorship - the authorship information that follows a scientific name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best, David Remsen / Markus Döring
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue). Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter “x” (not italicized)." http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and get rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if its an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and
- dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information. authorship - the authorship information that follows a scientific name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best, David Remsen / Markus Döring
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Don't forget that 1994 publication (copyrighted by TDWG as ISBN 0-913196-62-2, and a prior TDWG standard) that Greg Whitbread sent out and called "full circle" It spells out how to handle Plant Names in databases. http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html
Plant name structure has changed minimally since 1994 (ICBN revisions in 99 and 05) so this comprehensive document is still somewhat relevant except for all the various name "Levels" which have not been adopted by any of the more modern standards, but the same topics have been passed around in detail of late. And we now have UTF-8 which enables the multiplication sign recommended but unavailable with ASCII only.
The various sections in this publication concerning hybrids say:
Intergeneric hybrids (and graft chimaeras)
The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. Similarly the name of an intergeneric graft-chimaera is preceded by a "+" (plus symbol). The lower case x symbol is used instead of the multiplication sign, which is not available in the ASCII character set of most computers. Wherever possible this symbol should be converted back to a multiplication sign in typesetting or printing operations. To distinguish the marker from the following name, a space should separate them in data files.
´ Cupressocyparis leylandii (A.B. Jacks. & Dallim.) Dallim.
´
intergeneric hybrid marker
Cupressocyparis
genus name
leylandii
species epithet
(A.B. Jacks. & Dallim.) Dallim.
author string
Interspecific hybrids (and graft chimaeras)
The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" (plus sign) preceding the species epithet. As above, the alphabetic x substitutes for a multiplication sign.
Spartina ´ townsendii H.Groves & J.Groves
Spartina
genus name
´
interspecific hybrid marker
townsendii
species epithet
H.Groves & J.Groves
author string
The full name of an interspecific hybrid that has not been named, that is one given by hybrid formula, is composed of two parts, the genus name and the hybrid formula. The hybrid formula is given in place of the species epithet element. Again an alphabetic x substitutes for a multiplication sign.
Primula veris ´ vulgaris
Primula
genus name
veris ´ vulgaris
hybrid formula
Name element 1: Intergeneric hybrid (or chimaera) marker
Content:
* An "´ " or "+" placed before a hybrid or chimaera genus name.
Composed of:
* x (lower case alphabetic x) or + (addition sign).
Examples:
´ Cupressocyparis leylandii + Crataegomespilus dardarii
Rules:
* Each full name of an intergeneric hybrid must include the ´ marker. * Each full name of an intergeneric chimaera must include the + marker. * The alphabetic x substitutes in computers for the multiplication sign specified by the International Code of Botanical Nomenclature. Whenever possible it should be replaced by a multiplication sign in printed output. * In printout the x (or multiplication sign, ´ , if available) or + is normally printed adjacent to the name with no intervening space. However, in data files they should be separated by a space to ensure that the marker is not confused with the first letter of the name.
Other Standards:
* In ITF and HISPID; unspecified in CHIN.
Name element 3: Interspecific hybrid (or chimaera) marker
Content:
* The ´ marker for named interspecific hybrids or the + marker for named interspecific chimaeras.
Composed of:
* "´ " (lower case alphabetic x) or "+" (addition sign).
Example:
Spartina ´ townsendii
Rules:
* Each full name of a named interspecific hybrid must contain the ´ symbol. This is placed before the species epithet without an intervening space in printed output. However, it should be separated in data files by an intervening space to ensure that it is not confused with the first letter of the name. * Each full name of a named interspecific chimaera must contain the + symbol placed before the species epithet.
Other Standards:
* ITF, HISPID; not specified in CHIN.
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Wednesday, December 08, 2010 1:13 PM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm , quoted
below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue).
Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents.
H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)."
http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
<mdoering@gbif.org mailto:mdoering@gbif.org > wrote:
talking about canonical names again I want to use the oppertunity and get rid of another question I have.
What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section:
verbatim: Maxillaria sect. Multiflorae Christenson
canonical: Maxillaria sect. Multiflorae
Botanical subgenus:
verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
canonical: Anthemis subgen. Maruta
Botanical series:
verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
canonical: Artemisia ser. Codonocephalae
Zoological subgenus:
verbatim: Murex (Promurex) Ponder & Vokes, 1988
canonical: Murex subgen. Promurex
# if we use parenthesis to indicate the subgenus we can only guess if
its an author or subgenus name
Zoological species
verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
canonical: Leptochilus beaumonti
Botanical named genus hybrid:
verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid:
verbatim: Eryngium nothosect. Alpestria Burdet & Miège
canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid:
verbatim: Salix ×capreola Andersson (1867)
canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid:
verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid:
verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and 2. dwc:scientificName
follow the more accepted convention that is better represented by the
earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information.
authorship - the authorship information that follows a scientific
name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and
how they would be mapped to the existing terms. In cases 4 and 5 we
also propose how we would map these were there a 3rd available term
(called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3
diff. columns
Mapping a: verbatim name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names ->
dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and that
dwc:scientificName follow the more accepted convention that is better
represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Robert A. Morris
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com mailto:morris.bob@gmail.com
web: http://bdei.cs.umb.edu/ http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
Actually I wasn't forgetting it, I was ignoring it and its cousins, because they are TDWG "Prior Standards" in the specific sense that they are "Standards that were ratified prior to 2005 and that are not currently being promoted for ratification under the post 2006 ratification process. These standards currently lack a 'champion' to bring them into line with the draft specification and submit them to the new standards development process adopted in St Louis in 2006." Are you offering to become its champion? :-)
On Wed, Dec 8, 2010 at 3:25 PM, Chuck Miller Chuck.Miller@mobot.org wrote:
Don't forget that 1994 publication (copyrighted by TDWG as ISBN 0-913196-62-2, and a prior TDWG standard) that Greg Whitbread sent out and called “full circle” It spells out how to handle Plant Names in databases. http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html
Plant name structure has changed minimally since 1994 (ICBN revisions in 99 and 05) so this comprehensive document is still somewhat relevant except for all the various name “Levels” which have not been adopted by any of the more modern standards, but the same topics have been passed around in detail of late. And we now have UTF-8 which enables the multiplication sign recommended but unavailable with ASCII only.
The various sections in this publication concerning hybrids say:
*Intergeneric hybrids (and graft chimaeras)*
The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. Similarly the name of an intergeneric graft-chimaera is preceded by a "+" (plus symbol). The lower case x symbol is used instead of the multiplication sign, which is not available in the ASCII character set of most computers. Wherever possible this symbol should be converted back to a multiplication sign in typesetting or printing operations. To distinguish the marker from the following name, a space should separate them in data files.
´ *Cupressocyparis leylandii* (A.B. Jacks. & Dallim.) Dallim.
´
intergeneric hybrid marker
*Cupressocyparis*
genus name
*leylandii*
species epithet
(A.B. Jacks. & Dallim.) Dallim.
author string
*Interspecific hybrids (and graft chimaeras)*
The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" (plus sign) preceding the species epithet. As above, the alphabetic x substitutes for a multiplication sign.
*Spartina* ´ *townsendii* H.Groves & J.Groves
*Spartina*
genus name
´
interspecific hybrid marker
*townsendii*
species epithet
H.Groves & J.Groves
author string
The full name of an interspecific hybrid that has not been named, that is one given by hybrid formula, is composed of two parts, the genus name and the hybrid formula. The hybrid formula is given in place of the species epithet element. Again an alphabetic x substitutes for a multiplication sign.
*Primula veris* ´ *vulgaris*
*Primula*
genus name
*veris* ´ *vulgaris*
hybrid formula
*Name element 1: Intergeneric hybrid (or chimaera) marker*
Content:
• An "´ " or "+" placed before a hybrid or chimaera genus name.
Composed of:
• x (lower case alphabetic x) or + (addition sign).
Examples:
´ *Cupressocyparis leylandii
- Crataegomespilus dardarii*
Rules:
• Each full name of an intergeneric hybrid must include the ´ marker. • Each full name of an intergeneric chimaera must include the + marker. • The alphabetic x substitutes in computers for the multiplication sign specified by the International Code of Botanical Nomenclature. Whenever possible it should be replaced by a multiplication sign in printed output. • In printout the x (or multiplication sign, ´ , if available) or + is normally printed adjacent to the name with no intervening space. However, in data files they should be separated by a space to ensure that the marker is not confused with the first letter of the name.
Other Standards:
• In ITF and HISPID; unspecified in CHIN.
*Name element 3: Interspecific hybrid (or chimaera) marker*
Content:
• The ´ marker for named interspecific hybrids or the + marker for named interspecific chimaeras.
Composed of:
• "´ " (lower case alphabetic x) or "+" (addition sign).
Example:
*Spartina* ´ *townsendii*
Rules:
• Each full name of a named interspecific hybrid must contain the ´symbol. This is placed before the species epithet without an intervening space in printed output. However, it should be separated in data files by an intervening space to ensure that it is not confused with the first letter of the name. • Each full name of a named interspecific chimaera must contain the + symbol placed before the species epithet.
Other Standards:
• ITF, HISPID; not specified in CHIN.
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Wednesday, December 08, 2010 1:13 PM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted
below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue).
Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents.
H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter “x” (not italicized)."
http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and get
rid of another question I have.
What is the code compliant canonical version of named hybrids (not
formulas) and infrageneric names?
Are these examples correct?
Botanical section:
verbatim: Maxillaria sect. Multiflorae Christenson
canonical: Maxillaria sect. Multiflorae
Botanical subgenus:
verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
canonical: Anthemis subgen. Maruta
Botanical series:
verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
canonical: Artemisia ser. Codonocephalae
Zoological subgenus:
verbatim: Murex (Promurex) Ponder & Vokes, 1988
canonical: Murex subgen. Promurex
# if we use parenthesis to indicate the subgenus we can only guess if
its an author or subgenus name
Zoological species
verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
canonical: Leptochilus beaumonti
Botanical named genus hybrid:
verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid:
verbatim: Eryngium nothosect. Alpestria Burdet & Miège
canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid:
verbatim: Salix ×capreola Andersson (1867)
canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid:
verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid:
verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the
current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and 2. dwc:scientificName
follow the more accepted convention that is better represented by the
earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed,
complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part
of the discussion:
dwc:scientificName - The full scientific name, with authorship and date
information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the
scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data
configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name
without authorship information.
authorship - the authorship information that follows a scientific
name verbatim name - the verbatim text stored in a source database when
it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and
how they would be mapped to the existing terms. In cases 4 and 5 we
also propose how we would map these were there a 3rd available term
(called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3
diff. columns
Mapping a: verbatim name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names ->
dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to
support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and that
dwc:scientificName follow the more accepted convention that is better
represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list
tdwg-content@lists.tdwg.org
tdwg-content mailing list
tdwg-content@lists.tdwg.org
--
Robert A. Morris
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://etaxonomy.org/mw/FilteredPush
phone (+1) 857 222 7992 (mobile)
tdwg-content mailing list
tdwg-content@lists.tdwg.org
Bob,
Maybe we should reconvene the folks who authored the 94 publication. Frank is still active and many others.
But, champion or not, truth is truth wherever you find it. The ICBN isn't a TDWG standard either. A lot of work by those botanical taxonomists went into the preparation of that detailed 1994 document. It seems a shame to rediscover facts that haven't changed. Exchange formats and representations have changed, but the fundamental structures of plant names haven't. Same can be said of zoological, mycological, microbial, etc. names I'm sure but no group has ever published a document like this for them as far as I know.
Chuck
From: Bob Morris [mailto:morris.bob@gmail.com] Sent: Wednesday, December 08, 2010 3:05 PM To: Chuck Miller Cc: Markus Döring (GBIF); tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Actually I wasn't forgetting it, I was ignoring it and its cousins, because they are TDWG "Prior Standards" in the specific sense that they are
"Standards that were ratified prior to 2005 and that are not currently being promoted for ratification under the post 2006 ratification process. These standards currently lack a 'champion' to bring them into line with the draft specification and submit them to the new standards development process adopted in St Louis in 2006."
Are you offering to become its champion? :-)
On Wed, Dec 8, 2010 at 3:25 PM, Chuck Miller Chuck.Miller@mobot.org wrote:
Don't forget that 1994 publication (copyrighted by TDWG as ISBN 0-913196-62-2, and a prior TDWG standard) that Greg Whitbread sent out and called "full circle" It spells out how to handle Plant Names in databases. http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html
Plant name structure has changed minimally since 1994 (ICBN revisions in 99 and 05) so this comprehensive document is still somewhat relevant except for all the various name "Levels" which have not been adopted by any of the more modern standards, but the same topics have been passed around in detail of late. And we now have UTF-8 which enables the multiplication sign recommended but unavailable with ASCII only.
The various sections in this publication concerning hybrids say:
Intergeneric hybrids (and graft chimaeras)
The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. Similarly the name of an intergeneric graft-chimaera is preceded by a "+" (plus symbol). The lower case x symbol is used instead of the multiplication sign, which is not available in the ASCII character set of most computers. Wherever possible this symbol should be converted back to a multiplication sign in typesetting or printing operations. To distinguish the marker from the following name, a space should separate them in data files.
´ Cupressocyparis leylandii (A.B. Jacks. & Dallim.) Dallim.
´
intergeneric hybrid marker
Cupressocyparis
genus name
leylandii
species epithet
(A.B. Jacks. & Dallim.) Dallim.
author string
Interspecific hybrids (and graft chimaeras)
The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" (plus sign) preceding the species epithet. As above, the alphabetic x substitutes for a multiplication sign.
Spartina ´ townsendii H.Groves & J.Groves
Spartina
genus name
´
interspecific hybrid marker
townsendii
species epithet
H.Groves & J.Groves
author string
The full name of an interspecific hybrid that has not been named, that is one given by hybrid formula, is composed of two parts, the genus name and the hybrid formula. The hybrid formula is given in place of the species epithet element. Again an alphabetic x substitutes for a multiplication sign.
Primula veris ´ vulgaris
Primula
genus name
veris ´ vulgaris
hybrid formula
Name element 1: Intergeneric hybrid (or chimaera) marker
Content:
* An "´ " or "+" placed before a hybrid or chimaera genus name.
Composed of:
* x (lower case alphabetic x) or + (addition sign).
Examples:
´ Cupressocyparis leylandii + Crataegomespilus dardarii
Rules:
* Each full name of an intergeneric hybrid must include the ´ marker. * Each full name of an intergeneric chimaera must include the + marker. * The alphabetic x substitutes in computers for the multiplication sign specified by the International Code of Botanical Nomenclature. Whenever possible it should be replaced by a multiplication sign in printed output. * In printout the x (or multiplication sign, ´ , if available) or + is normally printed adjacent to the name with no intervening space. However, in data files they should be separated by a space to ensure that the marker is not confused with the first letter of the name.
Other Standards:
* In ITF and HISPID; unspecified in CHIN.
Name element 3: Interspecific hybrid (or chimaera) marker
Content:
* The ´ marker for named interspecific hybrids or the + marker for named interspecific chimaeras.
Composed of:
* "´ " (lower case alphabetic x) or "+" (addition sign).
Example:
Spartina ´ townsendii
Rules:
* Each full name of a named interspecific hybrid must contain the ´ symbol. This is placed before the species epithet without an intervening space in printed output. However, it should be separated in data files by an intervening space to ensure that it is not confused with the first letter of the name. * Each full name of a named interspecific chimaera must contain the + symbol placed before the species epithet.
Other Standards:
* ITF, HISPID; not specified in CHIN.
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Wednesday, December 08, 2010 1:13 PM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm , quoted
below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue).
Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents.
H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)."
http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
<mdoering@gbif.org mailto:mdoering@gbif.org > wrote:
talking about canonical names again I want to use the oppertunity and get rid of another question I have.
What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section:
verbatim: Maxillaria sect. Multiflorae Christenson
canonical: Maxillaria sect. Multiflorae
Botanical subgenus:
verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
canonical: Anthemis subgen. Maruta
Botanical series:
verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
canonical: Artemisia ser. Codonocephalae
Zoological subgenus:
verbatim: Murex (Promurex) Ponder & Vokes, 1988
canonical: Murex subgen. Promurex
# if we use parenthesis to indicate the subgenus we can only guess if
its an author or subgenus name
Zoological species
verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
canonical: Leptochilus beaumonti
Botanical named genus hybrid:
verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid:
verbatim: Eryngium nothosect. Alpestria Burdet & Miège
canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid:
verbatim: Salix ×capreola Andersson (1867)
canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid:
verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid:
verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and 2. dwc:scientificName
follow the more accepted convention that is better represented by the
earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information.
authorship - the authorship information that follows a scientific
name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and
how they would be mapped to the existing terms. In cases 4 and 5 we
also propose how we would map these were there a 3rd available term
(called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3
diff. columns
Mapping a: verbatim name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names ->
dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and that
dwc:scientificName follow the more accepted convention that is better
represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Robert A. Morris
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com mailto:morris.bob@gmail.com
web: http://bdei.cs.umb.edu/ http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well, as pointed out earlier, this 'standard' is now fifteen to twenty years out of date, although strictly speaking, it never was quite accurate, ever.
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris Verzonden: wo 8-12-2010 22:05 Aan: Chuck Miller CC: tdwg-content@lists.tdwg.org; Markus Döring (GBIF) Onderwerp: Re: [tdwg-content] canonical name for named hybrid &infragenericnames
Actually I wasn't forgetting it, I was ignoring it and its cousins, because they are TDWG "Prior Standards" in the specific sense that they are "Standards that were ratified prior to 2005 and that are not currently being promoted for ratification under the post 2006 ratification process. These standards currently lack a 'champion' to bring them into line with the draft specification and submit them to the new standards development process adopted in St Louis in 2006." Are you offering to become its champion? :-)
On Wed, Dec 8, 2010 at 3:25 PM, Chuck Miller Chuck.Miller@mobot.org wrote:
Don't forget that 1994 publication (copyrighted by TDWG as ISBN 0-913196-62-2, and a prior TDWG standard) that Greg Whitbread sent out and called "full circle" It spells out how to handle Plant Names in databases. http://www.nhm.ac.uk/hosted_sites/tdwg/plants.html
Plant name structure has changed minimally since 1994 (ICBN revisions in 99 and 05) so this comprehensive document is still somewhat relevant except for all the various name "Levels" which have not been adopted by any of the more modern standards, but the same topics have been passed around in detail of late. And we now have UTF-8 which enables the multiplication sign recommended but unavailable with ASCII only.
The various sections in this publication concerning hybrids say:
*Intergeneric hybrids (and graft chimaeras)*
The full name of an intergeneric hybrid has in addition an "x" (lower case alphabetic x symbol) preceding the generic name as a generic hybrid marker. Similarly the name of an intergeneric graft-chimaera is preceded by a "+" (plus symbol). The lower case x symbol is used instead of the multiplication sign, which is not available in the ASCII character set of most computers. Wherever possible this symbol should be converted back to a multiplication sign in typesetting or printing operations. To distinguish the marker from the following name, a space should separate them in data files.
´ *Cupressocyparis leylandii* (A.B. Jacks. & Dallim.) Dallim.
´
intergeneric hybrid marker
*Cupressocyparis*
genus name
*leylandii*
species epithet
(A.B. Jacks. & Dallim.) Dallim.
author string
*Interspecific hybrids (and graft chimaeras)*
The full name of a named interspecific hybrid or chimaera has in addition an "x" (lower case alphabetic x) or "+" (plus sign) preceding the species epithet. As above, the alphabetic x substitutes for a multiplication sign.
*Spartina* ´ *townsendii* H.Groves & J.Groves
*Spartina*
genus name
´
interspecific hybrid marker
*townsendii*
species epithet
H.Groves & J.Groves
author string
The full name of an interspecific hybrid that has not been named, that is one given by hybrid formula, is composed of two parts, the genus name and the hybrid formula. The hybrid formula is given in place of the species epithet element. Again an alphabetic x substitutes for a multiplication sign.
*Primula veris* ´ *vulgaris*
*Primula*
genus name
*veris* ´ *vulgaris*
hybrid formula
*Name element 1: Intergeneric hybrid (or chimaera) marker*
Content:
. An "´ " or "+" placed before a hybrid or chimaera genus name.
Composed of:
. x (lower case alphabetic x) or + (addition sign).
Examples:
´ *Cupressocyparis leylandii
- Crataegomespilus dardarii*
Rules:
. Each full name of an intergeneric hybrid must include the ´ marker. . Each full name of an intergeneric chimaera must include the + marker. . The alphabetic x substitutes in computers for the multiplication sign specified by the International Code of Botanical Nomenclature. Whenever possible it should be replaced by a multiplication sign in printed output. . In printout the x (or multiplication sign, ´ , if available) or + is normally printed adjacent to the name with no intervening space. However, in data files they should be separated by a space to ensure that the marker is not confused with the first letter of the name.
Other Standards:
. In ITF and HISPID; unspecified in CHIN.
*Name element 3: Interspecific hybrid (or chimaera) marker*
Content:
. The ´ marker for named interspecific hybrids or the + marker for named interspecific chimaeras.
Composed of:
. "´ " (lower case alphabetic x) or "+" (addition sign).
Example:
*Spartina* ´ *townsendii*
Rules:
. Each full name of a named interspecific hybrid must contain the ´symbol. This is placed before the species epithet without an intervening space in printed output. However, it should be separated in data files by an intervening space to ensure that it is not confused with the first letter of the name. . Each full name of a named interspecific chimaera must contain the + symbol placed before the species epithet.
Other Standards:
. ITF, HISPID; not specified in CHIN.
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Wednesday, December 08, 2010 1:13 PM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted
below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue).
Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents.
H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)."
http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)"
mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and get
rid of another question I have.
What is the code compliant canonical version of named hybrids (not
formulas) and infrageneric names?
Are these examples correct?
Botanical section:
verbatim: Maxillaria sect. Multiflorae Christenson
canonical: Maxillaria sect. Multiflorae
Botanical subgenus:
verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev
canonical: Anthemis subgen. Maruta
Botanical series:
verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling
canonical: Artemisia ser. Codonocephalae
Zoological subgenus:
verbatim: Murex (Promurex) Ponder & Vokes, 1988
canonical: Murex subgen. Promurex
# if we use parenthesis to indicate the subgenus we can only guess if
its an author or subgenus name
Zoological species
verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953
canonical: Leptochilus beaumonti
Botanical named genus hybrid:
verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb.
canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid:
verbatim: Eryngium nothosect. Alpestria Burdet & Miège
canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid:
verbatim: Salix ×capreola Andersson (1867)
canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid:
verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder
canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid:
verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay
canonical: Polypodium vulgare nothosubsp. mantoniae
On Dec 8, 2010, at 17:09, David Remsen (GBIF) wrote:
Markus and I wanted to try to consolidate the issues related to the
current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and 2. dwc:scientificName
follow the more accepted convention that is better represented by the
earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed,
complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part
of the discussion:
dwc:scientificName - The full scientific name, with authorship and date
information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the
scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data
configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name
without authorship information.
authorship - the authorship information that follows a scientific
name verbatim name - the verbatim text stored in a source database when
it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and
how they would be mapped to the existing terms. In cases 4 and 5 we
also propose how we would map these were there a 3rd available term
(called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3
diff. columns
Mapping a: verbatim name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName /
authorship->dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names ->
dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to
support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and that
dwc:scientificName follow the more accepted convention that is better
represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list
tdwg-content@lists.tdwg.org
tdwg-content mailing list
tdwg-content@lists.tdwg.org
--
Robert A. Morris
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://etaxonomy.org/mw/FilteredPush
phone (+1) 857 222 7992 (mobile)
tdwg-content mailing list
tdwg-content@lists.tdwg.org
Having personally written Rec. H.3A.1, I do not see that it offers scope for being misread: the placement of the multiplication sign is a matter of style (and insight). As background information, the ICBN-preferred style is to put it directly in front of the name or epithet (no space whatsoever: ×Agropogon littoralis): just keep it nice together, so as to give computers no chance to mess it up (after all, at a line break, a computer is likely to separate these over more than one line).
Rec. H.3A Note 1 has been put in there (redundantly) for those who are careless readers, just to make sure the matter could not possibly be misunderstood by even the most whimsical. So, in a formula, the parents are separated by: space, multiplication sign, space; Agrostis stolonifera × Polypogon monspeliensis.
Paul van Rijckevorsel
* * * -----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris Verzonden: wo 8-12-2010 20:12 Aan: Markus Döring (GBIF) CC: tdwg-content@lists.tdwg.org List Onderwerp: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue). Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)." http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and get rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if its an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers. It is a rare and happy circumstance for a programmer to have the document author to consult!
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned. The distinction then is that "formula" means at least two, but there are uses which do not appear in a formula, right? So a natural language name extractor should follow this rule: - If the × adjoins text, the token to the left of any predecessor white space is not part of a taxon name, but otherwise it is. Example: In the fragment "not unlike ×Agropogon littoralis" the token 'unlike' is not part of a name.
Believe it or not, I am not complaining about ICBN. No programmer interpreting a document not written for programmers should complain if understanding it assumes knowledge and insight of the intended audience. Nor should they complain if they are raising points that are addressed in other parts of the document that they haven't read--which in this case for me is everything but H.3A.
Robust context sensitive parsers are marginally more complicated to write than those that require no lookahead, but this is surely not the only name parsing issue that requires lookahead, so I can't even complain on that score. In a vaguely related setting, parser writers might see the rather nicely set forth http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterminis...
Bob Morris p.s. Hey, I thought of something to complain about, albeit not about ICBN: I sure wish spec writers targeting software would banish "should" from their documents in favor of "must", even if multiple choices are accompanied by "... is preferred". Well, maybe it's a little complaint about the nomenclatural codes, because movement towards born-digital, semantically marked-up systematics literature will bump into it when people try to write semantically enhanced applications. It would be far better if publishers followed a set of rules with no "should" in them, for which compliance could be tested before publication.
Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
On Thu, Dec 9, 2010 at 3:39 AM, dipteryx@freeler.nl wrote:
Having personally written Rec. H.3A.1, I do not see that it offers scope for being misread: the placement of the multiplication sign is a matter of style (and insight). As background information, the ICBN-preferred style is to put it directly in front of the name or epithet (no space whatsoever: ×Agropogon littoralis): just keep it nice together, so as to give computers no chance to mess it up (after all, at a line break, a computer is likely to separate these over more than one line).
Rec. H.3A Note 1 has been put in there (redundantly) for those who are careless readers, just to make sure the matter could not possibly be misunderstood by even the most whimsical. So, in a formula, the parents are separated by: space, multiplication sign, space; Agrostis stolonifera × Polypogon monspeliensis.
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris Verzonden: wo 8-12-2010 20:12 Aan: Markus Döring (GBIF) CC: tdwg-content@lists.tdwg.org List Onderwerp: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue). Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)." http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and get rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if its an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Van: Bob Morris [mailto:morris.bob@gmail.com] Verzonden: do 9-12-2010 14:52
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers.
*** That depends on how you look at it. The ICBN is mostly written so that nobody who just browses through will make sense of it. It requires any user to read it in some depth, if he is to apply it. Perhaps the parser writer should realize that he is no exception?
But, actually, parsers are not going to be the answer to any question in biodiversity informatics. This is impossible, as the natural laws in a nomenclatural universe are subject to change (almost without notice). What was true ten years ago is not necessarily true now: it may have been retroactively changed. Anybody doing anything in biodiversity informatics should have at least some basic awareness of the natural laws that govern nomenclatural universes. * * *
It is a rare and happy circumstance for a programmer to have the document author to consult!
*** Not the document, just the recommendation (excluding the Note). The ICBN more or less is a wiki (has been for a hundred years). * * *
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned.
*** No, numbers are irrelevant, provided there are at least two parents involved. * * *
The distinction then is that "formula" means at least two, but there are uses which do not appear in a formula, right?
*** No, the distinction is that a name is a name, while a formula is a summation of (at least two) names.
×Agropogon littoralis is a name, and it is the same as Agropogon littoralis, for most purposes.
Agrostis stolonifera × Polypogon monspeliensis are two names, and the formula indicates their relation, which may be more complex than here: see Rec. H.2A.1; so just lifting a formula in isolation from the literature is out (Mentha longifolia > × rotundifolia is an obsolete form).
* * *
So a natural language name extractor should follow this rule: - If the × adjoins text, the token to the left of any predecessor white space is not part of a taxon name, but otherwise it is. Example: In the fragment "not unlike ×Agropogon littoralis" the token 'unlike' is not part of a name.
Believe it or not, I am not complaining about ICBN. No programmer interpreting a document not written for programmers should complain if understanding it assumes knowledge and insight of the intended audience. Nor should they complain if they are raising points that are addressed in other parts of the document that they haven't read--which in this case for me is everything but H.3A.
Robust context sensitive parsers are marginally more complicated to write than those that require no lookahead, but this is surely not the only name parsing issue that requires lookahead, so I can't even complain on that score. In a vaguely related setting, parser writers might see the rather nicely set forth http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterminis...
Bob Morris p.s. Hey, I thought of something to complain about, albeit not about ICBN: I sure wish spec writers targeting software would banish "should" from their documents in favor of "must", even if multiple choices are accompanied by "... is preferred". Well, maybe it's a little complaint about the nomenclatural codes, because movement towards born-digital, semantically marked-up systematics literature will bump into it when people try to write semantically enhanced applications. It would be far better if publishers followed a set of rules with no "should" in them, for which compliance could be tested before publication.
*** There are more distinctions than just "must" and "should" in the ICBN. Eliminating the "should" is not going to happen, but sometimes a "should" will grow up to become a "must".
Paul van Rijckevorsel
I think it makes the most sense to model these based on biology and informatics and then be able to output a code compliant string.
One reason is that the code can change and so you don't want to have that a fixed part of the most fundamental units of your information systems.
Another advantage is you don't have to have different intermediate structures and forms for each of the nomenclatural codes.
Do we want to have one entity for species or four+ that have to be duplicated throughout the entire software stack?
Done this way if the code changes you just need to alter the output code.
You also don't always know what is the appropriate code for a given string until the end.
Respectfully,
- Pete
On Thu, Dec 9, 2010 at 7:52 AM, Bob Morris morris.bob@gmail.com wrote:
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers. It is a rare and happy circumstance for a programmer to have the document author to consult!
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned. The distinction then is that "formula" means at least two, but there are uses which do not appear in a formula, right? So a natural language name extractor should follow this rule:
- If the × adjoins text, the token to the left of any predecessor
white space is not part of a taxon name, but otherwise it is. Example: In the fragment "not unlike ×Agropogon littoralis" the token 'unlike' is not part of a name.
Believe it or not, I am not complaining about ICBN. No programmer interpreting a document not written for programmers should complain if understanding it assumes knowledge and insight of the intended audience. Nor should they complain if they are raising points that are addressed in other parts of the document that they haven't read--which in this case for me is everything but H.3A.
Robust context sensitive parsers are marginally more complicated to write than those that require no lookahead, but this is surely not the only name parsing issue that requires lookahead, so I can't even complain on that score. In a vaguely related setting, parser writers might see the rather nicely set forth
http://stackoverflow.com/questions/1952931/how-to-rewrite-this-nondeterminis...
Bob Morris p.s. Hey, I thought of something to complain about, albeit not about ICBN: I sure wish spec writers targeting software would banish "should" from their documents in favor of "must", even if multiple choices are accompanied by "... is preferred". Well, maybe it's a little complaint about the nomenclatural codes, because movement towards born-digital, semantically marked-up systematics literature will bump into it when people try to write semantically enhanced applications. It would be far better if publishers followed a set of rules with no "should" in them, for which compliance could be tested before publication.
Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
On Thu, Dec 9, 2010 at 3:39 AM, dipteryx@freeler.nl wrote:
Having personally written Rec. H.3A.1, I do not see that it offers scope for being misread: the placement of the multiplication sign is a matter of style (and insight). As background information, the ICBN-preferred style is to put it directly in front of the name or epithet (no space whatsoever: ×Agropogon littoralis): just keep it nice together, so as to give computers no chance to mess it up (after all, at a line break, a computer is likely to separate these over more than one line).
Rec. H.3A Note 1 has been put in there (redundantly) for those who are careless readers, just to make sure the matter could not possibly be misunderstood by even the most whimsical. So, in a formula, the parents are separated by: space, multiplication sign, space; Agrostis stolonifera × Polypogon monspeliensis.
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris Verzonden: wo 8-12-2010 20:12 Aan: Markus Döring (GBIF) CC: tdwg-content@lists.tdwg.org List Onderwerp: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue). Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)." http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and
get
rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if
its
an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
On Thu, Dec 9, 2010 at 3:39 AM, dipteryx@freeler.nl wrote:
Having personally written Rec. H.3A.1, I do not see that it offers scope for being misread: the placement of the multiplication sign is a matter of style (and insight). As background information, the ICBN-preferred style is to put it directly in front of the name or epithet (no space whatsoever: ×Agropogon littoralis): just keep it nice together, so as to give computers no chance to mess it up (after all, at a line break, a computer is likely to separate these over more than one line).
Rec. H.3A Note 1 has been put in there (redundantly) for those who are careless readers, just to make sure the matter could not possibly be misunderstood by even the most whimsical. So, in a formula, the parents are separated by: space, multiplication sign, space; Agrostis stolonifera × Polypogon monspeliensis.
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens Bob Morris Verzonden: wo 8-12-2010 20:12 Aan: Markus Döring (GBIF) CC: tdwg-content@lists.tdwg.org List Onderwerp: Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Your placement of the multiplication sign × does not seem code compliant. It looks too close. Maybe. Also there might be a question about whether a TDWG requirement to use the multiplication sign can be easily implemented by all providers.
On these subjects The Appendix on Hybrid Names of ICBN seems contradictory in that H.3A.1 (http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm, quoted below) seems to allow your placement, but Note 1. there seems to require space. Note 1. would, with H.3A.1 imply that there must be more white space to the left than right of the multiplication sign or its surrogate. One spacing that seems to violate all interpretations of A.3A.1 is equal white space around the multiplication sign. My guess is that the overwhelming fraction of printed hybrid names are thereby noncompliant unless something elsewhere resolves the issue). Making the amount of white space significant in a parsed string is a horrifying thought.
--Bob Morris
"Recommendation H.3A
H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. The exact amount of space, if any, between the multiplication sign and the initial letter of the name or epithet should depend on what best serves readability.
Note 1. The multiplication sign × in a hybrid formula is always placed between, and separate from, the names of the parents. H.3A.2. If the multiplication sign is not available it should be approximated by a lower case letter "x" (not italicized)." http://ibot.sav.sk/icbn/frameset/0071AppendixINoHa003.htm
======================
On Wed, Dec 8, 2010 at 1:14 PM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
talking about canonical names again I want to use the oppertunity and
get
rid of another question I have. What is the code compliant canonical version of named hybrids (not formulas) and infrageneric names?
Are these examples correct?
Botanical section: verbatim: Maxillaria sect. Multiflorae Christenson canonical: Maxillaria sect. Multiflorae
Botanical subgenus: verbatim: Anthemis subgen. Maruta (Cass.) Tzvelev canonical: Anthemis subgen. Maruta
Botanical series: verbatim: Artemisia ser. Codonocephalae (Pamp.) Y.R.Ling canonical: Artemisia ser. Codonocephalae
Zoological subgenus: verbatim: Murex (Promurex) Ponder & Vokes, 1988 canonical: Murex subgen. Promurex # if we use parenthesis to indicate the subgenus we can only guess if
its
an author or subgenus name
Zoological species verbatim: Leptochilus (Neoleptochilus) beaumonti Giordani Soika 1953 canonical: Leptochilus beaumonti
Botanical named genus hybrid: verbatim: ×Agropogon littoralis (Sm.) C. E. Hubb. canonical: ×Agropogon littoralis
Botanical named infrageneric hybrid: verbatim: Eryngium nothosect. Alpestria Burdet & Miège canonical: Eryngium nothosect. Alpestria
Botanical named species hybrid: verbatim: Salix ×capreola Andersson (1867) canonical: Salix ×capreola Andersson (1867)
Botanical variety, named species hybrid: verbatim: Populus ×canadensis var. serotina (R. Hartig) Rehder canonical: Populus ×canadensis var. serotina
Botanical named infraspecific hybrid: verbatim: Polypodium vulgare nothosubsp. mantoniae(Rothm.) Schidlay canonical: Polypodium vulgare nothosubsp. mantoniae
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I think it makes the most sense to model these based on biology and informatics and then be able to output a code compliant string.
Yes, that is the ideal. Unfortunately, many/most content providers are not in a position to meet that ideal (they only have a text string for the name). GNA is intended to help content holders reach the ideal, but even if the information and tools are available, many/most content providers do not have the means to implement that ideal.
Done this way if the code changes you just need to alter the output code.
Homonymous use of the word "code"? :-)
You also don't always know what is the appropriate code for a given string until the end.
"Code" as in Nomenclatural Code, or as in software code?
Serious question. I assume you mean nomenclatural code, but I could read it either way.
Just an example of why these conversations are always so difficult.
Aloha, Rich
On 10/12/2010, at 12:52 AM, Bob Morris wrote:
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers. It is a rare and happy circumstance for a programmer to have the document author to consult!
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned.
I have just been running through some code in APNI dealing with just this issue. The cases handled by the code at present are:
A generic name may be marked as a hybrid. It is rendered × Foo
An infrageneric name may be marked as a hybrid. It is rendered $genericName × rank. bar
A specific name may be marked as a hybrid. It is rendered $genericName × bar
An infraspecific name may be marked as a hybrid. Ii is rendered $genericName bar rank. × baz
And we have names that are hybrid names.
hybrid_code 'I' --> foo - bar (intergrade) hybrid_code '+' --> foo + bar (graft) hybrid_code 'U' --> foo hybrid (unspecified hybrid?)
If foo or bar are themselves hybrids (a typical example being a grafting with hybrids - I think you get that sort of thing in commercial fruit production), then that term must be enclosed in parenthesis.
This last case illustrates the real problem: that a "single record" model is not adequate for names that complex. These types of graftings potentially have four different specific epithets.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
<snip>
APPENDIX I NAMES OF HYBRIDS Article H.3 H.3.1. Hybrids between representatives of two or more taxa may receive a name. For nomenclatural purposes, the hybrid nature of a taxon is indicated by placing the multiplication sign × before the name of an intergeneric hybrid or before the epithet in the name of an interspecific hybrid, or by prefixing the term "notho-" (optionally abbreviated "n-") to the term denoting the rank of the taxon (see Art. 3.2 and 4.4). All such taxa are designated nothotaxa.
Ex. 1. (The putative or known parentage is found in Art. H.2 Ex. 1.) ×Agropogon P. Fourn. (1934); ×Agropogon littoralis (Sm.) C. E. Hubb. (1946); Salix ×capreola Andersson (1867); Mentha ×smithiana R. A. Graham (1949); Polypodium vulgare nothosubsp. mantoniae (Rothm.) Schidlay (in Futák, Fl. Slov. 2: 225. 1966).
</snip>
My comment: Since the multiplication sign is used it is clear that the hybrid indicator is not part of the scientific name, even without a space after it. If the hybrid indicator is instead the lowercase "x" rather than the multiplication sign, then that is where a space between the indicator and the name is preferable, so as to make clear that the "x" is not part of the name...
So maybe when parsing you have to expect any of the following:
×Foo (as per the official examples) × Foo (also permissible I think, and widely used) x Foo
but hopefully not xFoo (although there are certainly examples of the latter in e.g. the GBIF cache...).
Interestingly, the Kew list of angiosperm names on the web uses the style X Foo (even less correct)... e.g. see
http://data.kew.org/cgi-bin/vpfg1992/genlist.pl?ORCHIDACEAE
(Just keeping you on your toes here)
Regards - Tony
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Friday, 10 December 2010 1:26 PM To: Bob Morris Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED]
On 10/12/2010, at 12:52 AM, Bob Morris wrote:
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers. It is a rare and happy circumstance for a programmer to have the document author to consult!
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned.
I have just been running through some code in APNI dealing with just this issue. The cases handled by the code at present are:
A generic name may be marked as a hybrid. It is rendered × Foo
An infrageneric name may be marked as a hybrid. It is rendered $genericName × rank. bar
A specific name may be marked as a hybrid. It is rendered $genericName × bar
An infraspecific name may be marked as a hybrid. Ii is rendered $genericName bar rank. × baz
And we have names that are hybrid names.
hybrid_code 'I' --> foo - bar (intergrade) hybrid_code '+' --> foo + bar (graft) hybrid_code 'U' --> foo hybrid (unspecified hybrid?)
If foo or bar are themselves hybrids (a typical example being a grafting with hybrids - I think you get that sort of thing in commercial fruit production), then that term must be enclosed in parenthesis.
This last case illustrates the real problem: that a "single record" model is not adequate for names that complex. These types of graftings potentially have four different specific epithets.
PS to the post below:
This is also explained fairly clearly (?) here:
http://books.google.com.au/books?id=In_Lv8iMt24C&pg=PA49&lpg=PA49
Cheers - Tony
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Tony.Rees@csiro.au Sent: Friday, 10 December 2010 5:48 PM To: pmurray@anbg.gov.au; morris.bob@gmail.com Cc: tdwg-content@lists.tdwg.org Subject: [ExternalEmail] Re: [tdwg-content] canonical name for named hybrid & infragenericnames
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
<snip>
APPENDIX I NAMES OF HYBRIDS Article H.3 H.3.1. Hybrids between representatives of two or more taxa may receive a name. For nomenclatural purposes, the hybrid nature of a taxon is indicated by placing the multiplication sign × before the name of an intergeneric hybrid or before the epithet in the name of an interspecific hybrid, or by prefixing the term "notho-" (optionally abbreviated "n-") to the term denoting the rank of the taxon (see Art. 3.2 and 4.4). All such taxa are designated nothotaxa.
Ex. 1. (The putative or known parentage is found in Art. H.2 Ex. 1.) ×Agropogon P. Fourn. (1934); ×Agropogon littoralis (Sm.) C. E. Hubb. (1946); Salix ×capreola Andersson (1867); Mentha ×smithiana R. A. Graham (1949); Polypodium vulgare nothosubsp. mantoniae (Rothm.) Schidlay (in Futák, Fl. Slov. 2: 225. 1966).
</snip>
My comment: Since the multiplication sign is used it is clear that the hybrid indicator is not part of the scientific name, even without a space after it. If the hybrid indicator is instead the lowercase "x" rather than the multiplication sign, then that is where a space between the indicator and the name is preferable, so as to make clear that the "x" is not part of the name...
So maybe when parsing you have to expect any of the following:
×Foo (as per the official examples) × Foo (also permissible I think, and widely used) x Foo
but hopefully not xFoo (although there are certainly examples of the latter in e.g. the GBIF cache...).
Interestingly, the Kew list of angiosperm names on the web uses the style X Foo (even less correct)... e.g. see
http://data.kew.org/cgi-bin/vpfg1992/genlist.pl?ORCHIDACEAE
(Just keeping you on your toes here)
Regards - Tony
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Friday, 10 December 2010 1:26 PM To: Bob Morris Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED]
On 10/12/2010, at 12:52 AM, Bob Morris wrote:
Thanks. To me what is interesting about this thread is that documents whose main(?) audience is authors and publishers, do not always address the needs of parser writers. It is a rare and happy circumstance for a programmer to have the document author to consult!
What I \think/ is implied by your answer is (something that requires biological knowledge that I don't have, namely) that there are hybrid names which are not necessarily a cross of two things, but rather only one is mentioned.
I have just been running through some code in APNI dealing with just
this
issue. The cases handled by the code at present are:
A generic name may be marked as a hybrid. It is rendered × Foo
An infrageneric name may be marked as a hybrid. It is rendered $genericName × rank. bar
A specific name may be marked as a hybrid. It is rendered $genericName × bar
An infraspecific name may be marked as a hybrid. Ii is rendered $genericName bar rank. × baz
And we have names that are hybrid names.
hybrid_code 'I' --> foo - bar (intergrade) hybrid_code '+' --> foo + bar (graft) hybrid_code 'U' --> foo hybrid (unspecified hybrid?)
If foo or bar are themselves hybrids (a typical example being a grafting with hybrids - I think you get that sort of thing in commercial fruit production), then that term must be enclosed in parenthesis.
This last case illustrates the real problem: that a "single record"
model
is not adequate for names that complex. These types of graftings potentially have four different specific epithets.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Apart from the one glaring error and the fact that it is running three Codes behind, it appears to be copied fairly accurately from the relevant Codes.
Why not use the actual texts instead?
PvR
Van: tdwg-content-bounces@lists.tdwg.org namens Tony.Rees@csiro.au Verzonden: vr 10-12-2010 7:52
PS to the post below:
This is also explained fairly clearly (?) here:
http://books.google.com.au/books?id=In_Lv8iMt24C&pg=PA49&lpg=PA49
Cheers - Tony
On 10/12/2010, at 5:47 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
Hmm ... . Well, perhaps there's an issue with our existing (software) code.
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=272817 http://www.anbg.gov.au/cgi-bin/apni?taxon_id=40429
There are only 9 genera in APNI that look like this. X Cynochloris Clifford & Everist X Calassodia M.A.Clem. X Agropogon P.Fourn. X Chilosimpliglottis Jeanes X Vappaculum M.A.Clem. & D.L.Jones X Taurodium D.L.Jones & M.A.Clem. X Glossadenia Kavulak X Cyanthera Hopper & A.P.Br. X Festulolium Asch. & Graebn.
Perhaps the issue is that it's a shade tricky to get it right if you have to use the letter, and so we err on the side of caution and put spaces around the genus name. Although a multiplication sign is preferred to an 'x', I believe our powerbuilder interface was written back in the day before all this new-fangled unicode. Or perhaps it's simply that getting Windows to do proper multiplication signs involves explaining codepages to the windows oracle client: a byzantine process at best, and one which involves getting Admin access to the box. Hence the 'x'.
(Once, ages ago, I got oracle sql-plus to work correctly on one Windows machine on the departmental network, but we never did succeed in getting the machine sitting right next to it to do umlauts properly. Our users addressed this issue by inserting html escape codes into the data.)
In any case - anyone looking to parse names into their components may encounter something like this.
Oh - here's some more:
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=257927 xAstackea xAstackea 'Winter Pink' xChamecordia xChamecordia 'Eric John' xChamecordia 'Jasper' xChamecordia 'Southern Stars' xChamecordia 'Susie' xRhinochilus xRhinochilus 'Dorothy'
I think that the issue is that these genera were only inserted into the data in order to make it possible to construct the cultivar name. The genera are not published scientific names at all - 'Chamecordia' doesn't appear anywhere as a genus name except in these records (heck: even google has never heard of it), but Wrigley, J. & Fagg, M. (2003) named the cultivars thusly, so we have to jam them into the data somehow.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Hi Paul, all,
Well, if the existing ANBG software code is displaying the hybrid symbol as an uppercase rather than a lowercase "x", then I would say there was something wrong with it, since this usage is not supported by the ICBN.
In any case, to summarise, a recipient / parser of incoming taxonomic names and associated data must therefore be able to cope successfully with hybrid indicators for genera in any of the following formats:
×Foo (ICBN preferred usage as per examples) × Foo (apparently tolerated, since white space appears to be optional??) x Foo (ICBN preferred alternative) X Foo (apparently incorrect, but found in some quite reputable systems) xFoo (again, probably tolerated, but not sure...)
have I missed anything? (e.g. "Goo × Hoo" or variants thereof?)
I am also presuming that in all these cases, the equivalent canonical version would be Foo. Does this mean that an extra DwC field would also be needed now, for hybrid indicator?
Regards - Tony ________________________________________ From: Paul Murray [pmurray@anbg.gov.au] Sent: Saturday, 11 December 2010 4:05 PM To: Rees, Tony (CMAR, Hobart) Cc: morris.bob@gmail.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED]
On 10/12/2010, at 5:47 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
Hmm ... . Well, perhaps there's an issue with our existing (software) code.
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=272817 http://www.anbg.gov.au/cgi-bin/apni?taxon_id=40429
There are only 9 genera in APNI that look like this. X Cynochloris Clifford & Everist X Calassodia M.A.Clem. X Agropogon P.Fourn. X Chilosimpliglottis Jeanes X Vappaculum M.A.Clem. & D.L.Jones X Taurodium D.L.Jones & M.A.Clem. X Glossadenia Kavulak X Cyanthera Hopper & A.P.Br. X Festulolium Asch. & Graebn.
Perhaps the issue is that it's a shade tricky to get it right if you have to use the letter, and so we err on the side of caution and put spaces around the genus name. Although a multiplication sign is preferred to an 'x', I believe our powerbuilder interface was written back in the day before all this new-fangled unicode. Or perhaps it's simply that getting Windows to do proper multiplication signs involves explaining codepages to the windows oracle client: a byzantine process at best, and one which involves getting Admin access to the box. Hence the 'x'.
(Once, ages ago, I got oracle sql-plus to work correctly on one Windows machine on the departmental network, but we never did succeed in getting the machine sitting right next to it to do umlauts properly. Our users addressed this issue by inserting html escape codes into the data.)
In any case - anyone looking to parse names into their components may encounter something like this.
Oh - here's some more:
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=257927 xAstackea xAstackea 'Winter Pink' xChamecordia xChamecordia 'Eric John' xChamecordia 'Jasper' xChamecordia 'Southern Stars' xChamecordia 'Susie' xRhinochilus xRhinochilus 'Dorothy'
I think that the issue is that these genera were only inserted into the data in order to make it possible to construct the cultivar name. The genera are not published scientific names at all - 'Chamecordia' doesn't appear anywhere as a genus name except in these records (heck: even google has never heard of it), but Wrigley, J. & Fagg, M. (2003) named the cultivars thusly, so we have to jam them into the data somehow.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Van: tdwg-content-bounces@lists.tdwg.org namens Tony.Rees@csiro.au Verzonden: za 11-12-2010 22:10
In any case, to summarise, a recipient / parser of incoming taxonomic names and associated data must therefore be able to cope successfully with hybrid indicators for genera in any of the following formats:
×Foo (ICBN preferred usage as per examples)
*** Yes, the preferred style. * * *
× Foo (apparently tolerated, since white space appears to be optional??)
*** This is not merely tolerated, but perfectly in order. It is a matter of style. * * *
x Foo (ICBN preferred alternative)
*** No, this is disallowed by the ICBN (Art. H.1). This is something that is only to be used when out of reach of a computer, as in when using a typewriter. * * *
X Foo (apparently incorrect, but found in some quite reputable systems)
*** Again, disallowed by the ICBN. * * *
xFoo (again, probably tolerated, but not sure...)
*** Again, disallowed by the ICBN. * * *
I am also presuming that in all these cases, the equivalent canonical version would be Foo. Does this mean that an extra DwC field would also be needed now, for hybrid indicator?
*** In that case it is a good idea to keep in mind that this extra DwC field for hybrid indicator would be needed at each of the three levels.
Paul van Rijckevorsel
Tony,
A mix up in the method generating APNI name strings reusing the HISPID hybrid indicator code for named hybrid - 'X' . Now fixed. Perhaps more interesting is that this method has been in place since 1992 and while, since then, the APNI web interface has addressed c. 12 million download requests not one of them has resulted in a complaint about the style of the hybrid indicator. I wonder if anybody cares.
One complaint that we do have though is that hybrid indicator upsets searching and sorting at least for named generic hybrids. It seems that users of the data would like to see it separated from the name string completely.
greg
On 12 December 2010 08:10, Tony.Rees@csiro.au wrote:
Hi Paul, all,
Well, if the existing ANBG software code is displaying the hybrid symbol as an uppercase rather than a lowercase "x", then I would say there was something wrong with it, since this usage is not supported by the ICBN.
In any case, to summarise, a recipient / parser of incoming taxonomic names and associated data must therefore be able to cope successfully with hybrid indicators for genera in any of the following formats:
×Foo (ICBN preferred usage as per examples) × Foo (apparently tolerated, since white space appears to be optional??) x Foo (ICBN preferred alternative) X Foo (apparently incorrect, but found in some quite reputable systems) xFoo (again, probably tolerated, but not sure...)
have I missed anything? (e.g. "Goo × Hoo" or variants thereof?)
I am also presuming that in all these cases, the equivalent canonical version would be Foo. Does this mean that an extra DwC field would also be needed now, for hybrid indicator?
Regards - Tony ________________________________________ From: Paul Murray [pmurray@anbg.gov.au] Sent: Saturday, 11 December 2010 4:05 PM To: Rees, Tony (CMAR, Hobart) Cc: morris.bob@gmail.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED]
On 10/12/2010, at 5:47 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
Hmm ... . Well, perhaps there's an issue with our existing (software) code.
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=272817 http://www.anbg.gov.au/cgi-bin/apni?taxon_id=40429
There are only 9 genera in APNI that look like this. X Cynochloris Clifford & Everist X Calassodia M.A.Clem. X Agropogon P.Fourn. X Chilosimpliglottis Jeanes X Vappaculum M.A.Clem. & D.L.Jones X Taurodium D.L.Jones & M.A.Clem. X Glossadenia Kavulak X Cyanthera Hopper & A.P.Br. X Festulolium Asch. & Graebn.
Perhaps the issue is that it's a shade tricky to get it right if you have to use the letter, and so we err on the side of caution and put spaces around the genus name. Although a multiplication sign is preferred to an 'x', I believe our powerbuilder interface was written back in the day before all this new-fangled unicode. Or perhaps it's simply that getting Windows to do proper multiplication signs involves explaining codepages to the windows oracle client: a byzantine process at best, and one which involves getting Admin access to the box. Hence the 'x'.
(Once, ages ago, I got oracle sql-plus to work correctly on one Windows machine on the departmental network, but we never did succeed in getting the machine sitting right next to it to do umlauts properly. Our users addressed this issue by inserting html escape codes into the data.)
In any case - anyone looking to parse names into their components may encounter something like this.
Oh - here's some more:
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=257927 xAstackea xAstackea 'Winter Pink' xChamecordia xChamecordia 'Eric John' xChamecordia 'Jasper' xChamecordia 'Southern Stars' xChamecordia 'Susie' xRhinochilus xRhinochilus 'Dorothy'
I think that the issue is that these genera were only inserted into the data in order to make it possible to construct the cultivar name. The genera are not published scientific names at all - 'Chamecordia' doesn't appear anywhere as a genus name except in these records (heck: even google has never heard of it), but Wrigley, J. & Fagg, M. (2003) named the cultivars thusly, so we have to jam them into the data somehow.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Hi Greg, Paul,
I checked versions 3 and 4 of HISPID and the intended hybrid indicator "x" is given as lowercase, not uppercase, so your display code may be fine, just have to fix the content :-) - actually as stated by Paul (Holland) and the ICBN, it needs to be replaced by a multiplication symbol for display anyway.
Re 12 million people can't be wrong - maybe they can, if the latest sales figures for Andre Rieu live in concert DVDs are anything to go by. (apologies to any A.R. lovers out there - just kidding of course)
On a more serious note - presumably those 12 million downloads were not machine-machine transactions where hybrid nomenclature was ever checked, or the downloaders cared particularly about compliance with the Code or other standards, or reconciling content from multiple sources - not an attitude we would encourage on this list, surely!
Cheers - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
________________________________ From: whitbread.greg@gmail.com [mailto:whitbread.greg@gmail.com] On Behalf Of greg whitbread Sent: Sunday, 12 December 2010 10:44 PM To: Rees, Tony (CMAR, Hobart) Cc: pmurray@anbg.gov.au; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED]
Tony,
A mix up in the method generating APNI name strings reusing the HISPID hybrid indicator code for named hybrid - 'X' . Now fixed. Perhaps more interesting is that this method has been in place since 1992 and while, since then, the APNI web interface has addressed c. 12 million download requests not one of them has resulted in a complaint about the style of the hybrid indicator. I wonder if anybody cares.
One complaint that we do have though is that hybrid indicator upsets searching and sorting at least for named generic hybrids. It seems that users of the data would like to see it separated from the name string completely.
greg
On 12 December 2010 08:10, Tony.Rees@csiro.au wrote: Hi Paul, all,
Well, if the existing ANBG software code is displaying the hybrid symbol as an uppercase rather than a lowercase "x", then I would say there was something wrong with it, since this usage is not supported by the ICBN.
In any case, to summarise, a recipient / parser of incoming taxonomic names and associated data must therefore be able to cope successfully with hybrid indicators for genera in any of the following formats:
×Foo (ICBN preferred usage as per examples) × Foo (apparently tolerated, since white space appears to be optional??) x Foo (ICBN preferred alternative) X Foo (apparently incorrect, but found in some quite reputable systems) xFoo (again, probably tolerated, but not sure...)
have I missed anything? (e.g. "Goo × Hoo" or variants thereof?)
I am also presuming that in all these cases, the equivalent canonical version would be Foo. Does this mean that an extra DwC field would also be needed now, for hybrid indicator?
Regards - Tony ________________________________________ From: Paul Murray [pmurray@anbg.gov.aumailto:pmurray@anbg.gov.au] Sent: Saturday, 11 December 2010 4:05 PM To: Rees, Tony (CMAR, Hobart) Cc: morris.bob@gmail.commailto:morris.bob@gmail.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] canonical name for named hybrid & infragenericnames [SEC=UNCLASSIFIED] On 10/12/2010, at 5:47 PM, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Dear all,
Paul Murray wrote:
A generic name may be marked as a hybrid. It is rendered × Foo
Actually my understanding is that ×Foo is the Code-endorsed version:
Hmm ... . Well, perhaps there's an issue with our existing (software) code.
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=272817 http://www.anbg.gov.au/cgi-bin/apni?taxon_id=40429
There are only 9 genera in APNI that look like this. X Cynochloris Clifford & Everist X Calassodia M.A.Clem. X Agropogon P.Fourn. X Chilosimpliglottis Jeanes X Vappaculum M.A.Clem. & D.L.Jones X Taurodium D.L.Jones & M.A.Clem. X Glossadenia Kavulak X Cyanthera Hopper & A.P.Brhttp://A.P.Br. X Festulolium Asch. & Graebn.
Perhaps the issue is that it's a shade tricky to get it right if you have to use the letter, and so we err on the side of caution and put spaces around the genus name. Although a multiplication sign is preferred to an 'x', I believe our powerbuilder interface was written back in the day before all this new-fangled unicode. Or perhaps it's simply that getting Windows to do proper multiplication signs involves explaining codepages to the windows oracle client: a byzantine process at best, and one which involves getting Admin access to the box. Hence the 'x'.
(Once, ages ago, I got oracle sql-plus to work correctly on one Windows machine on the departmental network, but we never did succeed in getting the machine sitting right next to it to do umlauts properly. Our users addressed this issue by inserting html escape codes into the data.)
In any case - anyone looking to parse names into their components may encounter something like this.
Oh - here's some more:
http://www.anbg.gov.au/cgi-bin/apni?taxon_id=257927 xAstackea xAstackea 'Winter Pink' xChamecordia xChamecordia 'Eric John' xChamecordia 'Jasper' xChamecordia 'Southern Stars' xChamecordia 'Susie' xRhinochilus xRhinochilus 'Dorothy'
I think that the issue is that these genera were only inserted into the data in order to make it possible to construct the cultivar name. The genera are not published scientific names at all - 'Chamecordia' doesn't appear anywhere as a genus name except in these records (heck: even google has never heard of it), but Wrigley, J. & Fagg, M. (2003) named the cultivars thusly, so we have to jam them into the data somehow.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
-- Greg Whitbread Australian National Botanic Gardens Australian National Herbarium +61 2 62509482 ghw@anbg.gov.aumailto:ghw@anbg.gov.au
Tony,
I didn't think I was being serious:( The presentation was corrected as soon as Paul pointed it out. The APNI web pages use the '×' entity. It just looks like a lower case 'x'.
Our herbarium database standardises hybrid indicator to upper case and translates this to lower case when talking HISPIDx.
The important bit for this discussion is that we do need extra elements to correctly discover, list and render names. These include (though incomplete): a named hybrid indicator; the kind of name; and for autonyms and nominate names, the author of the superordinate name element. A place for the rendered name may also be desirable if we expect clients to use these names correctly.
Most client systems however expect to find names separated into family, genus, species, rank, infraspecic epithet and authority. They do not want prefixed, named hybrid indicators but they do need the name assembled.
greg
greg
On Mon, 2010-12-13 at 09:29, Tony.Rees@csiro.au wrote:
Hi Greg, Paul,
...
I checked versions 3 and 4 of HISPID and the intended hybrid indicator “x” is given as lowercase, not uppercase, so your display code may be fine, just have to fix the content :-) – actually as stated by Paul (Holland) and the ICBN, it needs to be replaced by a multiplication symbol for display anyway.
...
Re 12 million people can’t be wrong – maybe they can, if the latest sales figures for Andre Rieu live in concert DVDs are anything to go by. (apologies to any A.R. lovers out there – just kidding of course)
...
On a more serious note – presumably those 12 million downloads were not machine-machine transactions where hybrid nomenclature was ever checked, or the downloaders cared particularly about compliance with the Code or other standards, or reconciling content from multiple sources – not an attitude we would encourage on this list, surely!
...
australian centre for plant bIodiversity research<------------------+ national greg whitBread voice: +61 2 62509 482 botanic Integrated Botanical Information System fax: +61 2 62509 599 gardens S........ I.T. happens.. ghw@anbg.gov.au +----------------------------------------->GPO Box 1777 Canberra 2601
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
From: "Greg Whitbread" ghw@anbg.gov.au Sent: Monday, December 13, 2010 1:37 AM
[...]
The important bit for this discussion is that we do need extra elements to correctly discover, list and render names. These include (though incomplete): a named hybrid indicator; [...]
*** A single hybrid indicator will not do it; commonly there are three levels at which hybridity may be registered: 1) genus 2) species 3) infraspecific taxon
As in Mentha ×piperita subsp. ×pyramidalis (Rec. H.11 Ex. 3), in contrast with Mentha ×piperita f. hirsuta (Rec. H.12 Ex. 1),
[in addition there is also the subdivision of a genus which may be a hybrid, but this will hardly ever be included in the name of a lower ranking taxon]
Paul
On 12/12/2010, at 10:43 PM, greg whitbread wrote:
One complaint that we do have though is that hybrid indicator upsets searching and sorting at least for named generic hybrids. It seems that users of the data would like to see it separated from the name string completely.
A somewhat cheeky alternative would be to separate them with a unicode zero-width space. In principle, anything scanning the text of the page should understand that the character is whitespace and begins a new word.
From: "Paul Murray" pmurray@bigpond.com Sent: Monday, December 13, 2010 1:20 AM
A somewhat cheeky alternative would be to separate them with a unicode zero-width space. In principle, anything scanning the text of the page should understand that the character is whitespace and begins a new word.
*** If this works (that is, anything scanning the text of the page does understand that the character is whitespace) then this is perfectly in keeping with the ICBN. The multiplication sign is not part of the name.
"H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. [...] should depend on what best serves readability."
Paul
"H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. [...] should depend on what best serves readability."
My reading of this would be that a non-breaking space, possibly narrower, would satisfy both this statement and correct typographic typesetting (which does not endorse "M. ×piperita" -- the multiplication sign is designed to be used with spacing).
Gregor
Van: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Verzonden: ma 13-12-2010 16:57
"H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. [...] should depend on what best serves readability."
My reading of this would be that a non-breaking space, possibly narrower, would satisfy both this statement and correct typographic typesetting (which does not endorse "M. ×piperita" -- the multiplication sign is designed to be used with spacing).
*** Actually, it endorses anything, a full space, a half space, no space, or whatever (with sophisticated typography, kerning, anything is possible), as long as it is clear that the multiplication sign is not part of the name. A character that is read by a machine as a space, but by the eye as a not-space is perfectly OK.
I never heard of any natural or higher law that requires a multiplication sign to be used with spacing. Here it is just a typographic character that can be used any which way, just like most any other typographic character. The ICBN uses it without a space, a perfectly acceptable style.
Paul
Oooh! I propose the invisible space 0x00 :-)
Bob
On Mon, Dec 13, 2010 at 11:14 AM, dipteryx@freeler.nl wrote:
Van: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Verzonden: ma 13-12-2010 16:57
"H.3A.1. The multiplication sign ×, indicating the hybrid nature of a taxon, should be placed so as to express that it belongs with the name or epithet but is not actually part of it. [...] should depend on what best serves readability."
My reading of this would be that a non-breaking space, possibly narrower, would satisfy both this statement and correct typographic typesetting (which does not endorse "M. ×piperita" -- the multiplication sign is designed to be used with spacing).
Actually, it endorses anything, a full space, a half space, no space, or whatever (with sophisticated typography, kerning, anything is possible), as long as it is clear that the multiplication sign is not part of the name. A character that is read by a machine as a space, but by the eye as a not-space is perfectly OK.
I never heard of any natural or higher law that requires a multiplication sign to be used with spacing. Here it is just a typographic character that can be used any which way, just like most any other typographic character. The ICBN uses it without a space, a perfectly acceptable style.
Paul
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On 14/12/2010, at 4:44 AM, Bob Morris wrote:
Oooh! I propose the invisible space 0x00 :-)
Didn't Phil Collins do a song about that?
The zero-width space is unicode \u200B, html ​ or ​ . Hence (hopefully this will come though correctly in the mail):
×​<i>Agropogon</i> P.Fourn.
×Agropogon P.Fourn.
If you copy that text into an editor and step through the letters, you'll find the invisible space. It works fine, but it does mean that anything consuming the text and parsing it into words has to be aware of unicode spaces beyond the usual ascii control characters.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Van: Paul Murray [mailto:pmurray@anbg.gov.au] Verzonden: vr 10-12-2010 3:25
I have just been running through some code in APNI dealing with just this issue.
*** The ICBN-way to deal with these are:
* A generic name may be marked as a hybrid. It is rendered ×Foo or nothogenus Foo
* An infrageneric name may be marked as a hybrid. It is rendered $genericName rank ×Foz or $genericName nothorank Foz
* A species name may be marked as a hybrid. It is rendered $genericName ×bar or nothospecies $genericName bar
* An infraspecific name may be marked as a hybrid. It is rendered $genericName bar rank ×baz or $genericName bar nothorank baz
The following do not concern hybrid names or names in the sense of the ICBN at all: hybrid_code 'I' --> foo - bar (intergrade) hybrid_code '+' --> foo + bar (graft) hybrid_code 'U' --> foo hybrid (unspecified hybrid?)
However, the ICNCP uses +Foo and Foo +bar but not +Foo bar
Paul van Rijckevorsel
I think this make a lot of sense.
I was making up some sample lists and I thought something similar.
If I have a list which is missing some authorship string, should just the genus and species go into the scientificName field?
What David and Markus propose make it much easier to interpret the meaning of these terms.
Also I have always thought the following:
Puma concolor <= This is the scientific name
(Linnaeus 1771) <= This is the authorship string of that name
If you try to work with the other suggested way with real lists you run into all sorts of problems, especially when trying to match one list to another.
In summary, I support this clarification.
Respectfully,
- Pete
On Wed, Dec 8, 2010 at 10:09 AM, David Remsen (GBIF) dremsen@gbif.orgwrote:
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and 2. dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
*dwc:scientificName * - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
*dwc:scientificNameAuthorship* - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
*canonical name* - The nomenclatural components of a scentific name without authorship information. *authorship* - the authorship information that follows a scientific name *verbatim name* - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3 diff.
columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best, David Remsen / Markus Döring
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I am missing a canonical name including authorship. Rebuilding the canonical name from name without authorship and authorship requires parsing into the name, determining whether the name is an autonym, and if so, rebuilding. Assuming not separate autonym status is transmitted, this means parsing EVERY name before being able to output a name with authors.
scientificName: Lobelia spicata var. spicata scientificNameAuthorship: Lam. -> Lobelia spicata Lam. var. spicata
Gregor
Right....I'll amend my previous recommendation:
When a source database maintains separate fields corresponding to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) in most cases, or formatted appropriately for botanical autonyms, to form the required verbatimScientificName
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Gregor Hagedorn Sent: Wednesday, December 08, 2010 12:50 PM To: David Remsen (GBIF) Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
I am missing a canonical name including authorship. Rebuilding the
canonical
name from name without authorship and authorship requires parsing into the name, determining whether the name is an autonym, and if so, rebuilding. Assuming not separate autonym status is transmitted, this
means
parsing EVERY name before being able to output a name with authors.
scientificName: Lobelia spicata var. spicata scientificNameAuthorship: Lam. -> Lobelia spicata Lam. var. spicata
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rich, it is not a question of __formatting__; concatenation is just not possible, you have to parse into EVERY name, take it apart, determine whether it is an autonym, and if so __insert__ the author at the correct position for botanical names.
I understand this is tough on Zoologists :-), but I therefore propose
verbatimScientificName scientificName scientificNameAuthorship, scientificNameWithAuthorship
This covers all cases in my opinion. The comments should express, that scientificNameWithAuthorship should follow allow canonical name rules and recommendations of the respective Code.
Gregor
Rich, it is not a question of __formatting__; concatenation is just not possible, you have to parse into EVERY name, take it apart, determine whether it is an autonym, and if so __insert__ the author at the correct position for botanical names.
Yes, but that's the responsibility of the provider. Either they have the information sufficiently atomized to populate verbatimScientificName appropriately for autonyms, or they just have a pre-formatted "scientificNameWithAuthorship" (which can go in verbatimScientificName), or they do not have autonyms appropriately formatted, in which case we can't really do anything for them.
Thus, the expected content would be: verbatimScientificName: Lobelia spicata Lam. var. spicata scientificName: Lobelia spicata var. spicata scientificNameAuthorship: Lam.
I understand this is tough on Zoologists :-), but I therefore propose
Actually, it's the botanists who are making things tough in this case... :-)
verbatimScientificName scientificName scientificNameAuthorship, scientificNameWithAuthorship
This covers all cases in my opinion. The comments should express, that scientificNameWithAuthorship should follow allow canonical name rules and recommendations of the respective Code.
I'm still not convinced we need scientificNameWithAuthorship.
Aloha, Rich
Perhaps we need to add a "rule" element as Bob Morris has suggested. Then with that additional fact, the usage of the other terms would be specifically declared by the provider and all this assumption/inferencing would not be needed, where the declaration of the rule was provided.
But, millions of rows of legacy data may never conform to anything done at this point. If the meaning of ScientificName is altered by a definitional change after 10 years of the DarwinCore term being used with a different definition, no doubt the end result will be even more world-wide data hegemony because there will not be a sudden switchover of all the legacy data to the new definition. That herd of elephants is not going to turn quickly, so for some long time you really won't know what you have in a given ScientificName field - the old definition or the new.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Thursday, December 09, 2010 10:04 AM To: 'Gregor Hagedorn' Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
Rich, it is not a question of __formatting__; concatenation is just not possible, you have to parse into EVERY name, take it apart, determine whether it is an autonym, and if so __insert__ the author at
the correct position for botanical names.
Yes, but that's the responsibility of the provider. Either they have the information sufficiently atomized to populate verbatimScientificName appropriately for autonyms, or they just have a pre-formatted "scientificNameWithAuthorship" (which can go in verbatimScientificName), or they do not have autonyms appropriately formatted, in which case we can't really do anything for them.
Thus, the expected content would be: verbatimScientificName: Lobelia spicata Lam. var. spicata scientificName: Lobelia spicata var. spicata scientificNameAuthorship: Lam.
I understand this is tough on Zoologists :-), but I therefore propose
Actually, it's the botanists who are making things tough in this case... :-)
verbatimScientificName scientificName scientificNameAuthorship, scientificNameWithAuthorship
This covers all cases in my opinion. The comments should express, that
scientificNameWithAuthorship should follow allow canonical name rules and recommendations of the respective Code.
I'm still not convinced we need scientificNameWithAuthorship.
Aloha, Rich
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
It is my understanding that GBIF intends to split the verbatim scientific name and authorship in their processing.
So those millions of records will be split.
They will probably also provide a service that will split the names for you.
This clarification of scientificName just makes this more clear all the way down.
It also is a better match for how scientific name is used in most publications and related knowledge bases like Wikipedia, NCBI Taxonomy and eBird (probably Wikispecies too).
The parsers will need to look for well formed strings and in the interim they can split it if needed.
If they are already split it makes it much easier to determine what are correctly formed strings and what are not.
This reminds me of the many ways you can format a bibliographic citation. What some people seem to want is to have everyone adopt their citation format. What makes more sense it to keep the separate things separate and then combine them into whatever format is needed at the end.
It makes sense for end users to keep the authorship string is a separate field anyway.
How do they search for all the descriptions that might be tied to same publication etc.?
Respectfully,
- Pete
On Thu, Dec 9, 2010 at 11:28 AM, Chuck Miller Chuck.Miller@mobot.orgwrote:
Perhaps we need to add a "rule" element as Bob Morris has suggested. Then with that additional fact, the usage of the other terms would be specifically declared by the provider and all this assumption/inferencing would not be needed, where the declaration of the rule was provided.
But, millions of rows of legacy data may never conform to anything done at this point. If the meaning of ScientificName is altered by a definitional change after 10 years of the DarwinCore term being used with a different definition, no doubt the end result will be even more world-wide data hegemony because there will not be a sudden switchover of all the legacy data to the new definition. That herd of elephants is not going to turn quickly, so for some long time you really won't know what you have in a given ScientificName field - the old definition or the new.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Thursday, December 09, 2010 10:04 AM To: 'Gregor Hagedorn' Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
Rich, it is not a question of __formatting__; concatenation is just not possible, you have to parse into EVERY name, take it apart, determine whether it is an autonym, and if so __insert__ the author at
the correct position for botanical names.
Yes, but that's the responsibility of the provider. Either they have the information sufficiently atomized to populate verbatimScientificName appropriately for autonyms, or they just have a pre-formatted "scientificNameWithAuthorship" (which can go in verbatimScientificName), or they do not have autonyms appropriately formatted, in which case we can't really do anything for them.
Thus, the expected content would be: verbatimScientificName: Lobelia spicata Lam. var. spicata scientificName: Lobelia spicata var. spicata scientificNameAuthorship: Lam.
I understand this is tough on Zoologists :-), but I therefore propose
Actually, it's the botanists who are making things tough in this case... :-)
verbatimScientificName scientificName scientificNameAuthorship, scientificNameWithAuthorship
This covers all cases in my opinion. The comments should express, that
scientificNameWithAuthorship should follow allow canonical name rules and recommendations of the respective Code.
I'm still not convinced we need scientificNameWithAuthorship.
Aloha, Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Perhaps we need to add a "rule" element as Bob Morris has suggested. Then with that additional fact, the usage of the other terms would be specifically declared by the provider and all this assumption/inferencing would not be needed, where the declaration of the rule was provided.
I don't follow. From a DwC perspective, the providers are serving text strings. I think the proposal by David and Markus adequately captures the needs of both the providers, and the consumers. It's a good compromise solution (80% of the benefit with 20% of the work).
But, millions of rows of legacy data may never conform to anything done at this point. If the meaning of ScientificName is altered by a definitional change after 10 years of the DarwinCore term being used with a different definition, no doubt the end result will be even more world-wide data hegemony because there will not be a sudden switchover of all the legacy data to the new definition. That herd of elephants is not going to turn quickly, so for some long time you really won't know what you have in a given ScientificName field - the old definition or the new.
That was the basis for my original hesitation to redefine scientificName; but here's the thing -- over those ten years, the term has *not* been consistently applied or used. The herd of elephants has just been meandering aimlessly in this sense. What I think the proposed solution allows is at least a start of orienting the elephants heading in the same general direction. The point is, as evidenced by the data GBIF harvests, we *already* don't know what we have in a given scientificName field, because providers are not applying the stated definition consistently.
Aloha, Rich
About the problem of author strings in the middle of the scientific names in autonyms: Perhaps the debate is argued too much from the providers side. My proposal to add a scientificNameWithAuthorship is based on consumer use cases.
verbatimScientificName is meant to give the consumer no guarantee whatever on what form the name has. This is good, because it guarantees that a maximum of records can be served from providers having no guarantees themselves. However, it does not allow the consuming applications or services to make decisions.
I believe the majority of a consumers either need a scientificNameWithoutAuthorship (output policy, name-matching policy) or a scientificNameWithAuthorship (unambiguous name representation policy, name-matching policy).
The latter use case cannot be served with the modification following Markus's and David's proposal. This means that every service intending to match or display unambiguous name strings with authors needs to do parsing. Furthermore, providers that know that they have canonical scientific names with authorship cannot transmit information about this fact.
I even doubt, whether the use cases for an isolated scientificNameAuthorship string are very frequent (although they certainly exist). I therefore propose to emend DwC with:
verbatimScientificName scientificNameWithoutAuthorship (which might continued to be called scientificName) scientificNameWithAuthorship
and drop the scientificNameAuthorship to reduce complexity.
Gregor
I guess what I don't understand is: what would go in scientificNameWithAuthorship that isn't already achievable via these three terms:
verbatimScientificName (any text string purporting to represent as unambiguously as possible a scientific name, inclusive of authorship if available) scientificName (effectively canonical name explicitly without authorship = your scientificNameWithoutAuthorship) scientificNameAuthorship (only the authorship; no scientific name elements)
Give me a use case where one would want to use something like scientificNameWithAuthorship in a way that couldn't be served by these three elements.
The only one I can think of is the case where the original verbatim name-string for the record is a mis-spelling of an autonym (and the provider wanted to represent the mis-spelling in verbatimScientificName). In that case, the onus would be on the consumer to generate the correct autonym form, with authorship after the species epithet, rather than after the infraspecific epithet. This seems like a rare use case to me. There are *plenty* of rare use cases out there that DwC does not accommodate, so I don't see that as justification for introduction of a new term, that might further confuse people. Moreover, it seems like a very simple algorithm for a consumer to recognize an autonym (nomenclaturalCode=ICBN + Rank is below species + second two components of trinomial are identical), and then format the string accordingly from scientificName and scientificNameAuthorship.
I *STRONGLY* disagree with your suggestion to drop scientificNameAuthorship. This is an extremely fundamental component to nomenclatural disambiguation, and a relative "pain in the parse" for a consumer when provided only with a full name-string-with-authorship. To me, the use cases where your suggested scientificNameWithAuthorship cannot be easily met with a combination of verbatimScientificName , scientificName, and scientificNameAuthorship are far, far, far fewer than the use cases that would benefit from receiving content where authorship is pre-parsed from canonical name.
Aloha, Rich
-----Original Message----- From: Gregor Hagedorn [mailto:g.m.hagedorn@gmail.com] Sent: Thursday, December 09, 2010 10:26 AM To: Richard Pyle; Markus Döring; David Remsen (GBIF) Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
About the problem of author strings in the middle of the scientific names
in
autonyms: Perhaps the debate is argued too much from the providers side. My proposal to add a scientificNameWithAuthorship is based on consumer use cases.
verbatimScientificName is meant to give the consumer no guarantee whatever on what form the name has. This is good, because it guarantees that a maximum of records can be served from providers having no guarantees themselves. However, it does not allow the consuming applications or services to make decisions.
I believe the majority of a consumers either need a scientificNameWithoutAuthorship (output policy, name-matching policy) or a scientificNameWithAuthorship (unambiguous name representation policy, name-matching policy).
The latter use case cannot be served with the modification following Markus's and David's proposal. This means that every service intending to match or display unambiguous name strings with authors needs to do parsing. Furthermore, providers that know that they have canonical
scientific
names with authorship cannot transmit information about this fact.
I even doubt, whether the use cases for an isolated scientificNameAuthorship string are very frequent (although they certainly exist). I therefore propose to emend DwC with:
verbatimScientificName scientificNameWithoutAuthorship (which might continued to be called scientificName) scientificNameWithAuthorship
and drop the scientificNameAuthorship to reduce complexity.
Gregor
On 9 December 2010 22:31, Richard Pyle deepreef@bishopmuseum.org wrote:
I guess what I don't understand is: what would go in scientificNameWithAuthorship that isn't already achievable via these three terms:
verbatimScientificName (any text string purporting to represent as unambiguously as possible a scientific name, inclusive of authorship if available) scientificName (effectively canonical name explicitly without authorship = your scientificNameWithoutAuthorship) scientificNameAuthorship (only the authorship; no scientific name elements)
Give me a use case where one would want to use something like scientificNameWithAuthorship in a way that couldn't be served by these three elements.
Use case: A database desiring to identify taxa unambiguous by nomenclatural name would have a field containing:
Lobelia spicata Lam. var. spicata
I can export this to verbatimScientificName, but I cannot inform anyone that verbatimScientificName contain a full canonical name with authorship. As a consumer, I must expect verbatimScientificName to contain anything like: Lobelia spicata Lam. var. spicata Lobelia spicata var. spicata L. spicata Lam. var. spicata Lobelia sp. var. spicata (a common form!) Lobelia (Lobelia) spicata Lam. var. spicata (Lobeliaceae) or, in the case of a non-autonym, the canonical form would be: Lobelia spicata var. campanulata McVaugh but many many databases may have: Lobelia spicata Lam. var. campanulata McVaugh
What I am striving for is a field in which the best canonical form available to the provider can be expressed. If a provider has canonical form, this can be expressed as: verbatimScientificName = Lobelia spicata Lam. var. spicata scientificNameWithAuthorship = Lobelia spicata Lam. var. spicata
if the provider knows that it cannot provide a canonical name, then as: verbatimScientificName = Lobelia spicata Lam. var. spicata scientificNameWithAuthorship =
Note that this also applies to zoological databases that did not split author and name. It is impossible to provide a name that is known to be a canonical name including the authorship to the present proposal. You can put that into verbatim, but the consumer may not have any quality expectation on the verbatim name.
*plenty* of rare use cases out there that DwC does not accommodate, so I don't see that as justification for introduction of a new term, that might further confuse people. Moreover, it seems like a very simple algorithm for a consumer to recognize an autonym (nomenclaturalCode=ICBN + Rank is below species + second two components of trinomial are identical), and then format the string accordingly from scientificName and scientificNameAuthorship.
I doubt this. You would have to be able to correctly parse all names in all variants, including non-canonical and mistreated ones.
My point is: with the present proposal you require to parse ANY name... You cannot have any expectations on a fully qualified name including authorship with the present proposal. With the modified proposal, you can make quality expectations on most names, and limit any extra work to those that have no canonical form.
I *STRONGLY* disagree with your suggestion to drop scientificNameAuthorship. This is an extremely fundamental component to nomenclatural disambiguation, and a relative "pain in the parse" for a consumer when provided only with a full name-string-with-authorship. To me, the use cases where your suggested scientificNameWithAuthorship cannot be easily met with a combination of verbatimScientificName , scientificName, and scientificNameAuthorship are far, far, far fewer than the use cases that would benefit from receiving content where authorship is pre-parsed from canonical name.
I think we must have a misunderstanding here: The most common way to express a taxon name is with authorship. This does not solve all problems of concepts, but it does solve the problem of homonyms. Most journals require this (let us not debate whether instead they should require tdwg-lsids). Most online systems show names with authors. Why then is the desire to have this form where the downstream user of dwc data can have an assurance of this form of name a rare use case?
Gregor
I feel somewhat confused by this. The actual situation looks eminently simple to me:
1) In a database there sits a text string (in any of a bewildering number of shapes). The straightforward thing would be to call this something like
NameInDatabase: Dalbergia baroni Baker or VerbatimNameInDatabase: Dalbergia baroni Baker
2) There also is the scientific name, as governed by the relevant nomenclatural Code. The straightforward thing would be to call this
ScientificName: Dalbergia baronii
3) Finally, there is whatever intermediate result the algorithms that are being used are converting the harvested text string into. I suppose I would call this something like
ParserResult: Dalbergia baroni or ParserResultForScientificName: Dalbergia baroni
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
In the mean time, I do not see why all the complexity needs to come in?
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: tdwg-content-bounces@lists.tdwg.org namens David Remsen (GBIF) Verzonden: wo 8-12-2010 17:09 Aan: tdwg-content@lists.tdwg.org List Onderwerp: [tdwg-content] proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
1. We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and 2. dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information. authorship - the authorship information that follows a scientific name verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
1. canonical names only
Mapping: canonical name -> dwc:scientificName
2. canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship
3. verbatim name only
Mapping: verbatim name -> dwc:scientificName
4. all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship-
dwc:scientificNameAuthorship / verbatim name ->
dwc:verbatimScientificName
5. a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best, David Remsen / Markus Döring
...
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
For those of us attempting to populate databases with information extracted from published literature, the future is now. It seems to me that normalizing the extraction to some standardized form \before/ putting it in the database is more robust than forcing the parsing to be done afterwards. So we need rules for those forms, and an unambiguous way in our metadata to cite which rules have been followed. In a previous post my p.s. also whined about a similar need for born-digital taxonomic treatments.
Bob
The GBIF web parser service is a good step in that direction.
http://tools.gbif.org/nameparser/
Select the test names and review the extended output.
David On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:
...
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
For those of us attempting to populate databases with information extracted from published literature, the future is now. It seems to me that normalizing the extraction to some standardized form \before/ putting it in the database is more robust than forcing the parsing to be done afterwards. So we need rules for those forms, and an unambiguous way in our metadata to cite which rules have been followed. In a previous post my p.s. also whined about a similar need for born-digital taxonomic treatments.
Bob
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Bob, Funny. It starts to sound like the "Levels" that were defined in that 94 plant names publication. They were wrangling back then with how to format the different ways the names could occur. We have some real tension going on between having a minimal set of terms while enabling a maximal variety of use cases. I think something has to give somewhere. The use cases are not going away. So, maybe we need to accept a longer set of terms. One of them might be a "rule used" element as you suggest. But, that would mean the creation of the rules, definition of controlled vocabulary and then implementation into DwC somehow. Tall order but it may be something lacking in DwC that is causing all these late night emails to fly back and forth seemingly endlessly.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Thursday, December 09, 2010 9:06 AM To: Bob Morris Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
The GBIF web parser service is a good step in that direction.
http://tools.gbif.org/nameparser/
Select the test names and review the extended output.
David On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:
...
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
For those of us attempting to populate databases with information extracted from published literature, the future is now. It seems to me
that normalizing the extraction to some standardized form \before/ putting it in the database is more robust than forcing the parsing to be done afterwards. So we need rules for those forms, and an unambiguous way in our metadata to cite which rules have been followed. In a previous post my p.s. also whined about a similar need
for born-digital taxonomic treatments.
Bob
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
What we *REALLY* need is a globally shared system for representing taxon names either as verbatim strings (we could call it something like "Grand Names Index", or "Glorious Nomenclatural Inventory", or something like that), or as highly parsed, atomized data objects corresponding to specific usages of taxon names (we could call it the "Global Nomenclatural Utility Base", or "Great Names Universal Bucket") coordinated in such a way that they interact with each other in a common architecture (maybe we could call it the "Guttenberg Names Analog").
In any case, I think the real solution is to get to the point where we need only two terms for taxonomic data:
taxonID verbatimScientificName
As chuck says, it's a tall order...but it sure would reduce the traffic on the tdwg-content list....
:-)
Aloha, RIch
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Chuck Miller Sent: Thursday, December 09, 2010 5:55 AM To: David Remsen (GBIF); Bob Morris Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
Bob, Funny. It starts to sound like the "Levels" that were defined in that 94 plant names publication. They were wrangling back then with how to format the different ways the names could occur. We have some real tension going on between having a minimal set of terms while enabling a maximal variety of use cases. I think something has to give somewhere. The use cases are not going away. So, maybe we need to accept a longer
set
of terms. One of them might be a "rule used" element as you suggest.
But,
that would mean the creation of the rules, definition of controlled
vocabulary
and then implementation into DwC somehow. Tall order but it may be something lacking in DwC that is causing all these late night emails to
fly back
and forth seemingly endlessly.
Chuck
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Thursday, December 09, 2010 9:06 AM To: Bob Morris Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
The GBIF web parser service is a good step in that direction.
http://tools.gbif.org/nameparser/
Select the test names and review the extended output.
David On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:
...
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
For those of us attempting to populate databases with information extracted from published literature, the future is now. It seems to me
that normalizing the extraction to some standardized form \before/ putting it in the database is more robust than forcing the parsing to be done afterwards. So we need rules for those forms, and an unambiguous way in our metadata to cite which rules have been followed. In a previous post my p.s. also whined about a similar need
for born-digital taxonomic treatments.
Bob
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
It does not look so to me, this is perhaps the first 5% of the way. The point is that a parser will never convert a text string into a scientific name (except by accident), but will at most label several components, opening the way for a more sophisticated algorithm (although the 'more sophisticated algorithm' would probably work just as well without a parser).
It does look viable to build such a 'more sophisticated algorithm' now, as the spellcheckers that are built into so much software these days would do the job, if they were loaded with a sufficiently comprehensive vocabulary. However, I don't see any sign of that happening?
Paul van Rijckevorsel
-----Oorspronkelijk bericht----- Van: David Remsen (GBIF) [mailto:dremsen@gbif.org] Verzonden: do 9-12-2010 16:05 Aan: Bob Morris CC: David Remsen (GBIF); dipteryx@freeler.nl; tdwg-content@lists.tdwg.org Onderwerp: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
The GBIF web parser service is a good step in that direction.
http://tools.gbif.org/nameparser/
Select the test names and review the extended output.
David On Dec 9, 2010, at 3:09 PM, Bob Morris wrote:
...
Obviously, it would be nice if algorithms did exist which could convert a text string into a scientific name, but this still lies in the future.
For those of us attempting to populate databases with information extracted from published literature, the future is now. It seems to me that normalizing the extraction to some standardized form \before/ putting it in the database is more robust than forcing the parsing to be done afterwards. So we need rules for those forms, and an unambiguous way in our metadata to cite which rules have been followed. In a previous post my p.s. also whined about a similar need for born-digital taxonomic treatments.
Bob
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I think this is *exactly* the right solution. I would go further to make it clear that:
- verbatimScientificName is the required field (with scientificName and scientificNameAuthorship as optional)
- When a source database maintains separate fields corresponding to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) to form the required verbatimScientificName
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Wednesday, December 08, 2010 6:10 AM To: tdwg-content@lists.tdwg.org List Subject: [tdwg-content] proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
1. We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and
2. dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information.
authorship - the authorship information that follows a scientific name
verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
1. canonical names only
Mapping: canonical name -> dwc:scientificName
2. canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
3. verbatim name only
Mapping: verbatim name -> dwc:scientificName
4. all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
5. a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
So basically what you are saying is that the entire NCBI taxonomy database as well as the ebird database cannot output the required format.
- Pete
On Thu, Dec 9, 2010 at 9:44 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
I think this is **exactly** the right solution. I would go further to make it clear that:
verbatimScientificName is the required field (with
scientificName and scientificNameAuthorship as optional)
When a source database maintains separate fields corresponding
to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) to form the required verbatimScientificName
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *David Remsen (GBIF) *Sent:* Wednesday, December 08, 2010 6:10 AM *To:* tdwg-content@lists.tdwg.org List *Subject:* [tdwg-content] proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and
- dwc:scientificName follow the more accepted convention that is better
represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
*dwc:scientificName * - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
*dwc:scientificNameAuthorship* - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
*canonical name* - The nomenclatural components of a scentific name without authorship information.
*authorship* - the authorship information that follows a scientific name
*verbatim name* - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3 diff.
columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
They cannot provide a verbatimScientificName???? That would imply they have no text field whatsoever.
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Thursday, December 09, 2010 6:48 AM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] proposed term: dwc:verbatimScientificName
So basically what you are saying is that the entire NCBI taxonomy database as well as the ebird database cannot output the required format.
- Pete
On Thu, Dec 9, 2010 at 9:44 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
I think this is *exactly* the right solution. I would go further to make it clear that:
- verbatimScientificName is the required field (with scientificName and scientificNameAuthorship as optional)
- When a source database maintains separate fields corresponding to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) to form the required verbatimScientificName
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Wednesday, December 08, 2010 6:10 AM To: tdwg-content@lists.tdwg.org List Subject: [tdwg-content] proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
1. We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and
2. dwc:scientificName follow the more accepted convention that is better represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
dwc:scientificName - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
dwc:scientificNameAuthorship - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
canonical name - The nomenclatural components of a scentific name without authorship information.
authorship - the authorship information that follows a scientific name
verbatim name - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
1. canonical names only
Mapping: canonical name -> dwc:scientificName
2. canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
3. verbatim name only
Mapping: verbatim name -> dwc:scientificName
4. all three: canonical name, authorship, and verbatim name in 3 diff. columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
5. a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rich,
You said that the "required" form would be ...
If they don't have the authorship then they cannot output the "required" form.
Yes. I should have written code in some of those as "Code".
Kind of like "Bible"
Both sets of rules seem to be applied by some in areas where they don't make sense :-)
Both take on the feeling of "Dogma", for those just trying to make the world a better place.
Respectfully,
- Pete
On Thu, Dec 9, 2010 at 11:24 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
They cannot provide a verbatimScientificName???? That would imply they have no text field whatsoever.
*From:* Peter DeVries [mailto:pete.devries@gmail.com] *Sent:* Thursday, December 09, 2010 6:48 AM *To:* Richard Pyle *Cc:* David Remsen (GBIF); tdwg-content@lists.tdwg.org *Subject:* Re: [tdwg-content] proposed term: dwc:verbatimScientificName
So basically what you are saying is that the entire NCBI taxonomy database as well as the ebird database cannot output the required format.
- Pete
On Thu, Dec 9, 2010 at 9:44 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
I think this is **exactly** the right solution. I would go further to make it clear that:
verbatimScientificName is the required field (with
scientificName and scientificNameAuthorship as optional)
When a source database maintains separate fields corresponding
to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) to form the required verbatimScientificName
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *David Remsen (GBIF) *Sent:* Wednesday, December 08, 2010 6:10 AM *To:* tdwg-content@lists.tdwg.org List *Subject:* [tdwg-content] proposed term: dwc:verbatimScientificName
Markus and I wanted to try to consolidate the issues related to the current use and definition of scientificName that have been the focus of last weeks discussion in as simple a way as we can and leave it with a simple proposal which we will add to the issue tracking on the project site.
- We propose that a new term, dwc:verbatimScientificName carry the
existing definition for dwc:scientificName and
- dwc:scientificName follow the more accepted convention that is better
represented by the earlier proposed definition for Canonical Name
The intention is to enable data publishers to distinguish unparsed, complex scientific names from more cleanly separated scientific name data. This will relieve consumers of these data from testing each instance of a name for one of these two conditions.
Here are the definitions for the two existing terms that have been part of the discussion:
*dwc:scientificName * - The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.
*dwc:scientificNameAuthorship* - The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Here are terms and definitions used in the following 5 source data configurations we came up with. They don't have to be exact for this purpose.
*canonical name* - The nomenclatural components of a scentific name without authorship information.
*authorship* - the authorship information that follows a scientific name
*verbatim name* - the verbatim text stored in a source database when it differs from or combines the two definitions above. This is a bit more broad than the def for scientificName.
We identified the following configurations in a source database and how they would be mapped to the existing terms. In cases 4 and 5 we also propose how we would map these were there a 3rd available term (called 'mapping b:')
When a source database contains:
- canonical names only
Mapping: canonical name -> dwc:scientificName
- canonical name and authorship in two fields
Mapping: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
- verbatim name only
Mapping: verbatim name -> dwc:scientificName
- all three: canonical name, authorship, and verbatim name in 3 diff.
columns
Mapping a: verbatim name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship
Mapping b: canonical name -> dwc:scientificName / authorship->dwc:scientificNameAuthorship / verbatim name -> dwc:verbatimScientificName
- a mix of canonical and verbatim names in a single column
Mapping a: verbatim name + canonical names -> dwc:scientificName
Mapping b: verbatim name + canonical names -> dwc:verbatimScientificName
Summary - with the current two terms are left with no choice but to support both canonical and verbatim names in a single term, which makes consuming these data difficult.
We propose that a new term, dwc:verbatimScientificName carry the existing definition for dwc:scientificName and that dwc:scientificName follow the more accepted convention that is better represented by the definition for Canonical Name
Best,
David Remsen / Markus Döring
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
You said that the "required" form would be ...
Actually, what I said was: " verbatimScientificName is the required field"; I didn't say anything about "required form" (see original post appended below). Right now, scientificName is (I think?) treated as a required field. My point was that verbatimScientificName should take on that role (along with the existing definition for scientificName), because *everyone* can provide a verbatim scientific name.
My second point was not part of a "required" form (as evidenced by my use of the word "should", no doubt much to the dismay of Bob Morris); it was a suggestion that for content holders with parsed data, the "required" element (verbatimScientificName) could easily be generated from the parsed bits.
If they don't have the authorship then they cannot output the "required"
form.
Again, if you read my post, you'll see that I never said anything about a "required form".
Yes. I should have written code in some of those as "Code".
Kind of like "Bible"
I wouldn't rely on capitalization alone to disambiguate (I know some programmers who would hold software Code at the biblical level). I think when mixing the terms in the same post, there should be a qualifier (like "nomenclatural" vs. "software"). Same goes for the unqualified word "name", which should NEVER, EVER, EVER be used in our community without some sort of qualifier ("canonical", "in the botanical sense", "in the zoological sense", "in the bacterial sense", "as a text string", etc.)
Aloha, Rich
=========================== On Thu, Dec 9, 2010 at 9:44 AM, Richard Pyle deepreef@bishopmuseum.org wrote: I think this is *exactly* the right solution. I would go further to make it clear that:
- verbatimScientificName is the required field (with scientificName and scientificNameAuthorship as optional) - When a source database maintains separate fields corresponding to scientificName and scientificNameAuthorship, they should be concatenated (with a single space between them) to form the required verbatimScientificName
Aloha, Rich
participants (14)
-
"Markus Döring (GBIF)"
-
Bob Morris
-
Chuck Miller
-
David Remsen (GBIF)
-
dipteryx@freeler.nl
-
Greg Whitbread
-
greg whitbread
-
Gregor Hagedorn
-
Paul Murray
-
Paul Murray
-
Paul van Rijckevorsel
-
Peter DeVries
-
Richard Pyle
-
Tony.Rees@csiro.au