Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
There is a subtle difference in in the common but loosely expressed assertion of scientific names having (i.e. Including) authors and name components having authorship (which may or may not be displayed).
Jim
On Friday, November 19, 2010, John van Breda john.vanbreda@biodiverseit.co.uk wrote:
I'm coming in a bit late on this conversation so I hope I am not repeating what has already been said, but botanical names can also have authorship at both specific and infraspecific levels, e.g. Centaurea apiculata Ledeb. ssp. adpressa (Ledeb.) Dostál
And to make it even more complex, you can have subspecies variants, so 2 infraspecific levels, e.g. Centaurea affinis Friv. ssp. affinis var. Affinis
Atomising this properly could be quite complex but necessary to be able to present the name as it should be written with italics in the correct place. E.g. in the example above, the author string and rank strings are not normally italiced, but the rest of the name is. Unless we can include this formatting information in dwc:scientificName?
Regards
John
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: 19 November 2010 09:24 To: Roderic Page Cc: tdwg-content@lists.tdwg.org; Jim Croft Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
What Darwin Core offers right now are 2 ways of expressing the name:
A) the complete string as dwc:scientificName B) the atomised parts: genus, subgenus, specificEpithet, infraspecificEpithet, verbatimTaxonRank (+taxonRank), scientificNameAuthorship
Those 2 options are there to satisfy the different needs we have seen in this thread - the consumers call for a simple input and the need to express complex names in their verbatim form. Is there really anything we are missing?
When it comes to how its being used in the wild right now I agree with Dima that there is a lot of variety out there. It would be very, very useful if everyone would always publish both options in a consistent way.
Right now the fulI name can be found in once of these combinations: - scientificName - scientificName & scientificNameAuthorship - scientificName, taxonRank & scientificNameAuthorship - scientificName, verbatimTaxonRank & scientificNameAuthorship - genus, subgenus, specificEpithet, infraspecificEpithet, taxonRank, scientificNameAuthorship - genus, subgenus, specificEpithet, infraspecificEpithet, verbatimTaxonRank, scientificNameAuthorship
To make matters worse the way the authorship is expressed is also impressively rich of variants. In particular the use of brackets is not always consistent. You find things like:
# regular botanical names with ex authors Mycosphaerella eryngii (Fr. ex Duby) Johanson ex Oudem. 1897
# original name authors not in brackets, but year is Lithobius chibenus Ishii & Tamura (1994)
# original name in brackets but year not Zophosis persis (Chatanay), 1914
# names with imprint years cited Ctenotus alacer Storr, 1970 ["1969"] Anomalopus truncatus (Peters, 1876 ["1877"]) Deyeuxia coarctata Kunth, 1815 [1816] Proasellus arnautovici (Remy 1932 1941)
On Nov 19, 2010, at 8:50, Roderic Page wrote:
I'm with Jm. For the love of God let's keep things clean and simple. Have a field for the name without any extraneous junk (and by that I include authorship), and have a separate field for the name plus all the extra stuff. Having fields that atomise the name is also useful, but not at the expense of a field with just the name.
Please, please think of data consumers like me who have to parse this stuff. There is no excuse in this day and age for publishing data that users have to parse before they can do anything sensible with it.
Regards
Rod
On 19 Nov 2010, at 07:06, Jim Croft wrote:
Including the authors, dates and any thing else (with the exception of the infraspecific rank and teh hybrid symbol and in botany) as part of a thing called "the name" is an unholy abomination, a lexical atrocity, an affront to logic and an insult the natural order of the cosmos and any deity conceived by humankind.
In botany at least, the "name" (which I take to be the basic communication handle for a taxo
What puzzles me about the highly taxonomically technical parts of these threads is not that the codes of nomenclature seem difficult to parse in the sense of formal languages---that's true of lots of human-produced legislation. It is that in 15 years of hanging out with biologists, I have rarely heard them use anything other than binomials in conversation about anything other than whether binomials are adequate. Why, I wonder, are they not utterly confused during all those other conversations, and if they are, does that mean that conversations about biological topics can not advance biology? (This seems unlikely to me, else why do they keep doing it?). Does it mean that "only" hypotheses can come out of these discussions, but that support for hypotheses can only come from data that is rigorously tied delicate name formalisms? It is hard to believe that only hypotheses can be the subject of these conversations, except for the position that everything in science is "only" hypothesis. But maybe when the amateurs leave the room, they suddenly start talking in more code-compliant names.
There are plenty of use cases--and successful information systems---that don't depend on rigorous names. Some aspects of morphology form a simple example. For some uses, it is not a problem to illustrate what a sepal is with several images of different taxa which are either not named, inadequately named, or even incorrectly named. Furthermore, this wouldn't change if those images were fetched from a database in which it is impossible to decide which of those name defects is in play, e.g. one in which there is nothing other than binomials as names.
Another example I was personally party to was this conversation, from memory, that I was party to in Morocco a few years ago:
Bob Morris: Ooh, that's a beautiful cactus. Kevin Thiele: It's a Euphorb, not a cactus. There are no cacti here. Bob: Why does it look like a cactus? Kevin: It's pretty much the most successful way to deal with very dry environments. But they are pretty distant from an phylogenetic point of view.
Since most of the listeners were biologists, I imagine I was the only one this was news to. But what I don't believe is that some of the party had a radically different understanding of the conversation than I did.
So, the importance of code-compliant names not withstanding, I would find it very interesting to see a resource devoted to use cases and competency questions that are independent of them, along with accompanying "not fit for use X" annotations. Sort of like warnings on pharmaceutals.
Bob Morris
On Fri, Nov 19, 2010 at 5:29 PM, Jim Croft jim.croft@gmail.com wrote:
There is a subtle difference in in the common but loosely expressed assertion of scientific names having (i.e. Including) authors and name components having authorship (which may or may not be displayed).
Jim
On Friday, November 19, 2010, John van Breda john.vanbreda@biodiverseit.co.uk wrote:
I'm coming in a bit late on this conversation so I hope I am not repeating what has already been said, but botanical names can also have authorship at both specific and infraspecific levels, e.g. Centaurea apiculata Ledeb. ssp. adpressa (Ledeb.) Dostál
And to make it even more complex, you can have subspecies variants, so 2 infraspecific levels, e.g. Centaurea affinis Friv. ssp. affinis var. Affinis
Atomising this properly could be quite complex but necessary to be able to present the name as it should be written with italics in the correct place. E.g. in the example above, the author string and rank strings are not normally italiced, but the rest of the name is. Unless we can include this formatting information in dwc:scientificName?
Regards
John
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: 19 November 2010 09:24 To: Roderic Page Cc: tdwg-content@lists.tdwg.org; Jim Croft Subject: Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?
What Darwin Core offers right now are 2 ways of expressing the name:
A) the complete string as dwc:scientificName B) the atomised parts: genus, subgenus, specificEpithet, infraspecificEpithet, verbatimTaxonRank (+taxonRank), scientificNameAuthorship
Those 2 options are there to satisfy the different needs we have seen in this thread - the consumers call for a simple input and the need to express complex names in their verbatim form. Is there really anything we are missing?
When it comes to how its being used in the wild right now I agree with Dima that there is a lot of variety out there. It would be very, very useful if everyone would always publish both options in a consistent way.
Right now the fulI name can be found in once of these combinations: - scientificName - scientificName & scientificNameAuthorship - scientificName, taxonRank & scientificNameAuthorship - scientificName, verbatimTaxonRank & scientificNameAuthorship - genus, subgenus, specificEpithet, infraspecificEpithet, taxonRank, scientificNameAuthorship - genus, subgenus, specificEpithet, infraspecificEpithet, verbatimTaxonRank, scientificNameAuthorship
To make matters worse the way the authorship is expressed is also impressively rich of variants. In particular the use of brackets is not always consistent. You find things like:
# regular botanical names with ex authors Mycosphaerella eryngii (Fr. ex Duby) Johanson ex Oudem. 1897
# original name authors not in brackets, but year is Lithobius chibenus Ishii & Tamura (1994)
# original name in brackets but year not Zophosis persis (Chatanay), 1914
# names with imprint years cited Ctenotus alacer Storr, 1970 ["1969"] Anomalopus truncatus (Peters, 1876 ["1877"]) Deyeuxia coarctata Kunth, 1815 [1816] Proasellus arnautovici (Remy 1932 1941)
On Nov 19, 2010, at 8:50, Roderic Page wrote:
I'm with Jm. For the love of God let's keep things clean and simple. Have a field for the name without any extraneous junk (and by that I include authorship), and have a separate field for the name plus all the extra stuff. Having fields that atomise the name is also useful, but not at the expense of a field with just the name.
Please, please think of data consumers like me who have to parse this stuff. There is no excuse in this day and age for publishing data that users have to parse before they can do anything sensible with it.
Regards
Rod
On 19 Nov 2010, at 07:06, Jim Croft wrote:
Including the authors, dates and any thing else (with the exception of the infraspecific rank and teh hybrid symbol and in botany) as part of a thing called "the name" is an unholy abomination, a lexical atrocity, an affront to logic and an insult the natural order of the cosmos and any deity conceived by humankind.
In botany at least, the "name" (which I take to be the basic communication handle for a taxo
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963)
Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Dear Bob:
I think that your "metapoint" as I understand it, i.e. (1) that different levels of semantic resolution will be necessary and/or sufficient for a particular task, and (2) that considerations of specific use cases should take into account how much resolution is needed, is solid. The argument from example can work the other way as well, however:
http://onlinelibrary.wiley.com/doi/10.1046/j.1523-1739.1999.013002427.x/pdf
I also think that your anecdote touches on another subject that might need more attention. Here's a set of related (and similarly self-exploring) quotes from my "Letter to Linnaeus":
"We’re at a juncture in systematics when more precise phylogenetic estimates are published at an increasing rate. There is a concomitant trend to archive the results in networked repositories intended to serve as the primary ‘hubs’ for systematic information. Both the systematic and the computer science community seem to have bought into this vision. However it is likely that each community underestimates just how much we need to adjust our linguistic habits in order to achieve long-term integration of systematic products. Computer scientists use a formal language (description logic) to build highly structured networks (ontologies) that may include classes, instances, parts, properties, relationships, and other components and qualifiers. Once the structure is in place then powerful algorithms can ‘reason’ about the constituent elements, connect them to other ontologies created for related subject areas, and so on.
As computer scientists learn about systematics they must initially see a strong match between an ontology and a published taxonomy. However, as we’ve seen, a classification is never entirely comprehensible in isolation, and instead represents a complex mosaic of previous and new elements with implicit identities and relationships to each other. Too often such expert-made classifications are only comprehensible to other expert speakers, i.e., persons who share an intimate understanding of the contextuality of the new system and are thus able to make explicit the implicit semantic links to previous systems." [...]
"Why have systematists relied so much on painstakingly acquired, implicit assumptions about the taxonomic history of particular groups when presenting their new classifications? I believe the reason is neither some form of elitism (“take that, users!”) nor a lack of self-esteem (“who wants to read about all these subtle similarities and differences?”). More likely, it’s simply human habit—we make things just as explicit as we think is needed at the moment—paired with the similarly human notion that the latest perspective is really the one that’s going to last for a long time, in spite of all historical evidence to the contrary. And so we pass the burden of full semantic resolution, both looking backward and forward, on to future specialists." [...]
"However, the Linnaean system is not capable of capturing the entirety of semantic adjustments that occur when a previous classification is revised in light of new evidence. [...] Instead of abandoning the Linnaean system, this observation should lead us to express more clearly and more consistently what we mean when presenting a new classification. [...] At the human level, this requires that we routinely acknowledge the ephemerality of our latest insights, spend more time comparing our perspective to a previous one that we no longer think holds true, and generally pay more attention to the context in which we use taxonomic names. [...] If we supplement the Linnaean system with these conventions, there will be more linguistic transparency and less mistaken urgency to purge the idiosyncrasies of the past or legislate a wrong consensus."
http://academic.uprm.edu/~franz/publications/LetterLinnaeus.pdf
---------------
Few additional comments:
The view that taxonomic concepts represent hypotheses about how certain names, types, and descriptions relate to perceived entities in nature is by no means incompatible with your point that people understand each other sufficiently well in many particular situations using even crude shorthands for names and concepts (euphorb versus cactus). The reliability of the notion that those two lineages are phylogenetically distinct is in the ballpark of that of the law of gravity (well, actually I am rather foolishly relying on the veracity of your judgment of Dr. Thiele's expertise...making all kinds of ancillary assumptions that undermine deductive reasoning, sorry Prof. Popper). So yes, we are advancing.
But, isn't the point of some of the most critical use cases (e..g., the EEA one), not just to properly spell names, but to load up the "system" (ontologies, databases, metadata annotations, what have you) with some degree of specific taxonomic insight? If and when so then we shouldn't assume that matters of contextuality are going to be largely insignificant, and instead at some level will have to "teach" the system that contextuality.
Binomials and informal names are shorthands that can hold the water in most casual conversations among humans, especially if and when the involved speakers share a similar scientific and even taxon-specific training (in addition to all the other semantic and inferential expertise they share just by having been born into and raised in society; see Quine). I do feel, however, that our reluctance (if there actually is one) to go deeper with ontological representations, is neither necessarily due to an obvious limitation of computers - that remains to be shown - nor is it the most prudent way to move ahead.
Nico Franz
On 11/20/2010 5:45 PM, Bob Morris wrote:
What puzzles me about the highly taxonomically technical parts of these threads is not that the codes of nomenclature seem difficult to parse in the sense of formal languages---that's true of lots of human-produced legislation. It is that in 15 years of hanging out with biologists, I have rarely heard them use anything other than binomials in conversation about anything other than whether binomials are adequate. Why, I wonder, are they not utterly confused during all those other conversations, and if they are, does that mean that conversations about biological topics can not advance biology? (This seems unlikely to me, else why do they keep doing it?). Does it mean that "only" hypotheses can come out of these discussions, but that support for hypotheses can only come from data that is rigorously tied delicate name formalisms? It is hard to believe that only hypotheses can be the subject of these conversations, except for the position that everything in science is "only" hypothesis. But maybe when the amateurs leave the room, they suddenly start talking in more code-compliant names.
There are plenty of use cases--and successful information systems---that don't depend on rigorous names. Some aspects of morphology form a simple example. For some uses, it is not a problem to illustrate what a sepal is with several images of different taxa which are either not named, inadequately named, or even incorrectly named. Furthermore, this wouldn't change if those images were fetched from a database in which it is impossible to decide which of those name defects is in play, e.g. one in which there is nothing other than binomials as names.
Another example I was personally party to was this conversation, from memory, that I was party to in Morocco a few years ago:
Bob Morris: Ooh, that's a beautiful cactus. Kevin Thiele: It's a Euphorb, not a cactus. There are no cacti here. Bob: Why does it look like a cactus? Kevin: It's pretty much the most successful way to deal with very dry environments. But they are pretty distant from an phylogenetic point of view.
Since most of the listeners were biologists, I imagine I was the only one this was news to. But what I don't believe is that some of the party had a radically different understanding of the conversation than I did.
So, the importance of code-compliant names not withstanding, I would find it very interesting to see a resource devoted to use cases and competency questions that are independent of them, along with accompanying "not fit for use X" annotations. Sort of like warnings on pharmaceutals.
Bob Morris
[...]
participants (3)
-
Bob Morris
-
Jim Croft
-
Nico Franz