Hi Bob,
I think of the individual specimens that are included in a species concept definition as representative of the variation that exists within the species concept. In a sense, they serve as a guide as to what specimens are a match to a particular species concept. I might be misinterpreting your comments but this use does not seem to match the formal definition of individuals in a set.
Species description documents should be testable in the following way: given 1,000 specimens and set of species description documents can different scientists repeatably match the same specimens to the same concepts about 95% of the time.
Contrast this approach to the current method of determining whether a given specimen matches the original species description for Culex triseriatus => Aedes triseriatus => Ochlerotatus triseriatus. I may be wrong but the original type specimen seems to be missing. Attached is the original species description.
Also consider the proportion of existing identifications of this species, how many were determined based on the original description? In this example species, the formal species description does not seem to play a role in how these mosquitoes are actually identified.
I would suspect that this is true for a number of species identifications. So we are modeling this domain as if the original species descriptions were used when in actuality they were not used for the vast majority of identifications.
- Pete
On Fri, Jul 9, 2010 at 9:28 AM, Bob Morris morris.bob@gmail.com wrote:
I barely dare to poke in here since you know how little biology I know. Casting in terms of rudimentary Set Theory, let me take a Computer Science/Math view of the issues you raised, under the assumption that we are talking exactly about circumscription. Correspondents who don't agree to that maybe don't care about this post. Anyone anxious about. or disinterested in, the term "boolean expression" should stop reading now. Apologies if I have just eliminated all the readership.
In Set Theory, it is rare, to my knowledge, to assign to a set a globally unique identifier, or a widely used name with huge social impediment to change. Only "The Empty Set" comes to mind, and even that has several widely recognized single character orthographic representations. What one names a set can never change its contents. However, it \is/ often the case that sets are defined by descriptive data, e.g. a function that determines what is or isn't in the set (called the indicator, or characteristic function http://en.wikipedia.org/wiki/Indicator_function.) It is somewhat useful for proving theorems, but in many useful cases, it may sometimes be hard to compute it, and sometimes not. For example the characteristic function of the set of positive even integers is defined by I(x) = 1 if and only if x is a positive whole number and there is a positive even integer y such that x=2y.
Now that was easy, wasn't it? But how many readers of this post could quickly determine if a number is even when that number is represented in base 10?, base 2?, base 53? , base 54? I bet the answer is " 'all' for 10, 'many' for 2, 'few or none' for 53 and 54). Thus, the name "Positive Even Integers" is disambiguated by the characteristic function, but it may still be hard to figure out the circumscription of that name, depending on the representation of the underlying set members.
Of course, in an \application// of Set Theory, e.g. Number Theory, some widely used names for specific sets come with understood names carrying a huge social impediment to changing the circumscription. "Positive Even Integers" is an example.
In Set Theory applications, proliferation of names is often controlled by use of boolean expressions, especially expressions involving union, intersection, and negation (OR, AND, NOT) (*) For computational use, it is arguably sometimes better to refer to a set as a boolean function of some other, temporarily named, or anonymous sets than to identify it with a characteristic function of its own. That's because there is a long literature on computational use of boolean expressions, both as to the computational complexity of problems framed in terms of them, and as to algorithms for many such problems. Some problems don't require computation of indicator functions. An example is deciding whether two boolean expressions in the same variables represent determine the same set. That is independent of what the variables are and how set members may be represented.
Neither assigning multiple names or GUIDs to a set, nor reassigning one of those from one set to some other set, can change either set. However, either can surely confuse human readers and both raise problems for use as database keys (as in the special case of taxonomic names). So also can an opinion--especially a hard to find or obscurely applied one--such as "In my opinion the name "Blah blah" is more appropriately applied to the set A OR B, than it is to the set A"
For programming languages problems of shifting names usually fall under the rubric of variable "scope", which I don't discuss here.
My conclusion: for machine use, be spare on names and GUIDs and generous with boolean expressions. But take care for the computational complexity of the problem at hand. The latter sounds scary, but if the truth be known (pun intended), harmful--i.e. fundamentally exponential--computational complexity necessarily bites you exactly as badly no matter what computational methodology you use(**). Then arises the question: "Does generous use of boolean expressions cause proliferation of other resource use worse than caused by name and GUID management?" My answer is "It depends, but if it also leads to more robust reasoning, it is likely advantageous."
Bob Morris Recovering Algebraist
(*)It is sometimes startling to new students of logic who are comfortable with expressions made of complex combinations of OR, AND, and NOT, to learn that every such expression can be represented as an expression using only NOR ("not or", functionally equivalent to NOR(A,B) = NOT(A OR B) colloquially, "neither A nor B" ). NOR can be defined without use of NOT or OR by the use of a 2x2 truth table.
(**)Not exactly. Parallel computing and probabilistic approaches such as quantum computing sometimes help.
On Wed, Jul 7, 2010 at 10:28 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Mark,
I agree with everything you say (which is why I think that essentially everything related to "taxa" should be represented through Usage
Instances).
I also agree that taxonomists do not often articulate the scope of their taxon concepts by enumerating the included organisms. However, I would
argue
that when most (all?) taxonomists conceive of a taxon concept, the
"essence"
of the concept is the set of organisms implied to be circumscribed by it. Thus, there is an historical disconnect between what a taxonomist means
by a
taxon concept, and how a taxonomist articulates the scope of that
concept.
And therein lies what I think is the biggest biodiversity informatics challenge. That is, one of the most fundamental units of biology has a history of being very imprecisely defined by the practitioners who
establish
those units.
Aloha, Rich
-----Original Message----- From: Mark Wilden [mailto:mark@mwilden.com] Sent: Tuesday, July 06, 2010 5:46 AM To: Richard Pyle Subject: Re: [tdwg-content] Taxon Concept dilemma
On Tue, Jul 6, 2010 at 6:10 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
This is why the only way we're going to be able to establish RelationshipAssertions (sensu TCS) is via third-party
assertions. In
other words, someone is going to have to assert an opinion over whether the implied members of Smith's Aus bus would have
included the
population in Hawaii, and whether the implied set of Jones' Aus cus would have included the population in the Marshall Islands.
I think that a "someone" is always asserting such an opinion
- Smith and Jones included. There is no Platonic ideal of a particular
species. Every single classification is a matter of educated opinion. Smith has one opinion and Jones has another opinion. Brown may step in and decide that Smith's opinion is the correct one - but that's just another opinion. Consumers of the classification choose whose opinions are the most useful.
A taxon is always related to a taxon-assigner. In this sense, "circumscription" is perhaps not the best way to think about it, because very few assigners actually determine taxa by enumerating organisms.
The idea of researchers creating taxa, and third parties adjudicating them to arrive at the "true" classification, is too limited. It's third parties all the way down.
///ark
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content