[tdwg-content] Fwd: Taxon Concept dilemma

Bob Morris morris.bob at gmail.com
Fri Jul 9 16:28:56 CEST 2010

I barely dare to poke in here since you know how little biology I
know.  Casting in terms of  rudimentary Set Theory, let me take a
Computer Science/Math view of the issues you raised, under the
assumption that we are talking exactly about circumscription.
Correspondents who don't agree to that maybe don't care about this
post. Anyone anxious about. or disinterested in, the term "boolean
expression" should stop reading now. Apologies if I have just
eliminated all the readership.

In Set Theory, it is rare, to my knowledge, to assign to a set a
globally unique identifier,  or a widely used name with huge social
impediment to change. Only "The Empty Set" comes to mind, and even
that has several widely recognized single character orthographic
representations.  What one names a set can never change its contents.
However, it \is/ often the case that sets are defined by descriptive
data, e.g. a function that determines what is or isn't in the set
(called the indicator, or characteristic  function
http://en.wikipedia.org/wiki/Indicator_function.) It is somewhat
useful for proving theorems, but in many useful cases, it may
sometimes be hard to compute it, and sometimes not.  For example the
characteristic function of the set of positive even integers is
defined by
    I(x) = 1 if and only if x is a positive whole number
    and there is a positive even integer y such that x=2y.

Now that was easy, wasn't it? But how many readers of this post could
quickly determine if a number is even when that number is represented
in base 10?,  base 2?, base 53? , base 54? I bet the answer is " 'all'
for 10, 'many' for 2, 'few or none' for 53 and 54). Thus, the name
"Positive Even Integers" is disambiguated by the characteristic
function, but it may still be hard to figure out the circumscription
of that name, depending on the representation of the underlying set

Of course, in an \\application// of Set Theory, e.g. Number Theory,
some widely used names for specific sets come with understood names
carrying a huge social impediment to changing the circumscription.
"Positive Even Integers" is an example.

In Set Theory applications, proliferation of names is often controlled
by use of boolean expressions, especially expressions involving union,
intersection, and negation (OR, AND, NOT) (*)  For computational use,
it is arguably sometimes better to refer to a set as a boolean
function of some other, temporarily named, or anonymous sets than to
identify it with a characteristic function of its own. That's because
there is a long literature on computational use of boolean
expressions, both as to the computational complexity of problems
framed in terms of them, and as to algorithms for many such problems.
Some problems don't require computation of indicator functions. An
example is deciding whether two boolean expressions in the same
variables represent determine the same set. That is independent of
what the variables are and how set members may be represented.

Neither assigning multiple names or GUIDs to a set, nor reassigning
one of those from one set to some other set, can change either set.
However, either can surely confuse human readers and both raise
problems for use as database keys (as in the special case of taxonomic
names).  So also can an opinion--especially a hard to find or
obscurely applied one--such as "In my opinion the name "Blah blah" is
more appropriately applied to the set A OR B, than it is to the set A"

For programming languages problems of shifting names usually fall
under the rubric of variable "scope", which I don't discuss here.

My conclusion: for machine use, be spare on names and GUIDs and
generous with boolean expressions. But take care for the computational
complexity of the problem at hand.  The latter sounds scary, but if
the truth be known (pun intended), harmful--i.e. fundamentally
exponential--computational complexity necessarily bites you exactly as
badly no matter what computational methodology you use(**). Then
arises the question: "Does generous use of boolean expressions cause
proliferation of other resource use worse than caused by name and GUID
management?" My answer is "It depends, but if it also leads to more
robust reasoning, it is likely advantageous."

Bob Morris
Recovering Algebraist

(*)It is sometimes startling to new students of logic who are
comfortable with expressions made of complex combinations of OR, AND,
and NOT,  to learn that every such expression can be represented as an
expression using only NOR ("not or", functionally equivalent to
NOR(A,B) = NOT(A OR B)   colloquially, "neither A nor B" ).   NOR can
be defined without use of NOT or OR by the use of a 2x2 truth table.

(**)Not exactly. Parallel computing and  probabilistic approaches such
as quantum computing sometimes help.

On Wed, Jul 7, 2010 at 10:28 AM, Richard Pyle <deepreef at bishopmuseum.org> wrote:
> Hi Mark,
> I agree with everything you say (which is why I think that essentially
> everything related to "taxa" should be represented through Usage Instances).
> I also agree that taxonomists do not often articulate the scope of their
> taxon concepts by enumerating the included organisms. However, I would argue
> that when most (all?) taxonomists conceive of a taxon concept, the "essence"
> of the concept is the set of organisms implied to be circumscribed by it.
> Thus, there is an historical disconnect between what a taxonomist means by a
> taxon concept, and how a taxonomist articulates the scope of that concept.
> And therein lies what I think is the biggest biodiversity informatics
> challenge.  That is, one of the most fundamental units of biology has a
> history of being very imprecisely defined by the practitioners who establish
> those units.
> Aloha,
> Rich
>> -----Original Message-----
>> From: Mark Wilden [mailto:mark at mwilden.com]
>> Sent: Tuesday, July 06, 2010 5:46 AM
>> To: Richard Pyle
>> Subject: Re: [tdwg-content] Taxon Concept dilemma
>> On Tue, Jul 6, 2010 at 6:10 AM, Richard Pyle
>> <deepreef at bishopmuseum.org> wrote:
>> > This is why the only way we're going to be able to establish
>> > RelationshipAssertions (sensu TCS) is via third-party
>> assertions.  In
>> > other words, someone is going to have to assert an opinion over
>> > whether the implied members of Smith's Aus bus would have
>> included the
>> > population in Hawaii, and whether the implied set of Jones' Aus cus
>> > would have included the population in the Marshall Islands.
>> I think that a "someone" is always asserting such an opinion
>> - Smith and Jones included. There is no Platonic ideal of a particular
>> species. Every single classification is a matter of educated opinion.
>> Smith has one opinion and Jones has another opinion. Brown may step in
>> and decide that Smith's opinion is the correct one - but that's just
>> another opinion. Consumers of the classification choose whose opinions
>> are the most useful.
>> A taxon is always related to a taxon-assigner. In this sense,
>> "circumscription" is perhaps not the best way to think about it,
>> because very few assigners actually determine taxa by enumerating
>> organisms.
>> The idea of researchers creating taxa, and third parties adjudicating
>> them to arrive at the "true" classification, is too limited. It's
>> third parties all the way down.
>> ///ark
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Robert A. Morris
Emeritus Professor  of Computer Science
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: ram at cs.umb.edu
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/FilteredPush
phone (+1) 857 222 7992 (mobile)

More information about the tdwg-content mailing list