[tdwg-content] Fwd: Taxon Concept dilemma

Bob Morris morris.bob at gmail.com
Fri Jul 9 20:11:04 CEST 2010


Based on my experience building species id systems, I agree with you
entirely, even, by hearsay, for determinations by hard-core
taxonomists.  However, all I was addressing in my post is the issue of
whether for informatics purposes, some alternatives to proliferating
names and/or GUIDs are advantageous or not. My conclusion was that
they are, for machine-centric questions. I enjoy trying to follow the
rest of the arguments in this thread, but "trying to" is the operative
word.

On Fri, Jul 9, 2010 at 12:51 PM, Peter DeVries <pete.devries at gmail.com> wrote:
> Hi Bob,
> I think of the individual specimens that are included in a species concept
> definition as representative of the variation that exists within the species
> concept. In a sense, they serve as a guide as to what specimens are a match
> to a particular species concept. I might be misinterpreting your comments
> but this use does not seem to match the formal
> definition of individuals in a set.
> Species description documents should be testable in the following way: given
> 1,000 specimens and set of species description documents can different
> scientists repeatably match the same specimens to the same concepts about
> 95% of the time.
> Contrast this approach to the current method of determining whether a given
> specimen matches the original species description for Culex triseriatus =>
> Aedes triseriatus => Ochlerotatus triseriatus. I may be wrong but the
> original type specimen seems to be missing. Attached is the original species
> description.
> Also consider the proportion of existing identifications of this species,
> how many were determined based on the original description? In this example
> species, the formal species description does not seem to play a role in how
> these mosquitoes are actually identified.
> I would suspect that this is true for a number of species identifications.
> So we are modeling this domain as if the original species descriptions were
> used when in actuality they were not used for the vast majority of
> identifications.
> - Pete
>
>
>
> On Fri, Jul 9, 2010 at 9:28 AM, Bob Morris <morris.bob at gmail.com> wrote:
>>
>> I barely dare to poke in here since you know how little biology I
>> know.  Casting in terms of  rudimentary Set Theory, let me take a
>> Computer Science/Math view of the issues you raised, under the
>> assumption that we are talking exactly about circumscription.
>> Correspondents who don't agree to that maybe don't care about this
>> post. Anyone anxious about. or disinterested in, the term "boolean
>> expression" should stop reading now. Apologies if I have just
>> eliminated all the readership.
>>
>> In Set Theory, it is rare, to my knowledge, to assign to a set a
>> globally unique identifier,  or a widely used name with huge social
>> impediment to change. Only "The Empty Set" comes to mind, and even
>> that has several widely recognized single character orthographic
>> representations.  What one names a set can never change its contents.
>> However, it \is/ often the case that sets are defined by descriptive
>> data, e.g. a function that determines what is or isn't in the set
>> (called the indicator, or characteristic  function
>> http://en.wikipedia.org/wiki/Indicator_function.) It is somewhat
>> useful for proving theorems, but in many useful cases, it may
>> sometimes be hard to compute it, and sometimes not.  For example the
>> characteristic function of the set of positive even integers is
>> defined by
>>    I(x) = 1 if and only if x is a positive whole number
>>    and there is a positive even integer y such that x=2y.
>>
>> Now that was easy, wasn't it? But how many readers of this post could
>> quickly determine if a number is even when that number is represented
>> in base 10?,  base 2?, base 53? , base 54? I bet the answer is " 'all'
>> for 10, 'many' for 2, 'few or none' for 53 and 54). Thus, the name
>> "Positive Even Integers" is disambiguated by the characteristic
>> function, but it may still be hard to figure out the circumscription
>> of that name, depending on the representation of the underlying set
>> members.
>>
>> Of course, in an \\application// of Set Theory, e.g. Number Theory,
>> some widely used names for specific sets come with understood names
>> carrying a huge social impediment to changing the circumscription.
>> "Positive Even Integers" is an example.
>>
>> In Set Theory applications, proliferation of names is often controlled
>> by use of boolean expressions, especially expressions involving union,
>> intersection, and negation (OR, AND, NOT) (*)  For computational use,
>> it is arguably sometimes better to refer to a set as a boolean
>> function of some other, temporarily named, or anonymous sets than to
>> identify it with a characteristic function of its own. That's because
>> there is a long literature on computational use of boolean
>> expressions, both as to the computational complexity of problems
>> framed in terms of them, and as to algorithms for many such problems.
>> Some problems don't require computation of indicator functions. An
>> example is deciding whether two boolean expressions in the same
>> variables represent determine the same set. That is independent of
>> what the variables are and how set members may be represented.
>>
>> Neither assigning multiple names or GUIDs to a set, nor reassigning
>> one of those from one set to some other set, can change either set.
>> However, either can surely confuse human readers and both raise
>> problems for use as database keys (as in the special case of taxonomic
>> names).  So also can an opinion--especially a hard to find or
>> obscurely applied one--such as "In my opinion the name "Blah blah" is
>> more appropriately applied to the set A OR B, than it is to the set A"
>>
>> For programming languages problems of shifting names usually fall
>> under the rubric of variable "scope", which I don't discuss here.
>>
>> My conclusion: for machine use, be spare on names and GUIDs and
>> generous with boolean expressions. But take care for the computational
>> complexity of the problem at hand.  The latter sounds scary, but if
>> the truth be known (pun intended), harmful--i.e. fundamentally
>> exponential--computational complexity necessarily bites you exactly as
>> badly no matter what computational methodology you use(**). Then
>> arises the question: "Does generous use of boolean expressions cause
>> proliferation of other resource use worse than caused by name and GUID
>> management?" My answer is "It depends, but if it also leads to more
>> robust reasoning, it is likely advantageous."
>>
>> Bob Morris
>> Recovering Algebraist
>>
>> (*)It is sometimes startling to new students of logic who are
>> comfortable with expressions made of complex combinations of OR, AND,
>> and NOT,  to learn that every such expression can be represented as an
>> expression using only NOR ("not or", functionally equivalent to
>> NOR(A,B) = NOT(A OR B)   colloquially, "neither A nor B" ).   NOR can
>> be defined without use of NOT or OR by the use of a 2x2 truth table.
>>
>> (**)Not exactly. Parallel computing and  probabilistic approaches such
>> as quantum computing sometimes help.
>>
>>
>>
>> On Wed, Jul 7, 2010 at 10:28 AM, Richard Pyle <deepreef at bishopmuseum.org>
>> wrote:
>> >
>> > Hi Mark,
>> >
>> > I agree with everything you say (which is why I think that essentially
>> > everything related to "taxa" should be represented through Usage
>> > Instances).
>> > I also agree that taxonomists do not often articulate the scope of their
>> > taxon concepts by enumerating the included organisms. However, I would
>> > argue
>> > that when most (all?) taxonomists conceive of a taxon concept, the
>> > "essence"
>> > of the concept is the set of organisms implied to be circumscribed by
>> > it.
>> > Thus, there is an historical disconnect between what a taxonomist means
>> > by a
>> > taxon concept, and how a taxonomist articulates the scope of that
>> > concept.
>> > And therein lies what I think is the biggest biodiversity informatics
>> > challenge.  That is, one of the most fundamental units of biology has a
>> > history of being very imprecisely defined by the practitioners who
>> > establish
>> > those units.
>> >
>> > Aloha,
>> > Rich
>> >
>> >> -----Original Message-----
>> >> From: Mark Wilden [mailto:mark at mwilden.com]
>> >> Sent: Tuesday, July 06, 2010 5:46 AM
>> >> To: Richard Pyle
>> >> Subject: Re: [tdwg-content] Taxon Concept dilemma
>> >>
>> >> On Tue, Jul 6, 2010 at 6:10 AM, Richard Pyle
>> >> <deepreef at bishopmuseum.org> wrote:
>> >>
>> >> > This is why the only way we're going to be able to establish
>> >> > RelationshipAssertions (sensu TCS) is via third-party
>> >> assertions.  In
>> >> > other words, someone is going to have to assert an opinion over
>> >> > whether the implied members of Smith's Aus bus would have
>> >> included the
>> >> > population in Hawaii, and whether the implied set of Jones' Aus cus
>> >> > would have included the population in the Marshall Islands.
>> >>
>> >> I think that a "someone" is always asserting such an opinion
>> >> - Smith and Jones included. There is no Platonic ideal of a particular
>> >> species. Every single classification is a matter of educated opinion.
>> >> Smith has one opinion and Jones has another opinion. Brown may step in
>> >> and decide that Smith's opinion is the correct one - but that's just
>> >> another opinion. Consumers of the classification choose whose opinions
>> >> are the most useful.
>> >>
>> >> A taxon is always related to a taxon-assigner. In this sense,
>> >> "circumscription" is perhaps not the best way to think about it,
>> >> because very few assigners actually determine taxa by enumerating
>> >> organisms.
>> >>
>> >> The idea of researchers creating taxa, and third parties adjudicating
>> >> them to arrive at the "true" classification, is too limited. It's
>> >> third parties all the way down.
>> >>
>> >> ///ark
>> >
>> >
>> > _______________________________________________
>> > tdwg-content mailing list
>> > tdwg-content at lists.tdwg.org
>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> >
>>
>>
>>
>> --
>> Robert A. Morris
>> Emeritus Professor  of Computer Science
>> UMASS-Boston
>> 100 Morrissey Blvd
>> Boston, MA 02125-3390
>> Associate, Harvard University Herbaria
>> email: ram at cs.umb.edu
>> web: http://bdei.cs.umb.edu/
>> web: http://etaxonomy.org/FilteredPush
>> http://www.cs.umb.edu/~ram
>> phone (+1) 857 222 7992 (mobile)
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> --
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> GeoSpecies Knowledge Base
> About the GeoSpecies Knowledge Base
> ------------------------------------------------------------
>



-- 
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: ram at cs.umb.edu
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)



More information about the tdwg-content mailing list