[tdwg-content] Fwd: Taxon Concept dilemma

Peter DeVries pete.devries at gmail.com
Fri Jul 9 18:51:13 CEST 2010


Hi Bob,

I think of the individual specimens that are included in a species concept
definition as representative of the variation that exists within the species
concept. In a sense, they serve as a guide as to what specimens are a match
to a particular species concept. I might be misinterpreting your comments
but this use does not seem to match the formal
definition of individuals in a set.

Species description documents should be testable in the following way: given
1,000 specimens and set of species description documents can different
scientists repeatably match the same specimens to the same concepts about
95% of the time.

Contrast this approach to the current method of determining whether a given
specimen matches the original species description for Culex triseriatus =>
Aedes triseriatus => Ochlerotatus triseriatus. I may be wrong but the
original type specimen seems to be missing. Attached is the original species
description.

Also consider the proportion of existing identifications of this species,
how many were determined based on the original description? In this example
species, the formal species description does not seem to play a role in how
these mosquitoes are actually identified.

I would suspect that this is true for a number of species identifications.
So we are modeling this domain as if the original species descriptions were
used when in actuality they were not used for the vast majority of
identifications.

- Pete




On Fri, Jul 9, 2010 at 9:28 AM, Bob Morris <morris.bob at gmail.com> wrote:

> I barely dare to poke in here since you know how little biology I
> know.  Casting in terms of  rudimentary Set Theory, let me take a
> Computer Science/Math view of the issues you raised, under the
> assumption that we are talking exactly about circumscription.
> Correspondents who don't agree to that maybe don't care about this
> post. Anyone anxious about. or disinterested in, the term "boolean
> expression" should stop reading now. Apologies if I have just
> eliminated all the readership.
>
> In Set Theory, it is rare, to my knowledge, to assign to a set a
> globally unique identifier,  or a widely used name with huge social
> impediment to change. Only "The Empty Set" comes to mind, and even
> that has several widely recognized single character orthographic
> representations.  What one names a set can never change its contents.
> However, it \is/ often the case that sets are defined by descriptive
> data, e.g. a function that determines what is or isn't in the set
> (called the indicator, or characteristic  function
> http://en.wikipedia.org/wiki/Indicator_function.) It is somewhat
> useful for proving theorems, but in many useful cases, it may
> sometimes be hard to compute it, and sometimes not.  For example the
> characteristic function of the set of positive even integers is
> defined by
>    I(x) = 1 if and only if x is a positive whole number
>    and there is a positive even integer y such that x=2y.
>
> Now that was easy, wasn't it? But how many readers of this post could
> quickly determine if a number is even when that number is represented
> in base 10?,  base 2?, base 53? , base 54? I bet the answer is " 'all'
> for 10, 'many' for 2, 'few or none' for 53 and 54). Thus, the name
> "Positive Even Integers" is disambiguated by the characteristic
> function, but it may still be hard to figure out the circumscription
> of that name, depending on the representation of the underlying set
> members.
>
> Of course, in an \\application// of Set Theory, e.g. Number Theory,
> some widely used names for specific sets come with understood names
> carrying a huge social impediment to changing the circumscription.
> "Positive Even Integers" is an example.
>
> In Set Theory applications, proliferation of names is often controlled
> by use of boolean expressions, especially expressions involving union,
> intersection, and negation (OR, AND, NOT) (*)  For computational use,
> it is arguably sometimes better to refer to a set as a boolean
> function of some other, temporarily named, or anonymous sets than to
> identify it with a characteristic function of its own. That's because
> there is a long literature on computational use of boolean
> expressions, both as to the computational complexity of problems
> framed in terms of them, and as to algorithms for many such problems.
> Some problems don't require computation of indicator functions. An
> example is deciding whether two boolean expressions in the same
> variables represent determine the same set. That is independent of
> what the variables are and how set members may be represented.
>
> Neither assigning multiple names or GUIDs to a set, nor reassigning
> one of those from one set to some other set, can change either set.
> However, either can surely confuse human readers and both raise
> problems for use as database keys (as in the special case of taxonomic
> names).  So also can an opinion--especially a hard to find or
> obscurely applied one--such as "In my opinion the name "Blah blah" is
> more appropriately applied to the set A OR B, than it is to the set A"
>
> For programming languages problems of shifting names usually fall
> under the rubric of variable "scope", which I don't discuss here.
>
> My conclusion: for machine use, be spare on names and GUIDs and
> generous with boolean expressions. But take care for the computational
> complexity of the problem at hand.  The latter sounds scary, but if
> the truth be known (pun intended), harmful--i.e. fundamentally
> exponential--computational complexity necessarily bites you exactly as
> badly no matter what computational methodology you use(**). Then
> arises the question: "Does generous use of boolean expressions cause
> proliferation of other resource use worse than caused by name and GUID
> management?" My answer is "It depends, but if it also leads to more
> robust reasoning, it is likely advantageous."
>
> Bob Morris
> Recovering Algebraist
>
> (*)It is sometimes startling to new students of logic who are
> comfortable with expressions made of complex combinations of OR, AND,
> and NOT,  to learn that every such expression can be represented as an
> expression using only NOR ("not or", functionally equivalent to
> NOR(A,B) = NOT(A OR B)   colloquially, "neither A nor B" ).   NOR can
> be defined without use of NOT or OR by the use of a 2x2 truth table.
>
> (**)Not exactly. Parallel computing and  probabilistic approaches such
> as quantum computing sometimes help.
>
>
>
> On Wed, Jul 7, 2010 at 10:28 AM, Richard Pyle <deepreef at bishopmuseum.org>
> wrote:
> >
> > Hi Mark,
> >
> > I agree with everything you say (which is why I think that essentially
> > everything related to "taxa" should be represented through Usage
> Instances).
> > I also agree that taxonomists do not often articulate the scope of their
> > taxon concepts by enumerating the included organisms. However, I would
> argue
> > that when most (all?) taxonomists conceive of a taxon concept, the
> "essence"
> > of the concept is the set of organisms implied to be circumscribed by it.
> > Thus, there is an historical disconnect between what a taxonomist means
> by a
> > taxon concept, and how a taxonomist articulates the scope of that
> concept.
> > And therein lies what I think is the biggest biodiversity informatics
> > challenge.  That is, one of the most fundamental units of biology has a
> > history of being very imprecisely defined by the practitioners who
> establish
> > those units.
> >
> > Aloha,
> > Rich
> >
> >> -----Original Message-----
> >> From: Mark Wilden [mailto:mark at mwilden.com]
> >> Sent: Tuesday, July 06, 2010 5:46 AM
> >> To: Richard Pyle
> >> Subject: Re: [tdwg-content] Taxon Concept dilemma
> >>
> >> On Tue, Jul 6, 2010 at 6:10 AM, Richard Pyle
> >> <deepreef at bishopmuseum.org> wrote:
> >>
> >> > This is why the only way we're going to be able to establish
> >> > RelationshipAssertions (sensu TCS) is via third-party
> >> assertions.  In
> >> > other words, someone is going to have to assert an opinion over
> >> > whether the implied members of Smith's Aus bus would have
> >> included the
> >> > population in Hawaii, and whether the implied set of Jones' Aus cus
> >> > would have included the population in the Marshall Islands.
> >>
> >> I think that a "someone" is always asserting such an opinion
> >> - Smith and Jones included. There is no Platonic ideal of a particular
> >> species. Every single classification is a matter of educated opinion.
> >> Smith has one opinion and Jones has another opinion. Brown may step in
> >> and decide that Smith's opinion is the correct one - but that's just
> >> another opinion. Consumers of the classification choose whose opinions
> >> are the most useful.
> >>
> >> A taxon is always related to a taxon-assigner. In this sense,
> >> "circumscription" is perhaps not the best way to think about it,
> >> because very few assigners actually determine taxa by enumerating
> >> organisms.
> >>
> >> The idea of researchers creating taxa, and third parties adjudicating
> >> them to arrive at the "true" classification, is too limited. It's
> >> third parties all the way down.
> >>
> >> ///ark
> >
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >
>
>
>
> --
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: ram at cs.umb.edu
> web: http://bdei.cs.umb.edu/
> web: http://etaxonomy.org/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>



-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20100709/a0575570/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Type_Description_Culex_triseriatus.png
Type: image/png
Size: 276301 bytes
Desc: not available
Url : http://lists.tdwg.org/pipermail/tdwg-content/attachments/20100709/a0575570/attachment.png 


More information about the tdwg-content mailing list