Re: Progressive Revelation

28 Feb 2000

      ...
From: Kevin Thiele <kevin.thiele@PI.CSIRO.AU>
To: TDWG - Structure of Descriptive Data <TDWG-SDD@USOBI.ORG>
...
Create a key to all grass species so you're working with a list of all
taxa at species level including all the Poa species. The character list
has two classes of characters - ones that are scored over all taxa (these
will be the easily generalised characters) and ones that are scored for
only a subset of taxa (the characters that are highly specific and/or not
easily generalisable). When the key program starts it splashes up the
generalised characters only. But if after answering some characters you
end up with only Poas, the program finds and adds to the character list
the Poa-specific characters. Characters are progressively revealed as you
proceed through the key, with as much depth as necessary - e.g. you may
come down to a species complex of alpine Poas and presto! some characters
appear that are just the ticket to separate them.
Something like this, but less rigid, is achieved automatically by algorithms
for finding 'best' characters. A 'best' algorithm typically has a penalty
for characters which are unknown, inapplicable, or variable for some taxa,
but it does not completely exclude them. The 'best' algorithm used in our
programs Intkey (interactive identification) and Key (generation of
conventional keys) has a natural penalty for such characters, arising from
the goal of minimizing the average length of an identification. The
algorithm also has a parameter, Varywt, which can be used to add an
arbitrary penalty for such characters. However, this tends to increase the
average length of an identification, so its use in Intkey is not
recommended. A value of 0 for Varywt would have an effect similar to that
proposed above by Kevin.

If you try this in Intkey, note that a value of 0 is treated as 0.01, an ad
hoc adjustment made specifically to _avoid_ the complete exclusion of the
characters. In a data set with a substantial number of missing or
inapplicable values (e.g. the sample data supplied with the DELTA programs),
it is easy to observe how low values of Varywt cause characters with
comparatively low separating power (and which therefore result in longer
identifications) to move towards the top of the 'best' list.

Varywt is primarily intended for use in Key, where the increased average
length of an identification is offset by a reduction in the _printed_ length
of the key. In Key, a value of 0 completely prevents the use of characters
with intra-taxon variability.

--

Mike Dallwitz

CSIRO Entomology, GPO Box 1700, Canberra ACT 2601, Australia
Phone: +61 2 6246 4075   Fax: +61 2 6246 4000
Email: md@ento.csiro.au  Internet: biodiversity.uno.edu/delta/

Mike Dallwitz

tags

participants (1)