Progressive Revelation

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Mon Feb 28 09:47:10 CET 2000


At 15:30 24/02/00 +1100, Eric Zurcher wrote:

>6) I'm intrigued by the notion of a "Progressive Revelation model"
>(footnote 5). It sounds terribly theological - or perhaps that's
>Thiele-logical? (my apologies to Kevin, but I really can't resist bad puns).

I'm often accused of teleology, but rarely of theology.

Progressive Revelation is perhaps a new way of handling holes in data
matrices for random-access keys. The background is this:

The simplest data structure for a random-access key is a fully populated
matrix i.e. all taxa are scored for all characters/states. Works well
sometimes, especially if the taxa are highly comparable e.g. the species of
a genus or the genera of a family.

This structure is problematic sometimes though, for two reasons. Firstly and
most simply, you may not have data for all taxa, and need to leave holes in
the matrix. Solution is simple - fill the holes with ?s and allow for this
in the key program. But it often also happens that some characters are
simply inapplicable to some taxa, or (worse) are non-ambiguous for some taxa
but ambiguous for others. For instance, stipules don't occur in monocots,
stipule-like structures sometimes do but if you try scoring stipule
characters as defined for dicots against monocots you run into all sorts of
strife because of ambiguity of context. LucID can handle this to some extent
using the "present by misinterpretation" score, but the problem is in the
character definition, not the score.

Sometimes a better way to handle such circumstances using LucID is to break
the key up into a hierarchically nested set of keys & subkeys. For instance,
you want to create a key to grass species of Australia but there are many
special characters needed for identifying Poa species that are either
inapplicable to or ambiguous with respect to the remaining grasses. So put
Poa as a genus in the top-level key and attach to it a subkey to Poa species
in which you can optimise your character definitions for the Poas. There are
some disadvantages to this but often the advantages (in having an optimised
rather than generalised and suboptimal character list) outweigh the
disdvantages.

But there may be another solution - Progressive Revelation. As far as I know
no-one's done this yet, but I think it has merit. It would work like this.

Create a key to all grass species so you're working with a list of all taxa
at species level including all the Poa species. The character list has two
classes of characters - ones that are scored over all taxa (these will be
the easily generalised characters) and ones that are scored for only a
subset of taxa (the characters that are highly specific and/or not easily
generalisable). When the key program starts it splashes up the generalised
characters only. But if after answering some characters you end up with only
Poas, the program finds and adds to the character list the Poa-specific
characters. Characters are progressively revealed as you proceed through the
key, with as much depth as necessary - e.g. you may come down to a species
complex of alpine Poas and presto! some characters appear that are just the
ticket to separate them.

Might work.

This seems to me to be a more natural way of approaching both the building
and running of a key. In some ways it's like a hybrid between a traditional
random-access key and a traditional nested hard-copy key, but it has more
flexibility than either.

In the context of natural-language descriptions (and more controversially)
it would also provide a challenge to what to me is the Universalist furphy
that all descriptions should be strictly comparable!

I've included these ideas into the SDD Specification because I think this
may be one for the future.

Cheers - k

ps Eric - if you incorporate these ideas into DELTA we'll need to make an
arrangement regarding due acknowledgement and royalties :-)




More information about the tdwg-content mailing list