Re: Taxonomic hierarchy in SDD

29 Nov 2001

      I agree with all this Eric :-)

| While canalization is a risk, there certainly is also merit in trying to
| retain the best features of "prior art". The DELTA format has persisted
for
| over 20 years, which is nearly an eternity in the IT field. It must have
| been doing one or two things right. There's no need to follow it too
| closely, but certainly we can learn quite a bit from DELTA about what
works
| well and what doesn't.

No-one has suggested ignoring prior art, be it DELTA or any of the other
programs. I think we have agreed several times to start with +/- a blank
slate, learning as you say quite a bit from DELTA, Lucid etc about what
works and what doesn't along the way.

| >2. Of all the descriptions in the world, 99.9999999% of them are not in
| >DELTA. Probably 99.999% of them are textual (natural language)
descriptions.
| >Surely this should form the basis of our first challenge, methinks.
|
| I think that's actually a rather harsh assessment of DELTA's uptake. Let's
| say there are about 4x10^6 known species. We'd like descriptions for them
| all, along with descriptions of higher taxonomic levels (genera,
families).
| So the magnitude of total number of descriptions is about 10^7. I'd guess
| that the number of taxa with DELTA descriptions is of the magnitude of
| 10^5. So that means roughly 1% of all taxa already have a DELTA
description.

I didn't stop to consider the maths, for which my apologies. My point was
simply that a vast minority of decriptions are in any type of standardised
format at the present. If your estimations are correct (and I can't judge
that) then there are still 99 non-DELTA descriptions for every DELTA one, or
any other.

| >3. Related to 2 above, many (though by no means all) DELTA datasets are
| >already abstractions from the source (a set of natural language
| >descriptions). We should start with the source.
|
| The source? Surely the ultimate source is observations made on individual
| specimens. If we wish to start with the source, the first thing that needs
| to be done is to make sure that we have a system which can adequately
| describe a specimen in hand.

Yes, you're right about the ultimate source, but there are chains of
sources, of course. The reason why I'm interested in the vast legacy of
textual descriptions as source material is that I think that effective
semi-automated processing of these descriptions along the lines of Pankhurst
and Taylor is not too far off (some processing is here already, if course).
We should make this easier rather than harder, if at all possible. Capturing
the truly ultimate direct observations is something that any Builder program
supporting the new standard will have to be good at, of course.

| (Sorry if I get too defensive about DELTA. I just can't help myself...)

Please don't think that I'm too dismissive of DELTA. I just think we should
be assuming that we can do a fair bit better.

Cheers - k

Re: Taxonomic hierarchy in SDD

Kevin Thiele