I agree with all this Eric :-)
| While canalization is a risk, there certainly is also merit in trying to | retain the best features of "prior art". The DELTA format has persisted for | over 20 years, which is nearly an eternity in the IT field. It must have | been doing one or two things right. There's no need to follow it too | closely, but certainly we can learn quite a bit from DELTA about what works | well and what doesn't.
No-one has suggested ignoring prior art, be it DELTA or any of the other programs. I think we have agreed several times to start with +/- a blank slate, learning as you say quite a bit from DELTA, Lucid etc about what works and what doesn't along the way.
| >2. Of all the descriptions in the world, 99.9999999% of them are not in | >DELTA. Probably 99.999% of them are textual (natural language) descriptions. | >Surely this should form the basis of our first challenge, methinks. | | I think that's actually a rather harsh assessment of DELTA's uptake. Let's | say there are about 4x10^6 known species. We'd like descriptions for them | all, along with descriptions of higher taxonomic levels (genera, families). | So the magnitude of total number of descriptions is about 10^7. I'd guess | that the number of taxa with DELTA descriptions is of the magnitude of | 10^5. So that means roughly 1% of all taxa already have a DELTA description.
I didn't stop to consider the maths, for which my apologies. My point was simply that a vast minority of decriptions are in any type of standardised format at the present. If your estimations are correct (and I can't judge that) then there are still 99 non-DELTA descriptions for every DELTA one, or any other.
| >3. Related to 2 above, many (though by no means all) DELTA datasets are | >already abstractions from the source (a set of natural language | >descriptions). We should start with the source. | | The source? Surely the ultimate source is observations made on individual | specimens. If we wish to start with the source, the first thing that needs | to be done is to make sure that we have a system which can adequately | describe a specimen in hand.
Yes, you're right about the ultimate source, but there are chains of sources, of course. The reason why I'm interested in the vast legacy of textual descriptions as source material is that I think that effective semi-automated processing of these descriptions along the lines of Pankhurst and Taylor is not too far off (some processing is here already, if course). We should make this easier rather than harder, if at all possible. Capturing the truly ultimate direct observations is something that any Builder program supporting the new standard will have to be good at, of course.
| (Sorry if I get too defensive about DELTA. I just can't help myself...)
Please don't think that I'm too dismissive of DELTA. I just think we should be assuming that we can do a fair bit better.
Cheers - k