At 09:41 24/07/2000 +1000, Kevin Thiele wrote:
My way of looking at it is that if DELTA is a bicycle, I'm proposing a motor bike, and you're sketching out plans for a space shuttle. Maybe I'm not being visionary enough?
Kevin's analogy is actually quite flattering to DELTA. In constrast to a space shuttle, a bicycle is simple, affordable, robust, very efficient, and usable by nearly anyone once they've had a bit of training (though admittedly the training can be off-putting to the novice).
So I would suggest that before we consign the old bike to rust away in the garage, we take a good long look at what we perceive to be its strengths and weaknesses. It has, after all, served fairly reliably for over 20 years - a remarkably long time in the data-processing field.
If I may be indulged, I'd like to quickly list a few of what I perceive to be DELTA's strengths and weaknesses.
Perceived strengths:
1) Flexibility - the DELTA system is capable of encoding a large range of character types, especially those traditionally used for identification and classification. The system has been or is being used for encoding descriptions of pottery as well as Poaceae; soils as well as spiders; viruses as well as violets.
2) Large range of translations - this was really the impetus for the initial development of DELTA. Once entered, data can be readily converted into a variety of formats for usage by more specialized applications. Hence data needs to be entered only once to generate natural language, conventional keys, or interactive keys, or to perform phenetic or cladistic analyses.
3) Efficiency - the format is very compact, yet is still human-readable. It is relatively easy to parse - I know of parsers written independently in Fortran, C++ (both from our CSIRO group), Pascal (Mauro Cavalcanti), and Basic (Gregor Hagedorn's DeltaAccess), and I've even helped write one in Perl.
4) Internationalization - the format make it easy to prepare descriptions and keys (conventional or interactive) in a variety of languages, without having to re-enter data.
Perceived weaknesses:
1) Limited extensibility - although the DELTA format can be extended (see, for example, the "proposed new features" document available at the DELTA web site < http://www.biodiversity.uno.edu/delta/www/programs.htm >), most extensions tend to be done on a rather ad hoc basis, and typically must be done by program developers, rather than end users (that is, DELTA's extensibility is perhaps analogous to that of HTML, rather than XML).
2) Need for new data types - several new types of information associated with taxonomy and systematics have arisen in recent years, and which are not well catered for in the current DELTA format. Notable examples include sequence data, hyperlinks to related material in remote locations, and links to other programs (e.g., bibliographic databases). Another aspect might be the inclusion of meta-data (e.g., who recorded this observation, and on what material?) While this sort of information can be placed into DELTA "text" characters and comments (as could virtually anything representable as a string of bytes, with a bit of effort), the use of text characters does not provide for well-structured access to the information.
3) Poor support for hierarchies or classifications - the current DELTA system offers little in the way of representing classifications of either the taxa within the treatment, or of the characters used to describe those taxa.
4) Doesn't use "industry standard" technology - because DELTA is a special-purpose format, it is not possible to obtain off-the-shelf, general purpose software which can understand the format. This restricts DELTA's ability to interact with other programs (for example, those that might use SQL queries, or make use of embedded objects).
5) Difficulty in merging or comparing datasets - it is rather difficult to combine datasets based on differing character lists, even when those character lists are fairly similar. There is no mechanism for "mapping" character states from one dataset onto those of another. (Disparate character lists are another matter entirely. My personal view is that the holy grail of a "universal" character list for, say, all of botany will tend to remain tantilizingly just out of reach, and the efforts of this group should not be distracted in that direction.)
Well, that's a start. Others may wish to add to it. I hope that whatever new model is developed can duplicate DELTA's strengths, but circumvent some of the weaknesses.
My view is that an XML-based approach does offer hope for achieving these goals. Like DELTA, XML somewhat bridges the gap between "document" and "database", providing for great flexibility. Parsers and translators are readily available, and XML is designed to be extensible. But getting the design "right" is going to take a lot of thought and effort.
Cheers,
Eric Zurcher CSIRO Division of Entomology Canberra, Australia E-mail: ericz@ento.csiro.au