Space shuttles and bicycles

Eric Zurcher ericz at ENTO.CSIRO.AU
Tue Jul 25 11:29:22 CEST 2000


At 09:41 24/07/2000 +1000, Kevin Thiele wrote:
>My way of looking at it is that if DELTA is a bicycle, I'm proposing a
>motor bike, and you're sketching out plans for a space shuttle. Maybe I'm
>not being visionary enough?

Kevin's analogy is actually quite flattering to DELTA. In constrast to a
space shuttle, a bicycle is simple, affordable, robust, very efficient, and
usable by nearly anyone once they've had a bit of training (though
admittedly the training can be off-putting to the novice).

So I would suggest that before we consign the old bike to rust away in the
garage, we take a good long look at what we perceive to be its strengths
and weaknesses. It has, after all, served fairly reliably for over 20 years
- a remarkably long time in the data-processing field.

If I may be indulged, I'd like to quickly list a few of what I perceive to
be DELTA's strengths and weaknesses.

Perceived strengths:

1) Flexibility - the DELTA system is capable of encoding a large range of
character types, especially those traditionally used for identification and
classification. The system has been or is being used for encoding
descriptions of pottery as well as Poaceae; soils as well as spiders;
viruses as well as violets.

2) Large range of translations - this was really the impetus for the
initial development of DELTA. Once entered, data can be readily converted
into a variety of formats for usage by more specialized applications. Hence
data needs to be entered only once to generate natural language,
conventional keys, or interactive keys, or to perform phenetic or cladistic
analyses.

3) Efficiency - the format is very compact, yet is still human-readable. It
is relatively easy to parse - I know of parsers written independently in
Fortran, C++ (both from our CSIRO group), Pascal (Mauro Cavalcanti), and
Basic (Gregor Hagedorn's DeltaAccess), and I've even helped write one in Perl.

4) Internationalization - the format make it easy to prepare descriptions
and keys (conventional or interactive) in a variety of languages, without
having to re-enter data.

Perceived weaknesses:

1) Limited extensibility - although the DELTA format can be extended (see,
for example, the "proposed new features" document available at the DELTA
web site < http://www.biodiversity.uno.edu/delta/www/programs.htm >), most
extensions tend to be done on a rather ad hoc basis, and typically must be
done by program developers, rather than end users (that is, DELTA's
extensibility is perhaps analogous to that of HTML, rather than XML).

2) Need for new data types - several new types of information associated
with taxonomy and systematics have arisen in recent years, and which are
not well catered for in the current DELTA format. Notable examples include
sequence data, hyperlinks to related material in remote locations, and
links to other programs (e.g., bibliographic databases). Another aspect
might be the inclusion of meta-data (e.g., who recorded this observation,
and on what material?) While this sort of information can be placed into
DELTA "text" characters and comments (as could virtually anything
representable as a string of bytes, with a bit of effort), the use of text
characters does not provide for well-structured access to the information.

3) Poor support for hierarchies or classifications - the current DELTA
system offers little in the way of representing classifications of either
the taxa within the treatment, or of the characters used to describe those
taxa.

4) Doesn't use "industry standard" technology - because DELTA is a
special-purpose format, it is not possible to obtain off-the-shelf, general
purpose software which can understand the format. This restricts DELTA's
ability to interact with other programs (for example, those that might use
SQL queries, or make use of embedded objects).

5) Difficulty in merging or comparing datasets - it is rather difficult to
combine datasets based on differing character lists, even when those
character lists are fairly similar. There is no mechanism for "mapping"
character states from one dataset onto those of another. (Disparate
character lists are another matter entirely. My personal view is that the
holy grail of a "universal" character list for, say, all of botany will
tend to remain tantilizingly just out of reach, and the efforts of this
group should not be distracted in that direction.)

Well, that's a start. Others may wish to add to it. I hope that whatever
new model is developed can duplicate DELTA's strengths, but circumvent some
of the weaknesses.

My view is that an XML-based approach does offer hope for achieving these
goals. Like DELTA, XML somewhat bridges the gap between "document" and
"database", providing for great flexibility. Parsers and translators are
readily available, and XML is designed to be extensible. But getting the
design "right" is going to take a lot of thought and effort.

Cheers,


Eric Zurcher
CSIRO Division of Entomology
Canberra, Australia
E-mail: ericz at ento.csiro.au




More information about the tdwg-content mailing list