My synthetic answer to Kevin Thiele's question:
Why XML?
* it is the ultimate standard for exchange of data of any kind: textual, database, knowledge representation, etc; this standard is quickly growing and there is no doubt it will succeed, accompanying the growth of Internet; * it is extendible, allowing to be very formal or very unformal, as needed, * lots of free tools and standards exist for parsing, viewing, transforming, querying, * it is possible to apply the same techniques on the client browser or on the server, thus adding flexibility.
See the main links on XML on http://jmvanel.free.fr/XML.htm
And now the figures about the "verbosity" of XML:
My XML example (a single plant species from the flora of China) compresses from 2500 bytes to 1090 (57%): http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml
An Xdelta example (Lepidoptera) on Leigh Dodds' site compresses from 50500 bytes to 3600 (93%), because it has a large number of repeted tags.
I used winzip.
Now we see that we can have all plant species on a CD (650 Mb), even without compression: 2500 bytes * 250 000 species = 625 Mb
And on a DVD you can have 10 times more! And also on Internet with the new HTTP 1.1 protocol, the files can be compressed during transmission. Still 2 more arguments: the verbose repeted tags are not repeted when the XML file has been parsed and is in memory; the last argument is financial: today you can buy a 10 000 Mega-bytes disk for 200 US$ , and it will decrease.
So file size is NO problem for taxonomy with XML. You must understand that it's the small price to pay for extensibility and interoperability.
Noel Cross wrote :
On Thu, 2 Dec 1999, Kevin Thiele wrote:
Over the past couple of days I've partially implemented an export function to produce Leigh's XDELTA documents (as a simple example of a possible XML format) for the data in Lucid keys. I have a key to families of flowering plants of Australia (240 taxa, 166 characters, 600 states). The data I'm using are simple - basically a score matrix, a list of taxa and a list of characters. The file sizes in three formats for these data are:
LucID 166 kb DELTA 240 kb XML c3 Mb
And this is only the most basic XML!.