(XML) XML?

Jean-Marc Vanel jmvanel at FREE.FR
Sat Dec 4 18:45:12 CET 1999


My synthetic answer to Kevin Thiele's question:

Why XML?

   * it is the ultimate standard for exchange of data of any kind: textual,
     database, knowledge representation, etc; this standard is quickly growing
     and there is no doubt it will succeed, accompanying the growth of Internet;
   * it is extendible, allowing to be very formal or very unformal, as needed,
   * lots of free tools and standards exist for parsing, viewing, transforming,
     querying,
   * it is possible to apply the same techniques on the client browser or on the
     server, thus adding flexibility.

See the main links on XML on http://jmvanel.free.fr/XML.htm


And now the figures about the "verbosity" of XML:

My XML example (a single plant species from the flora of China) compresses from
2500 bytes to 1090 (57%):
http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml

An Xdelta example (Lepidoptera) on Leigh Dodds' site
compresses from 50500 bytes to 3600 (93%), because it has a large number of
repeted tags.

I used winzip.

Now we see that we can have all plant species on a CD (650 Mb), even without
compression:
2500 bytes * 250 000 species = 625 Mb

And on a DVD you can have 10 times more! And also on Internet with the new HTTP
1.1 protocol, the files can be compressed during transmission. Still 2 more
arguments: the verbose repeted tags are not repeted when the XML file has been
parsed and is in memory; the last argument is financial: today you can buy a 10
000 Mega-bytes disk for 200 US$ , and it will decrease.

So file size is NO problem for taxonomy with XML. You must understand that it's
the small price to pay for extensibility and interoperability.


Noel Cross wrote :

> On Thu, 2 Dec 1999, Kevin Thiele wrote:
>
> > Over the past couple of days I've partially implemented an export function
> > to produce Leigh's XDELTA documents (as a simple example of a possible XML
> > format) for the data in Lucid keys. I have a key to families of flowering
> > plants of Australia (240 taxa, 166 characters, 600 states). The data I'm
> > using are simple - basically a score matrix, a list of taxa and a list of
> > characters. The file sizes in three formats for these data are:
> >
> > LucID    166 kb
> > DELTA    240 kb
> > XML      c3  Mb
> >
> > And this is only the most basic XML!.

--------------4760B284F39C30976F3B308F
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
My synthetic answer to Kevin Thiele's question:
<p>Why XML?
<ul>
<li>
it is the ultimate standard for exchange of data of any kind: textual,
database, knowledge representation, etc; this standard is quickly growing
and there is no doubt it will succeed, accompanying the growth of Internet;</li>

<li>
it is extendible, allowing to be very formal or very unformal, as needed,</li>

<li>
lots of free tools and standards exist for parsing, viewing, transforming,
querying,</li>

<li>
it is possible to apply the same techniques on the client browser or on
the server, thus adding flexibility.</li>
</ul>
See the main links on XML on <A HREF="http://jmvanel.free.fr/XML.htm">http://jmvanel.free.fr/XML.htm</A>
<br>&nbsp;
<p>And now the figures about the "verbosity" of XML:
<p>My XML example (a single plant species from the flora of China) compresses
from 2500 bytes to 1090 (57%):
<br><A HREF="http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml">http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml</A>
<p>An Xdelta example (Lepidoptera) on Leigh Dodds' site
<br>compresses from 50500 bytes to 3600 (93%), because it has a large number
of repeted tags.
<p>I used winzip.
<p>Now we see that we can have all plant species on a CD (650 Mb), even
without compression:
<br>2500 bytes * 250 000 species = 625 Mb
<p>And on a DVD you can have 10 times more! And also on Internet with the
new HTTP 1.1 protocol, the files can be compressed during transmission.
Still 2 more arguments: the verbose repeted tags are not repeted when the
XML file has been parsed and is in memory; the last argument is financial:
today you can buy a 10 000 Mega-bytes disk for 200 US$ , and it will decrease.
<p>So file size is NO problem for taxonomy with XML. You must understand
that it's the small price to pay for extensibility and interoperability.
<br>&nbsp;
<p>Noel Cross wrote :
<blockquote TYPE=CITE>On Thu, 2 Dec 1999, Kevin Thiele wrote:
<p>> Over the past couple of days I've partially implemented an export
function
<br>> to produce Leigh's XDELTA documents (as a simple example of a possible
XML
<br>> format) for the data in Lucid keys. I have a key to families of flowering
<br>> plants of Australia (240 taxa, 166 characters, 600 states). The data
I'm
<br>> using are simple - basically a score matrix, a list of taxa and a
list of
<br>> characters. The file sizes in three formats for these data are:
<br>>
<br>> LucID&nbsp;&nbsp;&nbsp; 166 kb
<br>> DELTA&nbsp;&nbsp;&nbsp; 240 kb
<br>> XML&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c3&nbsp; Mb
<br>>
<br>> And this is only the most basic XML!.</blockquote>
</html>


More information about the tdwg-content mailing list