People interested in XML file size will be interested in a new free program XMill from AT&T Research Labs and UPenn. XMill is said to compress files up to twice as much as gzip on proprietary formats when data in those formats is first rendered in XML and then compressed with XMill. This is essentially because XMill can use the tag information to reorganize the data for greater redundancy.
http://www.research.att.com/sw/tools/xmill/
points to the software and technical material about it.
One of the authors, Dan Suciu, is also the a co-author of "Data on the Web : From Relations to Semistructured Data and XML", Morgan Kaufman. This is great book whose title is accurate. To read it, you have to be comfortable with graph theory, regular expressions, and finite state automata. If you are not, you can accept this a summary: XML is more general than relational databases and a little less general than pure object oriented databases.
participants (1)
-
Robert A. (Bob) Morris