[tdwg] comparison of schemas

Donat Agosti agosti at amnh.org
Fri Aug 31 13:49:02 CEST 2007


Here the link of our new paper with some relevance for the TDWG community 

http://jbi.nhm.ku.edu/index.php/jbi/article/view/36/20


Biodiversity Informatics, 4, 2007, pp. 1-13

A QUANTITATIVE COMPARISON OF XML SCHEMAS FOR
TAXONOMIC PUBLICATIONS

GUIDO SAUTTER1,3, KLEMENS BÖHM1, AND DONAT AGOSTI2
1Department of Computer Science, Universität Karlsruhe (TH), 76128
Karlsruhe, Germany;
2 Division of Invertebrate Zoology, American Museum of Natural History, New
York NY
10024-5192, and Naturmuseum der Burgergemeinde Bern, 3005 Bern Switzerland;
3 sautter at ipd.uka.de
Abstract.— Large numbers of legacy taxonomic publications are currently
being digitized to make
them online available and ready for full text search. The documents are
being marked up with XML for
two purposes: To preserve the document structure, and to facilitate access
via standard query languages
like XQuery. With regard to the second aspect, the choice of an appropriate
XML schema is crucial. It
affects both query performance and the correctness of query results. Over
the last few years, several
different XML schemas have been proposed as markup standards for taxonomic
publications. In this
paper, we report on a thorough evaluation and comparison of these schemas.
We have examined if they
facilitate formulation and correct processing of queries that are common
when it comes to taxonomic
literature. We also compare the performance of these queries on documents
that are marked up with the
different schemas. Finally, we propose extensions to the schemas that
enhance correctness of query
results.
Key words. — Heritage literature, quantiative analysis, systematics,
taxonomy, TaxonX, xml schema

Donat Agosti




More information about the tdwg mailing list