Markup of text descriptions Vs. structured data (GEN)

Mon Nov 29 18:09:53 CET 1999

Jean-Marc Varel, Don Kirkup and others have repeatedly touched the
problem of existing textual descriptions in their discussions
contributed to this list.

On the other hand, I myself belong to the group of DELTA/NEXUS minded
people, who are concerned mostly with knowledge management of
descriptive data that are created new.

These issues should be seen quite separately, and that we need both

1. A markup language for text documents. This includes:
  - existing descriptions, captured by OCR or other means, where the
    markup would be manually (or automated/data mining?)
  - computer generated natural language descriptions that are
    published as electronic documents. A new CSIRO package could have
    a ToNat command that automatically adds the necessary, hidden
    markup code.
2. A data language for new observations, including repeated
   measurements or repeated observations of categorical data, e.g.
   shape of multiple leaves in a single specimen. Further, many
   structures for knowledge managements (data revision, annotation,
   quality control and assessment) need to be implemented here.

The issues overlap, and it would be beneficial to use as much common
syntax as possible, but fundamentally I believe them to be quite
different.

See also the separate post "Item description data in XML (XML)" for
discussion of how to achieve this in XML and with XML examples.

Gregor
----------------------------------------------------------
Inst. for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Gregor Hagedorn                 Net: G.Hagedorn at bba.de
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!