Markup of text descriptions Vs. structured data (GEN)

Tue Nov 30 15:17:38 CET 1999

Jim Croft wrote:

> Can you explain in more detail why they should be fundamentally different?
> Aren't the differences just a matter a matter of degree, different points
> on a continuum of data as it were?  And aren't the basic principles
> applicable across the whole?

>>The issues overlap, and it would be beneficial to use as much common
>>syntax as possible, but fundamentally I believe them to be quite
>>different.

I completely agree with you that the issues overlap and should be treated
together. However, I still see them a distinct, so that they can be
treated by separate groups of commands, that use a common system.
But maybe that is only to try to protect my brain from blowing up in
confusion :-)

Could you perhaps give examples where you see the continuum?
My differences (see also the example in the other post) are:
o Language in free text that is marked up may have different
  language, while the underlying data should be language independent
o Similarly, on a more subtle level, the concepts used by one authors
  may only approximately map unto the concepts developed in the
  character definition
o Data may be more detailed, which unambigously may map unto a course
  natural language representation
o Much additional knowledge management attributes, mutliple
  authoring/editing/revision layers, annotation and assessment,
  quality control, status regarding data source and passing
  information up and down taxonomic hierarchy, assumptions about
  possible misinterpretations, etc.

Much information may be used for markup as well as for data, so there
will be overlap. However, for example, for a given character
definition multiple independent hierarchical classifications (by
morphological structure, perhaps according to different schools, by
methods, by function) may be defined; the natural language definition
will however usually only use a single classification. This
classification is superfluous in the item description part of the
data, but it is desirable to add to the markup. Or isn't it? But then
we could not use the idea to just use a coarse structure-only markup
as a first step to limit the scope of XML queries, an idea forwarded
by Don Kirkup and Bryan Heidorn at the TDWG meeting.

Gregor
----------------------------------------------------------
Gregor Hagedorn                 G.Hagedorn at bba.de
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!