Markup of text descriptions Vs. structured data (GEN)

Jim Croft jrc at ANBG.GOV.AU
Tue Nov 30 08:00:54 CET 1999

>These issues should be seen quite separately, and that we need both
>1. A markup language for text documents. This includes:
>  - existing descriptions, captured by OCR or other means, where the
>    markup would be manually (or automated/data mining?)
>  - computer generated natural language descriptions that are
>    published as electronic documents. A new CSIRO package could have
>    a ToNat command that automatically adds the necessary, hidden
>    markup code.
>2. A data language for new observations, including repeated
>   measurements or repeated observations of categorical data, e.g.
>   shape of multiple leaves in a single specimen. Further, many
>   structures for knowledge managements (data revision, annotation,
>   quality control and assessment) need to be implemented here.
>The issues overlap, and it would be beneficial to use as much common
>syntax as possible, but fundamentally I believe them to be quite


Can you explain in more detail why they should be fundamentally different?
Aren't the differences just a matter a matter of degree, different points
on a continuum of data as it were?  And aren't the basic principles
applicable across the whole?

Is not the issue we are dealing with here one of how to deal with
biological descriptive information whether at the level of individual
components of the specimen (e.g. each of the leaves), the specimen as a
whole (e.g. the variation of leaves on a specimen) or the taxon (e.g. the
variation of leaves across all specimens of the species, or the higher
level taxon)?  Or at any other level, or mixture of levels, we might chose?

Is it not possible, and indeed preferrable, to have a single scheme that
covers all of these situations?

I am finding it a bit difficult to get a grip of this discussion and where
it is going (or where it is coming from, for that matter).  I thought I
understood (and agree with) your previous post about the need for a
definition component as distinct from the the description and data in the
same manner as the DTDs of SGML, but I am now not so sure if that is what,
or all of what, is being suggested.

