Re: Markup of text descriptions Vs. structured data (GEN)
These issues should be seen quite separately, and that we need both
- A markup language for text documents. This includes:
- existing descriptions, captured by OCR or other means, where the markup would be manually (or automated/data mining?)
- computer generated natural language descriptions that are published as electronic documents. A new CSIRO package could have a ToNat command that automatically adds the necessary, hidden markup code.
- A data language for new observations, including repeated
measurements or repeated observations of categorical data, e.g. shape of multiple leaves in a single specimen. Further, many structures for knowledge managements (data revision, annotation, quality control and assessment) need to be implemented here.
The issues overlap, and it would be beneficial to use as much common syntax as possible, but fundamentally I believe them to be quite different.
Gregor
Can you explain in more detail why they should be fundamentally different? Aren't the differences just a matter a matter of degree, different points on a continuum of data as it were? And aren't the basic principles applicable across the whole?
Is not the issue we are dealing with here one of how to deal with biological descriptive information whether at the level of individual components of the specimen (e.g. each of the leaves), the specimen as a whole (e.g. the variation of leaves on a specimen) or the taxon (e.g. the variation of leaves across all specimens of the species, or the higher level taxon)? Or at any other level, or mixture of levels, we might chose?
Is it not possible, and indeed preferrable, to have a single scheme that covers all of these situations?
I am finding it a bit difficult to get a grip of this discussion and where it is going (or where it is coming from, for that matter). I thought I understood (and agree with) your previous post about the need for a definition component as distinct from the the description and data in the same manner as the DTDs of SGML, but I am now not so sure if that is what, or all of what, is being suggested.
jim __________________________________________________________________________ Jim Croft ~ jrc@anbg.gov.au ~ http://www.anbg.gov.au/people/croft.jim.html ph 02-6246-5500 ~ fx 02-6246-5248 ~ GPO Box 1600 Canberra ACT 2601
participants (1)
-
Jim Croft