I too would argue that what we seek is a system that distinguishes between "observations", measured in some sense (either directly using some device), or perhaps "mapped" using known or widely used conventions
and
higher level abstractions that often pass as qualitative "data", (ie leaf "ovate" or state "relatively derived" or "1").
Would it be too simplistic to simply allow the format to express the source (qualitative/quantitative, or a more detailed description) or a particular data set?
Deciding which sources you trust, value, or otherwise wish to manipulate is an option you then have at the application level.
i.e. "I wish only direct measures of leaf shape, and not a description such as 'ovate'.
If you also wanted to map between these measures ("give me an approximate textual description given these measurements") then this again lies at the application level and can be achieved assuming the data format is expressive enough.
In any case we need a means to distinguish between these two [maybe more?] "fundamental [?] types" of "data", while at formulate a
searching/description
language at the same time rich enough to characterize reasonably precisely
the
context in which the representations were made, as well as the objects
themselves.
Again allowing the format to express source of data origin and means of origin allows this kind of processing to be deferred to the next processing layer (the application, or additional metadata information).
Or am I missing the point here?
L.