I almost entirely agree with Peter.
Most existing formats deal not with data per se, but rather with representations of data. 0's and 1's in phylogenetic analysis are an obvious example. While either value could represent a particular molecular base pair substitution, it need not and will, almost certainly, rarely reflect actual measurement of a particular specimen or series of specimens, even though such information would be required in some measure or estimation to actually reach the "higher-order inference".
I too would argue that what we seek is a system that distinguishes between "observations", measured in some sense (either directly using some device), or perhaps "mapped" using known or widely used conventions and higher level abstractions that often pass as qualitative "data", (ie leaf "ovate" or state "relatively derived" or "1").
Because science requires assumptions be made, it is clear that in many areas, such representations are routinely assumed, with much or relatively little justification, as the issues require. Consequently, our "matrix" would be extremely sparse should one insist that only "measured variables" be specified as "character data" to reach the abstractions needed to conduct some kinds of investigations (ie. pooling museum records for a given species, whose taxonomy is established on the basis of morphology). Delta and Nexus serve well in variety of contexts. However, one would also like to be able to request the "intersection" of a variety of "weakly associated" or "semi-structured" information that might facilitate determining what the distribution and taxonomic conclusions might be for select sub-populations measured for a number of potentially widely different "features/properties/potential synonyms".
To use the XML analogy, perhaps we need a tag language that "wrap" "representations of data", as well as other tags that indicate the circumstances under which "measured variables" were actually taken (eg. measurements taken from landmarks that may subtly differ among investigators; device used; preparation methods, etc.). In any case we need a means to distinguish between these two [maybe more?] "fundamental [?] types" of "data", while at formulate a searching/description language at the same time rich enough to characterize reasonably precisely the context in which the representations were made, as well as the objects themselves.
If I disagree with Peter, it is that for at least a set of morphological features that can be imaged, we are rapidly approaching a day when it will be possible to measure thousands of objects each for thousands of measures quickly. For these, automated, well-defined definitions of shape will no longer be an issue (except perhaps which method/measure is most useful for a particular or general purpose). Rather, the issue is how can we set up a system of describing such measured data that allows us to evaluate it against taxonomic and morphological conclusions reached in the past using other methods, not to mention compare it against other molecular, developmental, physiological, and ecological data collected using a wide variety of techniques collected in the past and to be collected in the future. Also, comparison of methods for taking such data will become increasingly important as the devices (hardware and software) become ever more sophisticated and our language or interpreting their output becomes increasingly precise.
Perhaps a useful approach might be to evaluate various "data types" separately to establish the appropriate set of "context" tags (the essence of DELTA, LUCID, NEXUS and related approaches), while also seeking to better understand the nature of the "conceptual wrappers" that will be needed to associate (tag) different contexts. At least such an approach might permit comparison of the controlled vocabularies of alternative tagging methods where there is content and concept overlap. Such an approach might also permit us to assess just what kinds of associations we need to be able to make, and hence what kind of language do we need to construct to permit "multi-dimensional" extensions.
"P. F. Stevens" wrote:
Of course, the IAPT shapes are not "real", they are conventions (the actual
circumscription of the various shapes was decided at a meeting back in the '50s, I think). ...
participants (1)
-
Stuart G. Poss