Re: characters/states and measurements and other hoary problems
At 12:52 PM 3/8/00 -0600, Stuart Poss wrote:
1 Attribution and sources for an item datum overides that for a character or taxon, which override that for the treatment as a whole. Attribution for characters and taxa are equivalent and additive.
Don't we also need to say that item data attribution at the specimen level may override that for a character or taxon as a whole? Otherwise, the system has no way of dealing with misidentifications, particularly if some but not all parts have been associated with the wrong ID, as might be frequently encountered in fossils, or when dealing with taxa whose character state definitions are later found applicable for only a specific size range (ie differentially break down at small sizes). Specimen or parts level data/attributions also would presumably be additive, even if potentially contradictory (subject to differences of opinion), would they not?
An item in the DDS is like an old OTU - it may be anything from a specimen to a species to a Kingdom. In footnote 1 I'm trying to address the relationships between attribution of the different possible sources of data, not between different levels of the hierarchy.
That is, there may be an attribution for the treatment as a whole:
e.g. Data for this treatment, unless otherwise attributed, are derived from the literature source Bloggs & Hogwash (1899)
But data for a particular character (for all OTUs) may come from a different source:
e.g. Data for character 9 (presence or absence of the CAM photosynthetic pathway) comes from the literature source Meg & Mog (1999)
and the attribution Meg & Mog 1999 will override the default attribution.
But data for a particular item (in this case a taxon that was not included in Meg & Mog 1999) for character 9 comes from pers. obs.
The possibility of specimens being items comes from the ability to nest treatments.
e.g. Treatment A comprises measurements of a series of specimens of Imaginaria magica for some characters (e.g. hypopode length). Treatment B comprises as its items species of Imaginaria. Some characters (e.g. CAM metabolism) are scored directly in B while others (e.g. the range of hypopode lengths) are collated from A.
Data for the character hypopode length in B would then be attributed to treatment A, where each data element is attributed to measurements taken by me on a specimen. Redetermination of a specimen followed by rebuilding of the treatment tree would allow the redetermination to be incorporated all the way up.
Would this work, and does it address your query?
Statement:
One file will comprise one treatment, the basic unit of which is one or more characters describing one or more taxa or individuals.
If we are to presume nested levels of groupings of descriptor elements, then we need to be able to clearly distinguish data values that are regarded as referring to "individual units" (specimen, species, higher level taxon) at one level, but are "collective" when evaluated at a different level. Not only with the "collation rules" be different for different levels, but may be different depending on whether a given feature (possibly "same feature but defined differently") is regarded as a data item refering to a specific "individual unit", or as a "collective unit". That is, the data item is a representation of data that might apply to a collection, rather than a measurable value that may have a scope no larger than a specific measure of a specific specimen. Perhaps some treatments might include "collections of collections" that would imply a mixing of both situations.
Seems to me we need a means of distinguishing between collective representations and "unit values" (for lack of a better word), if for no other reason than to be able to track which data items apply to particular "features of general interest" (eg. leaves, roots, head, foot, etc) and which taxa are involved (higher-level taxon, species, individuals, parts of individuals). Its not clear to me how the current DDST can be used to associate "basic units" that might be reasonably differentially defined at different "levels of composition" by different investigators. Perhaps we need some general means to assign the "scope" over which the definitions of "unit" apply.
Surely the data level to which a character is attributed is determined by the context of the treatment:
we need to be able to clearly distinguish data values that are regarded as referring to "individual units" (specimen, species, higher level taxon) at one level, but are "collective" when evaluated at a different level. Not
The attribution of data for a collated character in B to "raw" data in A provides the context: we know that the character in B is collated and hence higher-level data than in A. Same goes for "collations of collations" - the attribution allows you to track the data down to its source.
Or am I misinterpreting you?
Peter Stevens added:
the measurement(s) of either the specimen or the species could be assigned to states; those states might encompass a range of variation such that the attributuion of a specimen/species to that state would mean that, when refering to the state, one loses sight of the original measurements.
A collation rule may derive a higher-order numeric from numeric data:
e.g. Treatment A stores hypopode lengths for specimens, Treatment B collates a range (and/or mean, median etc) of hypopode lengths for a species, Treatment C collates a higher-order range of lengths for a genus etc.
Another collation rule may convert low-order numeric data into a higher-order multistate:
e.g. Treatment A stores leaf length, leaf width and distance to widest point for specimens, Treatment B collates these into shapes for species, Treatment C collates the shapes again, this time direct from treatment A rather than bringing in the problems of collating a collation. (Note that sometimes collating a collation may be acceptable e.g. determining the full range of petal colours in a higher-level taxon, where the rules at all levels are simple additive)
Thus, with both the collation rules and attribution at all levels fully explicit, one should be able to track any data element to its source. Any good, Peter?
Cheers - k
participants (1)
-
Kevin Thiele