characters/states and measurements and other hoary problems

Mon Aug 7 10:04:24 CEST 2000

At 12:52 PM 3/8/00 -0600, Stuart Poss wrote:

>1 Attribution and sources for an item datum overides that for a character or
>taxon, which override that for the treatment as a whole. Attribution for
>characters and taxa are equivalent and additive.
>
>Don't we also need to say that item data attribution at the specimen level
>may override that for a character or taxon as a whole?  Otherwise, the system
>has no way of dealing with misidentifications, particularly if some but not
>all parts have been associated with the wrong ID, as might be frequently
>encountered in fossils, or when dealing with taxa whose character state
>definitions are later found applicable for only a specific size range (ie
>differentially break down at small sizes).  Specimen or parts level
>data/attributions also would presumably be additive, even if potentially
>contradictory (subject to differences of opinion), would they not?

An item in the DDS is like an old OTU - it may be anything from a specimen
to a species to a Kingdom. In footnote 1 I'm trying to address the
relationships between attribution of the different possible sources of
data, not between different levels of the hierarchy.

That is, there may be an attribution for the treatment as a whole:

e.g. Data for this treatment, unless otherwise attributed, are derived from
the literature source Bloggs & Hogwash (1899)

But data for a particular character (for all OTUs) may come from a
different source:

e.g. Data for character 9 (presence or absence of the CAM photosynthetic
pathway) comes from the literature source Meg & Mog (1999)

and the attribution Meg & Mog 1999 will override the default attribution.

But data for a particular item (in this case a taxon that was not included
in Meg & Mog 1999) for character 9 comes from pers. obs.

The possibility of specimens being items comes from the ability to nest
treatments.

e.g. Treatment A comprises measurements of a series of specimens of
Imaginaria magica for some characters (e.g. hypopode length). Treatment B
comprises as its items species of Imaginaria. Some characters (e.g. CAM
metabolism) are scored directly in B while others (e.g. the range of
hypopode lengths) are collated from A.

Data for the character hypopode length in B would then be attributed to
treatment A, where each data element is attributed to measurements taken by
me on a specimen.  Redetermination of a specimen followed by rebuilding of
the treatment tree would allow the redetermination to be incorporated all
the way up.

Would this work, and does it address your query?

>Statement:
>
>One file will comprise one treatment, the basic unit of which is one or more
>characters describing one or more taxa or individuals.
>
>If we are to presume nested levels of groupings of descriptor elements, then
>we need to be able to clearly distinguish data values that are regarded as
>referring to "individual units" (specimen, species, higher level taxon) at
>one level, but are "collective" when evaluated at a different level.  Not
>only with the "collation rules" be different for different levels, but may be
>different depending on whether a given feature (possibly "same feature but
>defined differently") is regarded as a data item refering to a specific
>"individual unit", or as a "collective unit".  That is, the data item is a
>representation of data that might apply to a collection, rather than a
>measurable value that may have a scope no larger than a specific measure of a
>specific specimen.  Perhaps some treatments might include "collections of
>collections" that would imply a mixing of both situations.
>
>Seems to me we need a means of distinguishing  between collective
>representations and "unit values" (for lack of a better word), if for no
>other reason than to be able to track which data items apply to particular
>"features of general interest" (eg. leaves, roots, head, foot, etc) and which
>taxa are involved (higher-level taxon, species, individuals, parts of
>individuals).  Its not clear to me how the current DDST can be used to
>associate "basic units" that might be reasonably differentially defined at
>different "levels of composition" by different investigators.  Perhaps we
>need some general means to assign the "scope" over which the definitions of
>"unit" apply.

Surely the data level to which a character is attributed is determined by
the context of the treatment:

>we need to be able to clearly distinguish data values that are regarded as
>referring to "individual units" (specimen, species, higher level taxon) at
>one level, but are "collective" when evaluated at a different level.  Not

The attribution of data for a collated character in B to "raw" data in A
provides the context: we know that the character in B is collated and hence
higher-level data than in A. Same goes for "collations of collations" - the
attribution allows you to track the data down to its source.

Or am I misinterpreting you?

Peter Stevens added:

>the measurement(s) of either the specimen or the species could be assigned
>to states; those states might encompass a range of variation such that the
>attributuion of a specimen/species to that state would mean that, when
>refering to the state, one loses sight of the original measurements.

A collation rule may derive a higher-order numeric from numeric data:

e.g. Treatment A stores hypopode lengths for specimens, Treatment B
collates a range (and/or mean, median etc) of hypopode lengths for a
species, Treatment C collates a higher-order range of lengths for a genus etc.

Another collation rule may convert low-order numeric data into a
higher-order multistate:

e.g. Treatment A stores leaf length, leaf width and distance to widest
point for specimens, Treatment B collates these into shapes for species,
Treatment C collates the shapes again, this time direct from treatment A
rather than bringing in the problems of collating a collation. (Note that
sometimes collating a collation may be acceptable e.g. determining the full
range of petal colours in a higher-level taxon, where the rules at all
levels are simple additive)

Thus, with both the collation rules and attribution at all levels fully
explicit, one should be able to track any data element to its source. Any
good, Peter?

Cheers - k