characters/states and measurements and other hoary problems

Sun Jul 30 21:11:26 CEST 2000

While I am philosophically most sympathetic to Peter's agruements, I must agree
with Kevin with regard to the need to distinguish measurements as "characters"
from qualitative characters.

The bases of comparison for the former are typically fundamentally different
with respect to asignment to state of the latter, and would be dependent not
only on the exact definition [criteria for recognition] of the endpoints, which
may differ systematically between investigators (or method of measurement), but
also with respect to temporal scope (ie some endpoints may not be identifiable,
except within a particular size range for a given structure in a given taxon).
As pointed out by Kevin, they would also be dependent on sampling, although one
would hope that most investigtors are atuned to the need to increase sample
sizes to avoid bias or mischaracterization.  With respect to measurements, the
bases of comparison are geometrically constrained, whereas qualitative
descriptions are not so constrained, even though both may be used to
characterize the "same" features.

Nonetheless, one could in some cases meaningfully tag measurements with some of
the same tags as qualitative characters to indicate that they are proxies for
the "same" features.  This would allow data matrices conforming to these
"generalized" features to be associated and the user able to then assess whether
the association is "sufficiently tight" to warrant regarding them as proxies for
the "same" feature.  The difficulty comes in consistently  applying/defining the
basis of comparison and in establishing criteria to recognize when constraints
imposed on using measurements create inconsistency with various qualitative
definitions of "state".  Perhaps attributes of a tag element could be used to
indicate  (hint at) different shades of meaning when measurements are associated
with qualitative states (eg <leaf character><measurement character
endpoint1="regionally localized,  endpoint1_precision=">>5%" endpoint2
="precisely localized" endpoint2precision="<<1%"> 45.2</measurement
character></leaf character>, or some such construction, where in this case,
<leaf character> may take the same tag name as might the qualitive character.
However, we may need to also include additional refining elements  <data
measured by>Peter Stevens</data measured by>Peter Stevens</data measured
by><data measured using>calipers</data measured using>, or some such to fully
relate fundamentally diffferent data types or qualify the measurement context.
In any event, such shades of meaning would require that "appropriate" context be
established and context could vary enormously depending on the features being
evaluated.   In image analysis, one of the reasons image algebra was developed
was to assist in standard charcterization (transformation) of images and image
fragments ("features").  However, this only works when there is a consistent
basis for comparison (typically pixel size and pixel neighborhood) that
underlies the sets being defined.  Unfortunately, (or perhaps fortunately) the
human vision system is vastly more variable and complex.

In the near term, Kevin is probably correct to suggest that the onus is on those
who would care to tackle the "hellish difficulties" of making such
associations.  It may be best to fork the dialog with regard to this point,
which is not to say that we shouldn't seek a general approach that could be
extended to include possible solutions to such  "issues of association" at a
later time (ie better establish a useful lexicon  and grammar for <mesurement
character> elements, as opposed to <qualitative character> elements).

Nonetheless, one of the unfortunate practices in systematics has been the
relegation of measurement data to "notebooks" or more recently "floppy/CD disks"
that are not properly archived in any coordinate fashion, except perhaps for the
duration of a study or some unspecified time after that, and often at best only
partially summarized in the original publication.  The discipline could be much,
much richer, if there were a consistent standard for archival and association
between measurement data and qualitative description, if for no other reason
than to recognize the various contexts within which these data might be useful
in the future, not to mention to encourage workers to let others know "exactly"
what they did to arrive at their conclusions.  Measurment data can be extremely
useful in providing insight into how different investigators actually recognize
the various states of their characters and may suggest why different
investigators may not recognize the same states for the "same" characters.

Kevin Thiele wrote:

> At 10:23 AM 25/7/00 -0500, Peter Stevens & Stinger commented on the hoary
> issue of characters/states vs measurements:
>
> It's completely true that a set of measurements is a more fundamental data
> element than a character state. But we need to allow for both ways of
> recording data, for two reasons. Firstly, some characters don't lend
> themselves to measurement (e.g. indumentum - it would be possible but
> hellishly difficult). Secondly, if I'm writing a key to Families, I'm
> afraid I probably won't be recording measurements of individual specimens
> very often, as the data blowout would be alarming. And what specimens would
> I use? I see no way around the problem (for higher-order taxa at least) of
> doing the collation of data (measurements->states) in my head, with all the
> admitted problems that ensue.
>
> The draft standard document I've put to the group allows for some
> characters to be scored after a head-wise (hopefully wise, anyway)
> collation, others to be collated from individual measurements using some
> form of collation rule.
>
> As to data attribution, the draft standard document allows (I hope) all
> data elements to be attributed (to a specimen, taxon, literature citation,
> person etc), but doesn't force the issue.
>
> >Kevin: >Or can we have a set of linked standards - one for describing in
> >the boring
> > >old characters/states/taxa way, and others for the more space-shuttle ways
> > >that can be linked in as they develop.
> >
> >and Eric >it is rather difficult to
> > >combine datasets based on differing character lists, even when those
> > >character lists are fairly similar. There is no mechanism for "mapping"
> > >character states from one dataset onto those of another.
> >
> >We are back to an issue that was raised earlier, so at the risk of boring
> >people by repetition, the way to record data is not (if at all possible)
> >as states, but as measurements.  Measurements of the same character from
> >different studies may then be combinable, and states extracted from the
> >measurements.  But if one is combining states, then one is likely to be in
> >trouble.  The "approved shapes" that a committee of the Systematics
> >Association came up with some years back are fine for communication in
> >descriptions and conversations, but the best way to record shape data,
> >whether for phylogenetic analysis or deciding if one heap of specimens
> >really is distinct from another, is as measurements.
> >
> >(I agree that a list of "approved" characters is probably not attainable
> >at present.)
> >
> >This raises another issue that Eric mentioned:
> >
> > >Another aspect
> > >might be the inclusion of meta-data (e.g., who recorded this observation,
> > >and on what material?) While this sort of information can be placed into
> > >DELTA "text" characters and comments (as could virtually anything
> > >representable as a string of bytes, with a bit of effort), the use of text
> > >characters does not provide for well-structured access to the information.
> >
> >This issue of metadata is very important: if a DNA sequence is linked to a
> >voucher, then all measurements/observations should be linked to
> >specimens.  In an "ideal" database, there would be links to specimens, if
> >only through a literature citation; not being able to make easy links in
> >the literature can make it difficult or impossible to use, and not having
> >such links in a database greatly reduces its value.  If I go wrong in
> >including a specimen in a particular taxon, I want to be able to pull out
> >all information associated with this specimen, whether it is simply length
> >of leaf blade or how the anther wall develops, when I put it in its
> >cotrrect place.
> >
> >Peter S.
> ></blockquote></x-html>