>This raises the old argument for scoring _data_ in our databases
rather
>than information derived from those data. This is to say that your
>system should at least allow of scoring descriptive data at specimen
>level rather than at the conceptual level of species (or other taxa).
>Thus the descriptions of taxa could be derived "just in time", ie on
the
>fly when required. Redetermination of a given suite of specimens
would
>result in all the relevant descriptions, keys, and other products
being,
>in effect, dynamic. This will be vital for projects which are
>institutional or international in scope.
>
I think that if databases are to be anything other than local,
this seems to be the issue. One can in theory at least go from
measurement to shape terms and still keep the basic data intact. At
the risk of overkill, below are some important (I think) points to be
borne in mind. Note that these measurements and images I talkk about
(sorry, this is edited from some notes which may turn into a paper or a
lecture for next term's class) are really the metadata that ground the
data of the description of an animal or plant or the character states
of a phylogenetic analysis. Although the comments below are couched in
terms of phylogenetic analysis, the metadata assembled would be useful
for a variety of purposes.
TimesAccess to
individual observations linked to individual specimens is the ideal
towards which we should strive. Photographs, drawings, images in
general and measurements as well as bibliographic references are the
proper metadata of morphological-phylogenetic data; they ground those
data.
Having such metadata readily accessible has several advantages.
1. We should be able to revisit earlier decisions that there were
states by examining the observations available to the authors who made
those decisions.
2. The same character may be quite properly divided in different ways
for different taxa within a larger group. When studying the larger
group, we need to combine all the raw data from which the states were
separately abstracted in the smaller studies. Combining raw data may
destroy states, looking at a subset of the raw data may suggest
additional states.
3. There is no knowing how future morphological observations will
relate to a gap that currently separates two states (this is unlike the
normal situation for molecular data). Similarly, data based on
misidentified specimens or taxa assigned to the wrong higher taxon
should be in a form that allows them to be easily placed in their
correct contexts, at the same time evaluating any effects this may have
on state delimitation.
4. Gap coding is not universally accepted, and there are at present no
sound reasons for preferring one method of coding over another. There
are few studies that code the same variation in different ways and
compare the results. Absent such reasons, studies and comparisons,
metadata will enable observations to be used by a variety of coding
practices.
5. If the states scored for a taxon are those believed to be basal in
it (Nandi et al. 1998; Doyle and Endress 2000), details of the
variation within the taxon need to be accessible. This is particularly
necessary given that our knowledge of taxa presumed to be "basal" in a
lineage and so sometimes (but largely incorrectly) emphasized when
assigning states is in a state of flux.
The general direction we should go is clear. Genbank is more reliable
than Vernon Heywood's shoebox in which we often place our data
(Heywood, 19). The shoebox may be discarded when we move, and our
heirs and assignees may well consign it to the recycling pile or
bonfire on our deaths. Unless our observations can be stored
somewhere, phylogenetic hypotheses will not cumulate. Information on
each character should be stored as individual measurements and images
as far as is possible, and linked to specimens and place of
publication. If we are serious about morphology, we need an equivalent
to GenBank, a MorphBank.