Version History:
Version 1.0 February 24, 2000, K.Thiele
Version 1.1 revised July 18, 2000, K.Thiele
General Requirements
The DDST will be a data file structure that allows the capture and management of all types of data required for describing the morphology and anatomy of an organism or taxon. All data and metadata needed will be stored in one file, structured into several blocks (character lists, taxon lists, items data etc.).
One file will comprise one treatment, the basic unit of which is one or more characters describing one or more taxa or individuals.
The DDST will support the following:
External lexica: these are externally-referenced lists of characters and states, or taxa, shared between several treatments. Lexica may be used without modification, or with one or more characters, states or taxa added internally (e.g. global vs local characters).
Collation of data: data in the DDST may be captured and managed at several levels. One treatment (see above for definition of treatment) may store descriptive data for individual specimens, another may store data for species-level taxa, while another may store data for higher-level taxa. These individual treatments may be linked into a nested hierarchy, with specified collation rules allowing collation of data up the hierarchy, and passing of data down the hierarchy. Thus, some characters in the species-level treatment may be scored directly in that treatment, while others will collate data (e.g. leaf measurements) from items in the specimen-level treatment. Conversely, some characters may be scored in a genus-level treatment, and these become implicitly true for all taxa in a linked species-level treatment.
Rich Attribution: all data elements in the DDST may be fully attributed to a source (e.g. contributor, published reference, specimen etc). Attribution will be optional at any level. Attribution will allow data-tracking and house-keepng, especially in circumstances when several contributors work on one treatment.
The list of data elements below is structured using tabbed levels. Items tabbed across one level and enclosed in square parentheses are replicable within the higher level.
Items in bold are required within their level (although the higher-level structure to which they belong may not be required)
Comments are in curly parentheses.
Note that this draft specification does not imply any particular structure for the data file used. It should be read as a list of required data elements for the final specification.
1 Attribution and sources for an item datum overides that for a character or taxon, which override that for the treatment as a whole. Attribution for characters and taxa are equivalent and additive.
2 Treatments are nestable. That is, one treatment may contain data on specimens, a higher-level treatment on taxa. The higher-level treatment gathers information for some characters from lower-level treatments, using a specified collation rule. Collation rules will be specified externally to the treatment, and will cover e.g. how to merge scores, calculate values, deal with conflicts in source data etc
3 Character names may be hierarchically nested. Character properties (e.g. sets, dependencies, attachments) are only specified for the lowest level characters.
Leaves
teeth
shape } properties
5 The idea here is to specify a subset of taxa for which this character is scored, or to specify that the character is non-global, then leave it to the parsing program to determine the taxon list. This feature would be used by future identification programs that employ the Progressive Revelation model.
6 The item data may be stored as the equivalent of either a taxon-state matrix or a state-taxon matrix, depending upon whether taxa are nested within characters or characters are nested within taxa. There will need to be a way of specifying which of these is operative.
7 Public Notes are available for parsing, Private Notes are not, and are designed for private housekeeping within the treatment