Re: characters/states and measurements and other hoary problems

3 Aug 2000

      Kevin Thiele <kevin.thiele@PI.CSIRO.AU> wrote:
...
attached find DDST Specifications.htm.
Here is the plain text equivalent, courtesy of lynx.

        Una Smith

-------------------------------- snip ------------------------------------

   Draft Specifications for a Descriptive Data Standard for Taxonomy

   Version History:

   Version 1.0 February 24, 2000, K.Thiele

   Version 1.1 revised July 18, 2000, K.Thiele

   General Requirements

   The DDST will be a data file structure that allows the capture and
   management of all types of data required for describing the morphology
   and anatomy of an organism or taxon. All data and metadata needed will
   be stored in one file, structured into several blocks (character
   lists, taxon lists, items data etc.).

   One file will comprise one treatment, the basic unit of which is one
   or more characters describing one or more taxa or individuals.

   The DDST will support the following:

   External lexica: these are externally-referenced lists of characters
   and states, or taxa, shared between several treatments. Lexica may be
   used without modification, or with one or more characters, states or
   taxa added internally (e.g. global vs local characters).

   Collation of data: data in the DDST may be captured and managed at
   several levels. One treatment (see above for definition of treatment)
   may store descriptive data for individual specimens, another may store
   data for species-level taxa, while another may store data for
   higher-level taxa. These individual treatments may be linked into a
   nested hierarchy, with specified collation rules allowing collation of
   data up the hierarchy, and passing of data down the hierarchy. Thus,
   some characters in the species-level treatment may be scored directly
   in that treatment, while others will collate data (e.g. leaf
   measurements) from items in the specimen-level treatment. Conversely,
   some characters may be scored in a genus-level treatment, and these
   become implicitly true for all taxa in a linked species-level
   treatment.

   Rich Attribution: all data elements in the DDST may be fully
   attributed to a source (e.g. contributor, published reference,
   specimen etc). Attribution will be optional at any level. Attribution
   will allow data-tracking and house-keepng, especially in circumstances
   when several contributors work on one treatment.

   The list of data elements below is structured using tabbed levels.
   Items tabbed across one level and enclosed in square parentheses are
   replicable within the higher level.

   Items in bold are required within their level (although the
   higher-level structure to which they belong may not be required)

   Comments are in curly parentheses.

   Note that this draft specification does not imply any particular
   structure for the data file used. It should be read as a list of
   required data elements for the final specification.
   ______________________________________________________________________

   Treatment Name {Free-text title for the treatment}
       Description {Free-text description of the treatment}
       Treatment build/revision number {A real numeric e.g. 4.1 used for
       version control}
       Treatment build/revision date {Date string (standardised format?)}
       Contributors List {List of contributors to the treatment,
       including the principal builder}

   [
       ID {Unique (in the context of this treatment) number for the
       contributor}
       Name {contributors name}
       Contact details {contributors address, email etc}
       Private notes {internal notes on contributor}7
       ]

   Attribution {ID of principal treatment builder - this is the default
   attribution unless a lower-level item is specifically attributed}1
   List of sources
   [
       ID {Number for the source}
       Description {e.g. reference, description of specimen set etc}
       ]

   Principal Source {ID of the principal (default) source for the data}1
   Treatment attachments {General information topics applicable to the
   treatment as a whole}
   [
       Attachment name
       Attachment type {e.g. xml,html,txt,rtf,jpeg,gif}
       Attachment path/URL
       Public attachment notes7
       Private attachment notes7
       ]

   Private treatment notes {internal freeform notes for treatment}7
   Character list source {path to an external lexicon that defines the
   character list for the treatment}
   Character set names list {list of set names for characters}
   [
       Name {name string for a character set}
       ]

   Character List {required unless an external lexicon resource has been
   specified above}
   [
       Character Name3
       Character ID
       Set membership {list of sets to which the character belongs; a
       character must be able to belong to more than one set}
       Attribution1 {reference to a contributors ID from the Contributors
       list}
       Source1 (reference to a sources ID from the Sources list)
       Collated Character source {path name for another treatment that
       contains lower-level data for this character}

   Collation rule name {Name of a collation rule as defined in the
       Collation Rules list}2

   Character type {ordered multistate, unordered multistate etc}
   Character dependencies (up) 4
   Applies To list (or global/restricted type definition, then leave it
   to program to extract) 5
   Character attachments
   [
       attachment name
       attachment type
       attachment path/URL
       Public notes7
       Private notes7
       ]

   Private notes {internal notes for character}7
   Character State List
   [
       Character state name | Character state ID
       Character dependencies (down)
       Character state attachments

   [
       attachment name
       attachment type
       attachment path/URL
       Public notes7
       Private notes7
       ]

   Private notes7
   ] ] Taxon list source {path to an external resource that defines the
   taxon list for the treatment}
   Taxon set names {defines a list of allowable names for taxon sets}
   [
       name
       ]

   Taxon List
   [
       Name | Taxon ID
       Taxon set membership {list of sets to which the taxon belongs; a
       taxon must be able to belong to more than one set?}
       Taxon attribution1
       Taxon attachments

   [
       attachment name
       attachment type
       attachment path/URL
       Public notes7
       Private notes7
       ]

   Private notes7
   ] Item Data {This will hold the "score matrix"}
   Taxon Name|ID/Character Name|ID6

   Character Name|ID/Taxon Name|ID6

   State Name|ID

   Score {normally present, rare, present by misinterpretation etc}
       Score Attribution1
       Public Notes7
       Private Notes7
   ______________________________________________________________________

   1 Attribution and sources for an item datum overides that for a
   character or taxon, which override that for the treatment as a whole.
   Attribution for characters and taxa are equivalent and additive.

   2 Treatments are nestable. That is, one treatment may contain data on
   specimens, a higher-level treatment on taxa. The higher-level
   treatment gathers information for some characters from lower-level
   treatments, using a specified collation rule. Collation rules will be
   specified externally to the treatment, and will cover e.g. how to
   merge scores, calculate values, deal with conflicts in source data etc

   3 Character names may be hierarchically nested. Character properties
   (e.g. sets, dependencies, attachments) are only specified for the
   lowest level characters.
   e.g.
       Leaves

   margins
       teeth

   orientation }only these have
       shape } properties

   4 Dependencies may be defined either up or down (but not both?). An up
   dependency lists the character states that make this character
   inapplicable; a down dependency lists characters that become
   inapplicable when this state is chosen.

   5 The idea here is to specify a subset of taxa for which this
   character is scored, or to specify that the character is non-global,
   then leave it to the parsing program to determine the taxon list. This
   feature would be used by future identification programs that employ
   the Progressive Revelation model.

   6 The item data may be stored as the equivalent of either a
   taxon-state matrix or a state-taxon matrix, depending upon whether
   taxa are nested within characters or characters are nested within
   taxa. There will need to be a way of specifying which of these is
   operative.

   7 Public Notes are available for parsing, Private Notes are not, and
   are designed for private housekeeping within the treatment