characters/states and measurements and other hoary problems

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Thu Aug 3 15:58:21 CEST 2000

Dear List'eners

attached find DDST Specifications.htm. See if this works.

Cheers - k
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="DDST Specifications.htm"

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
   <meta http-equiv=3D"Content-Type" content=3D"text/html;=
   <meta name=3D"Generator" content=3D"Microsoft Word 97">
   <meta name=3D"Template" content=3D"C:\PROGRAM FILES\MICROSOFT=
   <meta name=3D"GENERATOR" content=3D"Mozilla/4.73 [en] (Win98; I)=
   <title>Treatment level</title>
<body link=3D"#0000FF" vlink=3D"#800080">
<b><font size=3D-1>Draft Specifications for a Descriptive
Data Standard for Taxonomy</font></b>
<p><font size=3D-1>Version History:</font>
<p><font size=3D-1>Version 1.0 February 24, 2000, K.Thiele</font>
<p><font size=3D-1>Version 1.1 revised July 18, 2000, K.Thiele</font>
<p><font size=3D-1>General Requirements</font>
<p><font size=3D-1>The DDST will be a data file structure that allows the
capture and management of all types of data required for describing the
morphology and anatomy of an organism or taxon. All data and metadata needed
will be stored in one file, structured into several blocks (character lists,
taxon lists, items data etc.).</font>
<p><font size=3D-1>One file will comprise one treatment, the basic unit of
which is one or more characters describing one or more taxa or=
<p><font size=3D-1>The DDST will support the following:</font>
<p><font size=3D-1>External lexica: these are externally-referenced lists
of characters and states, or taxa, shared between several treatments. Lexica
may be used without modification, or with one or more characters, states
or taxa added internally (e.g. global vs local characters).</font>
<p><font size=3D-1>Collation of data: data in the DDST may be captured and
managed at several levels. One treatment (see above for definition of=
may store descriptive data for individual specimens, another may store
data for species-level taxa, while another may store data for higher-level
taxa. These individual treatments may be linked into a nested hierarchy,
with specified collation rules allowing collation of data up the hierarchy,
and passing of data down the hierarchy. Thus, some characters in the=
treatment may be scored directly in that treatment, while others will=
data (e.g. leaf measurements) from items in the specimen-level treatment.
Conversely, some characters may be scored in a genus-level treatment, and
these become implicitly true for all taxa in a linked species-level=
<p><font size=3D-1>Rich Attribution: all data elements in the DDST may be
fully attributed to a source (e.g. contributor, published reference,=
etc). Attribution will be optional at any level. Attribution will allow
data-tracking and house-keepng, especially in circumstances when several
contributors work on one treatment.</font>
<p><font size=3D-1>The list of data elements below is structured using=
levels. Items tabbed across one level and enclosed in square parentheses
are replicable within the higher level.</font>
<p><font size=3D-1>Items in <b>bold</b> are required within their level=
the higher-level structure to which they belong may not be required)</font>
<p><font size=3D-1>Comments are in curly parentheses.</font>
<p><font size=3D-1>Note that this draft specification does not imply any
particular structure for the data file used. It should be read as a list
of required data elements for the final specification.</font><font=
<hr WIDTH=3D"100%">
<dir><font size=3D-1><b>Treatment Name </b>{Free-text title for the=
<br><font size=3D-1>Description {Free-text description of the=
<br><font size=3D-1>Treatment build/revision number {A real numeric e.g.
4.1 used for version control}</font>
<br><font size=3D-1>Treatment build/revision date {Date string (standardised
<br><font size=3D-1>Contributors List {List of contributors to the=
including the principal builder}</font>
<dir><font size=3D-1>[</font>
<br><font size=3D-1><b>ID</b> {Unique (in the context of this treatment)
number for the contributor}</font>
<br><font size=3D-1><b>Name</b> {contributor=92s name}</font>
<br><font size=3D-1>Contact details {contributor=92s address, email=
<br><font size=3D-1>Private notes {internal notes on=
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Attribution {ID of principal treatment builder - this is
the default attribution unless a lower-level item is specifically=
<br><font size=3D-1>List of sources</font>
<dir><font size=3D-1>[</font>
<br><font size=3D-1><b>ID</b> {Number for the source}</font>
<br><font size=3D-1><b>Description</b> {e.g. reference, description of=
set etc}</font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Principal Source {ID of the principal (default) source for
the data}<sup>1</sup></font>
<br><font size=3D-1>Treatment attachments {General information topics=
to the treatment as a whole}</font>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>Attachment name</font></b>
<br><font size=3D-1><b>Attachment type</b> {e.g.=
<br><b><font size=3D-1>Attachment path/URL</font></b>
<br><font size=3D-1>Public attachment notes<sup>7</sup></font>
<br><font size=3D-1>Private attachment notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Private treatment notes {internal freeform notes for=
<br><font size=3D-1>Character list source {path to an external lexicon that
defines the character list for the treatment}</font>
<br><font size=3D-1>Character set names list {list of set names for=
<dir><font size=3D-1>[</font>
<br><font size=3D-1><b>Name</b> {name string for a character set}</font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Character List {required unless an external lexicon resource
has been specified above}</font>
<dir><font size=3D-1>[</font>
<br><font size=3D-1><b>Character Name</b><sup>3</sup></font>
<br><b><font size=3D-1>Character ID</font></b>
<br><font size=3D-1>Set membership {list of sets to which the character=
a character must be able to belong to more than one set}</font>
<br><font size=3D-1>Attribution<sup>1 </sup>{reference to a contributor=92s
ID from the <i>Contributors </i>list}</font>
<br><font size=3D-1>Source<sup>1</sup> (reference to a source=92s ID from=
<br><font size=3D-1>Collated Character source {path name for another=
that contains lower-level data for this character}</font>
<dir><font size=3D-1><b>Collation rule name</b> {Name of a collation rule
as defined in the <i>Collation Rules</i> list}<sup>2</sup></font></dir>
<font size=3D-1><b>Character type</b> {ordered multistate, unordered=
<br><font size=3D-1>Character dependencies (up)<sup> 4</sup></font>
<br><font size=3D-1>Applies To list (or global/restricted type definition,
then leave it to program to extract)<sup> 5</sup></font>
<br><font size=3D-1>Character attachments</font>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>attachment name</font></b>
<br><b><font size=3D-1>attachment type</font></b>
<br><b><font size=3D-1>attachment path/URL</font></b>
<br><font size=3D-1>Public notes<sup>7</sup></font>
<br><font size=3D-1>Private notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Private notes {internal notes for=
<br><b><font size=3D-1>Character State List</font></b>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>Character state name | Character state ID</font></b>
<br><font size=3D-1>Character dependencies (down)</font>
<br><font size=3D-1>Character state attachments</font>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>attachment name</font></b>
<br><b><font size=3D-1>attachment type</font></b>
<br><b><font size=3D-1>attachment path/URL</font></b>
<br><font size=3D-1>Public notes<sup>7</sup></font>
<br><font size=3D-1>Private notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Private notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>]</font></dir>
<font size=3D-1>Taxon list source {path to an external resource that defines
the taxon list for the treatment}</font>
<br><font size=3D-1>Taxon set names {defines a list of allowable names for
taxon sets}</font>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>name</font></b>
<br><font size=3D-1>]</font></dir>
<b><font size=3D-1>Taxon List</font></b>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>Name | Taxon ID</font></b>
<br><font size=3D-1>Taxon set membership {list of sets to which the taxon
belongs; a taxon must be able to belong to more than one set?}</font>
<br><font size=3D-1>Taxon attribution<sup>1</sup></font>
<br><font size=3D-1>Taxon attachments</font>
<dir><font size=3D-1>[</font>
<br><b><font size=3D-1>attachment name</font></b>
<br><b><font size=3D-1>attachment type</font></b>
<br><b><font size=3D-1>attachment path/URL</font></b>
<br><font size=3D-1>Public notes<sup>7</sup></font>
<br><font size=3D-1>Private notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1>Private notes<sup>7</sup></font>
<br><font size=3D-1>]</font></dir>
<font size=3D-1><b>Item Data</b> {This will hold the "score matrix"}</font>
<dir><font size=3D-1><b>Taxon Name|ID/Character=
<dir><font size=3D-1><b>Character Name|ID/Taxon=
<dir><b><font size=3D-1>State Name|ID</font></b>
<dir><font size=3D-1><b>Score</b> {normally present, rare, present by=
<br><font size=3D-1>Score Attribution<sup>1</sup></font>
<br><font size=3D-1>Public Notes<sup>7</sup></font>
<br><font size=3D-1>Private Notes<sup>7</sup></font></dir>

<hr WIDTH=3D"100%">
<p><sup><font size=3D-1>1 Attribution and sources for an item datum overides
that for a character or taxon, which override that for the treatment as
a whole. Attribution for characters and taxa are equivalent and=
<p><sup><font size=3D-1>2 Treatments are nestable. That is, one treatment
may contain data on specimens, a higher-level treatment on taxa. The=
treatment gathers information for some characters from lower-level=
using a specified collation rule. Collation rules will be specified=
to the treatment, and will cover e.g. how to merge scores, calculate values,
deal with conflicts in source data etc</font></sup>
<p><sup><font size=3D-1>3 Character names may be hierarchically nested.=
properties (e.g. sets, dependencies, attachments) are only specified for
the lowest level characters.</font></sup>
<dir><sup><font size=3D-1>e.g.</font></sup>
<p><sup><font size=3D-1>Leaves</font></sup>
<dir><sup><font size=3D-1>margins</font></sup>
<p><sup><font size=3D-1>teeth</font></sup>
<dir><sup><font size=3D-1>orientation }only these have</font></sup>
<p><sup><font size=3D-1>shape } properties</font></sup></dir>
<sup><font size=3D-1>4 Dependencies may be defined either up or down (but
not both?). An up dependency lists the character states that make this
character inapplicable; a down dependency lists characters that become
inapplicable when this state is chosen.</font></sup>
<p><sup><font size=3D-1>5 The idea here is to specify a subset of taxa for
which this character is scored, or to specify that the character is=
then leave it to the parsing program to determine the taxon list. This
feature would be used by future identification programs that employ the
Progressive Revelation model.</font></sup>
<p><sup><font size=3D-1>6 The item data may be stored as the equivalent of
either a taxon-state matrix or a state-taxon matrix, depending upon whether
taxa are nested within characters or characters are nested within taxa.
There will need to be a way of specifying which of these is=
<p><sup><font size=3D-1>7 Public Notes are available for parsing, Private
Notes are not, and are designed for private housekeeping within the=

Content-Type: text/plain; charset="us-ascii"; format=flowed

More information about the tdwg-content mailing list