SDD Specifications Document

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Wed Feb 23 16:02:40 CET 2000

Dear list_eners

attached find SDDspecs.rtf. This is my attempt to formalise the discussion
so far into a set of specifications for a new descriptive data standard.

Leigh has explained that his XDELTA is a first-cut approach to such a
standard, done more to show what's possible than as a serious suggestion for
a final cut. He has offered to revisit and rework "XDELTA" in light of the
list's discussions.

To make his job easier, I think it wold be a good idea for those of us still
out there to work up a structured specification document for Leigh to work
his magic on. If it works OK, the attached may be the first step to this

Again, this is a first cut. It incorporates my ideas, and those I've been
able to trawl from the list that have made sense to me. I may have got
things all wrong, and it will almost certainly not be exhaustive, but I
think at this stage it's better for the authors of other ideas to work on
this directly rather than for me to try to understand what they were saying.

I hope the document's self-explanatory - if not, ask me to explain.

Any suggestions as to the best way to work up the document? If we all edit
bits and repost we'll end up with diverging versions.

Anyway, see if this makes sense to you.

Cheers - k

Content-Type: application/rtf; charset="us-ascii"
Content-Disposition: attachment; filename="SDDSPECS.RTF"

The document below outlines data requirements for an XML-based descriptive data standard. The list is structured using tabbed levels (items tabbed across one level are replicable within the higher level). Items in bold are required within their level (but
 the higher-level structure to which they belong may not be required) Comments are in \{\}.
\par }{\i \tab Treatment
\par }{
\par \tab }{\b Treatment Name         }{ \{Free-text title for the treatment\}
\par \tab Description                \{Free-text description of the treatment\}
\par \tab Treatment build/revision number            \{A real numeric e.g. 4.1 used for version control\}
\par \tab Treatment build/revision date            \{Date string (standardised format?)\}
\par \tab Contributors List                 \{List of contributors to the treatment, including the principal builder\}
\par \tab \tab }{\b ID}{ \{Unique (in the context of this treatment) number for the contributor\}
\par \tab \tab }{\b Name}{ \{contributor\rquote s name\}
\par \tab \tab Contact details \{contributor\rquote s address, email etc\}
\par \tab \tab Private notes \{internal notes on contributor, not for parsing\}
\par \tab Attribution \{ID of principal treatment builder - this is the default attribution unless a lower-level \tab \tab \tab \tab \tab \tab item is specifically attributed\}}{\super 1
\par }\pard \fi240\nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {List of sources
\par \tab }{\b ID}{ \{Number for the source\}
\par \tab }{\b Description}{ \{e.g. reference, description of specimen set etc\}
\par Principal Source \{ID of the principal (default) source for the data\}}{\super 1}{
\par }\pard \nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {\tab Treatment attachments \{General information topics applicable to the treatment as a whole\}
\par \tab \tab }{\b Attachment name}{
\par \tab \tab }{\b Attachment type}{ \{e.g. xml,html,txt,rtf,jpeg,gif\}
\par \tab \tab }{\b Attachment path/URL
\par \tab \tab }{Public attachment notes
\par \tab \tab Private attachment notes
\par \tab Private treatment notes \{internal freeform notes for treatment\}
\par \tab Character list source \{path to an external lexicon that defines the character list for the treatment\}
\par \tab Character set names list \{list of set names for characters\}
\par \tab \tab }{\b Name}{ \{name string for a character set\}
\par \tab Collation rules \{Set of rules for collating characters - e.g. how to merge scores, calculate values, \tab \tab \tab \tab \tab \tab \tab deal with conflicts in source data etc\}}{\super 2}{
\par \tab \tab }{\b Name}{ \{A reference name for the rule\}
\par \tab \tab }{\b Rule definition}{ \{don\rquote t know how these will be structured yet\}
\par \tab Character List \{required unless an external lexicon resource has been specified above\}
\par \tab \tab }{\b Character Name}{\super 3}{
\par }{\b \tab \tab Character ID
\par }{\tab \tab Set membership \{list of sets to which the character belongs; a character must be able to belong to \tab \tab \tab \tab \tab \tab \tab \tab more than one set\}
\par \tab \tab Attribution}{\super 1 }{\{reference to a contributor\rquote s ID from the }{\i Contributors }{list\}
\par \tab \tab Source}{\super 1}{ (reference to a source\rquote s ID from the }{\i Sources}{ list)}{\super  }{
\par \tab \tab Collated Character source \{path name for another treatment that contains lower-level data for this \tab \tab \tab \tab \tab \tab \tab \tab \tab \tab character\}
\par \tab \tab \tab }{\b Collation rule name}{ \{Name of a collation rule as defined in the }{\i Collation Rules}{ list\}
\par \tab \tab }{\b Character type}{ \{ordered multistate, unordered multistate etc\}
\par \tab \tab Character dependencies (up)}{\super  4}{
\par \tab \tab Applies To list (or global/restricted type definition, then leave it to program to extract)}{\super  5}{
\par \tab \tab Character attachments
\par }{\b \tab \tab \tab attachment name
\par \tab \tab \tab attachment type
\par \tab \tab \tab attachment path/URL
\par }{\tab \tab \tab Public notes
\par \tab \tab \tab Private notes
\par \tab \tab Private notes \{internal notes for character\}
\par }{\b
\par \tab \tab Character State List
\par }{
\par }{\b \tab \tab \tab Character state name | Character state ID
\par }{\tab \tab \tab Character dependencies (down)
\par \tab \tab \tab Character state attachments
\par }{\b \tab \tab \tab \tab attachment name
\par \tab \tab \tab \tab attachment type
\par \tab \tab \tab \tab attachment path/URL
\par }{\tab \tab \tab \tab Public notes
\par \tab \tab \tab \tab Private notes
\par \tab \tab \tab Private notes
\par \tab Taxon list source \{path to an external resource that defines the taxon list for the treatment\}
\par \tab Taxon set names \{defines a list of allowable names for taxon sets\}
\par }{\b \tab \tab name
\par }{
\par }\pard \fi240\nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {\b Taxon List}{
\par }\pard \nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {
\par \tab \tab }{\b Name | Taxon ID
\par }{\tab \tab Taxon set membership \{list of sets to which the taxon belongs; a taxon must be able to belong to \tab \tab \tab \tab \tab \tab \tab \tab \tab \tab more than one set?\}
\par \tab \tab Taxon attribution}{\super 1}{
\par \tab \tab Taxon attachments
\par }{\b \tab \tab \tab attachment name
\par \tab \tab \tab attachment type
\par \tab \tab \tab attachment path/URL
\par }{\tab \tab \tab }{Public notes
\par \tab \tab \tab Private notes}{ \{not parsed\}}{
\par }{\tab \tab Private notes \{not parsed\}
\par \tab }{\b Item Data}{ \{This will hold the \ldblquote score matrix\rdblquote \}
\par }{\b \tab \tab Taxon Name|ID/Character Name|ID}{\super 6}{
\par }{\b \tab \tab \tab Character Name|ID/Taxon Name|ID}{\super 6}{\b
\par }{\tab \tab \tab \tab }{\b State Name|ID}{
\par \tab \tab \tab \tab \tab }{\b Score}{ \{normally present, rare, present by misinterpretation etc\}
\par \tab \tab \tab \tab \tab Public Notes
\par \tab \tab \tab \tab \tab Private Notes
\par ------------------------
\par }{\super 1}{ Attribution and sources for an item datum overides that for a character or taxon, which override that for the treatment as a whole. Attribution for characters and taxa are equivalent and additive.
\par }{\super 2 }{Treatments are nestable. That is, one treatment may contain data on specimens, a higher-level treatment on taxa. The higher-level treatment
 gathers information for some characters from lower-level treatments, using the collation rules defined here.
\par }{\super 3}{ Character names may be hierarchically nested. Character properties (e.g. sets, dependencies, attachments) are only specified for the lowest level characters.
\par }\pard \li480\nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {e.g.
\par Leaves
\par \tab margins
\par \tab \tab teeth
\par \tab \tab \tab orientation  \}only these have
\par \tab \tab \tab shape          \} properties
\par }\pard \nowidctlpar\widctlpar\tx240\tx480\tx720\tx960\tx1200\tx1440\tx1680\tx1920\adjustright {
\par }{\super 4}{ Dependencies may be defined either up or down (but not both?). An up dependency lists the character  states that make this character inapplicable; a down dependency lists characters that become inapplicable when this state is chosen.
\par }{\super 5}{ The idea here is to specify a subset of taxa for which this character is scored, or to specify that the character is non-global, then leave it to the parsing progr
am to determine the taxon list. This feature would be used by future identification programs that employ the Progressive Revelation model.
\par }{\super 6}{ The item data may be stored as the equivalent of either a taxon-state matrix or a state-taxon matrix, depending upon whether taxa are nested within characters or characters are nested within
 taxa. There will need to be a way of specifying which of these is operative.}{\super
\par }}

