Morphological Data Representation

Steve at Steve at
Fri Nov 23 14:33:06 CET 2001


Below and attached are a first attempt at representing simple, common
DELTA-type data in an XML-based structure.  I've used a selected set of
characters and items from the Butterfly sample data on the DELTA web site.
The DELTA-formatted data looks like this:


==== DELTA-Standard CHARS File ====

*SHOW: Lepidoptera demonstration characters. Revised 28-AUG-91.

*CHARACTER LIST

#1. main colour of inner part of front wing/
       1. white/
       2. cream/
       3. grey/
       4. brown/
       5. black/
       6. yellow/
       7. orange/
       8. blue/
       9. green/

#2. wings <transparency>/
       1. with transparent areas/
       2. without transparent areas/

#3. length of front wing/
       mm/

#4. antennae <length>/
       times length of front wing/


==== DELTA-Standard ITEMS File ====

*SHOW: Lepidoptera demonstration items. Revised 18-OCT-94.

*ITEM DESCRIPTIONS

# Antheraea/
1,4 2,2 3,43-50 4,0.15-0.2

# Ethmia/
1,2-4 2,2 3,11-14 4,0.6-0.65

# Graphium/
1,1-2/9 2,2 3,29-33 4,0.45-0.5

# Hecatesia/
1,4 2,1<small, translucent window>/2 3,11-14 4,0.8-0.9


==== DELTA-Standard SPECS File ====

*SHOW: Lepidoptera demonstration specifications. Revised 28-AUG-91.

*NUMBER OF CHARACTERS 4
*MAXIMUM NUMBER OF STATES 9
*MAXIMUM NUMBER OF ITEMS 4

*CHARACTER TYPES 3,RN 4,RN

*NUMBERS OF STATES 1,9




==== For these files, the DELTA-generated natural language would look
something like this:

Antheraea
Main colour of inner part of front wing brown. Wings without transparent
areas. Length of front wing 43-50 mm. Antennae 0.15-0.2 times length of
front wing.

Ethmia
Main colour of inner part of front wing cream to brown. Wings without
transparent areas. Length of front wing 11-14 mm. Antennae 0.6-0.65 times
length of front wing.

Graphium
Main colour of inner part of front wing white to cream, or green. Wings
without transparent areas. Length of front wing 29-33 mm. Antennae 0.45-0.5
times length of front wing.

Hecatesia
Main colour of inner part of front wing brown. Wings with transparent areas
(small, translucent window), or without
transparent areas. Length of front wing 11-14 mm. Antennae 0.8-0.9 times
length of front wing.




==== Hand-generated natural language would be essentially the same except
for the last item, where it might look more like this:

Hecatesia
Main colour of inner part of front wing brown. Wings with or without
transparent areas (when present, forming a small window). Length of front
wing 11-14 mm. Antennae 0.8-0.9 times length of front wing.


============================================

I've translated this into the XML file that is attached.  (Even this fairly
simple example is moderately large and I would recommend using an XML viewer
such as Microsoft's XML Notepad when working with it - see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/
xmlpaddownload.asp.)

The basic structure of the attached file is:

<Morphology>
   <Characters>
      <Character> - ANY NUMBER
         <Type>
         <ID>
         <Short_Descriptor>
         <Long_Descriptor>
         <State_Descriptor>
         <Order>
         <State> - ANY NUMBER
            <Value>
            <ID>
            <Order>
   <Items>
      <Item> - ANY NUMBER
         <Type>
         <ID>
         <Descriptor>
         <CodedCharacter> - ANY NUMBER
            <CharacterID>
            <Description>
            <CodedState> - ANY NUMBER
               <StateID>
               <Value>


Note that I've treated everything as elements and haven't used attributes.
Simplicity is the only reason for this and some elements would be better as
attributes; these can be converted when the dust settles.

I've tried to generalise as much as possible and use only two main elements:
<character> and <item>.  Each character is assigned a <type> that tells what
it is (ordered multistate, unordered multistate, real, integer, etc).
Similarly the item <type> can be specified as taxon or specimen (or
potentially something else).

A couple of points are probably worth making:

The <Short_Descriptor> and <Long_Descriptor> are used to support DELTA
comments.  This probably needs to be generalised further to support any
number of alternate phrasings.

<State_Descriptor> is used for the units of numeric characters and isn't
needed (?) for other character types - it's an attempt at keeping
<character> general.

The <Description> element in <CodedCharacter> is used to house natural
language representations. Codes for the states (when needed) are placed in
square brackets, these being translated during generation.  As noted above,
this may been to be generalised to support any number of phrasings.

In <CodedState>, <Value> is used to hold numeric values, the <character>
<state> being used to define what the number means (minimum, maximum, etc.,
rather than using placement in the attribute string as in the DELTA
standard).  This element won't (?) be needed for multistate characters.

I think/hope the remainder is fairly clear.


** The Next Step **

I would suggest the following path from here:

1) Make sure the above representation makes sense for the data given.

2) Expand the above data to support LucID-specific requirements (without
adding additional complexity).

Once this is finished we can:

Add additional DELTA features (dependencies, default values, etc.)

Add more complex data sets and examples

Add new features on our assorted "wish lists"


I look forward to comments and forward progress!


Thanks, Steve

Steve Shattuck
CSIRO Entomology
biolink at ento.csiro.au


------_=_NextPart_000_01C173CF.907ED420
Content-Type: application/octet-stream;
        name="Morphology 23 Nov 01.xml"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
        filename="Morphology 23 Nov 01.xml"

<Morphology>
   <Characters>
      <Character>
         <Type>Ordered</Type>
         <ID>C1</ID>
         <Short_Descriptor>main colour of inner part of front =
wing</Short_Descriptor>
         <Long_Descriptor>main colour of inner part of front =
wing</Long_Descriptor>
         <State_Descriptor/>
         <Order>1</Order>
         <State>
            <Value>white</Value>
            <ID>C1S1</ID>
            <Order>1</Order>
         </State>
         <State>
            <Value>cream</Value>
            <ID>C1S2</ID>
            <Order>2</Order>
         </State>
         <State>
            <Value>grey</Value>
            <ID>C1S3</ID>
            <Order>3</Order>
         </State>
         <State>
            <Value>brown</Value>
            <ID>C1S4</ID>
            <Order>4</Order>
         </State>
         <State>
            <Value>black</Value>
            <ID>C1S5</ID>
            <Order>5</Order>
         </State>
         <State>
            <Value>yellow</Value>
            <ID>C1S6</ID>
            <Order>6</Order>
         </State>
         <State>
            <Value>orange</Value>
            <ID>C1S7</ID>
            <Order>7</Order>
         </State>
         <State>
            <Value>blue</Value>
            <ID>C1S8</ID>
            <Order>8</Order>
         </State>
         <State>
            <Value>green</Value>
            <ID>C1S9</ID>
            <Order>9</Order>
         </State>
      </Character>
      <Character>
         <Type>Unordered</Type>
         <ID>C2</ID>
         <Short_Descriptor>wings</Short_Descriptor>
         <Long_Descriptor>transparent areas on wings</Long_Descriptor>
         <State_Descriptor/>
         <Order>2</Order>
         <State>
            <Value>with transparent areas</Value>
            <ID>C2S1</ID>
            <Order>1</Order>
         </State>
         <State>
            <Value>without transparent areas</Value>
            <ID>C2S2</ID>
            <Order>2</Order>
         </State>
      </Character>
      <Character>
         <Type>Real</Type>
         <ID>C3</ID>
         <Short_Descriptor>length of front wing</Short_Descriptor>
         <Long_Descriptor>length of front wing</Long_Descriptor>
         <State_Descriptor>mm</State_Descriptor>
         <Order>3</Order>
         <State>
            <Value>minimum</Value>
            <ID>C3S1</ID>
            <Order>1</Order>
         </State>
         <State>
            <Value>maximum</Value>
            <ID>C3S2</ID>
            <Order>2</Order>
         </State>
      </Character>
      <Character>
         <Type>Real</Type>
         <ID>C4</ID>
         <Short_Descriptor>antennae </Short_Descriptor>
         <Long_Descriptor>length of antennae</Long_Descriptor>
         <State_Descriptor>times length of front =
wing</State_Descriptor>
         <Order>4</Order>
         <State>
            <Value>minimum</Value>
            <ID>C4S1</ID>
            <Order>1</Order>
         </State>
         <State>
            <Value>maximum</Value>
            <ID>C4S2</ID>
            <Order>2</Order>
         </State>
      </Character>
   </Characters>
   <Items>
      <Item>
         <Type>Taxon</Type>
         <ID>T1</ID>
         <Descriptor>Antheraea</Descriptor>
         <CodedCharacter>
            <CharacterID>C1</CharacterID>
            <Description>[C1S4]</Description>
            <CodedState>
               <StateID>C1S4</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C2</CharacterID>
            <Description>[C2S2]</Description>
            <CodedState>
               <StateID>C2S2</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C3</CharacterID>
            <Description>43 to 50 mm</Description>
            <CodedState>
               <StateID>C3S1</StateID>
               <Value>43</Value>
            </CodedState>
            <CodedState>
               <StateID>C3S2</StateID>
               <Value>50</Value>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C4</CharacterID>
            <Description>0.15 to 0.2 times length of front =
wing</Description>
            <CodedState>
               <StateID>C4S1</StateID>
               <Value>0.15</Value>
            </CodedState>
            <CodedState>
               <StateID>C4S2</StateID>
               <Value>0.2</Value>
            </CodedState>
         </CodedCharacter>
      </Item>
      <Item>
         <Type>Taxon</Type>
         <ID>T2</ID>
         <Descriptor>Ethmia</Descriptor>
         <CodedCharacter>
            <CharacterID>C1</CharacterID>
            <Description>[C1S2] to [C1S4]</Description>
            <CodedState>
               <StateID>C1S2</StateID>
               <Value/>
            </CodedState>
            <CodedState>
               <StateID>C1S4</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C2</CharacterID>
            <Description>[C2S2]</Description>
            <CodedState>
               <StateID>C2S2</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C3</CharacterID>
            <Description>11 to 14 mm</Description>
            <CodedState>
               <StateID>C3S1</StateID>
               <Value>11</Value>
            </CodedState>
            <CodedState>
               <StateID>C3S2</StateID>
               <Value>14</Value>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C4</CharacterID>
            <Description>0.6 to 0.65 times length of front =
wing</Description>
            <CodedState>
               <StateID>C4S1</StateID>
               <Value>0.6</Value>
            </CodedState>
            <CodedState>
               <StateID>C4S2</StateID>
               <Value>0.65</Value>
            </CodedState>
         </CodedCharacter>
      </Item>
      <Item>
         <Type>Taxon</Type>
         <ID>T3</ID>
         <Descriptor>Graphium</Descriptor>
         <CodedCharacter>
            <CharacterID>C1</CharacterID>
            <Description>[C1S1] to [C1S2], or [C1S9]</Description>
            <CodedState>
               <StateID>C1S1</StateID>
               <Value/>
            </CodedState>
            <CodedState>
               <StateID>C1S2</StateID>
               <Value/>
            </CodedState>
            <CodedState>
               <StateID>C1S9</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C2</CharacterID>
            <Description>[C2S2]</Description>
            <CodedState>
               <StateID>C2S2</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C3</CharacterID>
            <Description>29 to 33 mm</Description>
            <CodedState>
               <StateID>C3S1</StateID>
               <Value>29</Value>
            </CodedState>
            <CodedState>
               <StateID>C3S2</StateID>
               <Value>33</Value>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C4</CharacterID>
            <Desscription>0.45 to 0.50 times length of front =
wing</Desscription>
            <CodedState>
               <StateID>C4S1</StateID>
               <Value>0.45</Value>
            </CodedState>
            <CodedState>
               <StateID>C4S2</StateID>
               <Value>0.50</Value>
            </CodedState>
         </CodedCharacter>
      </Item>
      <Item>
         <Type>Taxon</Type>
         <ID>T4</ID>
         <Descriptor>Hecatesia</Descriptor>
         <CodedCharacter>
            <CharacterID>C1</CharacterID>
            <Description>[C1S4]</Description>
            <CodedState>
               <StateID>C1S4</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C2</CharacterID>
            <Description>with or without transparent areas (when =
present, forming a small window)</Description>
            <CodedState>
               <StateID>C2S1</StateID>
               <Value/>
            </CodedState>
            <CodedState>
               <StateID>C2S2</StateID>
               <Value/>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C3</CharacterID>
            <Description>11 to 14 mm</Description>
            <CodedState>
               <StateID>C3S1</StateID>
               <Value>11</Value>
            </CodedState>
            <CodedState>
               <StateID>C3S2</StateID>
               <Value>14</Value>
            </CodedState>
         </CodedCharacter>
         <CodedCharacter>
            <CharacterID>C4</CharacterID>
            <Description>0.8 to 0.9 times length of front =
wing</Description>
            <CodedState>
               <StateID>C4S1</StateID>
               <Value>0.8</Value>
            </CodedState>
            <CodedState>
               <StateID>C4S1</StateID>
               <Value>0.9</Value>
            </CodedState>
         </CodedCharacter>
      </Item>
   </Items>
</Morphology>


More information about the tdwg-content mailing list