Morphological Data Representation

Kevin Thiele kevin.thiele at BIGPOND.COM
Wed Nov 28 12:42:01 CET 2001


----- Original Message -----
From: "Steve Shattuck" <Steve.Shattuck at CSIRO.AU>
To: <TDWG-SDD at USOBI.ORG>
Sent: Friday, November 23, 2001 2:33 PM
Subject: Morphological Data Representation


| Below and attached are a first attempt at representing simple, common
| DELTA-type data in an XML-based structure.  I've used a selected set of
| characters and items from the Butterfly sample data on the DELTA web site.
| The DELTA-formatted data looks like this:
|
|
| ==== DELTA-Standard CHARS File ====
|
| *SHOW: Lepidoptera demonstration characters. Revised 28-AUG-91.
|
| *CHARACTER LIST
|
| #1. main colour of inner part of front wing/
|        1. white/
|        2. cream/
|        3. grey/
|        4. brown/
|        5. black/
|        6. yellow/
|        7. orange/
|        8. blue/
|        9. green/
|
| #2. wings <transparency>/
|        1. with transparent areas/
|        2. without transparent areas/
|
| #3. length of front wing/
|        mm/
|
| #4. antennae <length>/
|        times length of front wing/
|
|
| ==== DELTA-Standard ITEMS File ====
|
| *SHOW: Lepidoptera demonstration items. Revised 18-OCT-94.
|
| *ITEM DESCRIPTIONS
|
| # Antheraea/
| 1,4 2,2 3,43-50 4,0.15-0.2
|
| # Ethmia/
| 1,2-4 2,2 3,11-14 4,0.6-0.65
|
| # Graphium/
| 1,1-2/9 2,2 3,29-33 4,0.45-0.5
|
| # Hecatesia/
| 1,4 2,1<small, translucent window>/2 3,11-14 4,0.8-0.9
|
|
| ==== DELTA-Standard SPECS File ====
|
| *SHOW: Lepidoptera demonstration specifications. Revised 28-AUG-91.
|
| *NUMBER OF CHARACTERS 4
| *MAXIMUM NUMBER OF STATES 9
| *MAXIMUM NUMBER OF ITEMS 4
|
| *CHARACTER TYPES 3,RN 4,RN
|
| *NUMBERS OF STATES 1,9
|
|
|
|
| ==== For these files, the DELTA-generated natural language would look
| something like this:
|
| Antheraea
| Main colour of inner part of front wing brown. Wings without transparent
| areas. Length of front wing 43-50 mm. Antennae 0.15-0.2 times length of
| front wing.
|
| Ethmia
| Main colour of inner part of front wing cream to brown. Wings without
| transparent areas. Length of front wing 11-14 mm. Antennae 0.6-0.65 times
| length of front wing.
|
| Graphium
| Main colour of inner part of front wing white to cream, or green. Wings
| without transparent areas. Length of front wing 29-33 mm. Antennae
0.45-0.5
| times length of front wing.
|
| Hecatesia
| Main colour of inner part of front wing brown. Wings with transparent
areas
| (small, translucent window), or without
| transparent areas. Length of front wing 11-14 mm. Antennae 0.8-0.9 times
| length of front wing.
|
|
|
|
| ==== Hand-generated natural language would be essentially the same except
| for the last item, where it might look more like this:
|
| Hecatesia
| Main colour of inner part of front wing brown. Wings with or without
| transparent areas (when present, forming a small window). Length of front
| wing 11-14 mm. Antennae 0.8-0.9 times length of front wing.
|
|
| ============================================
|
| I've translated this into the XML file that is attached.  (Even this
fairly
| simple example is moderately large and I would recommend using an XML
viewer
| such as Microsoft's XML Notepad when working with it - see
|
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/
| xmlpaddownload.asp.)
|
| The basic structure of the attached file is:
|
| <Morphology>
|    <Characters>
|       <Character> - ANY NUMBER
|          <Type>
|          <ID>
|          <Short_Descriptor>
|          <Long_Descriptor>
|          <State_Descriptor>
|          <Order>
|          <State> - ANY NUMBER
|             <Value>
|             <ID>
|             <Order>
|    <Items>
|       <Item> - ANY NUMBER
|          <Type>
|          <ID>
|          <Descriptor>
|          <CodedCharacter> - ANY NUMBER
|             <CharacterID>
|             <Description>
|             <CodedState> - ANY NUMBER
|                <StateID>
|                <Value>
|
|
| Note that I've treated everything as elements and haven't used attributes.
| Simplicity is the only reason for this and some elements would be better
as
| attributes; these can be converted when the dust settles.
|
| I've tried to generalise as much as possible and use only two main
elements:
| <character> and <item>.  Each character is assigned a <type> that tells
what
| it is (ordered multistate, unordered multistate, real, integer, etc).
| Similarly the item <type> can be specified as taxon or specimen (or
| potentially something else).
|
| A couple of points are probably worth making:
|
| The <Short_Descriptor> and <Long_Descriptor> are used to support DELTA
| comments.  This probably needs to be generalised further to support any
| number of alternate phrasings.
|
| <State_Descriptor> is used for the units of numeric characters and isn't
| needed (?) for other character types - it's an attempt at keeping
| <character> general.
|
| The <Description> element in <CodedCharacter> is used to house natural
| language representations. Codes for the states (when needed) are placed in
| square brackets, these being translated during generation.  As noted
above,
| this may been to be generalised to support any number of phrasings.
|
| In <CodedState>, <Value> is used to hold numeric values, the <character>
| <state> being used to define what the number means (minimum, maximum,
etc.,
| rather than using placement in the attribute string as in the DELTA
| standard).  This element won't (?) be needed for multistate characters.
|
| I think/hope the remainder is fairly clear.
|
|
| ** The Next Step **
|
| I would suggest the following path from here:
|
| 1) Make sure the above representation makes sense for the data given.
|
| 2) Expand the above data to support LucID-specific requirements (without
| adding additional complexity).
|
| Once this is finished we can:
|
| Add additional DELTA features (dependencies, default values, etc.)
|
| Add more complex data sets and examples
|
| Add new features on our assorted "wish lists"
|
|
| I look forward to comments and forward progress!
|
|
| Thanks, Steve
|
| Steve Shattuck
| CSIRO Entomology
| biolink at ento.csiro.au
|
|




More information about the tdwg-content mailing list