Morphological Data Representation
Kevin Thiele
kevin.thiele at BIGPOND.COM
Wed Nov 28 12:42:01 CET 2001
----- Original Message -----
From: "Steve Shattuck" <Steve.Shattuck at CSIRO.AU>
To: <TDWG-SDD at USOBI.ORG>
Sent: Friday, November 23, 2001 2:33 PM
Subject: Morphological Data Representation
| Below and attached are a first attempt at representing simple, common
| DELTA-type data in an XML-based structure. I've used a selected set of
| characters and items from the Butterfly sample data on the DELTA web site.
| The DELTA-formatted data looks like this:
|
|
| ==== DELTA-Standard CHARS File ====
|
| *SHOW: Lepidoptera demonstration characters. Revised 28-AUG-91.
|
| *CHARACTER LIST
|
| #1. main colour of inner part of front wing/
| 1. white/
| 2. cream/
| 3. grey/
| 4. brown/
| 5. black/
| 6. yellow/
| 7. orange/
| 8. blue/
| 9. green/
|
| #2. wings <transparency>/
| 1. with transparent areas/
| 2. without transparent areas/
|
| #3. length of front wing/
| mm/
|
| #4. antennae <length>/
| times length of front wing/
|
|
| ==== DELTA-Standard ITEMS File ====
|
| *SHOW: Lepidoptera demonstration items. Revised 18-OCT-94.
|
| *ITEM DESCRIPTIONS
|
| # Antheraea/
| 1,4 2,2 3,43-50 4,0.15-0.2
|
| # Ethmia/
| 1,2-4 2,2 3,11-14 4,0.6-0.65
|
| # Graphium/
| 1,1-2/9 2,2 3,29-33 4,0.45-0.5
|
| # Hecatesia/
| 1,4 2,1<small, translucent window>/2 3,11-14 4,0.8-0.9
|
|
| ==== DELTA-Standard SPECS File ====
|
| *SHOW: Lepidoptera demonstration specifications. Revised 28-AUG-91.
|
| *NUMBER OF CHARACTERS 4
| *MAXIMUM NUMBER OF STATES 9
| *MAXIMUM NUMBER OF ITEMS 4
|
| *CHARACTER TYPES 3,RN 4,RN
|
| *NUMBERS OF STATES 1,9
|
|
|
|
| ==== For these files, the DELTA-generated natural language would look
| something like this:
|
| Antheraea
| Main colour of inner part of front wing brown. Wings without transparent
| areas. Length of front wing 43-50 mm. Antennae 0.15-0.2 times length of
| front wing.
|
| Ethmia
| Main colour of inner part of front wing cream to brown. Wings without
| transparent areas. Length of front wing 11-14 mm. Antennae 0.6-0.65 times
| length of front wing.
|
| Graphium
| Main colour of inner part of front wing white to cream, or green. Wings
| without transparent areas. Length of front wing 29-33 mm. Antennae
0.45-0.5
| times length of front wing.
|
| Hecatesia
| Main colour of inner part of front wing brown. Wings with transparent
areas
| (small, translucent window), or without
| transparent areas. Length of front wing 11-14 mm. Antennae 0.8-0.9 times
| length of front wing.
|
|
|
|
| ==== Hand-generated natural language would be essentially the same except
| for the last item, where it might look more like this:
|
| Hecatesia
| Main colour of inner part of front wing brown. Wings with or without
| transparent areas (when present, forming a small window). Length of front
| wing 11-14 mm. Antennae 0.8-0.9 times length of front wing.
|
|
| ============================================
|
| I've translated this into the XML file that is attached. (Even this
fairly
| simple example is moderately large and I would recommend using an XML
viewer
| such as Microsoft's XML Notepad when working with it - see
|
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/
| xmlpaddownload.asp.)
|
| The basic structure of the attached file is:
|
| <Morphology>
| <Characters>
| <Character> - ANY NUMBER
| <Type>
| <ID>
| <Short_Descriptor>
| <Long_Descriptor>
| <State_Descriptor>
| <Order>
| <State> - ANY NUMBER
| <Value>
| <ID>
| <Order>
| <Items>
| <Item> - ANY NUMBER
| <Type>
| <ID>
| <Descriptor>
| <CodedCharacter> - ANY NUMBER
| <CharacterID>
| <Description>
| <CodedState> - ANY NUMBER
| <StateID>
| <Value>
|
|
| Note that I've treated everything as elements and haven't used attributes.
| Simplicity is the only reason for this and some elements would be better
as
| attributes; these can be converted when the dust settles.
|
| I've tried to generalise as much as possible and use only two main
elements:
| <character> and <item>. Each character is assigned a <type> that tells
what
| it is (ordered multistate, unordered multistate, real, integer, etc).
| Similarly the item <type> can be specified as taxon or specimen (or
| potentially something else).
|
| A couple of points are probably worth making:
|
| The <Short_Descriptor> and <Long_Descriptor> are used to support DELTA
| comments. This probably needs to be generalised further to support any
| number of alternate phrasings.
|
| <State_Descriptor> is used for the units of numeric characters and isn't
| needed (?) for other character types - it's an attempt at keeping
| <character> general.
|
| The <Description> element in <CodedCharacter> is used to house natural
| language representations. Codes for the states (when needed) are placed in
| square brackets, these being translated during generation. As noted
above,
| this may been to be generalised to support any number of phrasings.
|
| In <CodedState>, <Value> is used to hold numeric values, the <character>
| <state> being used to define what the number means (minimum, maximum,
etc.,
| rather than using placement in the attribute string as in the DELTA
| standard). This element won't (?) be needed for multistate characters.
|
| I think/hope the remainder is fairly clear.
|
|
| ** The Next Step **
|
| I would suggest the following path from here:
|
| 1) Make sure the above representation makes sense for the data given.
|
| 2) Expand the above data to support LucID-specific requirements (without
| adding additional complexity).
|
| Once this is finished we can:
|
| Add additional DELTA features (dependencies, default values, etc.)
|
| Add more complex data sets and examples
|
| Add new features on our assorted "wish lists"
|
|
| I look forward to comments and forward progress!
|
|
| Thanks, Steve
|
| Steve Shattuck
| CSIRO Entomology
| biolink at ento.csiro.au
|
|
More information about the tdwg-content
mailing list