Barry - these are just the sort of problems that Challenge 1 was designed to throw up. In retrospect, though, it's perhaps too complex already. Maybe as we consider it we can break it up into sub-challenges?
Dependencies are indeed a large issue that need to be dealt with carefully. Should we set them aside for the time being or do you want to deal with them now?
When I put in that bit about mucronate within a notch my thought was that there would be two states "obtuse" and "mucronate within an apical notch" and obviously Jim's right that it's dependent on the context - the other taxa in the treatment - whether to treat it like this or otherwise.
Your interpretation:
| Apex shape | obtuse | notched | | Apex projection | immucronate | mucronate within notch (dependent on 'Apex shape - notched')
seems to have a dependency between a state of one character and a state of another character (I'll use the character/state terminology for the time being here for simplicity). This is not the way it's usually done - dependencies usually set up a relationship between an entire character and a state of another character. e.g. the character Leaf shape ovate/elliptic will be dependent on the state "present" of the character "Leaves present/absent". In an interactive key the behaviour will be that if someone chooses Leaves absent then the character Leaf Shape will disappear. Likewise when building natural language descriptions, if a taxon has no leaves then the leaf descriptive characters will all be skipped.
You seem to be doing something else here, and I'm not sure what's the behaviour you were trying to get.
Can you give us a fragment of your model for the simplest case extractable from Challenge 1, without the complications of numeric characters, dependencies etc.
Yes, there's a real need for character lists and identifiers. Do we go with e.g.
<Feature Name = "Leaves"> <Value>ovate</Value> </Feature>
or <Feature FeatureID="23" valueID = "6">
The former certainly makes for better human-readability, but is this important?
I would have thought that a computer can make sense of either, and we can support both. The parser would simply need to know that if the attribute FeatureID is specified then use that and look up the idref to get the name, else if the attribute Name is specified then use that.
Presumably our document structure will be the same whether we use Names or Idrefs?
Cheers - k