Kerry wrote:
First, I am stuck trying to adequately model the structure and dependencies of the leaf apex characters "apex obtuse or minutely mucronate within an apical notch". I am parsing this out as: Apex shape obtuse notched Apex projection immucronate mucronate within notch (dependent on 'Apex shape - notched') The model has an attribute for 'dependent on' under the character element. Does the proper solution of this problem require a 'dependent on' attribute as part of the //character/state element as well? Am I parsing the data correctly?
Maybe... but is there a correct answer to this in the absence of context? Without knowing what the other taxa in the application are, you could get away with:
Apex shape obtuse minutely mucronate in an apical notch Or even, trivially: Apex shape obtuse or minutely mucronate in an apical notch
It maybe that there are only two character states in this group of taxa - the bluntish ones with obtuse/mucronate tips and the sharp ones with acute/attenuate tips. In this case building a line of dependency might not only be unnecessary, but might it also be misleading?
But I do not this that was the real issue, was it? The challenge was, if you had a dependency, how would you deal with it...
The first problem led me to the second problem, which is: how do we identify the character or state another character depends on? This is not a problem when working with DELTA format data which has unique identifiers already, but text descriptions do not. Is an Xpath description adequate or do we need to be able to have a structure that allows us to uniquely identify what a character state describes?
Is there any harm in requiring a unique id for each node? Most descritpive applications probably do this already to manage their data matrices. DELTA lets you see them which is considered mabye useful, Lucid doesn't which is considered healthy... :)
I originally thought a unique identifier was needed to identify
missing data not explicitly encoded in the description and to provide a pointer to characters or states that are other characters of states depend on. Now that I see how complex this may be to implement, I'm wondering if another way would be better.
Is our structure/Schema/DTD is right, it would be nice to think that it was self referenceing and that we did not have to worry about internal ids and pointers, wouldn't it...
Thinking further about Kevin's two types of dependency, context and content (or whatever the terminology was), isn't the former an artifact of the character set we have decided to use and the latter an artifact of the taxa we are covering? The former is just not possible, the latter might be possible but is just not there. If this really is the case and one sort of dependency arises from the character state and another arises from the taxa being considered, then it is very likely that they should be modelled in slightly or completely different ways.
I originally included an attribute called 'describes' under the
//character/state element. I thought that this could be used for two purposes: (1) to add descriptive text to the character state and (2) to help build a unique identifier for a character state. The first use is no problem; this structure allows an author to easily add "high' to the value for the character state describing height for example. The structure is less useful for building an identifier for what the character state describes.
//character/state is not always going to be unique is it? The tip mucronate, obtuse, acute condition could exist in scales, bracts, leaves, petals, sepals, anthers, etc. A more complete path is going to have to be specified which could get pretty nasty... but hey, computers are good at nasty...
I thought I could build a unique identifier by combining the
character name and the 'describes' attribute. So for
<Character> <CharName>Leaf</Charname> <State describes="PlaneShape" modifier="+/-">oblong</State> </Character>
The identifier 'LeafPlaneShape' identifies what the state describes. This allows you to recognize that even when there are multiple states (ie. 'ovate', 'round', etc.) they are all describing one aspect of the leaf. If nested characters are recognized, the identifier must get longer. 'ApexShape' or 'ApexProjection' wouldn't work, it would have to be 'LeafApexShape' or 'LeafApexProjection'. If the nesting gets deep, this gets awkward pretty fast. It would work, though.
but not very elegantly... Dependency is the act of a state/value controlling the applicability or otherwise of a character/feature or as Kevin has pointed out a taxon controlling a character feature (in the identification process this get s bit circular as you have to hav an idea what a taxon is before you can decide if a character is inapplicable, but such knowledge can be built into the data in various ways). Thus the model will have to accommodate that a certain character/feature is unavailable if a certain state/value exists, perhaps in a totally different branch of the character/feature tree - and the obligate reciprocal relationship between controlling state/values and controlled characters/feataures. Unique ids, even arbitrary sequential numeric ones a la DELTA, might be the best (or an adequate) way to do this.
How about:
<feature id="123"> <feateName>leaf</featureName> <character id="6" nullcharacter="3" nullstate="4"> <characterName>leaf shape</characterName> <state modifier="+/-" value="present">oblong</State> <state value="rarely">ovate</State> <state value="rarely">obovate</State> </character> </feature>
... just doodling... maybe this needs to be specified in a separate characterlist up front, rather than in the guts of the descriptive data.
Actually we have not even decided that yet have we? Should the list of characters/states and the lists of taxa be described as separate blocks, or should they be implied from the content and structure of the standard data file. I prefer the former and both DELTA and Lucid do this... but have we decided that this is the way to go?
Do others think that having an identifier is necessary and, if so,
has anyone been able to come up with a better way to handle it? Is Xpath adequate to our needs?
I always have problems with Xpath - it is supposed to work, but I always manage to get lost in the hierarchy... need more practice... :)
jim