Questions on Dependencies and identifiers

Thu Dec 6 08:46:38 CET 2001

Kerry wrote:
>  First, I am stuck trying to adequately model the structure and dependencies
> of the leaf apex characters
> "apex obtuse or minutely mucronate within an apical notch". I am parsing
> this out as:
>       Apex shape
>               obtuse
>               notched
>       Apex projection
>               immucronate
>               mucronate within notch (dependent on 'Apex shape - notched')
> The model has an attribute for 'dependent on' under the character element.
> Does the proper solution of this problem require a 'dependent on' attribute
> as part of the //character/state element as well? Am I parsing the data
> correctly?

Maybe...  but is there a correct answer to this in the absence of
context?  Without knowing what the other taxa in the application are, you
could get away with:

        Apex shape
                obtuse
                minutely mucronate in an apical notch
Or even, trivially:
        Apex shape
                obtuse or minutely mucronate in an apical notch

It maybe that there are only two character states in this group of taxa -
the bluntish ones with obtuse/mucronate tips and the sharp ones with
acute/attenuate tips.  In this case building a line of dependency might
not only be unnecessary, but might it also be misleading?

But I do not this that was the real issue, was it?  The challenge was,
if you had a dependency, how would you deal with it...

> The first problem led me to the second problem, which is: how do we
> identify the character or state another character depends on?  This is not a
> problem when working with DELTA format data which has unique identifiers
> already, but text descriptions do not.  Is an Xpath description adequate or
> do we need to be able to have a structure that allows us to uniquely
> identify what a character state describes?

Is there any harm in requiring a unique id for each node? Most
descritpive applications probably do this already to manage their data
matrices. DELTA lets you see them which is considered mabye useful,
Lucid doesn't which is considered healthy... :)

>         I originally thought a  unique identifier was needed to identify
> missing data not explicitly encoded in the description and to provide a
> pointer to characters or states that are other characters of states depend
> on.  Now that I see how complex this may be to implement, I'm wondering if
> another way would be better.

Is our structure/Schema/DTD is right, it would be nice to think that it
was self referenceing and that we did not have to worry about internal
ids and pointers, wouldn't it...

Thinking further about Kevin's two types of dependency, context and
content (or whatever the terminology was), isn't the former an artifact
of the character set we have decided to use and the latter an artifact
of the taxa we are covering? The former is just not possible, the latter
might be possible but is just not there.  If this really is the case and
one sort of dependency arises from the character state and another
arises from the taxa being considered, then it is very likely that they
should be modelled in slightly or completely different ways.

>         I originally included an attribute called 'describes' under the
> //character/state element. I thought that this could be used for two
> purposes: (1) to add descriptive text to the character state and (2) to help
> build a unique identifier for a character state.  The first use is no
> problem; this structure allows an author to easily add "high' to the value
> for the character state describing height for example.  The structure is
> less useful for building an identifier for what the character state
> describes.

//character/state is not always going to be unique is it? The tip
mucronate, obtuse, acute condition could exist in scales, bracts,
leaves, petals, sepals, anthers, etc. A more complete path is going to
have to be specified which could get pretty nasty...  but hey, computers
are good at nasty...

>         I thought I could build a unique identifier by combining the
> character name and the 'describes' attribute. So for
>
>         <Character>
>                 <CharName>Leaf</Charname>
>                 <State describes="PlaneShape" modifier="+/-">oblong</State>
>         </Character>
>
> The identifier 'LeafPlaneShape' identifies what the state describes.  This
> allows you to recognize that even when there are multiple states (ie.
> 'ovate', 'round', etc.) they are all describing one aspect of the leaf.  If
> nested characters are recognized, the identifier must get longer.
> 'ApexShape' or 'ApexProjection' wouldn't work, it would have to be
> 'LeafApexShape' or 'LeafApexProjection'.  If the nesting gets deep, this
> gets awkward pretty fast. It would work, though.

but not very elegantly...
Dependency is the act of a state/value controlling the applicability or
otherwise of a character/feature or as Kevin has pointed out a taxon
controlling a character feature (in the identification process this get
s bit circular as you have to hav an idea what a taxon is before you can
decide if a character is inapplicable, but such knowledge can be built
into the data in various ways).  Thus the model will have to accommodate
that a certain character/feature is unavailable if a certain state/value
exists, perhaps in a totally different branch of the character/feature
tree - and the obligate reciprocal relationship between controlling
state/values and controlled characters/feataures.  Unique ids, even
arbitrary sequential numeric ones a la DELTA, might be the best (or an
adequate) way to do this.

How about:

<feature id="123">
        <feateName>leaf</featureName>
                <character id="6" nullcharacter="3" nullstate="4">
                        <characterName>leaf shape</characterName>
                        <state modifier="+/-" value="present">oblong</State>
                        <state value="rarely">ovate</State>
                        <state value="rarely">obovate</State>
                </character>
</feature>

... just doodling...  maybe this needs to be specified in a separate
characterlist up front, rather than in the guts of the descriptive data.

Actually we have not even decided that yet have we?  Should the list of
characters/states and the lists of taxa be described as separate blocks,
or should they be implied from the content and structure of the standard
data file.  I prefer the former and both DELTA and Lucid do this... but
have we decided that this is the way to go?

>         Do others think that having an identifier is necessary and, if so,
> has anyone been able to come up with a better way to handle it? Is Xpath
> adequate to our needs?

I always have problems with Xpath - it is supposed to work, but I always
manage to get lost in the hierarchy...  need more practice...  :)

jim