Re: Questions on Dependencies and identifiers

6 Dec 2001

      Kerry wrote:
...
First, I am stuck trying to adequately model the structure and dependencies
of the leaf apex characters
"apex obtuse or minutely mucronate within an apical notch". I am parsing
this out as:
      Apex shape
              obtuse
              notched
      Apex projection
              immucronate
              mucronate within notch (dependent on 'Apex shape - notched')
The model has an attribute for 'dependent on' under the character element.
Does the proper solution of this problem require a 'dependent on' attribute
as part of the //character/state element as well? Am I parsing the data
correctly?
Maybe...  but is there a correct answer to this in the absence of
context?  Without knowing what the other taxa in the application are, you
could get away with:

        Apex shape
                obtuse
                minutely mucronate in an apical notch
Or even, trivially:
        Apex shape
                obtuse or minutely mucronate in an apical notch

It maybe that there are only two character states in this group of taxa -
the bluntish ones with obtuse/mucronate tips and the sharp ones with
acute/attenuate tips.  In this case building a line of dependency might
not only be unnecessary, but might it also be misleading?

But I do not this that was the real issue, was it?  The challenge was,
if you had a dependency, how would you deal with it...
...
The first problem led me to the second problem, which is: how do we
identify the character or state another character depends on?  This is not a
problem when working with DELTA format data which has unique identifiers
already, but text descriptions do not.  Is an Xpath description adequate or
do we need to be able to have a structure that allows us to uniquely
identify what a character state describes?
Is there any harm in requiring a unique id for each node? Most
descritpive applications probably do this already to manage their data
matrices. DELTA lets you see them which is considered mabye useful,
Lucid doesn't which is considered healthy... :)
...
I originally thought a  unique identifier was needed to identify
missing data not explicitly encoded in the description and to provide a
pointer to characters or states that are other characters of states depend
on.  Now that I see how complex this may be to implement, I'm wondering if
another way would be better.
Is our structure/Schema/DTD is right, it would be nice to think that it
was self referenceing and that we did not have to worry about internal
ids and pointers, wouldn't it...

Thinking further about Kevin's two types of dependency, context and
content (or whatever the terminology was), isn't the former an artifact
of the character set we have decided to use and the latter an artifact
of the taxa we are covering? The former is just not possible, the latter
might be possible but is just not there.  If this really is the case and
one sort of dependency arises from the character state and another
arises from the taxa being considered, then it is very likely that they
should be modelled in slightly or completely different ways.
...
I originally included an attribute called 'describes' under the
//character/state element. I thought that this could be used for two
purposes: (1) to add descriptive text to the character state and (2) to help
build a unique identifier for a character state.  The first use is no
problem; this structure allows an author to easily add "high' to the value
for the character state describing height for example.  The structure is
less useful for building an identifier for what the character state
describes.
//character/state is not always going to be unique is it? The tip
mucronate, obtuse, acute condition could exist in scales, bracts,
leaves, petals, sepals, anthers, etc. A more complete path is going to
have to be specified which could get pretty nasty...  but hey, computers
are good at nasty...
...
I thought I could build a unique identifier by combining the
character name and the 'describes' attribute. So for
<Character>
                <CharName>Leaf</Charname>
                <State describes="PlaneShape" modifier="+/-">oblong</State>
        </Character>
The identifier 'LeafPlaneShape' identifies what the state describes.  This
allows you to recognize that even when there are multiple states (ie.
'ovate', 'round', etc.) they are all describing one aspect of the leaf.  If
nested characters are recognized, the identifier must get longer.
'ApexShape' or 'ApexProjection' wouldn't work, it would have to be
'LeafApexShape' or 'LeafApexProjection'.  If the nesting gets deep, this
gets awkward pretty fast. It would work, though.
but not very elegantly...
Dependency is the act of a state/value controlling the applicability or
otherwise of a character/feature or as Kevin has pointed out a taxon
controlling a character feature (in the identification process this get
s bit circular as you have to hav an idea what a taxon is before you can
decide if a character is inapplicable, but such knowledge can be built
into the data in various ways).  Thus the model will have to accommodate
that a certain character/feature is unavailable if a certain state/value
exists, perhaps in a totally different branch of the character/feature
tree - and the obligate reciprocal relationship between controlling
state/values and controlled characters/feataures.  Unique ids, even
arbitrary sequential numeric ones a la DELTA, might be the best (or an
adequate) way to do this.

How about:

<feature id="123">
        <feateName>leaf</featureName>
                <character id="6" nullcharacter="3" nullstate="4">
                        <characterName>leaf shape</characterName>
                        <state modifier="+/-" value="present">oblong</State>
                        <state value="rarely">ovate</State>
                        <state value="rarely">obovate</State>
                </character>
</feature>

... just doodling...  maybe this needs to be specified in a separate
characterlist up front, rather than in the guts of the descriptive data.

Actually we have not even decided that yet have we?  Should the list of
characters/states and the lists of taxa be described as separate blocks,
or should they be implied from the content and structure of the standard
data file.  I prefer the former and both DELTA and Lucid do this... but
have we decided that this is the way to go?
...
Do others think that having an identifier is necessary and, if so,
has anyone been able to come up with a better way to handle it? Is Xpath
adequate to our needs?
I always have problems with Xpath - it is supposed to work, but I always
manage to get lost in the hierarchy...  need more practice...  :)

jim

Jim Croft

tags

participants (1)