I am still working through Challenge 1, modifiying my data model so that it can completely model the text descriptions. I am now stuck on two problems that arose when I was trying to work through the practical implications of the model.
First, I am stuck trying to adequately model the structure and dependencies of the leaf apex characters "apex obtuse or minutely mucronate within an apical notch". I am parsing this out as:
Apex shape obtuse notched
Apex projection immucronate mucronate within notch (dependent on 'Apex shape - notched')
The model has an attribute for 'dependent on' under the character element. Does the proper solution of this problem require a 'dependent on' attribute as part of the //character/state element as well? Am I parsing the data correctly?
The first problem led me to the second problem, which is: how do we identify the character or state another character depends on? This is not a problem when working with DELTA format data which has unique identifiers already, but text descriptions do not. Is an Xpath description adequate or do we need to be able to have a structure that allows us to uniquely identify what a character state describes? I originally thought a unique identifier was needed to identify missing data not explicitly encoded in the description and to provide a pointer to characters or states that are other characters of states depend on. Now that I see how complex this may be to implement, I'm wondering if another way would be better. I originally included an attribute called 'describes' under the //character/state element. I thought that this could be used for two purposes: (1) to add descriptive text to the character state and (2) to help build a unique identifier for a character state. The first use is no problem; this structure allows an author to easily add "high' to the value for the character state describing height for example. The structure is less useful for building an identifier for what the character state describes. I thought I could build a unique identifier by combining the character name and the 'describes' attribute. So for
<Character> <CharName>Leaf</Charname> <State describes="PlaneShape" modifier="+/-">oblong</State> </Character>
The identifier 'LeafPlaneShape' identifies what the state describes. This allows you to recognize that even when there are multiple states (ie. 'ovate', 'round', etc.) they are all describing one aspect of the leaf. If nested characters are recognized, the identifier must get longer. 'ApexShape' or 'ApexProjection' wouldn't work, it would have to be 'LeafApexShape' or 'LeafApexProjection'. If the nesting gets deep, this gets awkward pretty fast. It would work, though. Do others think that having an identifier is necessary and, if so, has anyone been able to come up with a better way to handle it? Is Xpath adequate to our needs?
Thanks, Kerry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Kerry Barringer (Curator of the Herbarium)
Herbarium 718-623-7318 (office) Brooklyn Botanic Garden 718-941-4774 (fax) 1000 Washington Avenue 718-623-7312 (herbarium) Brooklyn, NY 11225-1099 U.S.A.
kbarringer@bbg.org http://www.bbg.org/ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^