Document vs. database

Fri Nov 30 01:03:10 CET 2001

I have been watching the animated discussion on the revived SDD list and
it is tood to see... too busy to comment, but still interested, and
useful things are being said by others...  :)

> Existing data structures allow 2-level hierarchies e.g.
> #1. Leaf shape/
>     1. ovate/
>     2. elliptic/
> #2. Flower colour/
>     1. blue/
>     2. red/

Is that really two levels of character is is it only one with a state
attribure?

Also, in our new model we also want to show more than just presence
or absence of a state.  Don't we want to show if it is present/rarely
present/present by misinterpretation/rarely present by
misisnterpretation, etc., and my favourites yet to be implemented as
a character stae attribute: definitely absent, absent by
misinterpretation, unknown, unscored

> I'm simply suggesting allowing n levels:
> <Leaves>
>     <shape>
>         <ovate>
>         <elliptic>
> <Flowers>
>     <colour>
>         <blue>
>         <red>

That is where we want to get to do, but echoing Bob's words, what we
are after is the structure that allows us to get to that, giving
people all the appearance of freedom to do what they like but not
actually doing it and imposing a schematic straight jacket on the data.
A schema for the data structure rather than the descriptive data itself.
Like all freedoms, the descriptive data one too must be an illusion or
we wil miss out on on all the creative potential that it offers as the
free spirits among us do theri own thing.  But that is another
discussion...  :)

There has been a lot of talk about the feature/value paradigm and how
this might be made to represent a biological description and even nested
features in such descriptions...

At the recent TDWG meeting Richard Pankhurst described something like a
feature/character/value paradigm and at the time I made a note that this
was probalably worth considering in more detail, but so far have not had
the time.

It probably would mean something like:

<feature name="leaf" charecter="margin" value="serrate"/> or some
equivalent using XML entities rather than attributes.

Is there any merit in this approach above using a feature that is say
"leaf margin" and another that is say "leaf tip"  and yet another that
is say "leaf base"? It would seem to give some structure to the character
set data.

On the surface it would also allow easy generation of more readable
descriptions without excessive text processing:

Leaf outline ovate, margin serrate, tip acute, base attenuate, etc.

        as opposed to:

Leaf outline ovate, leaf margin serrate, leaf tip acute, leaf base
attenuate,  leaf... etc.

Has anyone else considered this approach?

jim