Draft Spec mark 2

Fri Sep 1 14:56:26 CEST 2000

Kevin has offered us not just 1, but 4, different document models for
comment. That's a lot to consider, and I won't attempt to offer detailed
criticism on each alternative. But from my point of view, the first several
appear far too "weak" to be of much use. In particular, I think it is
essential that there be a mechanism for specifying a universe of the
character and state values (or, mapping roughly from Kevin's terminology,
"elements" and "values", respectively) which may appear within a
description. That is, we need to be able to define (or, rather, provide a
mechanism for the dataset designer to define) the equivalent of a DELTA
"character list". This is needed to enforce consistency and to disallow
nonsensical constructs. For example, a portion of Document 1 reads:

        <STATEMENT>
                <ITEM>
                        <ITEM_NAME> Gouania exilis </ITEM NAME>
                </ITEM>
                <ELEMENT>
                        <ELEMENT_NAME> Flower colour </ELEMENT_NAME>
                </ELEMENT>
                <VALUE> green </VALUE>
                <QUALIFIER> rarely </QUALIFIER>
        </STATEMENT>

There needs to be a way to preclude a nonsensical entry like:

        <STATEMENT>
                <ITEM>
                        <ITEM_NAME> Gouania exilis </ITEM NAME>
                </ITEM>
                <ELEMENT>
                        <ELEMENT_NAME> Flour colur </ELEMENT_NAME>
                </ELEMENT>
                <VALUE> Puerto Rico </VALUE>
                <QUALIFIER> anchovies, please! </QUALIFIER>
        </STATEMENT>

I might also note that I strongly question the way the above is organized.
I think a rearrangement better expressing the relationships (but still not
really addressing the problem of a lack of meaningful validation) would be
more along the lines of:

        <ITEM>
           <ITEM_NAME> Gouania exilis </ITEM NAME>
           <ELEMENT>
              <ELEMENT_NAME> Flower colour </ELEMENT_NAME>
              <VALUE> green
                <QUALIFIER> rarely </QUALIFIER>
              </VALUE>
           </ELEMENT>
        </ITEM>

And I'd be inclined to make a bit more use of attributes, rather than
element content, though as Bob Morris points out, that is largely (though
not entirely) a matter of stylistic convention.

I don't really see any great difficulties is implementing some sort of
"character list" within the XML syntax. "Document 4" appears to be heading
in that direction, but doesn't (in my opinion) go quite far enough. What I
think we need is a Schema definition that would allow a validating parser
to detect obvious data errors, and assist editing software in enforcing
"correctness" of the data. I've recently been looking through the
description of XML Schema, and it seems to have the expressive power needed
for this sort of thing.

Cheers,

Eric Zurcher
CSIRO Division of Entomology
Canberra, Australia
E-mail: ericz at ento.csiro.au