Draft Spec mark 2

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Sat Sep 2 08:13:57 CEST 2000

>>>From Eric's comments:

| Kevin has offered us not just 1, but 4, different document models for
| comment. That's a lot to consider, and I won't attempt to offer detailed
| criticism on each alternative. But from my point of view, the first
| appear far too "weak" to be of much use.

These are four (very early) examples, not four models. My intention is to
develop a
specification that can allow data represented with varying levels of
complexity, from very simple statements to fully worked up integral data
files such as Lucid and DELTA use. I'm not sure why, but I like the idea
that the most basic document would simply say "Eric Zurcher has two feet".
Perhaps current programs wouldn't be able to make much use of such a
document, but maybe in the future we may have a way of collecting and
organising clouds of such basic scraps (atoms) of description into something

| In particular, I think it is
| essential that there be a mechanism for specifying a universe of the
| character and state values (or, mapping roughly from Kevin's terminology,
| "elements" and "values", respectively) which may appear within a
| description. That is, we need to be able to define (or, rather, provide a
| mechanism for the dataset designer to define) the equivalent of a DELTA
| "character list". This is needed to enforce consistency and to disallow
| nonsensical constructs. For example, a portion of Document 1 reads:
|         <STATEMENT>
|                 <ITEM>
|                         <ITEM_NAME> Gouania exilis </ITEM NAME>
|                 </ITEM>
|                 <ELEMENT>
|                         <ELEMENT_NAME> Flower colour </ELEMENT_NAME>
|                 </ELEMENT>
|                 <VALUE> green </VALUE>
|                 <QUALIFIER> rarely </QUALIFIER>
|         </STATEMENT>
| There needs to be a way to preclude a nonsensical entry like:
|         <STATEMENT>
|                 <ITEM>
|                         <ITEM_NAME> Gouania exilis </ITEM NAME>
|                 </ITEM>
|                 <ELEMENT>
|                         <ELEMENT_NAME> Flour colur </ELEMENT_NAME>
|                 </ELEMENT>
|                 <VALUE> Puerto Rico </VALUE>
|                 <QUALIFIER> anchovies, please! </QUALIFIER>
|         </STATEMENT>

Yes, I agree. Validation rules need to be built in. I think I said that I
was putting this up for development, not as a finished product.

Perhaps another reason for having character & taxon lists at the head of the
document is to allow an agent that visits the document to determine which
taxa and characters it's about without having to parse the whole thing?

| I might also note that I strongly question the way the above is organized.
| I think a rearrangement better expressing the relationships (but still not
| really addressing the problem of a lack of meaningful validation) would be
| more along the lines of:
|         <ITEM>
|            <ITEM_NAME> Gouania exilis </ITEM NAME>
|            <ELEMENT>
|               <ELEMENT_NAME> Flower colour </ELEMENT_NAME>
|               <VALUE> green
|                 <QUALIFIER> rarely </QUALIFIER>
|               </VALUE>
|            </ELEMENT>
|         </ITEM>

This is perhaps a subtle point, perhaps a trivial one, and I'm glad you
raised it. Why is this better that the above? What exactly is the advantage?
I know yours is more DELTA-like and I agree it seems more intuitively
correct, but is there more to your strong belief than that? In truth, I
played with different ways of structuring unitary statements and presented
the one I did to draw comment rather than just follow down the path of what
we do already.

| And I'd be inclined to make a bit more use of attributes, rather than
| element content, though as Bob Morris points out, that is largely (though
| not entirely) a matter of stylistic convention.

See my general comment about true- and pseudo-XML

| I don't really see any great difficulties is implementing some sort of
| "character list" within the XML syntax. "Document 4" appears to be heading
| in that direction, but doesn't (in my opinion) go quite far enough.

I agree

| What I think we need is a Schema definition that would allow a validating
| to detect obvious data errors, and assist editing software in enforcing
| "correctness" of the data. I've recently been looking through the
| description of XML Schema, and it seems to have the expressive power
| for this sort of thing.

Very likely. My opinion is that we should work out our data requirements
(one of which is the need for validation as you properly point out) and then
determine the best vehicle to carry the data.

Cheers - k

More information about the tdwg-content mailing list