Validation

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Sat Sep 2 11:16:48 CEST 2000


Eric,

I'm not sure about a requirement for strong validation in the spec. I agree
that validation should be allowed, but not required.

Your example:

|         <STATEMENT>
|                 <ITEM>
|                         <ITEM_NAME> Gouania exilis </ITEM NAME>
|                 </ITEM>
|                 <ELEMENT>
|                         <ELEMENT_NAME> Flour colur </ELEMENT_NAME>
|                 </ELEMENT>
|                 <VALUE> Puerto Rico </VALUE>
|                 <QUALIFIER> anchovies, please! </QUALIFIER>
|         </STATEMENT>

There are three types of validation you suggest here:

1. Validate the element strings (and presumably also items, values and
qualifiers) against predefined lists to guard against typographical and
spelling errors;
2. Validate values against a list of allowable values for the character;
3. Validate qualifiers against a list of allowable qualifiers.

Taking these in reverse order:

3. As I see it the spec itself would define a set of allowable qualifiers
(such as "rare" (= "rarely"?), "by misinterpretation", "uncertain" etc). I
think we could probably agree on a limited set of qualifiers, and stick to
that (with allowance for extension). If we do this, then "anchovies,
please!" will be out for several reasons.

2. Validation of allowable values is covered in the draft spec. One of the
properties of an element (~"character") is a list of allowable values
(~"states"). If such a property is filled, then validation can be done by
the program that handles the data, just as in DELTA and Lucid.

Two notes:
* I'd like to allow but not enforce that a valid document have the
allowable_values property filled. By not enforcing it, a simply marked-up
natural-language description could be a valid document. This would perhaps
mean that the spec could meet half-way the automated markup of legacy
descriptions, and I'm keen to do this. Of course, a document without this
property specified would not be able to be validated in the way you suggest
and hence may not be very useful for some purposes, but this may be a price
one is willing to pay for other benefits, and I think we need to keep this
open.
* I'm using "allowable values" rather than "states" as this seems to me to
be more general, and subsumes the requirement to distinguish between, for
instance, multistate and numeric "characters". A numeric "character" of
course, doesn't have "states", but it does have allowable values (integers
in the case of an integer numeric, real numbers in the case of a real
numeric).

1. How strong is the requirement for this type of validation? Enforcing this
seems to me to be like requiring that all word documents carry in their
header a dictionary of the english language to allow validation of
spellings. It seems to me that providing tools that allow people to check
these strings against a predefined list (defined either within the document
or in an external resource) would be useful, but not absolutely necessary. A
document that is not or cannot be validated in this way would not be
useless, and would perhaps be more free.

Note that the spec as I see it would allow (but again, not enforce as DELTA
and Lucid do) the encoding of descriptions. Thus, a valid document may be d1
as below. This would preempt the need for typographic validation, and allow
allowable-values validation. But for some reason I don't want to disallow d2
as a valid document also.

<DOCUMENT ID = "d1" Name = "Treatment of Gouania">

    <ITEM_PROPERTIES>
        <ITEM ID = "1" NAME = "Gouania exilis"/>
        <ITEM ID = "2" NAME = "Gouania australiana"/>
    </ITEM_PROPERTIES>

    <ELEMENT_PROPERTIES>
        <ELEMENT ID = "1" >
            <ELEMENT_NAME> Flower colour <\ELEMENT NAME>
            <VALUE_LIST>
                <VALUE ID = "1">
                    <VALUE_NAME> "green" </VALUE_NAME>
                </VALUE>
                <VALUE ID = "2">
                    <VALUE_NAME> "yellow" </VALUE_NAME>
                </VALUE>
            </VALUE_LIST>
        </ELEMENT>
        <ELEMENT ID = "2" >
            <ELEMENT_NAME> Petal shape <\ELEMENT NAME>
            <VALUE_LIST>
                <VALUE ID = "1">
                    <VALUE_NAME> "ovate" </VALUE_NAME>
                </VALUE>
                <VALUE ID = "2">
                    <VALUE_NAME> "obovate" </VALUE_NAME>
                </VALUE>
            </VALUE_LIST>
        </ELEMENT>
    </ELEMENT_PROPERTIES>

    <COLLECTION Item_ID = "1">
        <STATEMENT>
            <ELEMENT Element_ID="1/>
            <VALUE Value_ID = "1"/>
        </STATEMENT>
        <STATEMENT>
            <ELEMENT Element_ID="2/>
            <VALUE Value_ID = "1"/>
        </STATEMENT>
        <STATEMENT>
            <ELEMENT Element_ID="2/>
            <VALUE Value_ID = "2"/>
        </STATEMENT>
    </COLLECTION>
</DOCUMENT>

<DOCUMENT ID = "d2" Name = "One thing I know about a rose">
    <STATEMENT>
        <ITEM>
            <ITEM_NAME> Rose </ITEM NAME>
        </ITEM>
        <ELEMENT>
            <ELEMENT_NAME> Smell </ELEMENT_NAME>
        </ELEMENT>
        <VALUE> Sweet </VALUE>
    </STATEMENT>
<\DOCUMENT>





Cheers - k




More information about the tdwg-content mailing list