Eric,
I'm not sure about a requirement for strong validation in the spec. I agree that validation should be allowed, but not required.
Your example:
| <STATEMENT> | <ITEM> | <ITEM_NAME> Gouania exilis </ITEM NAME> | </ITEM> | <ELEMENT> | <ELEMENT_NAME> Flour colur </ELEMENT_NAME> | </ELEMENT> | <VALUE> Puerto Rico </VALUE> | <QUALIFIER> anchovies, please! </QUALIFIER> | </STATEMENT>
There are three types of validation you suggest here:
1. Validate the element strings (and presumably also items, values and qualifiers) against predefined lists to guard against typographical and spelling errors; 2. Validate values against a list of allowable values for the character; 3. Validate qualifiers against a list of allowable qualifiers.
Taking these in reverse order:
3. As I see it the spec itself would define a set of allowable qualifiers (such as "rare" (= "rarely"?), "by misinterpretation", "uncertain" etc). I think we could probably agree on a limited set of qualifiers, and stick to that (with allowance for extension). If we do this, then "anchovies, please!" will be out for several reasons.
2. Validation of allowable values is covered in the draft spec. One of the properties of an element (~"character") is a list of allowable values (~"states"). If such a property is filled, then validation can be done by the program that handles the data, just as in DELTA and Lucid.
Two notes: * I'd like to allow but not enforce that a valid document have the allowable_values property filled. By not enforcing it, a simply marked-up natural-language description could be a valid document. This would perhaps mean that the spec could meet half-way the automated markup of legacy descriptions, and I'm keen to do this. Of course, a document without this property specified would not be able to be validated in the way you suggest and hence may not be very useful for some purposes, but this may be a price one is willing to pay for other benefits, and I think we need to keep this open. * I'm using "allowable values" rather than "states" as this seems to me to be more general, and subsumes the requirement to distinguish between, for instance, multistate and numeric "characters". A numeric "character" of course, doesn't have "states", but it does have allowable values (integers in the case of an integer numeric, real numbers in the case of a real numeric).
1. How strong is the requirement for this type of validation? Enforcing this seems to me to be like requiring that all word documents carry in their header a dictionary of the english language to allow validation of spellings. It seems to me that providing tools that allow people to check these strings against a predefined list (defined either within the document or in an external resource) would be useful, but not absolutely necessary. A document that is not or cannot be validated in this way would not be useless, and would perhaps be more free.
Note that the spec as I see it would allow (but again, not enforce as DELTA and Lucid do) the encoding of descriptions. Thus, a valid document may be d1 as below. This would preempt the need for typographic validation, and allow allowable-values validation. But for some reason I don't want to disallow d2 as a valid document also.
<DOCUMENT ID = "d1" Name = "Treatment of Gouania">
<ITEM_PROPERTIES> <ITEM ID = "1" NAME = "Gouania exilis"/> <ITEM ID = "2" NAME = "Gouania australiana"/> </ITEM_PROPERTIES>
<ELEMENT_PROPERTIES> <ELEMENT ID = "1" > <ELEMENT_NAME> Flower colour <\ELEMENT NAME> <VALUE_LIST> <VALUE ID = "1"> <VALUE_NAME> "green" </VALUE_NAME> </VALUE> <VALUE ID = "2"> <VALUE_NAME> "yellow" </VALUE_NAME> </VALUE> </VALUE_LIST> </ELEMENT> <ELEMENT ID = "2" > <ELEMENT_NAME> Petal shape <\ELEMENT NAME> <VALUE_LIST> <VALUE ID = "1"> <VALUE_NAME> "ovate" </VALUE_NAME> </VALUE> <VALUE ID = "2"> <VALUE_NAME> "obovate" </VALUE_NAME> </VALUE> </VALUE_LIST> </ELEMENT> </ELEMENT_PROPERTIES>
<COLLECTION Item_ID = "1"> <STATEMENT> <ELEMENT Element_ID="1/> <VALUE Value_ID = "1"/> </STATEMENT> <STATEMENT> <ELEMENT Element_ID="2/> <VALUE Value_ID = "1"/> </STATEMENT> <STATEMENT> <ELEMENT Element_ID="2/> <VALUE Value_ID = "2"/> </STATEMENT> </COLLECTION> </DOCUMENT>
<DOCUMENT ID = "d2" Name = "One thing I know about a rose"> <STATEMENT> <ITEM> <ITEM_NAME> Rose </ITEM NAME> </ITEM> <ELEMENT> <ELEMENT_NAME> Smell </ELEMENT_NAME> </ELEMENT> <VALUE> Sweet </VALUE> </STATEMENT> <\DOCUMENT>
Cheers - k