Jims and Bryan's comments are exactly where we want to be at the moment, methinks.
I've realised over the weekend, thinking about Eric Zurcher's criticisms, what it is I'm trying to do with Draft Spec Mark 2 (some people are a bit slow).
In Lucid and DELTA we enforce a great deal of structure on the data file:
*CHARACTER LIST #1. Distribution by State (or known provenance)/ 1. South Australia - south-eastern (south of the line from Port Augusta to Broken Hill)/ 2. New South Wales (including Jervis Bay - A.C.T.)/ 3. Australian Capital Territory/ 4. Victoria/ 5. Tasmania/ ...etc
If the * or the # or a / is left off or put in the wrong place, the whole thing falls over. Only one very narrowly constrained data format can be a valid file. This is primarily done for ease of processing, and was a perfectly reasonable constraint for DELTA and Lucid to enforce, since what they were trying to do was create a data file for their particular program.
The standard I'm working towards ALLOWS this degree of formality, but doesn't ENFORCE it. In the standard, for a file to be valid input for Lucid or DELTA, it would need to conform to higher-order structure as imposed by those programs. But not all descriptive data out there is in DELTA or Lucid format, as Bryan notes, and we need to be inclusive of other types of data as well (by far the majority of which is natural-language legacy data). If all descriptive data needs to be encoded to a strong specification, it will never be so encoded (that's one reason, I think, why DELTA has failed as a global specification).
So I'm trying to create a spec where both this:
<DOCUMENT Name = "d1">
<ITEM_PROPERTIES> <ITEM ID = "1" NAME = "Gouania exilis"/> <ITEM ID = "2" NAME = "Gouania australiana"/> </ITEM_PROPERTIES>
<ELEMENT_PROPERTIES> <ELEMENT ID = "1" > <ELEMENT_NAME> Flower colour <\ELEMENT NAME> <VALUE_LIST> <VALUE ID = "1"> <VALUE_NAME> "green" </VALUE_NAME> </VALUE> <VALUE ID = "2"> <VALUE_NAME> "yellow" </VALUE_NAME> </VALUE> </VALUE_LIST> </ELEMENT> </ELEMENT_PROPERTIES>
<DESCRIPTION Name = "Gouania exilis"> <ELEMENT> <ELEMENT_ID> 1 </ELEMENT_ID> </ELEMENT> <VALUE_ID> 1 </VALUE_ID> <QUALIFIER> rarely </QUALIFIER> </DESCRIPTION>
</DOCUMENT>
... and this ...
<DOCUMENT Name = "d2">
Viola eminens K. Thiele & Prober, sp. nov.
<DESCRIPTION Name = "Viola eminens"> <ELEMENT = "Longevity"> <VALUE>Perennial</VALUE> </ELEMENT> <ELEMENT = "Life form"> <VALUE> herb </VALUE> </ELEMENT> spreading by stolons; rootstock sometimes somewhat swollen and bulbous at the stem bases. Stems contracted so that the leaves form rosettes, never elongate with caulescent leaves. <ELEMENT Name = "Leaves">Leaves <ELEMENT Name = "lamina"><ELEMENT Name = "Shape"><VALUE>broad-reniform</VALUE></ELEMENT>, the largest (10-)12-15(-25) mm long, (20-)25-35(-45) mm wide, 1.5-3.2 times wider than long, usually with a broad basal sinus; lamina with 9-20 +/- prominent teeth, glabrous or with scattered unicellular hairs on the upper surface, +/- concolorous bright green </ELEMENT>; petioles 2-8 cm long; stipules narrowly triangular, usually with several small, glandular teeth on each side.</ELEMENT> .........etc </DESCRIPTION> </DOCUMENT>
are valid documents.
Now there's a lot of blob text in d2, but it's still a description, and surely the very simple markup has value-added enormously to it. If we find this document through a web search we at least know that it includes a description of Viola eminens, and that this description includes a statement about the life form of that species. This is a huge advance on only knowing that we've found a document that contains the words "Viola" and "eminens". Isn't that what XML is all about?
Further, it seems it me that imposing the most basic formalities only (that a description is about something, that it starts at <DESCRIPTION> and ends with </DESCRIPTION>, and that it may optionally include some structured statements of the form <ELEMENT></ELEMENT> etc) can actually surprisingly easily allow parts at least of d2 to be mapped (or reformatted) to d1 (once we've got the syntax right which isn't yet the case with d1 and d2, he hastens to add).
It seems to me that creating a standard like this would be more valuable than simply XMLifying the DELTA of Lucid data file structure.
So that's what I'm trying to do, I think.
Cheers - k