Re: Minimalism AND functionalism

5 Sep 2000

      At 09:31 5/09/2000 +1000, Kevin Thiele wrote:
...
So I'm proposing, as ground-zero, that the only absolute requirements are
that a document have DESCRIPTION elements, that these have ELEMENT elements
(terminology is unfortunate here) and that these have VALUEs. This is a long
way down the track towards minimalism, I know, but I'm floating it for
comment.
So, as I've exampled before, I'd like something like:
<DOCUMENT ID = "d2" Name = "One thing I know about a rose">
   <DESCRIPTION>
       <ITEM>
           <ITEM_NAME> Rose </ITEM NAME>
       </ITEM>
       <ELEMENT>
           <ELEMENT_NAME> Smell </ELEMENT_NAME>
       </ELEMENT>
       <VALUE> Sweet </VALUE>
   </DESCRIPTION>
</DOCUMENT>
to be a valid document.
I don't mean to be petty, but didn't we pretty much agree on a different
sort of nesting of elements in this general model?

Anyway, I can see that we will need to be cautious with terminology here.
"Element" (and "Attribute") have specific meanings within XML that could
potentially be confused with the terms as used for descriptive data.
Similarly, "valid" has a specific meaning within XML (basically, that a
document is both well-formed and compliant with a specified DTD or Schema),
that is slightly different from the notion of "validity" you are referring
to here (or is it?). I realize that in this context, none of us has much
difficulty using the appropriate mental "namespace", but if we ever reach
the stage of a formal number, we'll need to be careful of this. (Yes, yes,
I know I'm being terribly pedantic, and I'm sorry. But then I am addressing
taxonomists, after all.)
...
It contains information in a structured way that
means something, and the information can be parsed. Sure, this document
couldn't be used direct as input to Lucid or DELTA, but does that make it
worthless? I'm working at the moment towards a future generation of the
Lucid Builder (and perhaps future DELTA Builders will be similar) that will
indeed be able to collect and interactively integrate such structured bits
of information.
Such a document is not "worthless", but it is definitely "worth less" than
it could be. Arguably, the chief goal in designing a descriptive data model
is to provide a format or structure which can be readily accessed or
manipulated by computer. We humans can usually cope fairly well with
conventional natural language descriptions, but we can do so because of our
substantial background knowledge of life, the universe, and everything. We
can know from a quick glance at the above what is intended (provided we can
read English!). But today's computers have no concept of "Smell", and has
no awareness that the term "Smell" in this description may be in any way
related to the term "fragrance" is some other description. In short, such a
"loosely" structured description has very little advantage over a
description written in natural language in terms of machine readability,
yet suffers some reduction in its human readability.

If you can actually create software that can make real sense of bits of
information like this, you will have accomplished something that has eluded
AI researchers for the past half century.

Of course it's important to be able to store loosely structured bits of
information, but in general the more rigour we can apply, the better.

This discussion brings to mind a debate between the mid 20th century
American poets Robert Frost and Carl Sandberg. Frost argued that writing
poetry in free verse was like playing tennis without a net. Sandberg
countered that tennis would be a better game if there were no net to get in
the way. As one might guess, I've always been inclined to agree with Frost...

Eric Zurcher
CSIRO Division of Entomology
Canberra, Australia
E-mail: ericz@ento.csiro.au

Eric Zurcher

tags

participants (1)