Re: Minimalism AND functionalism

8 Sep 2000

      At 09:01 8/09/2000 +1000, Kevin Thiele wrote:
...
So tell me, does *anyone* out there agree with me?
Well, I don't thing we're really all that far apart. As Jim Croft nicely
summarized it:

"Is this the consensus we are arriving at?  That we strive for structure and
comparability but that we accommodate the free text 'blob' because
oftentimes it may be the best we can get? Yes? Great!"

So I'm largely in agreement with you, Kevin, with the disagreement perhaps
being on where we should aim as our primary target on the
structure/unstructred spectrum. (That is, I'd prefer to see highly
structure information as the "default", with loosely structured "blobs"
being optional, rather than the other way round.)

A similar sort of problem must occur commonly with specimen databases. For
example, how to you capture a "location"? You might prefer to have lats,
longs, and elevations nicely geocoded to the nearest meter, but the label
on a older specimen might not say much more than "in a shady little
billabong along Reedy Creek, back of Beyond". It's desirable to be able to
store that information one way or another, even when it doesn't fit into
your preferred structure, but used structured data when you can get it.

As a programmer, I'd like to be able to know whether I'm looking at a
rigourously defined object or something "fuzzier". I think one could fairly
easily accomodate both. Modifying one of Kevin's examples only slightly,
you could have something like:

<DOCUMENT>
    <DESCRIPTION Taxon_Name = "Viola odorata">
        <CHARACTER type="defined" Character_Name = "Leaves">
            <STATE State_Name = "present">
        </CHARACTER>
        <CHARACTER type="arbitrary" Character_Name = "scent">
           a marvelous perfume on a perfect spring day
        </CHARACTER>
    </DESCRIPTION>
</DOCUMENT>

where there is some small distinction made between rigourously defined
characters (those which can be validated against some "character list",
sensu lato, and for which cross-taxa comparisons are clearly meaningful),
and characters defined "on the fly". (Note: I'm not so sure that using
attributes is syntactically the best way to do this, but it illustrates the
principle.)

Of course, as Mike would point out, such characters are really not terribly
different from DELTA "text" characters...

I suspect that how all this eventually gets used will depend substantially
on the sorts of editing and markup tools that are developed. I don't really
anticipate that many taxonomists will want to go through their existing
natural language descriptions and insert <ELEMENT></ELEMENT> tags manually.
It's not only tedious, it's too darned easy to make mistakes. And perhaps
the editing tools can be made clever enough that maintaining a character
list isn't all that difficult. But it's still a good idea to have a system
flexible enough to catch material that doesn't fit the mold.

So we do agree, kind of...

Eric Zurcher
CSIRO Division of Entomology
Canberra, Australia
E-mail: ericz@ento.csiro.au