At 09:01 8/09/2000 +1000, Kevin Thiele wrote:
So tell me, does *anyone* out there agree with me?
Well, I don't thing we're really all that far apart. As Jim Croft nicely summarized it:
"Is this the consensus we are arriving at? That we strive for structure and comparability but that we accommodate the free text 'blob' because oftentimes it may be the best we can get? Yes? Great!"
So I'm largely in agreement with you, Kevin, with the disagreement perhaps being on where we should aim as our primary target on the structure/unstructred spectrum. (That is, I'd prefer to see highly structure information as the "default", with loosely structured "blobs" being optional, rather than the other way round.)
A similar sort of problem must occur commonly with specimen databases. For example, how to you capture a "location"? You might prefer to have lats, longs, and elevations nicely geocoded to the nearest meter, but the label on a older specimen might not say much more than "in a shady little billabong along Reedy Creek, back of Beyond". It's desirable to be able to store that information one way or another, even when it doesn't fit into your preferred structure, but used structured data when you can get it.
As a programmer, I'd like to be able to know whether I'm looking at a rigourously defined object or something "fuzzier". I think one could fairly easily accomodate both. Modifying one of Kevin's examples only slightly, you could have something like:
<DOCUMENT> <DESCRIPTION Taxon_Name = "Viola odorata"> <CHARACTER type="defined" Character_Name = "Leaves"> <STATE State_Name = "present"> </CHARACTER> <CHARACTER type="arbitrary" Character_Name = "scent"> a marvelous perfume on a perfect spring day </CHARACTER> </DESCRIPTION> </DOCUMENT>
where there is some small distinction made between rigourously defined characters (those which can be validated against some "character list", sensu lato, and for which cross-taxa comparisons are clearly meaningful), and characters defined "on the fly". (Note: I'm not so sure that using attributes is syntactically the best way to do this, but it illustrates the principle.)
Of course, as Mike would point out, such characters are really not terribly different from DELTA "text" characters...
I suspect that how all this eventually gets used will depend substantially on the sorts of editing and markup tools that are developed. I don't really anticipate that many taxonomists will want to go through their existing natural language descriptions and insert <ELEMENT></ELEMENT> tags manually. It's not only tedious, it's too darned easy to make mistakes. And perhaps the editing tools can be made clever enough that maintaining a character list isn't all that difficult. But it's still a good idea to have a system flexible enough to catch material that doesn't fit the mold.
So we do agree, kind of...
Eric Zurcher CSIRO Division of Entomology Canberra, Australia E-mail: ericz@ento.csiro.au