Re: Minimalism AND functionalism

9 Sep 2000

      ...
At 09:01 8/09/2000 +1000, Kevin Thiele wrote:
...
So tell me, does *anyone* out there agree with me?
Well, I don't thing we're really all that far apart. As Jim Croft nicely
summarized it:
"Is this the consensus we are arriving at?  That we strive for structure and
comparability but that we accommodate the free text 'blob' because
oftentimes it may be the best we can get? Yes? Great!"
So I'm largely in agreement with you, Kevin, with the disagreement perhaps
being on where we should aim as our primary target on the
structure/unstructred spectrum. (That is, I'd prefer to see highly
structure information as the "default", with loosely structured "blobs"
being optional, rather than the other way round.)
Yes, I think that this should be the primary focus as well -- both as a
programmer and a botanist.
...
A similar sort of problem must occur commonly with specimen databases. For
example, how to you capture a "location"? You might prefer to have lats,
longs, and elevations nicely geocoded to the nearest meter, but the label
on a older specimen might not say much more than "in a shady little
billabong along Reedy Creek, back of Beyond". It's desirable to be able to
store that information one way or another, even when it doesn't fit into
your preferred structure, but used structured data when you can get it.
And this is something which I am currently *painfully* aware of.  I'm a
member of the ATBI plants group (the Biodiversity Inventory of the Great
Smoky Mountains National Park ...  qv. www.discoverlife.org or
www.goldsword.com/sfarmer/ATBI for our taxon pages ... they're supposed
to get crosslinked *eventually* ... ) and one of the things that we're
doing at the moment is databasing all of our legacy data -- all the
herbarium sheets that we have.

Unfortunately, for starters, that location
data is all going into a big blob *initially* so that it can be analysed
and determined what fields we have and which ones that we really want
to use -- we're also working with the folks who are designing the Final
Database that we're going to use because we have the most legacy data
already in databases.

And this might be something to consider --
your legacy label says something like "adjacent to Reedy Creek." and
that's all.  So, you go back later, and *can* add additional information
because that population is still there -- perhaps it's the only one
along the creek -- how do you add that information into the existing
record and identify it as *added* data?  It's valid -- some of it
more-so than others, perhaps.  Maybe all you might be able to add is
the County -- or a gross locality name -- "Greenbrier Section" but
you want to identify it as added data.  Would that be something that
this group would want to consider?  I know that it goes back to
the Validation issues ...
...
As a programmer, I'd like to be able to know whether I'm looking at a
rigourously defined object or something "fuzzier". I think one could fairly
easily accomodate both. Modifying one of Kevin's examples only slightly,
you could have something like:
<DOCUMENT>
   <DESCRIPTION Taxon_Name = "Viola odorata">
       <CHARACTER type="defined" Character_Name = "Leaves">
           <STATE State_Name = "present">
       </CHARACTER>
       <CHARACTER type="arbitrary" Character_Name = "scent">
          a marvelous perfume on a perfect spring day
       </CHARACTER>
   </DESCRIPTION>
</DOCUMENT>
where there is some small distinction made between rigourously defined
characters (those which can be validated against some "character list",
sensu lato, and for which cross-taxa comparisons are clearly meaningful),
and characters defined "on the fly". (Note: I'm not so sure that using
attributes is syntactically the best way to do this, but it illustrates the
principle.)
Of course, as Mike would point out, such characters are really not terribly
different from DELTA "text" characters...
I suspect that how all this eventually gets used will depend substantially
on the sorts of editing and markup tools that are developed. I don't really
anticipate that many taxonomists will want to go through their existing
natural language descriptions and insert <ELEMENT></ELEMENT> tags manually.
It's not only tedious, it's too darned easy to make mistakes. And perhaps
the editing tools can be made clever enough that maintaining a character
list isn't all that difficult. But it's still a good idea to have a system
flexible enough to catch material that doesn't fit the mold.
So we do agree, kind of...
Eric Zurcher
CSIRO Division of Entomology
Canberra, Australia
E-mail: ericz@ento.csiro.au
Susan Farmer
-----

Susan Farmer
sfarmer@goldsword.com
Botany Department, University of Tennessee
http://www.goldsword.com/sfarmer/Trillium

Re: Minimalism AND functionalism

Susan B. Farmer