Eric wrote:
A similar sort of problem must occur commonly with specimen databases. For example, how to you capture a "location"? You might prefer to have lats, longs, and elevations nicely geocoded to the nearest meter, but the label on a older specimen might not say much more than "in a shady little billabong along Reedy Creek, back of Beyond". It's desirable to be able to store that information one way or another, even when it doesn't fit into your preferred structure, but used structured data when you can get it.
We had a real example of this sort to thing here the other day. Someone wanted a list of tree species from New Guinea, so we attacked our specimen database with a query for records with New Guinea or Iran Jaya in the locality field and tree in the habit and notes fields - this seemed like a logical thing to do at the time. The result was absolutely disastrous because to lack of rigour in these free text fields. There were herbs, shrubs and vines all through the list because they were recorded as growing in, on, under or around trees. But on the surface, it seemed like a fairly reasonable and straightforward query to make.
I suspect that how all this eventually gets used will depend substantially on the sorts of editing and markup tools that are developed. I don't really anticipate that many taxonomists will want to go through their existing natural language descriptions and insert <ELEMENT></ELEMENT> tags manually.
This has to be a given doesn't it? No-one is going to want hand score this data long-hand and will be expecting to use point and click tools like the delta and lucid editors that save typing, maintain consistency, etc. What we are talking about here is the type of data that is scored and stored, and how it is transferred between applications, not what the various application might decide to do with it. Right?
jim