Bryan wrote:
I think that everyone should produce the most standardized, consistent and meaningful data possible, grounded only on fact and direct observation.
...and...
Structure is good when we can get it.
Is this the consensus we are arriving at? That we strive for structure and comparability but that we accommodate the free text 'blob' because oftentimes it may be the best we can get?
Yes?
Great!
Now we can move on to fixing up Brian's dream:
If we had a perfect data model and everyone did that all our problems would be solved!
The problem lies not so much with the 'perfect data model', because even imperfect ones can be made to work, but rather with the 'everyone did that' thing. An imperfect data model would work if everyone agreed to use it within its acknowledged limitations. After all, we do that sort of stuff all the time in other database applications.
Problem is that we'll never make a data model that can capture all of the semantics of the natural world.
But I do not think we should bust ourselves trying to do this. Kevin's approach of taking incremental steps in both directions to accommodate plain language description on one hand, character lists and structure on the other, and a mixture of both in between, depending on end use and available data, is the way to go, and is potentially achievable.
If we accept this, the next step would surely be to revisit Kevin's draft spec to see if it can do all this, and if it can not, beat it around the head until it can.
The description blob is an easy thing to model and is not particularly exciting: <blob>problem solved</blob>, but I accept Kevin's desire to allow for this in the specs - even this level of structure is better than none at all. A problem for me is that it is very easy to go from the character list to the blob without losing anything along the way, but it is next to impossible to do the reverse. However, Bryan and others have shown that it possible to automatically partially deblob or subblob text - this is surely a step in the right direction that can only improve with technology...
What I would like to see now is a 'real example' of the draft specs applied to a full 'real taxon' description to see if it really and hold a full description of a plant (or animal, if you must) in a fully atomized character list form, a single blob, and something in between... If the draft specs can do that, we will be getting somewhere.
<aside relevance=marginal>Speaking with Kevin this evening, I mentioned an early impression of the SDD list as being set up to create a standard to cover descriptive data in a digital environment and at the time the focus was heavily, if not entirely, on character based systems such as DELTA and Lucid. As the list and discussion has matured, we have come to embrace all forms of biological descriptions so we are now really talking about a formal specification for all forms of description, and acknowledge that some people will want to use one part of the spec and others another part of it. This seems like a very positive development in the quest for a universal descriptive data standard, that I do not think we have actually acknowledged before.</aside>
jim