Comparative data

Thu Sep 7 23:24:10 CEST 2000

Bryan wrote:
>I think that everyone should produce the most standardized, consistent and
>meaningful data possible, grounded only on fact and direct observation.
...and...
>Structure is good when we can get it.

Is this the consensus we are arriving at?  That we strive for structure and
comparability but that we accommodate the free text 'blob' because
oftentimes it may be the best we can get?

Yes?

Great!

Now we can move on to fixing up Brian's dream:

>  If we had a perfect data model and everyone did that all our problems
> would be
>solved!

The problem lies not so much with the 'perfect data model', because even
imperfect ones can be made to work, but rather with the 'everyone did that'
thing.   An imperfect data model would work if everyone agreed to use it
within its acknowledged limitations.  After all, we do that sort of stuff
all the time in other database applications.

>Problem is that we'll never make a data model that can capture all of
>the semantics of the natural world.

But I do not think we should bust ourselves trying to do this.  Kevin's
approach of
taking incremental steps in both directions to accommodate plain language
description on one hand, character lists and structure on the other, and a
mixture of both in between, depending on end use and available data,  is
the way to go, and is potentially achievable.

If we accept this, the next step would surely be to revisit Kevin's draft
spec to see if it can do all this, and if it can not, beat it around the
head until it can.

The description blob is an easy thing to model and is not particularly
exciting: <blob>problem solved</blob>, but I accept Kevin's desire to allow
for this in the specs - even this level of structure is better than none at
all. A problem for me is that it is very easy to go from the character list
to the blob without losing anything along the way, but it is next to
impossible to do the reverse.  However, Bryan and others have shown that it
possible to automatically partially deblob or subblob text - this is surely
a step in the right direction that can only improve with technology...

What I would like to see now is a 'real example' of the draft specs applied
to a full 'real taxon' description to see if it really and hold a full
description of a plant (or animal, if you must) in a fully atomized
character list form, a single blob, and something in between...  If the
draft specs can do that, we will be getting somewhere.

<aside relevance=marginal>Speaking with Kevin this evening,  I mentioned an
early impression of the SDD list as being set up to create a standard to
cover descriptive data in a digital environment and at the time the focus
was heavily, if not entirely, on character based systems such as DELTA and
Lucid.   As the list and discussion has matured, we have come to embrace
all forms of biological descriptions so we are now really talking about a
formal specification for all forms of description, and acknowledge that
some people will want to use one part of the spec and others another part
of it.  This seems like a very positive development in the quest for a
universal descriptive data standard, that I do not think we have actually
acknowledged  before.</aside>

jim