Special states

Fri Mar 21 10:18:06 CET 2003

Steve Shattuck's response to Gregor's Special States document raises some
important points. There is a clear divergence of opinion as to the role and
scope of SDD. It seems to me that the difference is that Steve wants SDD to
record pure descriptive facts, while Gregor wants it to capture a scientific
work-in-progress, including the judgements and process decisions of the
scientist. Hence, Gregor wants SDD to capture the types of statements that a
taxonomist may wish to make with respect to an evolving description (in this
case, with respect to uncertainty or missing data in the description), while
Steve is sticking with the pure known, that the data are missing so lets
leave it at that. Am I right, you two?

Steve regards that there is a potentially infinite universe of possibly
useful special states. In this view, any attempt to specify a few special
states is restrictive, so we need to generalize, Generalize, GENERALIZE.
Steve's suggestion is:

| A much better way to implement this functionality would be to store an
| "uncoded" flag with the description along with an (encoded or
| text-based) explanation ("unknown", "not interpretable", "too lazy to
| code this", "don't have proper specimens", "To Do" or what ever).  This
| is both direct and allows the explanation to change in a simple and
| flexible way.

It seems to me that if the "explanation" is encoded, then this suggestion is
not a long way from Gregor's. If the "explanation" is text-based, then it
will be a mere comment that may be useful to the original author but will be
impossible to process by any other application. I agree that processing
issues need to be kept firmly under control in SDD but I don't think they
have no role - after all, we're capturing these data in order to process
them, not just archive them.

Under Gregor's model and Steve's "encoded explanation" model, we would need
to be quite sure that it is possible to capture the entire universe of
statements describing the uncertainty. This seems to me to be possible. For
instance, the following list of possibilities for an unencoded datum seem to
be exhaustive:

It's logically possible to code and I intend to code it but haven't gotten
around to it yet (unfinished business)
It's logically possible to code and I intend not to do it (character scoped
out)
It's logically impossible to code (inapplicable)

Surely there's no space between the logically possible and logically
impossible, or between the intend to do it and intend not to do it. (There
may of course be subcategories of these that we may choose to capture. And
there will of course be a role for free-form text)

I agree with Steve (and Gregor) that Gregor's document is messy and we need
to clean up and tease out the concepts.

Cheers - k