Special states

Steve at Steve at
Wed Mar 26 08:24:17 CET 2003

Thanks to both Kevin and Gregor for the comments on my comments.

Kevin is pretty much right that I see the main use of SDD as documenting
essentially finished data rather than a "work in progress".  My thinking
is that most people will work on their dataset until it's finished, then
make it available through SDD to other interested parties.  My
assumption is that most collaborative projects will select a single tool
to use while building the dataset (be it DELTA, LucID, DeltaAccess or
what ever) and will use the native format of this application to share

Now, I have no problem with capturing details needed by specific
situations or applications and would say that these are important and
should be support by SDD.  However, I would try to make them extensions
to the basic, core data rather than part of the core itself.  

The fundamental problem I see is making SDD truly extensible and yet
generic at the same time.  The big problem with the DELTA standard is
that every time Mike needed new DELTA program functionality he extended
the DELTA standard by adding new directives.  This meant that if you
wanted to support the DELTA standard you needed to follow Mike's lead.
It's hardly a "standard" if it's being pushed by one group.

A specific example - "Use default state."  We are looking hard at this
in BioLink and will probably stay away from it, using different
functionality to achieve the same thing (rapid data entry and global
data changes).   We obviously support this when importing from DELTA
(and possibly from SDD) but it won't be core functionality for us and we
won't export it.  To enshrine this in SDD simply because DELTA had it
(or some other application currently supports it) may not be the best
approach (it may be, but I'm not willing to make that assumption unless
we have to).

The same seems to be true for describing uncertainty.  Kevin suggests
that there are only three "special states":

It's logically possible to code and I intend to code it but haven't
gotten around to it yet (unfinished business) 
It's logically possible to code and I intend not to do it (character
scoped out) It's logically impossible to code (inapplicable)

I only see two here:

It's logically possible to code but I haven't done it
It's logically impossible to code

PLUS a reason:

unfinished business
character scoped out
don't have specimens
+ a thousand other reasons.

To overcome the "machine processing" problem you'll need to enumerate
this list and that's fine.  But don't build a structure that makes it
hard to change that enumeration at any time or as any set of users sees
fit.  The problem with "special states" is that it overloads "states",
which have nothing to do with the uncertainty of the data.  

I think this is a priority issue.  It is FAR more important to know if
this character has been scored for this taxon than to know the reason
why it was or wasn't.  If you want to then tell me the reason, great.
But keep it simple and flexible.  It also separates data from metadata
along slightly different lines.  I consider the state to be data, the
reason for coding/not coding to be metadata and want to separate these
as much as possible.

Kevin's discussion of attaching uncertainly to states breaks Gregor's
SDD model in more fundamentally than what I had proposed.  As Gregor
suggests, this would  result in fundamental changes to the current SDD
model.  The use of "modifiers" might work, but again, they are fairly
tightly associated with natural language representation (as I understand
it) and may not generalize well to fulfil Kevin's needs.

This leads to two thoughts:

First, it is crucial for SDD to be the product of a team with differing
views who work together towards some middle ground.  Otherwise it is
unlikely that it will meet the needs of the broader community.

Second, I worry about SDD becoming too specialised and too complex.  The
"joy" of the DELTA standard is its simplicity.  And this simplicity is,
I would say, one of the reasons it has been so successful.  I would
contrast this with the Nexus format.  Nexus is only seriously being used
by the 3 or 4 applications involved in its original development.  It is
hardly a "standard" outside this small community.  The power of Nexus is
that it supports essentially all of the features of these original
applications.  This seems to be the way SDD is heading: support all of
the features of character-based programs that currently exist.  My
concern would be that if we do this we will be the only people who use
SDD, new players finding the format so complex and so specialised that
it's expensive to implement and has little flexibility to support
"extensions" outside of the strict standard.  If this is true and is
realised we all lose.


Steve Shattuck
CSIRO Entomology
Steve.shattuck at csiro.au

More information about the tdwg-content mailing list