Thanks to both Kevin and Gregor for the comments on my comments.
Kevin is pretty much right that I see the main use of SDD as documenting essentially finished data rather than a "work in progress". My thinking is that most people will work on their dataset until it's finished, then make it available through SDD to other interested parties. My assumption is that most collaborative projects will select a single tool to use while building the dataset (be it DELTA, LucID, DeltaAccess or what ever) and will use the native format of this application to share data.
Now, I have no problem with capturing details needed by specific situations or applications and would say that these are important and should be support by SDD. However, I would try to make them extensions to the basic, core data rather than part of the core itself.
The fundamental problem I see is making SDD truly extensible and yet generic at the same time. The big problem with the DELTA standard is that every time Mike needed new DELTA program functionality he extended the DELTA standard by adding new directives. This meant that if you wanted to support the DELTA standard you needed to follow Mike's lead. It's hardly a "standard" if it's being pushed by one group.
A specific example - "Use default state." We are looking hard at this in BioLink and will probably stay away from it, using different functionality to achieve the same thing (rapid data entry and global data changes). We obviously support this when importing from DELTA (and possibly from SDD) but it won't be core functionality for us and we won't export it. To enshrine this in SDD simply because DELTA had it (or some other application currently supports it) may not be the best approach (it may be, but I'm not willing to make that assumption unless we have to).
The same seems to be true for describing uncertainty. Kevin suggests that there are only three "special states":
It's logically possible to code and I intend to code it but haven't gotten around to it yet (unfinished business) It's logically possible to code and I intend not to do it (character scoped out) It's logically impossible to code (inapplicable)
I only see two here:
It's logically possible to code but I haven't done it It's logically impossible to code
PLUS a reason:
unfinished business character scoped out don't have specimens + a thousand other reasons.
To overcome the "machine processing" problem you'll need to enumerate this list and that's fine. But don't build a structure that makes it hard to change that enumeration at any time or as any set of users sees fit. The problem with "special states" is that it overloads "states", which have nothing to do with the uncertainty of the data.
I think this is a priority issue. It is FAR more important to know if this character has been scored for this taxon than to know the reason why it was or wasn't. If you want to then tell me the reason, great. But keep it simple and flexible. It also separates data from metadata along slightly different lines. I consider the state to be data, the reason for coding/not coding to be metadata and want to separate these as much as possible.
Kevin's discussion of attaching uncertainly to states breaks Gregor's SDD model in more fundamentally than what I had proposed. As Gregor suggests, this would result in fundamental changes to the current SDD model. The use of "modifiers" might work, but again, they are fairly tightly associated with natural language representation (as I understand it) and may not generalize well to fulfil Kevin's needs.
This leads to two thoughts:
First, it is crucial for SDD to be the product of a team with differing views who work together towards some middle ground. Otherwise it is unlikely that it will meet the needs of the broader community.
Second, I worry about SDD becoming too specialised and too complex. The "joy" of the DELTA standard is its simplicity. And this simplicity is, I would say, one of the reasons it has been so successful. I would contrast this with the Nexus format. Nexus is only seriously being used by the 3 or 4 applications involved in its original development. It is hardly a "standard" outside this small community. The power of Nexus is that it supports essentially all of the features of these original applications. This seems to be the way SDD is heading: support all of the features of character-based programs that currently exist. My concern would be that if we do this we will be the only people who use SDD, new players finding the format so complex and so specialised that it's expensive to implement and has little flexibility to support "extensions" outside of the strict standard. If this is true and is realised we all lose.
Steve
Steve Shattuck CSIRO Entomology Steve.shattuck@csiro.au