Hello,
my apologies to all list members for dropping out of the discussion for almost 5 months. I was overwhelmed with other work and could not get out of just feeling guilty but not picking the work up again. I am back from other work and also just back from vacation.
Steve Shattuck wrote:
Kevin is pretty much right that I see the main use of SDD as documenting essentially finished data rather than a "work in progress". My thinking is that most people will work on their dataset until it's finished, then make it available through SDD to other interested parties. My assumption is that most collaborative projects will select a single tool to use while building the dataset (be it DELTA, LucID, DeltaAccess or what ever) and will use the native format of this application to share data.
Now, I have no problem with capturing details needed by specific situations or applications and would say that these are important and should be support by SDD. However, I would try to make them extensions to the basic, core data rather than part of the core itself.
I agree with your concern about complexity. It already shows that the complexity present is difficult to document. We urgently need more introductory material.
However, I disagree about the notion of "finished datasets". I may be wrong, but I think that this is a thing of the past. The current process of publishing scientific information in unchangeable batches is primarily a function of the process of publishing printed materials, which are difficult to revise. I believe in the future we will move into continuously updated and revised versions. The granularity in terms of time may be months or years, but not decades as it is currently the case.
Especially when I am thinking about collaboration I am thinking not about 2 or 3 people working closely, but a loose network of collaborators. That also includes that datasets are passed on from one generation to the next. If some researcher created a very valuable dataset as a finished work, then retired and a new Ph.D. candidate revises this dataset, it will be irrelevant whether creation and intellectual property details are in the SDD set. However, the outcome of this will be a dataset perhaps 5% improved, 95% original material. Now I believe it is undesirable to pass the entire dataset on to the next reviser without any indication of what happened. In this second step, I believe it is the task of the SDD exchange format to record some fundamental attribution of intellectual property.
I am, however receptive to wisdom that caring for this (web updated data etc.) makes our task too complicated.
This leads to two thoughts:
First, it is crucial for SDD to be the product of a team with differing views who work together towards some middle ground. Otherwise it is unlikely that it will meet the needs of the broader community.
Second, I worry about SDD becoming too specialised and too complex. The "joy" of the DELTA standard is its simplicity. And this simplicity is, I would say, one of the reasons it has been so successful. I would contrast this with the Nexus format. Nexus is only seriously being used by the 3 or 4 applications involved in its original development. It is hardly a "standard" outside this small community. The power of Nexus is that it supports essentially all of the features of these original applications. This seems to be the way SDD is heading: support all of the features of character-based programs that currently exist. My concern would be that if we do this we will be the only people who use SDD, new players finding the format so complex and so specialised that it's expensive to implement and has little flexibility to support "extensions" outside of the strict standard. If this is true and is realised we all lose.
Here I would maintain that we do you existing programs to help us in finding the requirements, but that the presence of a feature in a program never had any bearing on whether it should become a feature of SDD. Rather the argument always is: Is there a need to exchange such data, do these data describe scientific information or management information that should be interoperable as metadata of the scientific process. Quite a number of DeltaAccess features are NOT or completely differently supported in SDD.
The concern about complexity is correct, however.
A call to everyone: Please do join the TDWD /SDD meeting in Lisbon, 21-26 October of this year, to help us finding a good compromise!
(Please inform me by direct mail if you are likely to come)
Gregor ---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Koenigin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Often wrong but never in doubt!