Complexity of SDD format

27 Aug 2003

      Hello,

my apologies to all list members for dropping out of the discussion
for almost 5 months. I was overwhelmed with other work and could not
get out of just feeling guilty but not picking the work up again. I
am back from other work and also just back from vacation.

Steve Shattuck wrote:
...
Kevin is pretty much right that I see the main use of SDD as
documenting essentially finished data rather than a "work in
progress".  My thinking is that most people will work on their dataset
until it's finished, then make it available through SDD to other
interested parties.  My assumption is that most collaborative projects
will select a single tool to use while building the dataset (be it
DELTA, LucID, DeltaAccess or what ever) and will use the native format
of this application to share data.
...
Now, I have no problem with capturing details needed by specific
situations or applications and would say that these are important and
should be support by SDD.  However, I would try to make them
extensions to the basic, core data rather than part of the core
itself.
I agree with your concern about complexity. It already shows that the
complexity present is difficult to document. We urgently need more
introductory material.

However, I disagree about the notion of "finished datasets". I may be
wrong, but I think that this is a thing of the past. The current
process of publishing scientific information in unchangeable batches
is primarily a function of the process of publishing printed
materials, which are difficult to revise. I believe in the future we
will move into continuously updated and revised versions. The
granularity in terms of time may be months or years, but not decades
as it is currently the case.

Especially when I am thinking about collaboration I am thinking not
about 2 or 3 people working closely, but a loose network of
collaborators. That also includes that datasets are passed on from
one generation to the next. If some researcher created a very
valuable dataset as a finished work, then retired and a new Ph.D.
candidate revises this dataset, it will be irrelevant whether
creation and intellectual property details are in the SDD set.
However, the outcome of this will be a dataset perhaps 5% improved,
95% original material. Now I believe it is undesirable to pass the
entire dataset on to the next reviser without any indication of what
happened. In this second step, I believe it is the task of the SDD
exchange format to record some fundamental attribution of
intellectual property.

I am, however receptive to wisdom that caring for this (web updated
data etc.) makes our task too complicated.
...
This leads to two thoughts:
First, it is crucial for SDD to be the product of a team with
differing views who work together towards some middle ground.
Otherwise it is unlikely that it will meet the needs of the broader
community.
Second, I worry about SDD becoming too specialised and too complex.
The "joy" of the DELTA standard is its simplicity.  And this
simplicity is, I would say, one of the reasons it has been so
successful.  I would contrast this with the Nexus format.  Nexus is
only seriously being used by the 3 or 4 applications involved in its
original development.  It is hardly a "standard" outside this small
community.  The power of Nexus is that it supports essentially all of
the features of these original applications.  This seems to be the way
SDD is heading: support all of the features of character-based
programs that currently exist.  My concern would be that if we do this
we will be the only people who use SDD, new players finding the format
so complex and so specialised that it's expensive to implement and has
little flexibility to support "extensions" outside of the strict
standard.  If this is true and is realised we all lose.
Here I would maintain that we do you existing programs to help us in
finding the requirements, but that the presence of a feature in a
program never had any bearing on whether it should become a feature
of SDD. Rather the argument always is: Is there a need to exchange
such data, do these data describe scientific information or
management information that should be interoperable as metadata of
the scientific process. Quite a number of DeltaAccess features are
NOT or completely differently supported in SDD.

The concern about complexity is correct, however.

A call to everyone: Please do join the TDWD /SDD meeting in Lisbon,
21-26 October of this year, to help us finding a good compromise!

(Please inform me by direct mail if you are likely to come)

Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!

Gregor Hagedorn

tags

participants (1)