As an attempt to start a discussion, I am posting this to ask: do we all really agree that there should be a new standard for descriptive data based on XML, as a substitute for DELTA (as well as NEXUS and XDF)? Or should we instead just try to improve one of the existing formats?
From a purely technological perspective I'm convinced that XML is an
excellent mechanism for describing structured data - which is what much of DELTA is all about.
Standardising on something like XML means that custom parsers/parsing routines can be eliminated, making it much easier to write software to manipulate the information. I've already run over these issues in my web page on XDELTA.
As of myself, when I first read the specifications of "XDELTA", I was
under
the impression that in some way the DELTA format as we know it would then become obsolete... But what of NEXUS, or XDF? Has anyone considered of the integration of these formats plus DELTA into a single new, XML-based, format for descriptive data?
:) I didn't mean to imply that, just that using XML would provide a better underlying data format.
The reason I've been holding fire on additional work is that I believe that a proper requirements analysis needs to be done to decide what data needs to be captured by the standard - does DELTA meet all the current requirements of taxonomists? Are there other standards (e.g. NEXUS, etc) which are better suited to some applications/data? Can these be integrated into a single new standard (or modular standard)?
As a computer scientist rather than a taxonomist I don't feel qualified to address these issues, as I'm not working with this data on a daily basis.
Perhaps we should start by building a list of requirements and then measuring DELTA, NEXUS, etc against them to see whether they meet them. If they do - then theres no reason for change, if they don't then it may be time for a new standard. XML is only one possible 'syntax' for such a standard, but its one which has a lot of software support.
Note that when I refer to DELTA I refer to the data/file format specifically and not the application software. These should be evaluated separately.
So, what do people see as the basic requirements for this kind of format?
- ease of use (i.e. authoring) - ease of processing (parsing, validating, reading, converting) - ease of sharing (i.e. distribution) - open-ness (i.e. proprietary/non-proprietary) - ease of extensibility (i.e. ability to add more information cleanly at a later data) - internationalization - un-abiguity of data representation - unlimited size of data sets? (i.e. any limitation on character names, lengths, item names, numbers, etc)
What types of data need to be modelled by the format? (this can be a post-requirements gathering step, but some consideration needs to be given early on to measure the 'success' of the current formats)
- what types of characters? (real, integer, text, etc) - what types of data (text, images, other formats?)
Just some thoughts.
L.