As far as Delta goes, I think that the data are fairly close to what taxonomists have been encoding into traditional taxonomic descriptions. 'Success' is pretty good.
Nexus can include charater/state data but also other things such as ways to describe phylogenetic trees in the so-called "New Hampshire" format:
((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel:18.87953):2.09460):3.87382,dog:25.46154);
If the basic format contains the relevant data, then a phylogenetic tree can be constructed from that data - so I'd have seen the above as something that should be derived from your data set rather than stored within it.
I assume "New Hampshire" = "Newick" format? If so, heres an XML equivalent.
<!ELEMENT label (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT branch (label?, branch*)> <!ATTLIST branch length CDATA #IMPLIED> <!ELEMENT newick (branch?)>
Just something I was toying with before.
I sense that application developers who use Nexus are pretty happy with the format since is was a collaborative effort to begin with, and that the Nexus community would not see a great benefit in adopting a new data format. Anybody else have this impression?
Given this, and other similar comments on the list, I'd suggest that an additional requirement of any new format (if thats the way things go) is that it provides backwards-compatibility, as far as possible with other formats.
Granted this might well be lossy, but given the numbers of users of Nexus, and the amount of software currently available, making efforts to provide for data conversion between formats means that we don't have to start completely from scratch.
The first thing I did with XDELTA was provide a stylesheet with produce DELTA files. The same could be achieved for other formats.
For example, how could retrieval of descriptive information across a department or institution be facilitated? Is it because of the sheer flexibility in how the characters can be defined using DELTA,
that querying
across projects is difficult to say the least, unless the
character set is
global that is (and therefore with a lot of redundacy)?
I agree. XML only solves the syntax problem. The semantics problem can't be solved as easily with technology, but would rely either on community agreement, or on extensive mappings between various ontologies. I think.
Definitely, and this is an important point to make. XML is an enabler, its not a magic-bullet. What types of problems have people met in attempting to share characters between research efforts?
At a basic level, linking withing and between documents and data is easy. Its defining the semantics of this, and ensuring that the data retrieved is meaningful.
This on the face of it would seem to map onto an XML
element/attribute/value
schema pretty well. Would that help define more closely how we construct characters and maybe even prove universally applicable for all character types?
Great question. Leigh? Anyone? :-)
Sure element/attribute/value, but also element/element/content
i.e. <leaf shape="obvate" /> or
<leaf> <shape>obvate</shape> </leaf>
Or did I miss something?
Could one constrain further by expressing within the schema the
hierarchical
relationships between the elements(eg 'blade' and 'petiole' as child elements of leaf') or would the introduction of terminology into the 'standard' be a step too far?
This would be The Lexicon, no? It does seem to go beyond the task of making a file format for data transfer. But it gets my vote nevertheless.
I'd suggest that something like this (expressing meta-data relationships amongst elements) could be layered on top of a basic data description format. RDF might provide the facility to do this effectively. I need to revisit the spec.
cheers,
L.