Re: so far...

24 Nov 1999

      ...
As far as Delta goes, I think that the data are fairly close to what
taxonomists have been encoding into traditional taxonomic descriptions.
'Success' is pretty good.
Nexus can include charater/state data but also other things such as ways
to describe phylogenetic trees in the so-called "New Hampshire" format:
((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700,
  seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201,
  weasel:18.87953):2.09460):3.87382,dog:25.46154);
If the basic format contains the relevant data, then a phylogenetic tree
can be constructed from that data - so I'd have seen the above as
something that should be derived from your data set rather than stored
within it.

I assume "New Hampshire" = "Newick" format? If so, heres an XML equivalent.

<!ELEMENT label   (#PCDATA)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT branch (label?, branch*)>
<!ATTLIST branch length  CDATA    #IMPLIED>
<!ELEMENT newick  (branch?)>

Just something I was toying with before.
...
I sense that application developers who use Nexus are pretty happy with
the format since is was a collaborative effort to begin with, and that the
Nexus community would not see a great benefit in adopting a new data
format.  Anybody else have this impression?
Given this, and other similar comments on the list, I'd suggest that an
additional requirement of any new format (if thats the way things go)
is that it provides backwards-compatibility, as far as possible with
other formats.

Granted this might well be lossy, but given the numbers of users of Nexus,
and the amount of software currently available, making efforts to
provide for data conversion between formats means that we don't
have to start completely from scratch.

The first thing I did with XDELTA was provide a stylesheet with produce
DELTA files. The same could be achieved for other formats.
...
...
For example, how could retrieval of descriptive information across a
department or institution be facilitated? Is it because of the sheer
flexibility in how the characters can be defined using DELTA,
that querying
across projects is difficult to say the least, unless the
character set is
global that is (and therefore with a lot of redundacy)?
I agree.  XML only solves the syntax problem.  The semantics problem can't
be solved as easily with technology, but would rely either on community
agreement, or on extensive mappings between various ontologies.  I think.
Definitely, and this is an important point to make. XML is an enabler,
its not a magic-bullet. What types of problems have people met in attempting
to share characters between research efforts?

At a basic level, linking withing and between documents and data is
easy. Its defining the semantics of this, and ensuring that the data
retrieved is meaningful.
...
...
This on the face of it would seem to map onto an XML
element/attribute/value
schema pretty well. Would that help define more closely how we construct
characters and maybe even prove universally applicable for all character
types?
Great question. Leigh? Anyone? :-)
Sure element/attribute/value, but also element/element/content

i.e. <leaf shape="obvate" />
or

<leaf>
  <shape>obvate</shape>
</leaf>

Or did I miss something?
...
...
Could one constrain further by expressing within the schema the
hierarchical
relationships between the elements(eg 'blade' and 'petiole' as child
elements of leaf') or would the introduction of terminology into the
'standard' be a step too far?
This would be The Lexicon, no?  It does seem to go beyond the task of
making a file format for data transfer.  But it gets my vote
nevertheless.
I'd suggest that something like this (expressing meta-data relationships
amongst elements) could be layered on top of a basic data description
format. RDF might provide the facility to do this effectively. I need
to revisit the spec.

cheers,

L.

Leigh Dodds

tags

participants (1)