so far...

Wed Nov 24 09:46:12 CET 1999

> As far as Delta goes, I think that the data are fairly close to what
> taxonomists have been encoding into traditional taxonomic descriptions.
> 'Success' is pretty good.
>
> Nexus can include charater/state data but also other things such as ways
> to describe phylogenetic trees in the so-called "New Hampshire" format:
>
>   ((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700,
>   seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201,
>   weasel:18.87953):2.09460):3.87382,dog:25.46154);

If the basic format contains the relevant data, then a phylogenetic tree
can be constructed from that data - so I'd have seen the above as
something that should be derived from your data set rather than stored
within it.

I assume "New Hampshire" = "Newick" format? If so, heres an XML equivalent.

<!ELEMENT label   (#PCDATA)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT branch (label?, branch*)>
<!ATTLIST branch length  CDATA    #IMPLIED>
<!ELEMENT newick  (branch?)>

Just something I was toying with before.

> I sense that application developers who use Nexus are pretty happy with
> the format since is was a collaborative effort to begin with, and that the
> Nexus community would not see a great benefit in adopting a new data
> format.  Anybody else have this impression?

Given this, and other similar comments on the list, I'd suggest that an
additional requirement of any new format (if thats the way things go)
is that it provides backwards-compatibility, as far as possible with
other formats.

Granted this might well be lossy, but given the numbers of users of Nexus,
and the amount of software currently available, making efforts to
provide for data conversion between formats means that we don't
have to start completely from scratch.

The first thing I did with XDELTA was provide a stylesheet with produce
DELTA files. The same could be achieved for other formats.

> > For example, how could retrieval of descriptive information across a
> > department or institution be facilitated? Is it because of the sheer
> > flexibility in how the characters can be defined using DELTA,
> that querying
> > across projects is difficult to say the least, unless the
> character set is
> > global that is (and therefore with a lot of redundacy)?
>
> I agree.  XML only solves the syntax problem.  The semantics problem can't
> be solved as easily with technology, but would rely either on community
> agreement, or on extensive mappings between various ontologies.  I think.

Definitely, and this is an important point to make. XML is an enabler,
its not a magic-bullet. What types of problems have people met in attempting
to share characters between research efforts?

At a basic level, linking withing and between documents and data is
easy. Its defining the semantics of this, and ensuring that the data
retrieved is meaningful.

> > This on the face of it would seem to map onto an XML
> element/attribute/value
> > schema pretty well. Would that help define more closely how we construct
> > characters and maybe even prove universally applicable for all character
> > types?
>
> Great question. Leigh? Anyone? :-)

Sure element/attribute/value, but also element/element/content

i.e. <leaf shape="obvate" />
or

<leaf>
  <shape>obvate</shape>
</leaf>

Or did I miss something?

> > Could one constrain further by expressing within the schema the
> hierarchical
> > relationships between the elements(eg 'blade' and 'petiole' as child
> > elements of leaf') or would the introduction of terminology into the
> > 'standard' be a step too far?
>
> This would be The Lexicon, no?  It does seem to go beyond the task of
> making a file format for data transfer.  But it gets my vote
> nevertheless.

I'd suggest that something like this (expressing meta-data relationships
amongst elements) could be layered on top of a basic data description
format. RDF might provide the facility to do this effectively. I need
to revisit the spec.

cheers,

L.