XDELTA and LucID

Fri Jan 7 10:34:43 CET 2000

Dear Leigh,

I've been looking through your XDELTA DTD. The group has already commented
that it's "merely" a literal mapping of DELTA, but it's a very useful start :-)

LucID can do a number of things that DELTA can't, and perhaps XDELTA as it
stands can't either. You should consider these (quotes are from your
xdelta.dtd document):

(forgive me if I've misinterpreted some things in the DTD - I'm new to these
things)

1. Character groups

><!-- A character group allows the grouping of related characters together.
DELTA does
>     not enforce the grouping of characters, and neither does XDELTA.
However the
>     CHARACTER HEADINGS directive seems to assume that related characters
will be
>     grouped together within a DELTA file (as it only specifies the first
character
>     that falls under each heading). XDELTA therefore allows characters to
be grouped
>     in this way.

I assume that character groups here are equivalent to character sets in
LucID - groupings of characters so that you can e.g. display only leaf
characters in a random-access key. In LucID there is no requirement that
characters in a set are consecutive in the character list. Further, any
character may belong to more than one set - this is very useful so that e.g.
you can have the sets "leaves", "flowers", "petals", "simple" in which
"petal" characters also belong to the "flowers" set, and the "simple" set
comprises a selection of characters from all the other sets.

2. Character state notes

DELTA stores all notes as plain text, sometimes marked up with its own
typesetting marks, but pretty plain nonetheless. LucID allows builders of
keys to associate a formatted file (txt, rtf, html) with a character state,
to hold explanatory notes. One option would be to include xml markup as
comment (would this be messy if the notes included embedded images, hotlinks
etc). Another would be to treat state notes as you do images, with an href
pointer to a file. LucID also allows sound files to be associated with a
state (imagine a key to cicadas or frogs with calls as characters).

3. Taxon multimedia

LucID allows any number of multimedia files (images, videos, sounds, txt,
rtf, html) files to be associated with a taxon. Each file has a title for
display in a drop down menu. e.g. the taxon "Polygalaceae" may have notes
files "Brief Description", "Full Description", "List of genera", "Ecology"
etc., images "Polygala japonica (habit)", "Polygala japonica (detail of
flower)", "Comesperma volubile", sound file "The wind blowing through leaves
of Polygala minuta" etc.

We store all these as separate files for the convenience of the builder and
speed of retrieval. I can see that for the notes it would perhaps be
possible to store everything in one xml file or block and use a style sheet
to give different views, but this may be awkward and limiting.

4. Images

<!-- An image. An image can have associated text (e.g. a title).

It's useful to allow an image to have both a title and associated text e.g.
notes, credits, copyright notice. Defining the title separately from the
text allows an application to retrieve and display just this without the rest.

5. Character values

LucID allows a richer assignation of character state values to taxa. Thus, a
taxon may be scored as having state 1 as its normal state, state 2 rarely,
state 3 by misinterpretation, and be unknown for state 4. The full list is:

normally present
rarely present
uncertain
present by misinterpretation
rarely present by misinterpretation

These are important. Note that the proposed extensions to DELTA incorporate
some or all of these. Gregor has already proposed a system of modifiers to
allow this.

6. Taxon subkeys

In LucID keys one key may be linked as a child to another. A taxon thus has
the name of a subkey associated with it, so that a user on reaching the
taxon can ask the program to automatically drop to the subkey to continue
the identification.

Note that if the promise of a universal lexicon fails to materialize, we'll
need a way of mapping characters from one key to another, so that characters
can be passed from the parent key to the child.

7. A small point:

>     A multistate character must contain at least one state.

Surely a multistate character must contain at least two states.

8. General Information

Not all information in a key or other treatment can be associated with
character states or taxa. Some is associated with the treatment as a whole
e.g. introductory notes, notes on using the key, general notes for the
parent taxon of the key (e.g. a key to genera of Grossulariaceae may have a
description of the family) etc. In LucID this information may comprise html
files, images etc just as with taxa and states. This could be easily dealt
with by expanding your DTD. A special case is an "About this key" file. This
needs much more than is allowed in the document description:

-->
<!ELEMENT xdelta (description?, character-list?, item-list?)>
<!ATTLIST xdelta revised        CDATA #IMPLIED>

9. Collaborative projects

>TODO - its feasible that documents may grow quite lengthy with
>     large datasets and users may prefer to hold character and item
>     descriptions in separate documents. The item document can then
>     refer to a main document which describes the characters. This might
>     facilitate collaborative working as a centrally located document
>     describing common characters could be referenced by several teams.
>     In this case the semantics of importing other character lists should
>     be defined - i.e. how inter-document references are managed, and
>     how conflicting/over-riding character descriptions should be dealt
>     with by applications.

It will sometimes happen that several collaborators will share a common
character list and score different taxa, as you describe. But it will also
happen that the collaborators share a taxon list and score different
characters. This may be simple, but the same issues (interdocument
references, conflicts etc) need to be addressed.

Cheers - k