XDELTA and LucID

Fri Jan 7 12:54:40 CET 2000

>>>From a strictly practical point of view, multistate characters must be
allowed to function with just a single state.  In an ongoing study, a
character such as corolla pubescence is included in the character list.
After collecting data from one, or just a few taxa, you may only have a
single character state known.  As the research progresses, additional
character states will be discovered and added to the character list.  In
the preliminary stages of any taxonomic study the number of known
character states is usually very low, and must be allowed for!

When I first started using DELTA, my character list was peppared with "2.
FILLER/" so that I could get characters onto the character list.  Some
years back, Mike changed the DELTA software so that it will now accept a
multistate character with a single state.

My major concern with this process and its discussions is that the results
will be something that works and is practical.  The most computionally
perfect system may well turn out to be impractical and unworkable.  Some
years ago, I pointed out to someone that his program did not work because
my numerical data for one character, "1.5-2/3.5-7.8", could not be input
into his software because it would only accept "1.5-7.8".  I sincerely
hope that it is kept firmly in mind that the results have to work in the
real world, not the ideal world.

Joe K

Joseph H. Kirkbride, Jr.
USDA, Agricultural Research Service
Systematic Botany and Mycology Laboratory
Room 304, Building 011A, BARC-West
Beltsville, Maryland 20705-2350 USA
Voice telephone: 301-504-9447
FAX: 301-504-5810
Internet: jkirkbri at asrr.arsusda.gov

On Fri, 7 Jan 2000, Leigh Dodds wrote:

> > I've been looking through your XDELTA DTD. The group has already commented
> > that it's "merely" a literal mapping of DELTA, but it's a very
> > useful start :-)
>
> Thanks I was hoping it would be a discussion point.
>
> > 1. Character groups
> >
> > I assume that character groups here are equivalent to character sets in
> > LucID - groupings of characters so that you can e.g. display only leaf
> > characters in a random-access key. In LucID there is no requirement that
> > characters in a set are consecutive in the character list. Further, any
> > character may belong to more than one set - this is very useful
> > so that e.g.
> > you can have the sets "leaves", "flowers", "petals", "simple" in which
> > "petal" characters also belong to the "flowers" set, and the "simple" set
> > comprises a selection of characters from all the other sets.
>
> Yes you've interpreted correctly, and XDELTA is limited in that
> characters can only be within a single group, and must be grouped
> together.
>
> This did concern me when I was initially writing the DTD, but based on
> the fact that DELTA had the same limitation I thought it would be good
> enough for a first cut. LucID is obviously more flexible.
>
> > 2. Character state notes
> >
> > DELTA stores all notes as plain text, sometimes marked up with its own
> > typesetting marks, but pretty plain nonetheless. LucID allows builders of
> > keys to associate a formatted file (txt, rtf, html) with a
> > character state,
> > to hold explanatory notes. One option would be to include xml markup as
> > comment (would this be messy if the notes included embedded
> > images, hotlinks
> > etc). Another would be to treat state notes as you do images, with an href
> > pointer to a file. LucID also allows sound files to be associated with a
> > state (imagine a key to cicadas or frogs with calls as characters).
>
> How about if the text allowed (X)HTML markup to me used. This would
> allow the full range of HTML expression (tables, bold, images, etc).
> However it would mean tht processors (e.g. a stylesheet) would need to
> understand HTML. I suggest that a subset of the available markup
> would be enough in most cases. A pointer to additional document(s)
> would also be useful.
>
> > 3. Taxon multimedia
> >
> > LucID allows any number of multimedia files (images, videos, sounds, txt,
> > rtf, html).
>
> Ideally an XML format would allow this as well. A generic 'link' to
> additional information could be used. It then depends on the application
> to decide whether it can fetch, and handle the particular
> multimedia format.
>
> > 4. Images
>
> > It's useful to allow an image to have both a title and associated
> > text e.g.
> > notes, credits, copyright notice. Defining the title separately from the
> > text allows an application to retrieve and display just this
> > without the rest.
>
> More generally I think that (from other comments on the list), it would
> be useful to attach this kind of information : notes, credit, copyright,
> origin, etc in multiple places in the format - including textual
> descriptions.
>
> > 5. Character values
> >
> > LucID allows a richer assignation of character state values to
> > taxa. Thus, a taxon may be scored as having state 1 as its normal state,
> > state 2 rarely, state 3 by misinterpretation, and be unknown for state 4.
> The
> > full list is:
> >
> > normally present
> > rarely present
> > uncertain
> > present by misinterpretation
> > rarely present by misinterpretation
> >
> > These are important. Note that the proposed extensions to DELTA
> > incorporate some or all of these. Gregor has already proposed a system of
> > modifiers to allow this.
>
> This sounds like a strong requirement then.
>
> >
> > 6. Taxon subkeys
> >
> > In LucID keys one key may be linked as a child to another. A
> > taxon thus has the name of a subkey associated with it, so that a user on
> > reaching the taxon can ask the program to automatically drop to the subkey
> to
> > continue the identification.
>
> If I understand this correctly you mean that the subkey provides additional
> information regarding the taxon. To take a trivial example, the first
> key may identify a plant to the Family level, whilst the subkey had
> additional information - e.g. to the species?
>
> If so, then this is related to the document linking problem below...
>
> > 7. A small point:
> >
> > >     A multistate character must contain at least one state.
> >
> > Surely a multistate character must contain at least two states.
>
> Yes, You're quite right! :)
>
> > 8. General Information
> >
> > Not all information in a key or other treatment can be associated with
> > character states or taxa. Some is associated with the treatment as a whole
> > e.g. introductory notes, notes on using the key, general notes for the
> > parent taxon of the key (e.g. a key to genera of Grossulariaceae
> > may have a description of the family) etc. In LucID this information may
> > comprise html files, images etc just as with taxa and states. This could
> be
> > easily dealt with by expanding your DTD. A special case is an "About this
> key"
> > file. This needs much more than is allowed in the document description:
>
> This is one area that I don't think we've touched upon to a great extent
> (I could be wrong). Heres a question:
>
> - is it enough to just have an 'additional information' element
> so that any required explanatory notes, etc can be added there.
>
> Heres my answer (and I'd welcome comment):
>
> No its not. It would be useful to try to quantify this additional
> information,
> so that a generic catch-all placeholder isn't used. This sidesteps issues
> such as that shown by the comment marker in DELTA which ends up being
> used for all sorts of other uses. Its this kind of information which is
> highly useful both to the user, and the application developer: the user
> has a defined place to add his/her information, the developer has explicit
> cues on how particular sections of information should be used.
>
> So to start a running list:
>         - introduction
>         - about this key?
>         - parent taxon / related keys
>         - usage guidelines
>         - other meta information?
>                 (see above - copyright, funding body, author, change history, etc).
>
> Anything else?
>
> > It will sometimes happen that several collaborators will share a common
> > character list and score different taxa, as you describe. But it will also
> > happen that the collaborators share a taxon list and score different
> > characters. This may be simple, but the same issues (interdocument
> > references, conflicts etc) need to be addressed.
>
> I'd hadn't thought of this - its the reverse of what I'd originally
> expected.
>
> I expect the referencing and conflict resolution mechanism to be
> tricky to solve, and I'm hoping we can delay some of those decisions
> until such time as a document structure has been determined - mainly
> because not all the issues will be obvious until that point.
>
> ------------------------------
>
> You've commented that you're just getting to grips with DTDs, is
> it going to cause a great problem if I design the next version of
> XDELTA using XML Schemas [1,2]? Schemas provide a great deal more
> flexibility
> when defining document structures, as well as validation (not
> that I expect the latter issue to matter to a huge degree in this
> context). The are some structures in the XDELTA DTD which I'm not
> particularly happy with, but are imposed by the limitations of
> DTD syntax. I'm hoping that Schemas will remove some of these constraints.
>
> When I get chance to attempt another iteration of XDELTA (hopefully
> quite shortly), I can fully document the Schema to bring people
> up to speed - i.e. use XDELTA as a tutorial piece.
>
> As an aside - if anyone wishes additional explanation of the current
> DTD then I'm more than happy to oblige.
>
> I also don't want to appear to be leading everyone by the nose
> towards an XML based format, or one which I've produced myself.
> So please consider XDELTA, as I've always maintained, as a discussion
> point.
>
> L.
>