At 16:02 23/02/2000 +1100, Kevin Thiele wrote:
Dear list_eners
attached find SDDspecs.rtf. This is my attempt to formalise the discussion so far into a set of specifications for a new descriptive data standard.
and requested comments his proposal. I've not had time to think this through in detail, but I'd nonetheless like to offer a few scattered observations.
1) One "pattern" that recurs in the document in the use of "attachments" to entities. These consist of a name, type, path, public notes, and private notes. Presumably one could define some sort of generic "attachment" object and avoid the multiple (4) redefinitions currently given in the outline. This would shorten the outline appreciably.
2) ID numbers crop up in a number of places. But it's not obvious how "unique" these various IDs need to be. For example, must character IDs be unique only with the context of the set of all character IDs, or of all IDs used anywhere within the treatment? Or perhaps they are intended be used across treatments (to facilitate merging, etc.), and must be unique (but consistent) in an even broader sense? (But perhaps this is an area this is best left deliberately ambiguous for now.)
3) Character (and taxon) sets - these should probably be defined hierarchically, so that sets would be able contain other sets, as well as the base elements (characters or taxa). Note that there seems to be another "up vs. down" problem here - is it better to define a set by listing the members within it, or for the each of the members to list the sets to which it belongs?
4) I'm rather confused by footnote 3, regarding the nesting of character names, and the restriction of "properties" to only the lowest level. What it the reason for this restriction? But certainly there seems to be merit in separating the "properties" of a character from it's textual representation. This is almost essential when attempting to generate natural-language descriptions in multiple languages. Similarly, different wordings may be appropriate in different application contexts (natural-language vs. interactive keys vs. conventional keys, or keys of the layman vs. the specialist).
5) This draft allows for a "score" only within the context of a "state name". It is not obvious how characters with non-discrete values (e.g. numeric values) would be handled.
6) I'm intrigued by the notion of a "Progressive Revelation model" (footnote 5). It sounds terribly theological - or perhaps that's Thiele-logical? (my apologies to Kevin, but I really can't resist bad puns).
7) For purposes of natural-language generation (and perhaps other applications), it is desirable to have some sort of "connection operator" between states within a character (e.g., "flowers blue or violet" vs. "flowers blue and violet" vs. "flowers blue to violet" all carry slightly different meanings). This and other requirements of generating natural-language descriptions might be an argument for generally preferring a "characters within taxa" representation to "taxa within characters".
Cheers,
Eric Zurcher CSIRO Division of Entomology Canberra, Australia E-mail: ericz@ento.csiro.au