- tdwg-content - lists.tdwg.org

Oh phooey... Re: [TDWG-SDD] Where are we?
by Peter Rauch 31 Jul '00

31 Jul '00

On Mon, 31 Jul 2000, Peter Rauch wrote: > Send email to listserv(a)usobi.org with > GETPOST TDWD-SDD 183 X Make that: GETPOST TDWG-SDD 183

1 0

Re: Where are we?
by Peter Rauch 31 Jul '00

31 Jul '00

On Tue, 1 Aug 2000, Kevin Thiele wrote: > > I put up a while ago an initial attempt at a proposal (the document DDST > Specifications.doc, you all will have this buried in your attachments > folder somewhere). Some people have looked at this, ... Send email to listserv(a)usobi.org with GETPOST TDWD-SDD 183 to retrieve another copy of this Macintosh-created, MS Word formatted, base64-encoded-attachment document. :>) (Can it be posted in plain-ascii? Maybe not everyone could view it?) Peter

1 0

Re: characters/states and measurements and other hoary problems
by Stuart G. Poss 30 Jul '00

30 Jul '00

While I am philosophically most sympathetic to Peter's agruements, I must agree with Kevin with regard to the need to distinguish measurements as "characters" from qualitative characters. The bases of comparison for the former are typically fundamentally different with respect to asignment to state of the latter, and would be dependent not only on the exact definition [criteria for recognition] of the endpoints, which may differ systematically between investigators (or method of measurement), but also with respect to temporal scope (ie some endpoints may not be identifiable, except within a particular size range for a given structure in a given taxon). As pointed out by Kevin, they would also be dependent on sampling, although one would hope that most investigtors are atuned to the need to increase sample sizes to avoid bias or mischaracterization. With respect to measurements, the bases of comparison are geometrically constrained, whereas qualitative descriptions are not so constrained, even though both may be used to characterize the "same" features. Nonetheless, one could in some cases meaningfully tag measurements with some of the same tags as qualitative characters to indicate that they are proxies for the "same" features. This would allow data matrices conforming to these "generalized" features to be associated and the user able to then assess whether the association is "sufficiently tight" to warrant regarding them as proxies for the "same" feature. The difficulty comes in consistently applying/defining the basis of comparison and in establishing criteria to recognize when constraints imposed on using measurements create inconsistency with various qualitative definitions of "state". Perhaps attributes of a tag element could be used to indicate (hint at) different shades of meaning when measurements are associated with qualitative states (eg <leaf character><measurement character endpoint1="regionally localized, endpoint1_precision=">>5%" endpoint2 ="precisely localized" endpoint2precision="<<1%"> 45.2</measurement character></leaf character>, or some such construction, where in this case, <leaf character> may take the same tag name as might the qualitive character. However, we may need to also include additional refining elements <data measured by>Peter Stevens</data measured by>Peter Stevens</data measured by><data measured using>calipers</data measured using>, or some such to fully relate fundamentally diffferent data types or qualify the measurement context. In any event, such shades of meaning would require that "appropriate" context be established and context could vary enormously depending on the features being evaluated. In image analysis, one of the reasons image algebra was developed was to assist in standard charcterization (transformation) of images and image fragments ("features"). However, this only works when there is a consistent basis for comparison (typically pixel size and pixel neighborhood) that underlies the sets being defined. Unfortunately, (or perhaps fortunately) the human vision system is vastly more variable and complex. In the near term, Kevin is probably correct to suggest that the onus is on those who would care to tackle the "hellish difficulties" of making such associations. It may be best to fork the dialog with regard to this point, which is not to say that we shouldn't seek a general approach that could be extended to include possible solutions to such "issues of association" at a later time (ie better establish a useful lexicon and grammar for <mesurement character> elements, as opposed to <qualitative character> elements). Nonetheless, one of the unfortunate practices in systematics has been the relegation of measurement data to "notebooks" or more recently "floppy/CD disks" that are not properly archived in any coordinate fashion, except perhaps for the duration of a study or some unspecified time after that, and often at best only partially summarized in the original publication. The discipline could be much, much richer, if there were a consistent standard for archival and association between measurement data and qualitative description, if for no other reason than to recognize the various contexts within which these data might be useful in the future, not to mention to encourage workers to let others know "exactly" what they did to arrive at their conclusions. Measurment data can be extremely useful in providing insight into how different investigators actually recognize the various states of their characters and may suggest why different investigators may not recognize the same states for the "same" characters. Kevin Thiele wrote: > At 10:23 AM 25/7/00 -0500, Peter Stevens & Stinger commented on the hoary > issue of characters/states vs measurements: > > It's completely true that a set of measurements is a more fundamental data > element than a character state. But we need to allow for both ways of > recording data, for two reasons. Firstly, some characters don't lend > themselves to measurement (e.g. indumentum - it would be possible but > hellishly difficult). Secondly, if I'm writing a key to Families, I'm > afraid I probably won't be recording measurements of individual specimens > very often, as the data blowout would be alarming. And what specimens would > I use? I see no way around the problem (for higher-order taxa at least) of > doing the collation of data (measurements->states) in my head, with all the > admitted problems that ensue. > > The draft standard document I've put to the group allows for some > characters to be scored after a head-wise (hopefully wise, anyway) > collation, others to be collated from individual measurements using some > form of collation rule. > > As to data attribution, the draft standard document allows (I hope) all > data elements to be attributed (to a specimen, taxon, literature citation, > person etc), but doesn't force the issue. > > >Kevin: >Or can we have a set of linked standards - one for describing in > >the boring > > >old characters/states/taxa way, and others for the more space-shuttle ways > > >that can be linked in as they develop. > > > >and Eric >it is rather difficult to > > >combine datasets based on differing character lists, even when those > > >character lists are fairly similar. There is no mechanism for "mapping" > > >character states from one dataset onto those of another. > > > >We are back to an issue that was raised earlier, so at the risk of boring > >people by repetition, the way to record data is not (if at all possible) > >as states, but as measurements. Measurements of the same character from > >different studies may then be combinable, and states extracted from the > >measurements. But if one is combining states, then one is likely to be in > >trouble. The "approved shapes" that a committee of the Systematics > >Association came up with some years back are fine for communication in > >descriptions and conversations, but the best way to record shape data, > >whether for phylogenetic analysis or deciding if one heap of specimens > >really is distinct from another, is as measurements. > > > >(I agree that a list of "approved" characters is probably not attainable > >at present.) > > > >This raises another issue that Eric mentioned: > > > > >Another aspect > > >might be the inclusion of meta-data (e.g., who recorded this observation, > > >and on what material?) While this sort of information can be placed into > > >DELTA "text" characters and comments (as could virtually anything > > >representable as a string of bytes, with a bit of effort), the use of text > > >characters does not provide for well-structured access to the information. > > > >This issue of metadata is very important: if a DNA sequence is linked to a > >voucher, then all measurements/observations should be linked to > >specimens. In an "ideal" database, there would be links to specimens, if > >only through a literature citation; not being able to make easy links in > >the literature can make it difficult or impossible to use, and not having > >such links in a database greatly reduces its value. If I go wrong in > >including a specimen in a particular taxon, I want to be able to pull out > >all information associated with this specimen, whether it is simply length > >of leaf blade or how the anther wall develops, when I put it in its > >cotrrect place. > > > >Peter S. > ></blockquote></x-html>

1 0

Re: characters/states
by Kevin Thiele 29 Jul '00

29 Jul '00

At 10:23 AM 25/7/00 -0500, Peter Stevens & Stinger commented on the hoary issue of characters/states vs measurements: It's completely true that a set of measurements is a more fundamental data element than a character state. But we need to allow for both ways of recording data, for two reasons. Firstly, some characters don't lend themselves to measurement (e.g. indumentum - it would be possible but hellishly difficult). Secondly, if I'm writing a key to Families, I'm afraid I probably won't be recording measurements of individual specimens very often, as the data blowout would be alarming. And what specimens would I use? I see no way around the problem (for higher-order taxa at least) of doing the collation of data (measurements->states) in my head, with all the admitted problems that ensue. The draft standard document I've put to the group allows for some characters to be scored after a head-wise (hopefully wise, anyway) collation, others to be collated from individual measurements using some form of collation rule. As to data attribution, the draft standard document allows (I hope) all data elements to be attributed (to a specimen, taxon, literature citation, person etc), but doesn't force the issue. >Kevin: >Or can we have a set of linked standards - one for describing in >the boring > >old characters/states/taxa way, and others for the more space-shuttle ways > >that can be linked in as they develop. > >and Eric >it is rather difficult to > >combine datasets based on differing character lists, even when those > >character lists are fairly similar. There is no mechanism for "mapping" > >character states from one dataset onto those of another. > >We are back to an issue that was raised earlier, so at the risk of boring >people by repetition, the way to record data is not (if at all possible) >as states, but as measurements. Measurements of the same character from >different studies may then be combinable, and states extracted from the >measurements. But if one is combining states, then one is likely to be in >trouble. The "approved shapes" that a committee of the Systematics >Association came up with some years back are fine for communication in >descriptions and conversations, but the best way to record shape data, >whether for phylogenetic analysis or deciding if one heap of specimens >really is distinct from another, is as measurements. > >(I agree that a list of "approved" characters is probably not attainable >at present.) > >This raises another issue that Eric mentioned: > > >Another aspect > >might be the inclusion of meta-data (e.g., who recorded this observation, > >and on what material?) While this sort of information can be placed into > >DELTA "text" characters and comments (as could virtually anything > >representable as a string of bytes, with a bit of effort), the use of text > >characters does not provide for well-structured access to the information. > >This issue of metadata is very important: if a DNA sequence is linked to a >voucher, then all measurements/observations should be linked to >specimens. In an "ideal" database, there would be links to specimens, if >only through a literature citation; not being able to make easy links in >the literature can make it difficult or impossible to use, and not having >such links in a database greatly reduces its value. If I go wrong in >including a specimen in a particular taxon, I want to be able to pull out >all information associated with this specimen, whether it is simply length >of leaf blade or how the anther wall develops, when I put it in its >cotrrect place. > >Peter S. ></blockquote></x-html>

1 0

Converting Nexus data to DELTA
by Mike Dallwitz 28 Jul '00

28 Jul '00

> From: "Susan B. Farmer" <sfarmer(a)GOLDSWORD.COM> > To: TDWG - Structure of Descriptive Data <TDWG-SDD(a)USOBI.ORG> > As a person who went from DELTA to PAUP and tried to come back again > (after much massaging of the data), I'd add that the transferrance needs > to go both ways -- after you port your data to PAUP/MacClade and massage > it somewhat, it would be nice to (relatively) painlessly come back to > DELTA with those changes rather than having to begin from scratch and then > combine the two data sets. The program Nex2del, supplied with our DELTA programs, converts a Nexus data matrix to DELTA format. Information will usually be lost in converting DELTA to Nexus, so it's usually better to edit the data in DELTA format and re-export to Nexus as necessary. -- Mike Dallwitz CSIRO Entomology, GPO Box 1700, Canberra ACT 2601, Australia Phone: +61 2 6246 4075 Fax: +61 2 6246 4000 Email: md(a)ento.csiro.au Internet: biodiversity.uno.edu/delta/

1 0

Re: Space shuttles and bicycles
by Susan B. Farmer 27 Jul '00

27 Jul '00

> >But Eric's comment reminds me that there is a STRONG reason for moving a >data set between the various applications that deal with descriptive >data: a single person might want to use DELTA, LucID, PAUP and McClade in >the same study. It would be "nice" to have the capability to create and >maintain a single data set that could store and "serve" data to each >application. If we could create the specification for that data set, I >would judge this effort a success. > As a person who went from DELTA to PAUP and tried to come back again (after much massaging of the data), I'd add that the transferrance needs to go both ways -- after you port your data to PAUP/MacClade and massage it somewhat, it would be nice to (relatively) painlessly come back to DELTA with those changes rather than having to begin from scratch and then combine the two data sets. Susan Farmer sfarmer(a)goldsword.com Botany Department, University of Tennessee http://www.goldsword.com/sfarmer/Trillium

1 0

Re: SDD standard - kinds of description
by Susan B. Farmer 27 Jul '00

27 Jul '00

A plea from a list membr with an HTML-challenged mail reader. Please post plain text (less desirabley) attachments. I can't make heads or tails of this very easily Thanks, Susan Farmer sfarmer(a)goldsword.com Botany Department, University of Tennessee http://www.goldsword.com/sfarmer/Trillium ><html> >At 08:10 AM 7/20/00 +1000, Kevin Thiele wrote: ><blockquote type=cite cite>At 11:56 PM 18/7/00 -0300, Bob Allkin >wrote:</blockquote> >[...] > ><blockquote type=cite cite><blockquote type=cite cite>1) description of >distribution (a hierarchical descriptor - with attributes >attached to each substate such as native/introduced/etc) > remainder snipped ...

1 0

Re: SDD standard - purpose
by Mike Dallwitz 27 Jul '00

27 Jul '00

> From: Kevin Thiele <kevin.thiele(a)PI.CSIRO.AU> > To: TDWG-SDD > Initially, I think we should aim at descriptions and interactive ID. The > idea of massaging one data file into two or more different products (e.g. > natural-language and keys) is very attractive, but surprisingly > problematical, since the structure of data needed for the two purposes is > often subtly different. Of course, doing just this is the basis of the > DELTA system, but we may need to do it in a more sophisticated way. ... > The problems inherent in the multiple-product model become even more > alarming when you try to maintain one data file for both > description/identification and phylogenetic analysis. My personal view is > that we should leave cladistics out of the scope at least for the time > being. INTRODUCTION I think that it's not particularly difficult to accommodate description, identification, and phenetic or phylogenetic analysis in a single database. Perhaps some of the people who have done or taught it would care to comment. By 'not particularly difficult' I don't mean that it's easy, particularly without the help of an experienced teacher. It's of comparable difficulty to many other aspects of professional work, for which we usually prepare by undertaking a degree course. Nor should we expect that advances in software will ever make it easy, in the sense that it could be done well without aptitude, training, thought, and experience. (In fact, software advances often make tasks _more_ difficult, as greater capabilities lead to higher expectations.) In addition to the obvious benefits of making as much use as possible of laboriously acquired data, there can be valuable synergies between the different kinds of application. For example, even if the data are primarily for phylogenetic analysis, using them for description and identification can help detect errors. It is not unheard of for published work to contain gross errors (such as frame shifts caused by the accidental deletion of matrix elements) which could easily have been detected in this way. Also, the information-retrieval functions of Intkey can help in exploring patterns and relationships in the data. DETAILED DISCUSSION - permission granted to stop reading here :-) Within a given project, it's possible to define a 'universal' set of characters which are suitable for all applications. To these can be added characters designed for particular purposes, which are to be omitted for other purposes. For example, classification characters (e.g. the family to which a taxon belongs) and geographical distribution characters (what countries, states, etc. a taxon occurs in) are useful in description and identification, but would not normally be used in classification (for want of a better word, I will use this as an abbreviation for 'phenetic and phylogenetic analysis'). Sometimes it will be necessary to define alternative characters to represent similar concepts for different purposes. Obviously, efforts should be made to keep such alternative characters to a minimum. Software can help by combining character states, converting numeric characters to multistate, and checking the scoring of characters against relationships defined between them (not done by any current software as far as I know, except for the special case of character dependencies). (An aside. While some 'identification' characters are unsuitable for classification, the converse is not true. To claim that a 'classification' character is not suitable for identification is tantamount to an admission that the author's scoring of the character is not reproducible by others. Of course, I am referring to interactive identification, using a program with a 'best characters' calculation fast enough to be used routinely, and supporting character weights.) With a given set of characters, it may be necessary to record attributes (i.e. the cells of the taxa X characters 'matrix') in different ways for different purposes. For example, in LucID it is possible to flag state values as 'present by misinterpretation'. Values so flagged would normally be used for identification but not for description. Our proposed enhancements for the Delta format (see, for example, http://biodiversity.uno.edu/delta/www/descdata.htm) contain more general methods of flagging values for use in any number of user-defined applications. For example, consider the coding 16,2/1<@only keys> 17,7<@only keys>-8.5-9<@for classification>-10-12<@only keys> 18,2<@for classification>/3 20,1/2<@not Australia> For the application 'keys', this would be interpreted as 16,2/1 17,7-12 18,2/3 20,1/2 and for the application 'classification Australia' as 16,2 17,9 18,2 20,1 It is often necessary to use alternative _wordings_ of characters for different purposes. (This is different from the alternative character _concepts_ discussed above.) This arises: (1) because of the different contexts in which the words are used; (2) because of the different audiences for whom the words are intended (e.g. different native language, different knowledge of terminology). The contexts in which the words appear range from full natural-language descriptions, which may contain all the characters in their natural order, supplemented by headings, to applications such as conventional keys in which the characters appear in random order, completely out of the context of their related characters. Intermediate cases are descriptions in which parts are omitted because of: missing data; inapplicable characters; inclusion of only a subset of the characters; or inclusion of only diagnostic attributes. Other example of random order are: lists of 'best' characters in interactive identification; displaying the attributes of a specimen in the order in which they were entered; displaying diagnostic descriptions in the order in which the characters were added. Another requirement is an abbreviated form of the character for displaying in applications, such as interactive identification, where characters must be selected from a list. In DELTA, this is achieved by means of comments in the 'feature' line of a character. For example, with the character ...#10. leaves <presence>/ 1. present/ 2. absent/ Intkey would display 'leaves (presence)' in character-selection lists, but a natural-language description would read (say) 'leaves absent'. It is often possible to meet the requirements of various contexts from a single character list, though doing so may require some compromise - the results for some purposes may not be optimal. For the best results, it may be necessary to have alternative wordings. In the past, we have accommodated this in DELTA simply by having separate character lists, and invoking the appropriate one for different applications. This is inefficient, because a large proportion of the words can usually be used in all applications. We therefore want to move towards a single list, with groups of words flagged for use in different applications or contexts. The same mechanism can be used for different languages, and also in other text such as character notes, and text in 'item' descriptions (text characters, and comments associated with attributes). The syntax suggested for this in our Web publications does not reflect our current preferences - I have not had time to update these documents. -- Mike Dallwitz CSIRO Entomology, GPO Box 1700, Canberra ACT 2601, Australia Phone: +61 2 6246 4075 Fax: +61 2 6246 4000 Email: md(a)ento.csiro.au Internet: biodiversity.uno.edu/delta/

1 0

A 'superset' data format
by Mike Dallwitz 26 Jul '00

26 Jul '00

> From: Stan Blum <sblum(a)CALACADEMY.ORG> > To: TDWG-SDD > There is a STRONG reason for moving a data set between the various > applications that deal with descriptive data: a single person might want > to use DELTA, LucID, PAUP and McClade in the same study. It would be > "nice" to have the capability to create and maintain a single data set > that could store and "serve" data to each application. This is the raison d'etre of DELTA. To quote from the Introduction in the User's Guide: When taxonomic descriptions are prepared for input to computer programs, the form of the coding is usually dictated by the requirements of a particular program or set of programs. This restricts the type of data that can be represented, and the number of other programs that can use the data. ... The DELTA (DEscription Language for TAxonomy) system was developed to overcome these problems. ... A format-conversion program, Confor, converts DELTA-format data into natural language, or into formats required by several other programs, including Key (generation of keys), Dist (generation of distance matrices), Paup, MacClade (Nexus), and Hennig86 (cladistic analysis), and Intkey (interactive identification and information retrieval). In other words, DELTA was designed to be a single source for the descriptive data required by a variety of applications. It still fulfils this function better than other formats currently in use - see http://biodiversity.uno.edu/delta/www/compdata.htm. > If we could create the specification for that data set, I would judge this > effort a success. Producing such a specification, i.e. for a new 'superset' data format, was one of the goals mentioned in the minutes of the last meeting of the TDWG-SDD group. I agree with Stan that it is the most important goal, and perhaps the discussion should start to focus more on it. Also, more progress might be made if the initial effort were directed to the less controversial and more achievable parts - perhaps a discussion and consolidation of the features used by existing programs. The above-mentioned document might be a starting point. -- Mike Dallwitz CSIRO Entomology, GPO Box 1700, Canberra ACT 2601, Australia Phone: +61 2 6246 4075 Fax: +61 2 6246 4000 Email: md(a)ento.csiro.au Internet: biodiversity.uno.edu/delta/

1 0

Re: General reference schemes for classifying characters
by Stuart G. Poss 25 Jul '00

25 Jul '00

> At 11:29 AM 7/25/00 +1000, Eric Zurcher wrote: > >5) Difficulty in merging or comparing datasets - it is rather difficult to > >combine datasets based on differing character lists, even when those > >character lists are fairly similar. There is no mechanism for "mapping" > >character states from one dataset onto those of another. (Disparate > >character lists are another matter entirely. My personal view is that the > >holy grail of a "universal" character list for, say, all of botany will > >tend to remain tantilizingly just out of reach, and the efforts of this > >group should not be distracted in that direction.) > The issue of constructing mechanisms for mapping characters into one another is quite different from attempting to circumscribe a language necessary to describe all possible definitions of "character" so that rules governing their description can be unambiguously applied. One doesn't need "universal" characters, only a usefully large "universe" of potential character descriptors. Properties of some classes of characters will be wholely irrelevant to the characterization of others. I would agree it would be useless to search for such mechanisms that could be "universally applied". Although I entirely agree with Peter, so few cladists actually measure the objects to which they assign to state that one can safely conclude that matching "characters" by their underlying measurment data would likely only reference a very sparse matrix. Quantiative characters are fundamentally different than qualitative ones in large part because qualitative characters are inherently more ambiguous and less constrained in terms of implicit notions of "equivalence". However, if one did want to associate features represented in widely different formats, it would be helpful to have a data standard that could describe the various kinds of associations that might be usefully employed and how they might be related. As Stan points out, the measurements actually taken are often quite context sensitive and may even vary from one investigator to another or from one method of measurment to another. It would seem that one would need to have a good description of exactly how the measurements were taken, should either one want to repeat them or simply to understand what measures exist and what they tell us. However, other than that "caveat emptor". Nonetheless, despite numerous problems of correspondence, it would be extremely useful to be able to use metadata consistently applied to tag two different kinds of data (characters), both taken on the same organisms and perhaps on the very same objects, so that they could be reevaluated. Whether the possible "remappings" could be made automatically, would largely depend on how much ambiguity one could remove from the qualitative characterizations. In my estimation, at present this could only be done for an extremely small number of characters under special circumstances and consequently would not now be of much interest to most. However, newer data-rich acquisition techniques continue to expand the number of features for which human-induced ambiguity can be better excluded from the data. Consequently, if our aim is to develop a standard protocol able to describe and relate such data, as well as more track and associate taxonomic data captured by more traditional methods, it would be worthwhile to have a means to do this and not leave such kinds of data out of the protocol, because they do not readily conform to currently accepted paradigms (eg. characters in states always specified by 0 or 1, my least favorite approach, etc.). It seems to me we need protocols for describing how multiple kinds of characters can be referenced. I think we need a reference model that permits us to associate various kinds of metadata and not (necessarily) establish the equivalence of specific character definitions. For example, it would be extremely helpful to be able to query the web oracle and ask "Please give me a list of all the characters (names and descriptors) associated with veination in taxa, X, Y, and Z" or "what character systems have been used to establish the propinquity of [ or simply describe' members of taxon A?" If our lexicon and grammar are inadequate for classifying some of these "characters", then the list will always be incomplete and the reference model will be of limited value. > But Eric's comment reminds me that there is a STRONG reason for moving a > data set between the various applications that deal with descriptive > data: a single person might want to use DELTA, LucID, PAUP and McClade in > the same study. It would be "nice" to have the capability to create and > maintain a single data set that could store and "serve" data to each > application. If we could create the specification for that data set, I > would judge this effort a success. > -Stan Obviously, it would be nice (probably essential) if we can then translate from different representations, but if this list is really only about translation among different existing data formats, then perhaps conceptualization for a more general reference model for representation of taxonomic data can be done elsewhere [ :( ]. Nonetheless, I really don't think that in many cases one would want to actually store the reference data in a common format and then serve it. Rather, because some forms of storage will be much more compact, and hence more efficient to specific tasks, there would be often strong incentive to keep data in its "native" format. However, these representations would be necessarily more arcane and probably not human readable. Instead, I think it would be more often useful to consider how one could specify how the native format could be reformulated dynamically into a common "standard?" reference format and then retranslated into another native format. XLST is well suited for this purpose. If the "standard reference" involved "standard tagging" then use of XML could provide a nice human readable format that might exist "conceptually" or only briefly in memory during translation. This would require, however, understanding what kind of language (tags?) we might formulate to describe the common representations so that the translations are correct when made. This has three advantages that would be useful: 1) that the "view" of the data can be separated from its inherent logic, 2) that natives who like to use "native" languages need not become restless as a result of our efforts, and 3) various "native" formats might die out simply because other approaches are found more useful or new formats could emerge, without impacting our ability to track and reference such data. Of course, we need to keep in mind that in taxonomy and systematics, generally when we refer to character data, we are usually really referring to representations of data rather than the data itself (ie values ultimately derived from some repeatable measurement process). For those who shun phenetics, where characters have a quantiative meaning, the measurement process is largely implicit in the resultant conceptualizations ("states"). If translation is required, then one might at a suitable place in the descriptor structure/lexicon establish a means for specifying <character encoding="DELTA"> or <character encoding="LucID"> etc. Then let the processor handle the rules required by the transform, at least to the extent the inherent level of ambiguity allows. The question is then what information is needed by the processor to perform the specific requested transformation. Seems to me we require at a minimum: 1) tag(s) identifying the character, 2) specification(s?) of the character/data class (what kind of character/data is it?), and 3) a means of informing the processor what its various properties (ie states, ordering, values, etc.) are and how they are encoded. Might detailed discussion of a few example characters permit us to better understand exactly how these specific (target?) implementation handle different kinds of characters and how they might be alternatively represented? This would give us a better idea of what kinds of "rules" we require and in what circumstances one kind of representation might be more or less ambiguous than another. Stuart

1 0