At 13:57 6/03/00 -0500, Peter Stevens wrote:
If the rules need to be formulated rather then just named the challenge is to do it in such a way that an XML parser could do the collation, and a klutz taxonomist could create their own if an existing one is not sufficient.
There are many ways one can go from measurements to some sort of collation, and this is an area where there is a fair amount of activity - I would have thogutht that specifying rules was not the way to go.
By rule here I just mean a quasimathematical rule that the computer uses to effect the collation. Some rules should be common and straightforward, such as the "gather all values" rule, others may be specific to a treatment/builder. As I envisage it, you (Peter) could formulate your own rule for your own treatment to do the collation just as you want.
If there's a lot of activity in this area, can you help with this bit?
- Collated Character source. I see these as essentially
a drill down mechanism that further identifies a 'bottom-level' taxon.
The necessary thing is to identify a path to the treatment that contains the data for collation, and the taxa of the lower-level treatment that will provide the data for a taxon in the current treatment. In the example above it's perhaps deceptively simple, because the target taxa in the lower-level treatment (in this case, the specimens) are nested within a taxon with the same name as the current taxon in the higher-level treatment (Macrolepiota clelandii or Lilium turkestanicum). This is perhaps the easiest way to do it (thinking aloud now) - require that the taxon of interest exist somewhere in the taxon hierarchy in both treatments, then step one down in in the lower-level treatment to find the elements of interest.
Are you then going to have to specify all ranks in advance, so that one knows what is up or down? And how would one proceed if unranked naming does indeed become popular?
No, the system as I envisage it will be rank-free (I agree, we need to stay ahead of the game here), and "up" and "down" will be locally defined relative to the hierarchy as defined in the treatment. E.g.
Treatment 1: Key to Australian species of Rhamnaceae Rhamnaceae Colletieae Discaria Discaria nitida Discaria pubescens
Treatment 2: Morphometric data for Discaria specimens Discaria pubescens CANB4452885 CANB4788710 Discaria nitida CANB9633902 CANB9644921
Treatment 1 here defines Treatment 2 as the source for some collated characters (e.g. leaf length) for the taxa of Discaria. When it's looking for data for Discaria nitida it drills through Treatment 2 until it finds Discaria nitida in it's hierarchy (in this case, at the top level). It then assumes that all items at the next level down (specimens here) are source items for collation.
Similarly, if another treatment were a key to genera of Rhamnaceae, it may use Treatment 2 as its source, drill down to Discaria and collate data from all items at the next level (in this case, species of Discaria).
The actual taxa in the hierarchies are rank-free and not constrained in their definition. The only constraint would be that since the "taxon" name is used for identification purposes it would need to be unique in a treatment and common to both treatments. "Taxa" in this sense may equate to formal taxa or they may be species groups, whatever.
Note that I still don't know whether all this collation business is actually possible - I just thinks it's important to try something like this. I take my cues in this from Peter (who's pointed out the importance of transparently recording all the steps that lead to a "character" and its "score" since these often hide a multitude of sins) and Bernie Hyland (are you still out there Bernie?), who has built a key to rainforest trees of Australia recording every measurement on every specimen used along the way. We need a way of 1. recording all these data if they're available and 2. automating the process of collating these data to higher levels.
For instance, in the above example, if one of the specimens measured as Discaria nitida were redetermined to D. pubescencs, that change could be made at the lowest level treatment, then the higher-level treatments would automatically reflect the possible change in the "circumscription" of the taxa.
- Its possible to provide a Character list internally to the
treatment or reference an external one. Will it be a requirement that both could be used (i.e. combine the internal and external lists.)
Yes. The way I see it an external character list is a resource, but not a constraint. If one is referenced a treatment builder should also be able to
- define their own internal characters,
- perhaps pick and choose amongst the characters from the referenced list,
rather than have to use them all. I think such flexibility is necessary if lexica are to grow.
This issue of character lists in advance gives me the heebie-jeebies. Perhaps I am confusing the issue, but to a certain extent one could argue that characters flow out of/depend on observations/measurements, so would a character list in advance unduly constrain? I am thinking of characters like helobial/cellular/nuclear endosperm, which can and should be decomposed, embryo development, anther wall development, etc.
I agree Peter - I'm worried by predefined characters also. That's why I've been arguing against building lexica into the standard. BUT, it seems to me that if one is to proceed with these treatments there need to be a priori defined characters. I don't know how to leave characters to define themselves (flow out of) the observations/measurements.
I've suggested the structure of remote and local character lists to allow people to share resources. If someone creates a list of characters for Rhamnaceae and someone else finds that much of that list will be just the ticket for Vitaceae, then they should be able to share. Perhaps progressively global lexica will grow from this, perhaps not. There seems to be such enthusiasm for lexica that we need to allow for their possibility, but with a bottom-up rather then top-down approach to their creation.
Conversely, if other treatment builders want to opt out of constraining lexica altogether, they should be alowed to do so also.
My understanding of your problem (I may be wrong) is that you have difficulty with people using qualitative characters as handy (but dangerous) shortcuts for characters that are in reality quantitative. There are no Platonic ideals in the qualitative world, but the quantitative world gets closer. Thus, "crassinucellate" is really an abstraction of a set of measurements. I may argue that it's a handy abstraction, you'd argue that it's a dangerous abstraction, and we'd both be right (this may be a bad example as we'd probably both argue for the danger in crassinucellate, but you get my drift).
As the standard stands, I could use Leaves: ovate as a character state in one treatment, and you could use leaf length, width and distance to widest point in yours (or some more sophisticated mathematical descriptor). Can we do any better?
Cheers - k