Re: SDD Specifications Document

7 Mar 2000

      At 13:57 6/03/00 -0500, Peter Stevens wrote:
...
...
If the rules need to be formulated rather then just named the challenge is
to do it in such a way that an XML parser could do the collation, and a
klutz taxonomist could create their own if an existing one is not sufficient.
There are many ways one can go from measurements to some sort of collation,
and this is an area where there is a fair amount of activity - I would have
thogutht that specifying rules was not the way to go.
By rule here I just mean a quasimathematical rule that the computer uses to
effect the collation. Some rules should be common and straightforward, such
as the "gather all values" rule, others may be specific to a
treatment/builder. As I envisage it, you (Peter) could formulate your own
rule for your own treatment to do the collation just as you want.

If there's a lot of activity in this area, can you help with this bit?
...
...
...
2. Collated Character source. I see these as essentially
a drill down mechanism that further identifies a 'bottom-level'
taxon.
The necessary thing is to identify a path to the treatment that contains the
data for collation, and the taxa of the lower-level treatment that will
provide the data for a taxon in the current treatment. In the example above
it's perhaps deceptively simple, because the target taxa in the lower-level
treatment (in this case, the specimens) are nested within a taxon with the
same name as the current taxon in the higher-level treatment (Macrolepiota
clelandii or Lilium turkestanicum). This is perhaps the easiest way to do it
(thinking aloud now) - require that the taxon of interest exist somewhere in
the taxon hierarchy in both treatments, then step one down in in the
lower-level treatment to find the elements of interest.
Are you then going to have to specify all ranks in advance, so that one
knows what is up or down?  And how would one proceed if unranked naming
does indeed become popular?
No, the system as I envisage it will be rank-free (I agree, we need to stay
ahead of the game here), and "up" and "down" will be locally defined
relative to the hierarchy as defined in the treatment. E.g.

Treatment 1: Key to Australian species of Rhamnaceae
Rhamnaceae
        Colletieae
                Discaria
                        Discaria nitida
                        Discaria pubescens

Treatment 2: Morphometric data for Discaria specimens
Discaria pubescens
        CANB4452885
        CANB4788710
Discaria nitida
        CANB9633902
        CANB9644921

Treatment 1 here defines Treatment 2 as the source for some collated
characters (e.g. leaf length) for the taxa of Discaria. When it's looking
for data for Discaria nitida it drills through Treatment 2 until it finds
Discaria nitida in it's hierarchy (in this case, at the top level). It then
assumes that all items at the next level down (specimens here) are source
items for collation.

Similarly, if another treatment were a key to genera of Rhamnaceae, it may
use Treatment 2 as its source, drill down to Discaria and collate data from
all items at the next level (in this case, species of Discaria).

The actual taxa in the hierarchies are rank-free and not constrained in
their definition. The only constraint would be that since the "taxon" name
is used for identification purposes it would need to be unique in a
treatment and common to both treatments. "Taxa" in this sense may equate to
formal taxa or they may be species groups, whatever.

Note that I still don't know whether all this collation business is actually
possible - I just thinks it's important to try something like this. I take
my cues in this from Peter (who's pointed out the importance of
transparently recording all the steps that lead to a "character" and its
"score" since these often hide a multitude of sins) and Bernie Hyland (are
you still out there Bernie?), who has built a key to rainforest trees of
Australia recording every measurement on every specimen used along the way.
We need a way of 1. recording all these data if they're available and 2.
automating the process of collating these data to higher levels.

For instance, in the above example, if one of the specimens measured as
Discaria nitida were redetermined to D. pubescencs, that change could be
made at the lowest level treatment, then the higher-level treatments would
automatically reflect the possible change in the "circumscription" of the taxa.
...
...
...
4. Its possible to provide a Character list internally to the
treatment or reference an external one. Will it be a requirement
that both could be used (i.e. combine the internal and external lists.)
Yes. The way I see it an external character list is a resource, but not a
constraint. If one is referenced a treatment builder should also be able to
1. define their own internal characters,
2. perhaps pick and choose amongst the characters from the referenced list,
rather than have to use them all.
I think such flexibility is necessary if lexica are to grow.
This issue of character lists in advance gives me the heebie-jeebies.
Perhaps I am confusing the issue, but to a certain extent one could argue
that characters flow out of/depend on observations/measurements, so would a
character list in advance unduly constrain?  I am thinking of characters
like helobial/cellular/nuclear endosperm, which can and should be
decomposed, embryo development, anther wall development, etc.
I agree Peter - I'm worried by predefined characters also. That's why I've
been arguing against building lexica into the standard. BUT, it seems to me
that if one is to proceed with these treatments there need to be a priori
defined characters. I don't know how to leave characters to define
themselves (flow out of) the observations/measurements.

I've suggested the structure of remote and local character lists to allow
people to share resources. If someone creates a list of characters for
Rhamnaceae and someone else finds that much of that list will be just the
ticket for Vitaceae, then they should be able to share. Perhaps
progressively global lexica will grow from this, perhaps not. There seems to
be such enthusiasm for lexica that we need to allow for their possibility,
but with a bottom-up rather then top-down approach to their creation.

Conversely, if other treatment builders want to opt out of constraining
lexica altogether, they should be alowed to do so also.

My understanding of your problem (I may be wrong) is that you have
difficulty with people using qualitative characters as handy (but dangerous)
shortcuts for characters that are in reality quantitative. There are no
Platonic ideals in the qualitative world, but the quantitative world gets
closer. Thus, "crassinucellate" is really an abstraction of a set of
measurements. I may argue that it's a handy abstraction, you'd argue that
it's a dangerous abstraction, and we'd both be right (this may be a bad
example as we'd probably both argue for the danger in crassinucellate, but
you get my drift).

As the standard stands, I could use Leaves: ovate as a character state in
one treatment, and you could use leaf length, width and distance to widest
point in yours (or some more sophisticated mathematical descriptor). Can we
do any better?

Cheers - k

Re: SDD Specifications Document

Kevin Thiele