Re: SDD Specifications Document

6 Mar 2000

      Leigh,
...
1. Collation rules. These are currently unspecified. Any objections
if I leave them out of an attempt to express these in XML? We
can revisit and revise this portion as time goes on.
I think it's an important thing to try to implement this part, as it's a
fundanmental part of the structure but also new and difficult and we need to
give people a chance to comment & revise. The three main aspects of the
draft standard as documented are
1. The ability (but not necessity) to attribute every data element to a source
2. The ability to collate data from lower-level (e.g. real-data) to
higher-level (e.g. synthesized) sources
3. The open-ended hierarchical structure for character and taxon lists.

I haven't given a lot of thought to how the collation rules would work in
practice. The basic idea is this. Suppose you have one treatment that stores
base data (measurements etc) for observed specimens of your taxa, then
another treatment that wants to collate parts of these data to a taxon-level
character.

e.g. One file has

Character = "Spore length (um)"
        Taxon =  "Macrolepiota clelandii"
                Specimen = "CANB4545601"
                        Value = 15
                        Value = 18
                        ...
                Specimen = "CANB233976"
                        Value = 16
.....

The higher-level file wants to collate from this
Character = "Spore length (um)
        Taxon = "Macrolepiota clelandii"
                Minvalue = 15
                Maxvalue = 18

The collation necessary here is simple - "Find the limits of the range".
Other simple rules may be "Find the limits and mean", "Find the limits and
25% quartiles" etc. For multistate characters the most common rule will be
an additive one - "collect all values for the character and concatenate them":

Character = "Petal colour"
        Taxon =  "Lilium turkestanicum"
                Specimen = "CANB4545601"
                        Value = "Pink"
                Specimen = "CANB233976"
                        Value = "Purple"

Now, I don't know what's the best way of formulating the rules. Would you
simply have a set of standard names for rules as above, then leave it to the
processing program to do the work, or can the whole thing be embedded in the
XML (so that looking at the higher-level treatment with an XML interpreter
would show you the collated data)?

If the rules need to be formulated rather then just named the challenge is
to do it in such a way that an XML parser could do the collation, and a
klutz taxonomist could create their own if an existing one is not sufficient.

Is this possible or am I talking out of my stovepipe?
...
2. Collated Character source. I see these as essentially
a drill down mechanism that further identifies a 'bottom-level'
taxon.
The necessary thing is to identify a path to the treatment that contains the
data for collation, and the taxa of the lower-level treatment that will
provide the data for a taxon in the current treatment. In the example above
it's perhaps deceptively simple, because the target taxa in the lower-level
treatment (in this case, the specimens) are nested within a taxon with the
same name as the current taxon in the higher-level treatment (Macrolepiota
clelandii or Lilium turkestanicum). This is perhaps the easiest way to do it
(thinking aloud now) - require that the taxon of interest exist somewhere in
the taxon hierarchy in both treatments, then step one down in in the
lower-level treatment to find the elements of interest. Does any of this
make sense to anyone - I'd quite like some support here, if anyone's still
out there.
...
Is it reasonable to include character dependencies 'upwards'?
Don't know - I need to think about this more. There may be a problem if you
mix and match "up" and "down" dependency definitions. The thing is to choose
the one that will be most easily defined.
...
Am I right in believing that multiple character sets could drill
down into the same source (perhaps produced by different
organisations, researchers, techniques). In this case, is
there a principal source for going back upwards? It seems
unfair to expect a treatment to keep track of all other treatments
which point to it (I may be infering too much here).
I think you may be off the track (or I may be). A character dependency is a
property of a character or a state (depending whether it's an up or down
dependency). It should be internal to a treatment, so there's no "keeping
track" to worry about.
...
3. Why should characters only have properties at the lowest level?
What is the 'lowest level' given that drill-down can occur? Could
you outline the reasoning here?
You're right - characters at higher levels may have properties also e.g.

Leaves
        venation
                prominence
                pattern

It may be worth providing notes, illustrations etc for "Leaves" so basic
information can be moved up the hierarchy. Similarly, an "up" dependency set
to "Leaves" would apply to all lower-level character items.
...
4. Its possible to provide a Character list internally to the
treatment or reference an external one. Will it be a requirement
that both could be used (i.e. combine the internal and external lists.)
Yes. The way I see it an external character list is a resource, but not a
constraint. If one is referenced a treatment builder should also be able to
1. define their own internal characters,
2. perhaps pick and choose amongst the characters from the referenced list,
rather than have to use them all.
I think such flexibility is necessary if lexica are to grow.

Possible?

Cheers - k

Re: SDD Specifications Document

Kevin Thiele