Re: SDD Specifications Document
Leigh,
- Collation rules. These are currently unspecified. Any objections
if I leave them out of an attempt to express these in XML? We can revisit and revise this portion as time goes on.
I think it's an important thing to try to implement this part, as it's a fundanmental part of the structure but also new and difficult and we need to give people a chance to comment & revise. The three main aspects of the draft standard as documented are 1. The ability (but not necessity) to attribute every data element to a source 2. The ability to collate data from lower-level (e.g. real-data) to higher-level (e.g. synthesized) sources 3. The open-ended hierarchical structure for character and taxon lists.
I haven't given a lot of thought to how the collation rules would work in practice. The basic idea is this. Suppose you have one treatment that stores base data (measurements etc) for observed specimens of your taxa, then another treatment that wants to collate parts of these data to a taxon-level character.
e.g. One file has
Character = "Spore length (um)" Taxon = "Macrolepiota clelandii" Specimen = "CANB4545601" Value = 15 Value = 18 ... Specimen = "CANB233976" Value = 16 .....
The higher-level file wants to collate from this Character = "Spore length (um) Taxon = "Macrolepiota clelandii" Minvalue = 15 Maxvalue = 18
The collation necessary here is simple - "Find the limits of the range". Other simple rules may be "Find the limits and mean", "Find the limits and 25% quartiles" etc. For multistate characters the most common rule will be an additive one - "collect all values for the character and concatenate them":
Character = "Petal colour" Taxon = "Lilium turkestanicum" Specimen = "CANB4545601" Value = "Pink" Specimen = "CANB233976" Value = "Purple"
Now, I don't know what's the best way of formulating the rules. Would you simply have a set of standard names for rules as above, then leave it to the processing program to do the work, or can the whole thing be embedded in the XML (so that looking at the higher-level treatment with an XML interpreter would show you the collated data)?
If the rules need to be formulated rather then just named the challenge is to do it in such a way that an XML parser could do the collation, and a klutz taxonomist could create their own if an existing one is not sufficient.
Is this possible or am I talking out of my stovepipe?
- Collated Character source. I see these as essentially
a drill down mechanism that further identifies a 'bottom-level' taxon.
The necessary thing is to identify a path to the treatment that contains the data for collation, and the taxa of the lower-level treatment that will provide the data for a taxon in the current treatment. In the example above it's perhaps deceptively simple, because the target taxa in the lower-level treatment (in this case, the specimens) are nested within a taxon with the same name as the current taxon in the higher-level treatment (Macrolepiota clelandii or Lilium turkestanicum). This is perhaps the easiest way to do it (thinking aloud now) - require that the taxon of interest exist somewhere in the taxon hierarchy in both treatments, then step one down in in the lower-level treatment to find the elements of interest. Does any of this make sense to anyone - I'd quite like some support here, if anyone's still out there.
Is it reasonable to include character dependencies 'upwards'?
Don't know - I need to think about this more. There may be a problem if you mix and match "up" and "down" dependency definitions. The thing is to choose the one that will be most easily defined.
Am I right in believing that multiple character sets could drill down into the same source (perhaps produced by different organisations, researchers, techniques). In this case, is there a principal source for going back upwards? It seems unfair to expect a treatment to keep track of all other treatments which point to it (I may be infering too much here).
I think you may be off the track (or I may be). A character dependency is a property of a character or a state (depending whether it's an up or down dependency). It should be internal to a treatment, so there's no "keeping track" to worry about.
- Why should characters only have properties at the lowest level?
What is the 'lowest level' given that drill-down can occur? Could you outline the reasoning here?
You're right - characters at higher levels may have properties also e.g.
Leaves venation prominence pattern
It may be worth providing notes, illustrations etc for "Leaves" so basic information can be moved up the hierarchy. Similarly, an "up" dependency set to "Leaves" would apply to all lower-level character items.
- Its possible to provide a Character list internally to the
treatment or reference an external one. Will it be a requirement that both could be used (i.e. combine the internal and external lists.)
Yes. The way I see it an external character list is a resource, but not a constraint. If one is referenced a treatment builder should also be able to 1. define their own internal characters, 2. perhaps pick and choose amongst the characters from the referenced list, rather than have to use them all. I think such flexibility is necessary if lexica are to grow.
Possible?
Cheers - k
participants (1)
-
Kevin Thiele