There is a subtext running in this discussion - whether part of our scope is the creation of lexicons or standard name-spaces - that to me is causing confusion.
For instance:
From Leigh:
Restricting states to particular options depending on the property in question (e.g. leaf and/or wing shape) leads back to the prior discussion on accepted standards for character description.
Defining, and agreeing upon these standard notations/descriptions are A FUNDAMENTAL PART (my caps) of specifying this new format, and one that isn't solved simply by deciding to use XML (for example). Its part of the fundamental design and modelling, and is therefore something that should be addressed early on.
But from Leigh again:
I'd say that the DELTA approach - of avoiding domain (i.e. zoology, virology, etc) specific notations in the format has worked well. And I think this is the level that any initial work should be pitched at. i.e. the data format should encode taxonomic *data* - just as DELTA does. Any domain specific schema can be layered on top of this, or include it. Begin with capturing the relevant data just as DELTA does, and then progress from there.
So do we or don't we? Am I misinterpreting these that they seem to say opposite things?
From Gregor:
It is true: Morphological structures may have containment hierarchies, but I believe that these depent strongly on the viewpoint of the author or user.
EXAMPLE 2: Stuff can be in-between: The inflorescence contains part of stem, part of leaves, and all flowers. Which leaves are part of inflorescence and thus called bracts, and which aren't is often a matter of taste, school, country...
Thus: there are multiple concurrent or competing hierarchies, which may overlap.
The only problem with competing hierarchies is if we are trying to standardise and resolve the conflicts. If every worker resolves for their own project what to call bracts, this is not a problem for us.
From Jean-Marc:
We are designing XML vocabularies for the description of biological species.
Are we? I thought we were designing a format by which such a vocabulary can be represented.
For the record, all the current systems (DELTA, LucID, NEXUS etc) enforce nothing lexically, they merely enforce a particular way of representing data. Two data sets for similar groups of plants may contain entirely different characters, or the same characters worded in different ways, or the same characters resolved into states in different ways, or (occasionally) identical characters. Comparing and combining datasets automatically is thus impossible. This seems such a shame, but is it perhaps unavoidable?
Thus, if we are designing vocabularies, we are going a long way beyond what's been attempted before.
Personally, I think designing domain-specific vocabularies will never work, unless the domain is the individual worker or group of collaborating workers. The popularity of lexicons is the old seductive universalism again. Great idea, but...
There are two problems. Firstly, there are (broadly) two types of characters used in descriptions (and keys) - lets call them comparative and diagnostic characters. Comparative characters are the fairly general ones - e.g. leaf shape, ovary position - the sorts of characters that one would aim to describe consistently for all taxa in a monograph. Diagnostic characters are special characters that are useful for separating two or more taxa (of course, sometimes fairly general characters are diagnostic, but not always).
A real example of a diagnostic character (from Synaphaea: Proteaceae):
Ovary with an apical ring of translucent glands......S. bifurcata Ovary without glands.................................S. oulopha
Clearly, no generalised lexicon or name-space will allow for capture of such diagnostic characters.
BUT, perhaps we can have a standardised representation for the generalised characters using a lexicon and then use extensibility to allow user-specific diagnostic characters? To some extent, but perhaps not...:
I will (foolishly) raise a challenge here that any generalised morphological character that anyone can come up with (in the plant domain) will be entirely inadequate for capturing data for some groups. For example, the most straightforward character I can think of is
Leaves present absent
But, a diagnostic difference between Discaria pubescens and Discaria nitida (Rhamnaceae) is the degree to which the leaves persist - in both, leaves tend to be absent in the adult plant, but in D. pubescens they are often completely absent while in D. nitida there are usually scattered reduced leaves on younger branchlets. And in Podostemaceae and Utricularia there's no guarantee that a leaf-like part is a leaf, because the conventional differentiation of vegetative parts into leaves/stems doesn't hold.
Mother Nature's a tricky old dame, and any character definition will be inadequate to catch her. But do we put up with the inadequacy for the advantages that the universality brings? - if it means we constrain our ability to capture data, then I'd say no.
So, I'd like to suggest that we try to develop a standardised data representation, but put no constraints on character definitions whatsoever.
Cheers - k
Beware the Universe - it bites
participants (1)
-
Kevin Thiele