From Alex R. Chapman (alexc@calm.wa.gov.au), 14/12/99, 10.46:
Kevin's comments concur with my earlier posting on the scope of this list's topic. While reference to particular biological characters and attributes is essential to illustrate the potential functionality of any new standard system of representing descriptive data, it is surely beyond the scope of this group to attempt to develop or standardise the character lexicons themselves. A worthy and extremely useful task (and here I specifically differ from Kevin T.) but surely a major threat to reporting back to TDWG with a new data model in a timely manner!
I would like to suggest that if part of our group think it is worthwhile to attempt to construct a standard character list for one or more (or all) kingdoms (to which I am not opposed) that we have an additional first level topic identifier for subject lines in our postings (currently we use GEN, RQT and XML).
Perhaps LEX - for the discussion of developing standard lexicons or name-space vocabularies for one or more fundamental taxonomic groups.
This way those most interested in this area can continue the debate. (It would be a fascinating outcome if most of the postings end up under this category!)
In fact such a sub-subgroup would perhaps help to provide a range of character types or exemplars with which we could test any proposed data model arising from the main discussion.
Cheers, Alex ____ Alex R. Chapman Email: alexc@calm.wa.gov.au Research Scientist Voice/Fax: +61 8 9334 0513 / 0515 WA Herbarium - Department of Conservation and Land Management Locked Bag 104 Bentley Delivery Centre Western Australia 6983 ---------- Original Text ----------
From: "Kevin Thiele" kevin.thiele@PI.CSIRO.AU, on 9/12/99 7:55:
There is a subtext running in this discussion - whether part of our scope is the creation of lexicons or standard name-spaces - that to me is causing confusion.
For instance:
From Leigh:
Restricting states to particular options depending on the property in question (e.g. leaf and/or wing shape) leads back to the prior discussion on accepted standards for character description.
Defining, and agreeing upon these standard notations/descriptions are A FUNDAMENTAL PART (my caps) of specifying this new format, and one that isn't solved simply by deciding to use XML (for example). Its part of the fundamental design and modelling, and is therefore something that should be addressed early on.
But from Leigh again:
I'd say that the DELTA approach - of avoiding domain (i.e. zoology, virology, etc) specific notations in the format has worked well. And I think this is the level that any initial work should be pitched at. i.e. the data format should encode taxonomic *data* - just as DELTA does. Any domain specific schema can be layered on top of this, or include it. Begin with capturing the relevant data just as DELTA does, and then progress from there.
So do we or don't we? Am I misinterpreting these that they seem to say opposite things?
From Gregor:
It is true: Morphological structures may have containment hierarchies, but I believe that these depent strongly on the viewpoint of the author or user.
EXAMPLE 2: Stuff can be in-between: The inflorescence contains part of stem, part of leaves, and all flowers. Which leaves are part of inflorescence and thus called bracts, and which aren't is often a matter of taste, school, country...
Thus: there are multiple concurrent or competing hierarchies, which may overlap.
The only problem with competing hierarchies is if we are trying to standardise and resolve the conflicts. If every worker resolves for their own project what to call bracts, this is not a problem for us.
From Jean-Marc:
We are designing XML vocabularies for the description of biological species.
Are we? I thought we were designing a format by which such a vocabulary can be represented.
For the record, all the current systems (DELTA, LucID, NEXUS etc) enforce nothing lexically, they merely enforce a particular way of representing data. Two data sets for similar groups of plants may contain entirely different characters, or the same characters worded in different ways, or the same characters resolved into states in different ways, or (occasionally) identical characters. Comparing and combining datasets automatically is thus impossible. This seems such a shame, but is it perhaps unavoidable?
Thus, if we are designing vocabularies, we are going a long way beyond what's been attempted before.
Personally, I think designing domain-specific vocabularies will never work, unless the domain is the individual worker or group of collaborating workers. The popularity of lexicons is the old seductive universalism again. Great idea, but...
There are two problems. Firstly, there are (broadly) two types of characters used in descriptions (and keys) - lets call them comparative and diagnostic characters. Comparative characters are the fairly general ones - e.g. leaf shape, ovary position - the sorts of characters that one would aim to describe consistently for all taxa in a monograph. Diagnostic characters are special characters that are useful for separating two or more taxa (of course, sometimes fairly general characters are diagnostic, but not always).
A real example of a diagnostic character (from Synaphaea: Proteaceae):
Ovary with an apical ring of translucent glands......S. bifurcata Ovary without glands.................................S. oulopha
Clearly, no generalised lexicon or name-space will allow for capture of such diagnostic characters.
BUT, perhaps we can have a standardised representation for the generalised characters using a lexicon and then use extensibility to allow user-specific diagnostic characters? To some extent, but perhaps not...:
I will (foolishly) raise a challenge here that any generalised morphological character that anyone can come up with (in the plant domain) will be entirely inadequate for capturing data for some groups. For example, the most straightforward character I can think of is
Leaves present absent
But, a diagnostic difference between Discaria pubescens and Discaria nitida (Rhamnaceae) is the degree to which the leaves persist - in both, leaves tend to be absent in the adult plant, but in D. pubescens they are often completely absent while in D. nitida there are usually scattered reduced leaves on younger branchlets. And in Podostemaceae and Utricularia there's no guarantee that a leaf-like part is a leaf, because the conventional differentiation of vegetative parts into leaves/stems doesn't hold.
Mother Nature's a tricky old dame, and any character definition will be inadequate to catch her. But do we put up with the inadequacy for the advantages that the universality brings? - if it means we constrain our ability to capture data, then I'd say no.
So, I'd like to suggest that we try to develop a standardised data representation, but put no constraints on character definitions whatsoever.
Cheers - k
Beware the Universe - it bites
participants (1)
-
Alex R. Chapman