- tdwg-content - lists.tdwg.org

GEN RE: Types of Quantitative Data - Outline Methods
by Stuart G. Poss 27 Nov '99

27 Nov '99

Gregor wrote: > > > Does anybody know about good parameterization of shapes, so they can > > be stored in more objective forms? > To which Don responded: > > Wouldn't the most objective representation of shape be given by a set of > coordinates for points placed around the outline? If orientation of the > object is important, then give landmark coordinates too eg. mark the organ > base and apex. From that info, parameters such as object area, perimeter > length, circularity (perimeter length*perimeter length/area), centroid, > longest and shortest axes, aspect ratios etc. can be calculated. > > You'd also be able to standardise your outlines (eg. using Bookstein's > transformation) and compare against reference shapes (eg elliptic fourier > analysis), pairwise comaprisons using (thin plate splines) or fit to average > shape using Procrustes metric. > > Feature extraction from digital images (eg defining the object outline, > sampling of points around outline)and calculation of the shape parameters > and export to spreadsheet is virtually automatic in packages like OPTIMAS. > James Rolf's NTsysPC will perform the basic standardisations, thin plate > splines and EFA. > > don Approaches to evaluating OUTLINES include: Freeman Chain Code Encoding Chris Meacham's MorphoSys package uses a Freeman Chain Code to automatically identify and encode points along the perimeter of a target object using video digitization, as well as along the edge of holes within it. There is a published citation for this, but I can not find it at the moment. This is not a parametric technique since it generates a point set for which no parameters must be either calculated. Median or Symmetrical Axis Blum, H. 1967. A transformation for extracting new descriptors of shape, pp. 362-380, In: Whaten-Dunn, E, Models for the perception of speech and visual form, MIT Press, Cambridge, MA, 470 pp. Straney, D. O. 1990. Median Axis Methods in Morphometrics, pp. 179-200, In: Rohlf, F. J. and F. L. Bookstein, Proceedings of Michigan Morphometrics Workshop. Special Publication Number 2, The University of Michigan Museum of Zoology, Ann Arbor, 380 pp. Fourier Analysis using Polar Coordinates Kaesler, R. L. and J. A. Waters. 1972. Fourier analysis of the ostracode margin. Geol. Soc. Amer. Bull., 83:1169-1178. Younker, J. L. and R. Ehrlich. 1977. Fourier biometrics: harmonic amplitudes as multivariate shape descriptors: Syst. Zool., 26:336-342. Elliptical Fourier transforms Kuhl, F. P. and C. R. Giardina. 1982. Elliptic Fourier features of a closed contour. Computer Graphiccs and Image Processing. 18:236-258. Rohlf, F. J. and J. W. Archie. 1984. A Comparions of Fourier methods for the description of wing shape in mosquitos (Diptera: Culicidae). Syst. Zol., 33:302-317. Eigenshape analysis: Lohmann, G. P. 1984. Eigenshape analysis of microfossils: A general morphometric procedure for describing changes in shape. Math. Geol., 15(6):659-672. Lohmann, G. P. and P. N. Schweitzer. 1990. On Eigenshape Analysis, pp. 147-166, In: Rohlf, F. J. and F. L. Bookstein, Proceedings of Michigan Morphometrics Workshop. Special Publication Number 2, The University of Michigan Museum of Zoology, Ann Arbor, 380 pp. Bezier Curves Engles, H. 1986. A least squares methods for estimation of Bezier curves and surface and its applicability to multivariate analysis. Mathemaical Biosciences, 79:155-170. Cubic Splines Evans, D. G., P. N. Schweitzer, and M. S. Hanna. 1985. Parametric cubic splines and geological shape descriptions. Mathematical Geology, 17:611-624. Fractals Barnsley, M. F. , V. Ervin, D. Hardin, and J. Lancaster. 1986. Solution of an inverse problem for fractals and other sets. Proceedings of the National Academy of Sciences USA, 83:1975-1977.

1 0

Re: (XML) feature-property-value
by Noel Cross 26 Nov '99

26 Nov '99

Hello XML'ers, A few minor comments with regard to Jean-Marc's recent proposals which, on the whole, I find intriguing: On Fri, 26 Nov 1999, Jean-Marc Vanel wrote: > This feature-property-value triology is what I had in mind the first > time I read about Delta. > A few remarks: > > 1. the feature can be a hierarchy, like: > > leaf/lamina/abaxial_surface/vein_islands/indumentum/density I wouldn't have thought that "density" would be the feature, but rather the property, i.e.: feature = leaf/lamina/abaxial_surface/vein_islands/indumentum property = density value = (some value) > * we can turn the Flora of Australia GLOSSARY in a XML vocabulary in > XML Schema or RDF Schema syntax; each glossary entry should be > classified either as a feature, or a property, or a property-value; Would properties be reified into features (as is done in RDF), or will there be some other way to deal with properties of properties? Intensifiers such as "very", for example. > * the current type of characters of Delta (multistate=enumerate, > integer, real numeric, text) will become type information for > properties in our new Taxonomic XML Schema; there is a standard for > data types in the 2nd part of W3C's XML Schema recommandation; we > must avoid to re-invent the wheel; Note: the XML Schema part 2 of November 1999 is a Working Draft, not even a Proposed Recommendation, let a alone a Recommendation. > * the proposed XDELTA format (http://www.bath.ac.uk/~ccslrd/delta/) > is too much a direct translation of a Delta file; Of course, since its intent is to be a direct translation of the DELTA format, this is not exactly a failing of the XDELTA format. > * I propose to have 3 XML Namespaces for our different XML > vocabularies: > o biological descriptions (generalities) > o botany > o zoology I'm sure you don't mean to leave out Mycology, Bryology, etc. Best wishes, -Noel

1 0

Nexus -- from a user
by Susan B. Farmer 26 Nov '99

26 Nov '99

Note that these comments are based on what is currently available in the current versions of Paup (more Paup 3.1.1 rather than Paup*) and MacClade that I've been using -- not the published standard. I can get the version number for the MacClade that I've been using, if that will help. You can have numeric characters in your data, but they must be "hard coded," (e.g., character #4 "number of leaves" with states "1. three," "2. four," etc. You can't put comments in your character matrix but you can put comments in your state names and character labels section. There's not an unlimited number of states. Paup will let you have about 36 I think, but MacClade allows less. I know I had to combine some of mine to get the data set into MacClade. Practically, unless you're modeling color (which I was) this is usually equivalent to an unlimited number of states, Character notes and illustrations only in MacClade. If you revert/convert to Paup, you loose all that information. Taxon and character names are limited to the number of characters. If I"m not mistaken this is only true in MacClade. Either that or MacClade is once again "shorter" than Paup. Multi-state characters -- they handle this wierd -- and differently. It used to be that they all had to be either AND or OR. And it's not even a true AND or OR type situation. It's mostly an OR, but not quite. (And I know that this is making *no* sense whatsoever!) Multiple states tend to be interpreted as uncertainty rather than polymorphism. YOu can now make that distinction, but it is still polymorphic rather than all of the above at the same time. Inapplicable is treated as missing. There's really no other way to do it. If you have an apealout plant, all the petal characters can either be treated as "missing" or you can have an extra state "not applicable." Then, you run into the problem of that extra state providing support that otherwise might not be there. I'm not aware of value probabilities in either Paup or MacClade. The same situation applies to taxon notes and taxon illustrations. Those are only good in MacClade. The typesetting markup in MacClade is only for HTML rather than typsetting. Anyway, I hope this helps. Maybe you can get some other experiential comments about Paup and MacClade. Susan Farmer sfarmer(a)goldsword.com Botany Department, University of Tennessee http://www.goldsword.com/sfarmer/Trillium

1 0

(XML) feature-property-value
by Jean-Marc Vanel 26 Nov '99

26 Nov '99

I saw this in: Minutes of the Subgroup "Structure of descriptive data" workshop at TDWG 1999 in Harvard Diederich's "Basic properties" General agreement was reached that a direct application of the "structure-property-value" model would be too restrictive, applicable mainly to morphological descriptions. A more general model, including cultural/physiological and molecular descriptions should be developed. The term "feature" was proposed as a more general replacement for structure. This feature-property-value triology is what I had in mind the first time I read about Delta. A few remarks: 1. the feature can be a hierarchy, like: leaf/lamina/abaxial_surface/vein_islands/indumentum/density * the property can default to an "unnamed property", that is a textual content for the feature as a whole; this allows to import directly floristic descriptions, see http://jmvanel.free.fr/Samples/parsing.htm; * we can turn the Flora of Australia GLOSSARY in a XML vocabulary in XML Schema or RDF Schema syntax; each glossary entry should be classified either as a feature, or a property, or a property-value; * the current characters of Delta are in fact feature-property couples; * the current type of characters of Delta (multistate=enumerate, integer, real numeric, text) will become type information for properties in our new Taxonomic XML Schema; there is a standard for data types in the 2nd part of W3C's XML Schema recommandation; we must avoid to re-invent the wheel; * the proposed XDELTA format (http://www.bath.ac.uk/~ccslrd/delta/) is too much a direct translation of a Delta file; * I propose to have 3 XML Namespaces for our different XML vocabularies: o biological descriptions (generalities) o botany o zoology * XML Namespaces are important to avoid name clashes, they will allow us to mix in descriptions or reports with other vocabularies from other origins, like biochemical, paleontological, ecological, phytosociological, pedological, climatic, agronomic, plant uses, ethnobotany, etc... (again avoid re-invent the wheel) All together this is a sound and state-of-the-art XML foundation; of course a lot of details are left. Cheers JMV

1 0

Re: so far ...
by Susan B. Farmer 26 Nov '99

26 Nov '99

>> This is typical of the level of malpractice of those who seem to be taking the >> running with this discussion. Can you imagine being so clumsy and foolhardly >> with hard-won data? I despair of this debate. > >Patience Nick - new lists take a while to settle down as participants >try and establish a common agenda, and people strut personal favorite >products and ideologies. > >Like Bernie, I am waiting for the discussion to focus itself. > >I had initially thought we were looking at a common and comprehensive >(interchange?) format for biological descriptive data, perhaps >involving an information model of the topic we are dealing with, and >importantly, its boundaries. But all too quickly we have got to the >level of all things to all people end-products, software and a degree >of daunting complexity that we could probably do without at this stage. Putting my programmer hat on here -- this debate may seem endless; but before one can define how data looks, one must have an idea of what goes into it. I have no more knowledge of what an entomologist's needs are in that respect than he does of a botanists (or even of a synantherologist to a bryologist). If I'm going to define a group of objects to define the morphology of my organisms (in my case of members of the Trilliaceae), I'll have a small set. If I want to make my program/definitions useful enough to a vascular plant person (or anybody else who wants to record data outside my tiny group of organisms), then there's a whole nother set of definitions that I'll need to include. I don't think that we're trying to get to the definition level -- especially not immediately (or at least I don't think so), but for this kind of work I think a Data Dictionary would be a useful thing to have available at some point. The second point to consider is that all of the other standards have their shortcommings. This one will probably have its shortcommings as well -- no standard is perfect. We can't hope to address those problems in this standard if we're not aware of the shortcommings in the other ones. > >Approaching things from an 'if it aint broke dont fix it' point of >view, is someone in a position to enunciate/tabulate exactly what it is >we are trying to achieve and the shortcomings/limitations of exiting >formats in reaching this goal? Having done this we might be better >able to partition things into managable and achievable lumps. I was not >at the Harvard meeting so I'm a bit reluctant to stick an oar in, but >since when has ignorance been a reason not to have an opinion... :) > >All the suggestions of 'do it this way using this', while interesting >and educational, are tending to obscure rather than clarify what it is >we are tring to achieve. > >Well, I am getting a bit lost, and I love this data stuff... > >jim > Susan Farmer sfarmer(a)goldsword.com Botany Department, University of Tennessee http://www.goldsword.com/sfarmer/Trillium

1 0

Re: Limits of the Discussion.
by Stuart G. Poss 26 Nov '99

26 Nov '99

Are existing methods and this discussion restricted to "morphological characters"? I think it would be much more useful to broadly define a character as "a feature or property that varies among taxa as defined for a given basis of comparison." This could be a morphological trait as it often will be for taxonomic considerations. However, characters can be defined for a molecular, physiological, ecological, and behavioral bases (the last perhaps less so for plants). This would greatly broaden the potential group of users and begin the much needed dialog among disciplines that are not defined by the taxa themselves. Even among morphologists and taxonomists, there are many of us who seek to understand morphology in a variety of contexts. How are non-morphological characters treated by contemporary methods (besides 0's and 1's)? "Susan B. Farmer" wrote: > > If I'm going to define a group of objects to define the morphology > of my organisms (in my case of members of the Trilliaceae), I'll > have a small set.

1 0

Re: Types of data
by Jim Croft 26 Nov '99

26 Nov '99

><My_description> > <leaf> > <lamina> > <vein_islands> > <indumentum density="sparse"/> > </vein_islands> > </lamina> > </leaf> ></My_description> I really like this open ended nested or hierachical approach to characters and the contextual and relational information it brings with it. It would be really nice to be able to handle this arangement of data in a DELTA or DELTA-like application. There are instances where it would be very useful and certainly improve ease of drilling through large character sets. However, data-wise, it is a lot more verbose, and you could up with a lot of repetition to describe similar indumentum on leaves, stems, pedicels, fruit, etc.; but if someone wants to build such a thing, I sure we could live with this... :) jim __________________________________________________________________________ Jim Croft ~ jrc(a)anbg.gov.au ~ http://www.anbg.gov.au/people/croft.jim.html ph 02-6246-5500 ~ fx 02-6246-5248 ~ GPO Box 1600 Canberra ACT 2601

1 0

Re: Types of data
by Jim Croft 26 Nov '99

26 Nov '99

>Surely >the way you measure an attribute depends on what you are going to use the >measurements for, ultimately that's a question of precision. If there is a >possibility that there will be a variety of uses (reuse) for the data, then >an indictation of the degree of precision of the measurement would be highly >desirable. Source and precision stuff is surely important, but it is quite possible to end up with major bloat and more meta than meaningful data. This may not be a problem but there is certainly a practical limit as to how far we can or should go in this direction. >Isn't it also important to distinguish whether the description refers to an >actual leaf (ie. that leaf there, on that specimen is ovate) or whether you >are making a generalisation about a taxon (ie leaves ovate to obovate)? Most certainly. I must admit I thought we were talking about taxon descriptive data, but therie is probalby no reason at all why the same principles and data structure/definitions can not be used forthe structures/features themselves. >A taxon cannot have a leaf shape, only leaves can, and an actual leaf does not >usually vary in shape except through time. Conversely, only a taxon can have, or not have, ovate leaves. Is this a generalization or a fact? (and does it matter?) ><my_taxon_description> > <specimen> > <leaf shape="ovate" length_mm="10"/> > <leaf shape="ovate" length_mm="8"/> > <leaf shape="obovate" length_mm="25"/> > </specimen> > <specimen> > <leaf shape="ovate" length_mm="5"/> > <leaf shape="obovate" length_mm="31"/> > </specimen> ></my_taxon_description> Conceptually I like this reperesentation - it is what we all do and may or may not write down. But as you say, maybe people want to know this, maybe they don't... jim __________________________________________________________________________ Jim Croft ~ jrc(a)anbg.gov.au ~ http://www.anbg.gov.au/people/croft.jim.html ph 02-6246-5500 ~ fx 02-6246-5248 ~ GPO Box 1600 Canberra ACT 2601

1 0

FW: GEN RE: Types of data
by Don Kirkup 25 Nov '99

25 Nov '99

> but ovate is rather difficult to parameterize. It may be possible to > get ellipsoidal parameters, but width and length are clearly > insufficient. > > Does anybody know about good parameterization of shapes, so they can > be stored in more objective forms? Wouldn't the most objective representation of shape be given by a set of coordinates for points placed around the outline? If orientation of the object is important, then give landmark coordinates too eg. mark the organ base and apex. From that info, parameters such as object area, perimeter length, circularity (perimeter length*perimeter length/area), centroid, longest and shortest axes, aspect ratios etc. can be calculated. You'd also be able to standardise your outlines (eg. using Bookstein's transformation) and compare against reference shapes (eg elliptic fourier analysis), pairwise comaprisons using (thin plate splines) or fit to average shape using Procrustes metric. Feature extraction from digital images (eg defining the object outline, sampling of points around outline)and calculation of the shape parameters and export to spreadsheet is virtually automatic in packages like OPTIMAS. James Rolf's NTsysPC will perform the basic standardisations, thin plate splines and EFA. don

1 0

First level topics on this list (GEN)
by Gregor Hagedorn 25 Nov '99

25 Nov '99

The following is an attempt to structure the topics that must be discussed. On a first level, I believe we need to discuss: (GEN) General topics, e.g. discussion about the agenda in general, standardization process in general, perhaps exploration of funding possibilites to organize one or several workshops. (RQT) Requirement analysis and information model for descriptive information in biology: Stepping back and asking ourselves what we need in terms of information model, entities/attributes or functionality. We should try to create an outline here to further structure this discussion (XML) Exploration of the suitability or desirability of XML to be used as a metaformat for a new exchange format for descriptive data (incl. discussion of XML-Data, XML-Schema, XML-RDF, names spaces, Dublin Core, etc. What else? RQT and XML should definitely be structured further. I will make a proposal for RQT tomorrow. Please: - Always use a subject line - If possible, add the outline code (GEN, RQT, XML) and perhaps the number in the outline for this topic after your own subject in brackets. See the example for (GEN) in this posting. - When responding to a posting, please delete any information you are not commenting on. Do not just leave the entire posting as quoted text before or after your own text! I believe that structuring will help the discussion, but please feel free at any time to submit a discussion paper on any topic you consider important! Gregor ---------------------------------------------------------- Inst. for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Gregor Hagedorn Net: G.Hagedorn(a)bba.de Koenigin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203 Often wrong but never in doubt!

1 0