<RAMBLING_PHILOSOPHY flame="occasionally" biology="rarely">
Erik Westlin writes:
In my mind it's totally wrong to create a language for descriptions using XML. Instead create a dedicated language for descriptions - then construct a XML generator for this language if you like.
One would do this only if one decided that XML and its associated languages did not have enough expressive power to solve the problem at hand. If it does, not using it is to abandon LOTS of existing tools for parsing, transformation, presentation, data exchange and data entry.
I'm in computer science, my interrest in botany is just a small part time hobby But i hate to see a lot of energy go to waste and i would like to see a computer flora whith a deep knowledge i could interact with.
This is a good goal for system architects, but I doubt that treatment representation will be the enabling technology for it. Rather only it must not preclude it. What taxonomists mean by a treatment is too narrow to be the sole support for what non-taxonomists, especially non-biologists, need from computer systems about organisms.
Treatment data is only part of the knowledge that an author of, say, a field guide, must represent in the guide, or that is needed by projects as ambitious as the Global Biodiversity Information Facility (www.gbif.org), or many of the numerous efforts with intersecting goals (http://www.gbif.org/relafram.htm).
Here's an example that can not(?) be represented in what Kevin is attempting, nor does it belong there: Some years ago on a rural road in upstate New York, I passed a litter of fox cubs who---at the likely cost of their own lives---would not leave their mother as she lay dying in the middle of the road, obviously hit by a car. There is a lot of important behavioral and ecological information encompassed in that scene: the age of the cubs; the rearing habits of the species; the impact of roads on populations; etc. etc. None of this matters to a taxonomist (well, maybe a little. Probably one can deduce from the behavior that foxes are mammals).
Kevin's effort, and TDWG's charter, is foremost about data in the service of knowledge, rather than knowledge in the service of users. As such, his effort supports the system builders, especially those with a desire to exchange or federate data or (one hopes) to build extensible systems.
I think you should consult some people in the knowledge representation field.
<FLAME level="extreme" offense_intended="none">
This is the last thing one would do if one believes---as I do---that computer scientists should be building tools that /free/ biologists from computer science specialists rather than indenture biologists to computer scientists. Were it truly necessary, it would be analogous to advice to consult with automotive engineers, physicists and organic chemists before you buy a car.
</FLAME>
Plant descriptive data is much to complex to be captured in XML which is
I don't believe this. XML in particular and semi-structured data in general can support schema inference (See especially the literature referenced in "Data on the Web : From Relations to Semistructured Data and Xml" (Morgan Kaufmann Series in Data Management Systems) -- Serge Abiteboul, et al; 2000). This alone probably disproves your claim.
more geared towards presentation than representation.
Ummm.... unless these terms mean something different in the Knowledge Representation field than in the Document Management field, this is the opposite of what most people believe:
"Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure." Sec. 1(Origin and Goals) of the XML1.0 Recommendation http://www.w3.org/TR/REC-xml#sec-origin-goals
So much is the desire to separate presentation that only a small part of XSL, the XML Style Sheet language, addresses it (Section 6, "Formatting Objects", http://www.w3.org/TR/xsl/slice6.html#fo-section)
In general, neither Kevin's goals nor XML impose any presentation or other semantics at all. That's good, because it leaves knowledge representation to the application builder---who one hopes has lots of help from biologists, all of whom are smarter than software.
</RAMBLING_PHILOSPHY>
Bob Morris
p.s. After about a year of hanging out with a variety of biologists, I've come to think of the taxonomists the of grammarians and etymologists (not /entomologists/) of Life:
"etymology n., pl. The origin and historical development of a linguistic form as shown by determining its basic elements, earliest known use, and changes in form and meaning, tracing its transmission from one language to another, identifying its cognates in other languages, and reconstructing its ancestral form where possible."[1]
Taxonomists, God bless 'em. You can't live with 'em and you can't live without 'em.
[1] The American Heritage. Dictionary of the English Language, Third Edition 1996, as cited after ALT-Click with software installed from from guruNet. This software is incredibly cool. It gives you information about any word in /any/ application on your Windows or Palm screen. See http://www.gurunet.com.