Hi Kevin
looking over the character lists you use for keys I would say that they can all be expressed by decomposing into a number of atomic statements ( our angiosperm ontology has not yet included some of the structures and states required, but this is not a problem as the terminology is readily expandable) - i have briefly outlined the sort of mapping that is done in the table below, (note - i am not a botanist...so there may be some gaffs ;-).
The approach that we are proposing is that the descriptions are collected as atomic statements, and more traditional 'characters' can be discovered by analysis of this data (many characters are apparently a collection of atomic scores/states)
Our taxonomists find this quite a departure from how they compose and record their characters at the moment ( they recognize/discover and define a set of characters by looking at the variation that exists in their specimen, then create a scoring sheet/proforma that allows them to pick one of these alternative characters) - our system might be tweaked to allow them to work in a more character oriented manner if they precompose sets of statements as part of the proforma specification, and then score these alternates as present or absent.
a major advantage of our system can be seen from some of your simple characters - eg growth habit: you have split this into two alternatives 1. Epiphytic or lithophytic habit vs 2. (not epiphytic or lithophytic) whilst this might make sense for a key, and is a DELTA-like representation, we would argue that if the ACTUAL growth habit was scored for each specimen as epiphytic, lithophytic, terrestrial, aquatic ( or concatenations of these ) far more accurate information would be recorded. For example, this would allow the same specimen description to be divided into other character sets if desired ( someone else may think that a key would work better if the alternates were soildwelling or lithophytic vs epiphytic, another person might want the alternates separately....if the description data had been recorded in the orginal two-alternate-character division, this data reuse would not be possible.
I hope this shows some of the salient features of our model...and how we think it would beneft working taxonomists.
LUCID CHARACTERS<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
STRUCTURE
PROPERTY/
STATEGROUP
STATES
Salt tolerance
· plants tolerating high salt levels (halophytes)
· plants not salt tolerant
Entire Plant
Ecological Adaptations
Halophytic
(there are a list of alternate states that could be scored, or NOT-halophytic is allowed)
General habit
· tree
· shrub
· climber (woody or herbaceous)
· herb
· grass- or sedge-like plant
Entire Plant
Habit
Tree, Shrub, Herb etc.are scorable (or the negative)
Entire Plant
Architecture
Climbing, Bushy, creeper, Twining etc
We can collect more specific data by scoring more states for additional properties
Epiphytic or lithophytic habit
· plants growing in soil (not epiphytic or lithophytic)
· plants growing on other plants or on bare rock surfaces (epiphytic or
· lithophytic)
Entire Plant
Preferred Substrate
Epiphytic, Aquatic, Lithophytic, Terrestrial
Habit (aquatic herbs only)
· free-floating
· rooted in substrate with leaves all or mostly submerged
· rooted in substrate with leaves mostly floating on the water surface
· rooted in substrate with leaves mostly emergent above the water surface
Root
Root attachment
free-floating, substrate-attached
we don't have appropriate terms etc for thes states in our ontology as yet - but they could be added
Leaf
Aquatic Position
floating, submerged,
emergent
Seasonal longevity
· annual, biennial or ephemeral
· perennial
Entire Plant
Lifespan
Annual, Biennial, ephemeral, perrenial
Seasonality of leaves (woody plants)
· evergreen
· deciduous or semi-deciduous
Leaf
Lifespan
deciduous, semi d., evergreen
Structures for spreading vegetatively
· none (plants not spreading vegetatively)
· underground bulbs, corms or tubers etc
· rhizomes, stolons or root-suckers
· detached aerial stem parts, or proliferous flowerheads
Entire Plant
sex and reproduction
vegetative
list of alternatives, or use NOT
Bulb
Presence
present, absent
Corm
Presence
present, absent
Tuber
Presence
present, absent
Rhizome
Presence
present, absent
Stolon
Presence
present, absent
Root-sucker
Presence
present, absent
detached aerial stem parts
Presence
present, absent
bulbils
Presence
present, absent
inflorescence
Type
proliferous
we can identify 'types' of structures, ith associated sets of states, (aerial stem parts migh be a type of stem)
Chlorophyll in stems or leaves
· present (plants green or grey-green)
· absent (plants colourless, white or yellowish)
Leaf-Cholorphyll
Presence
present, absent
uses our structure hierarchy to identify which chlorophyll we are describing
Stem-Chlorophyll
Presence
present, absent
Entire Plant
Colour
specify any colour
Nutritional strategy
· neither carnivorous nor parasitic (normal plants)
· partially or totally parasitic on other plants
· carnivorous
Entire Plant
Habit-Lifestyle
carnivorous, parasite, partial parasite, etc
any combination of states including NOT can be allowed
Trap structures (carnivorous plants only)
· submerged or underground bladders
· pitcher-traps
· sticky glands or glandular hairs on leaves and/or stems
· trap like irritable leaf blade segments
We haven't had to address trap yet but we have anumber of ways in which the terminology can be expanded to represent this information....
· we don't have 'trap' as a structure in our ontology yet - we could add trap structure in various structural contexts, and allow scoring presence or absence.
· we can add the presence of hairs or glandular hairs anywhere - and again score presence/absence
· we would have to add some stes to the ontology - e.g irritable
Trevor Paterson PhD t.paterson@napier.ac.uk < mailto:t.paterson@napier.ac.uk mailto:t.paterson@napier.ac.uk >
School of Computing Napier University Merchiston Campus 10 Colinton Road Edinburgh Scotland EH10 5DT
tel: +44 (0)131 455-2752
http://www.dcs.napier.ac.uk/~cs175 http://www.dcs.napier.ac.uk/~cs175
http://www.prometheusdb.org http://www.prometheusdb.org
-----Original Message----- From: Kevin Thiele [ mailto:kevin.thiele@BIGPOND.COM
mailto:kevin.thiele@BIGPOND.COM ]
Sent: 16 March 2004 22:59 To: TDWG-SDD@LISTSERV.NHM.KU.EDU Subject: Re: SDD Schema in relationship to Prometheus_Response to Kevin
Trevor,
I agree that there will be considerable congruence between SDD and Prometheus, imposed by the needs of the system. The main difference is in our decision early on not to constrain a taxonomist's ability to express any necessary differences in SDD.
We looked at a proscribed part-region-property-state data model such as you've developed for Prometheus early on, and rejected it in favour of a more general, simple character-state model (such as DELTA had; with the extension that characters may optionally be arranged into character hierarchies).
For instance, in an interactive key that I published a few years ago (The Families of Flowering Plants of Australia, in Lucid), I have (amongst many others) the following characters:
Salt tolerance plants tolerating high salt levels (halophytes) plants not salt tolerant
General habit tree shrub climber (woody or herbaceous) herb grass- or sedge-like plant
Epiphytic or lithophytic habit plants growing in soil (not epiphytic or lithophytic) plants growing on other plants or on bare rock surfaces (epiphytic or lithophytic)
Habit (aquatic herbs only) free-floating rooted in substrate with leaves all or mostly submerged rooted in substrate with leaves mostly floating on the water surface rooted in substrate with leaves mostly emergent above the water surface
Seasonal longevity annual, biennial or ephemeral perennial
Seasonality of leaves (woody plants) evergreen deciduous or semi-deciduous
Structures for spreading vegetatively none (plants not spreading vegetatively) underground bulbs, corms or tubers etc rhizomes, stolons or root-suckers detached aerial stem parts, bulbils or proliferous flowerheads
Chlorophyll in stems or leaves present (plants green or grey-green) absent (plants colourless, white or yellowish)
Nutritional strategy neither carnivorous nor parasitic (normal plants) partially or totally parasitic on other plants carnivorous
Trap structures (carnivorous plants only) submerged or underground bladders pitcher-traps sticky glands or glandular hairs on leaves and/or stems trap like irritable leaf blade segments
I don't see how it would be possible to represent these in your data model, but as a taxonomist I need to represent them for my purpose. There would be no problem with these in SDD.
The good news I suppose is that since SDD is (hopefully) more general, there should be no problem rendering your data in SDD. The more interesting question is whether your data *model* (the part-structure stuff) would come out the other end of an SDD roundtrip.
It should be possible, if you represent you part-structure stuff in a character hierarchy, and assume after roundtripping that the hierarchy you get back can still be interpreted in the same way. That is, you would need to check whether the hierarchy you receive conforms with your model (and I suppose reject it if it does not).
Cheers - k
----- Original Message ----- From: Paterson, Trevor To: TDWG-SDD@LISTSERV.NHM.KU.EDU Sent: Tuesday, March 16, 2004 10:36 PM Subject: Re: SDD Schema in relationship to Prometheus_Response to Kevin
Kevin Thanks for your reply, it is becoming much clearer to me that actually alot of our thoughts are convergent ( probably because we are all thinking about the same issues.... ). You have clarified a lot of ipoints, and i have added a little more clarification below.... It looks like a worthwhile task would be to try and represent our angiosperm terminology in SDD format at some stage ( time permitting etc...as ever). This is probably more straightforward than representing our descriptive data according to SDD as our underlying data model is quite different insome aspects ( I think!!!). cheers Trevor
Trevor Paterson PhD t.paterson@napier.ac.uk School of Computing Napier University Merchiston Campus 10 Colinton Road Edinburgh Scotland EH10 5DT tel: +44 (0)131 455-2752 www.dcs.napier.ac.uk/~cs175 www.prometheusdb.org -----Original Message----- From: Kevin Thiele [ mailto:kevin.thiele@BIGPOND.COM
mailto:kevin.thiele@BIGPOND.COM ]
Sent: 15 March 2004 22:33 To: TDWG-SDD@LISTSERV.NHM.KU.EDU Subject: Re: SDD Schema in relationship to Prometheus
Apologies: The previous post from me with this title was an unfinished version sent off prematurely by my email editor. Please ignore and use this one instead.
Hi Trevor - thanks very much for your comments and comparative document - this is really useful, and we need to get much more feedback like this.
The main difference between SDD and Prometheus seems to be that you are working specifically on the basis of defining a controlled terminology whereas SDD explicitly decided early on that a controlled terminology was outside our scope. History will judge which approach is best. [Paterson, Trevor] a controlled terminology was not such a large feature of our work initially - we were more interested in the model for saving 'character' data - however it became obvious that the only way to allow unambiguous interpretation of data - for reuse, comparison etc - was to provide full definitions. it then seemed desirable that people would share definitions to allow compatability.......whether this will be achieved by bottom up adoption is an open question. Taxonomists don seem to like the idea of top down imposition - tho they may be happier when it is restricted to quite a small domain of users
We did have early discussions about a controlled terminology (see the list archives for a history of this).One dificulty for us is that SDD is designed to be biology-wide (indeed, we have even removed specific references to biology, such as "taxon", because SDD is equally applicable to descriptions of non-taxa such as diseases, nutrient deficiency syndromes, soils and minerals. Perhaps here we have drawn our bow too wide, but we were informed by the fact that at our Lisbon meeting all but one of the contributors who were working with identification tools had removed their biology-specific tags to become more general). Prometheus (as I understand it from your document) is specifically botanical. This would be an intolerable restriction for us given our brief. [Paterson, Trevor] We are constrained by the expertise of whoever we are collaborating with... the taxonomists at RBGE are full partners in this project so the 'test domains' reflect their interests and expertise ( or we will never get real test data). We hope that our character model will be applicable to the whole field of biological taxonomy - - and that specific ontologies/terminologies could be developed to allow description of other groups ( mammals, insects etc)
Obviously, a botany-wide controlled terminology is more achievable than a biology-wide one. Personally, however, I think that you run the danger even in botany with any controlled terminology of trying to force nature kicking and screaming into small boxes, and do it an injustice therewith. I don't know how any botany-wide controlled terminology could cope with the leaves of Drosera auriculata, for instance, or the morphology of Podostemaceae. (In fact, I wonder whether the dream of a controlled terminology is more likely in a cold Northern Hemisphere climate than in the biodiverse South or tropics?). [Paterson, Trevor] Yes - we know that diverse taxa would probably require specific ontologies. We may be able to develope a system that allows a core central terminology - with taxon specific extensions.....We want to allow MEANINGFUL comparison of data - and often there is no need or sense in comparing data across widely divergent taxa ie you would might want to compare the properties of stalks on angiosperm flowers, but it is probably of no taxonomic interest to compare these with the stalks of a slime mould fruiting body...............
In general, we have taken the view that a controlled terminology in particular domains (e.g. legumes) may develop as an emergent property of SDD, rather than imposed top-down. [Paterson, Trevor] Yes - this is the working model we have come round to...users develope an ontology and share it with colleagues in a closely related field etc....
On more specific points from your document:
Complexity: SDD was scoped to be a superset of existing systems and standards e.g. DELTA, Lucid, DeltaAcess, and also to accommodate future developments that those of us working in the field can envisage but no-one's really done yet (particularly federation issues - and you may be further down this track than we are). This is part of the reason for the complexity,
It is not clear to me whether SDD is proposing this schema as a unifying schema to which different description formats
would map their own schema
or whether the SDD schema is being proposed as a schema for
developers to (partially) implement when designing applications
and repositories for capturing descriptive data.
It is designed as a unifying standard, to allow lossless roundtripping between applications. At the same time, we are struggling with how much should be mandatory and how much optional (your second option)
From our own collaborative experiences with botanical
taxonomists, data models and structures hold no interest to them in
practice, and they find even our simple conceptual model of
character description complex to understand. Probably few working
taxonomists would wish to interact at any level with the
SDD schema and applications would have to achieve this mapping
transparently.
On this I'm sure you're right, and we have had many discussions within SDD about this problem. There are differing views as to the importance of taxonomists themselves coming to grips with SDD, as the standard itself will generally be invisible to a taxonomist using an SDD-compliant application. [Paterson, Trevor] The problem is that someone has to write and implement the applications ( e.g. me !) - and they have to do this in collaboration with taxonomists - or it will be perceived as an irrelevant imposition - -therefore it will always be necessary to get at least some taxonomists from a variety of fields to understand the schema...
From my perspective actually it seems much easier for
computer scientists to get to grips with taxonomy rather than vice versa - ( although taxonmists would be a bit defensive about this....)
Translation and multiple language representations: allowing multiple languages is seen as a fundamental part of the SDD brief. Life would indeed be much simpler if everyone spoke the same language, but they don't so we need to handle that.
It is not clear whether SDD proposes that a single document
can include multiple language representations, or whether these
would form separate documents, conforming to the same standard
SDD can handle multiple language representations of every character string within the one document. [Paterson, Trevor] Still not convinced allowing multiple language representations would mace for good/accurate science
Multiple expertise levels
I am similarly suspicious of the necessity for including
the ability for recording different expertise levels in one document format.
Is SDD proposing/allowing multiple representations within
the same document : or just that the same format/standard can be
used for documents aimed at different expertise level.
There clearly is value in being able to extract/translate
simple language descriptions from complex data resources - as is
necessary for compiling flora and keys from monographs and original
descriptions. However, is including the ability to describe
descriptive data in language suitable for primary
schoolchildren relevant to an accurate scientific database of taxonomic data.
[Again this would appear to be a political requirement??]
This is not a political requirement, but an attempt to broaden the application of taxonomy beyond taxonomists (surely a requirement if the taxonomic crisis is to be resolved). It also derives neatly from the XML underpinning (XML is based on the idea of multiple representations of a single document) [Paterson, Trevor] Yes I see where you are coming from - i think the probelm is that taxonomists are concerned about accurate representation of their data for their purposes - making a shareable version of this is not perceived as of any value to them - they want a scientific tool to do their job - i am not sure how /if we could encourage a dual markup approach - or whether different 'markets' for descriptive data would exist independently -
Defining the descriptive terminology
Are you suggesting that the SDD Terminology Section will be
adequate and appropriate to store
and represent any (allowed) defined terminology?
Yes, we hope so. Do you think it will be inadequate? [Paterson, Trevor] It probably would be adequate to store Prometheus terminologies - with minor tinkering, the semantics of the terminolgies could be saved just with glossary entries and relationships etc - but would obviously require interpetation by suitable applications. I cant understand concept trees well enough to get a feel for whether they could store the semantics of a terminolgy more explicitly..... However, that is not to say SDD could cope with other terminolgies that might have further unforseen relationships - there might need to be a facility for recording 'user defined' relationships such as our stategroup membership and restrictions between these and sets of structures - i think that these type of relationships are representable in concept trees - but would need a primer/ turotial to show me how.
Is the standard going to allow descriptions to reference
other defined terminologies?
It will be possible to outsource the terminology section, so if a group creates a controlled vocabulary, that could be referenced in multiple SDD documents. So presumably Prometheus could be the source of a controlled vocabulary that other users (of they found it adequate) could reference.
Would SDD only accept Data marked up in an SDD terminology?
Yes - or do I misunderstand this question? [Paterson, Trevor] If i understand your previous remark - you could have completely external termnologies - that SDD 'knows' nothing about the structure and semantics - so it would have to accept 'non-SDD' terminology
Would existing terminologies have to be
translated/mapped/redescribed in SDD format?
Any existing terminology can be represented in SDD, so there will be no remapping necessary. [Paterson, Trevor] Again this will only be knowable if people try doing it. Obviously if there is a standard structure people would be encouraged to use it - but prexisting description ontologies ( eg PlantOnology and GeneOntology) would not want to retrofit to the standard....even if there is no ' rdesign or remapping necessary - there is the matter of re-expressing it in an SDD format
Who is going to create terminologies, e.g.gusers on an
adhoc basis, or expert user groups?
As above, these may develop particularly for some groups (e.g. ferns, legumes), but some users may choose to stay outside such a system (there will be benefits and costs of using a controlled vocabulary, so people will have to weigh it up for themselves). SDD itself is agnostic. [Paterson, Trevor] Almost our opinion - but we are not agnostic - we believe 'interpretability' and 'reusability' are Gods, and we should proselytize on their behalfs - gently of course - by offering only carrots and no sticks...
Is it an aim to promote re-use and sharing of terminologies?
It would be a desirable outcome, but we hope it evolves bottom-up rather than being imposed top-down. [Paterson, Trevor] Sure
Is there going to be policing of SDD terminologies, e.g. maintaining
versioning, additions etc?
Versioning will be handled within SDD, but there can be no possibility of policing a system [Paterson, Trevor] This what really worries the taxonomists - imposed standards reducing the flexibility and expressivity of their descriptions - and hatred of a police state. I think standardisation WILL be perceived to lead to a loss of expressivity - but as often one person's expressivity is another'; s incomprehensivity to an outsider this seems like potentially 'a good thing' ( in small doses obviously)........
How was the terminology section created - by examining examples of
terminology specifications,
ontology representations etc?
We have had a mix of off-the-top-of-the-head speculation as to how best to do things, and proofing of concepts against real-world examples. I would have liked to see more proofing going on during development, but this has been hard to maintain, and may be to our cost. We are nbow at a phase where several groups are trting to implement SDD-compliance for their systems - this will be the proof of the pudding.
Note also that SDD is currently v0.9 - with the explicit statement on release that everything may change if we find that the proofing fails.
Does it form a standard template for storing a terminology?
What do you mean by this. It provides the standard schema for representing an (undefined) terminology.
Is it compatible with any existing tools, standards or
formats - e.g. ontology editors?
We have specifically made it very general. There is currently no existing tool that can handle SDD. Hopefull this will change shortly. [Paterson, Trevor] Keep me posted
So does Prometheus have any data yet, or is it still at the model stage? It would be very interesting to try representing Prometheus data in SDD. [Paterson, Trevor] to date we have the model ( and a data base implementation to save model compliant data) a tool for creating simple terminologies - with definitions, PartOf, Typeof, Stategroup relations etc a prototype angiosperm terminology/ontology a prototype tool for using onologies to specify project description templates ( proformas) which also allows recordng of specimen descriptions the ability to save these descriptions to the database in the model compliant format and we are currently getting user testing done on the prototype by the taxonomists, who are making proformas adn saving specimen descriptions. We are just about ready to use the tool for a 'real' project if we can get time and interest to do this...eg for a small taxonomic revision. We are rapidly approaching the end of our project funding however.. so it may switch to being a 'back burner' project....
Cheers - k ----- Original Message ----- From: Paterson, Trevor To: TDWG-SDD@LISTSERV.NHM.KU.EDU Sent: Monday, March 15, 2004 9:41 PM Subject: SDD Schema in relationship to Prometheus
Gregor
I have written a rough document considering several aspects of the SDD-schema - largely interpreted with reference to our Prometheus Database model for descriptive data. It seems easier to keep this all together, rather than post it to various sections on twiki, so i am attaching it here
My main problems in interpreting the schema were the lack of documentation ( as always...) especially for the conceptually complex parts like concept trees. I think clear, visual summary models for description, characters, concept trees etc would help a novice to get to grips with the concepts, and might make some of the complexities more tractable. I do worry that the overall schema is over complex and 'trying to do too much in one go' - eg considering multiple language and expertise representations, although I am sure that there are good political reasons for everything.....
yours trevor
Trevor Paterson PhD t.paterson@napier.ac.uk School of Computing Napier University Merchiston Campus 10 Colinton Road Edinburgh Scotland EH10 5DT tel: +44 (0)131 455-2752 www.dcs.napier.ac.uk/~cs175 www.prometheusdb.org