SDD Schema in relationship to Prometheus_Response to Kevin

Kevin Thiele kevin.thiele at BIGPOND.COM
Wed Mar 17 09:59:00 CET 2004


Trevor,

I agree that there will be considerable congruence between SDD and
Prometheus, imposed by the needs of the system. The main difference is in
our decision early on not to constrain a taxonomist's ability to express any
necessary differences in SDD.

We looked at a proscribed part-region-property-state data model such as
you've developed for Prometheus early on, and rejected it in favour of a
more general, simple character-state model (such as DELTA had; with the
extension that characters may optionally be arranged into character
hierarchies).

For instance, in an interactive key that I published a few years ago (The
Families of Flowering Plants of Australia, in Lucid), I have (amongst many
others) the following characters:

Salt tolerance
  plants tolerating high salt levels (halophytes)
  plants not salt tolerant

General habit
  tree
  shrub
  climber (woody or herbaceous)
  herb
  grass- or sedge-like plant

Epiphytic or lithophytic habit
  plants growing in soil (not epiphytic or lithophytic)
  plants growing on other plants or on bare rock surfaces (epiphytic or
lithophytic)

Habit (aquatic herbs only)
  free-floating
  rooted in substrate with leaves all or mostly submerged
  rooted in substrate with leaves mostly floating on the water surface
  rooted in substrate with leaves mostly emergent above the water surface

Seasonal longevity
  annual, biennial or ephemeral
  perennial

Seasonality of leaves (woody plants)
  evergreen
  deciduous or semi-deciduous

Structures for spreading vegetatively
  none (plants not spreading vegetatively)
  underground bulbs, corms or tubers etc
  rhizomes, stolons or root-suckers
  detached aerial stem parts, bulbils or proliferous flowerheads

Chlorophyll in stems or leaves
  present (plants green or grey-green)
  absent (plants colourless, white or yellowish)

Nutritional strategy
  neither carnivorous nor parasitic (normal plants)
  partially or totally parasitic on other plants
  carnivorous

Trap structures (carnivorous plants only)
  submerged or underground bladders
  pitcher-traps
  sticky glands or glandular hairs on leaves and/or stems
  trap like irritable leaf blade segments

I don't see how it would be possible to represent these in your data model,
but as a taxonomist I need to represent them for my purpose. There would be
no problem with these in SDD.

The good news I suppose is that since SDD is (hopefully) more general, there
should be no problem rendering your data in SDD. The more interesting
question is whether your data *model* (the part-structure stuff) would come
out the other end of an SDD roundtrip.

It should be possible, if you represent you part-structure stuff in a
character hierarchy, and assume after roundtripping that the hierarchy you
get back can still be interpreted in the same way. That is, you would need
to check whether the hierarchy you receive conforms with your model (and I
suppose reject it if it does not).

Cheers - k

----- Original Message -----
From: Paterson, Trevor
To: TDWG-SDD at LISTSERV.NHM.KU.EDU
Sent: Tuesday, March 16, 2004 10:36 PM
Subject: Re: SDD Schema in relationship to Prometheus_Response to Kevin


Kevin
Thanks for your reply, it is becoming much clearer to me that actually alot
of our thoughts are convergent ( probably because we are all thinking about
the same issues.... ). You have clarified a lot of ipoints, and i have added
a little more clarification below....
It looks like a worthwhile task would be to try and represent our angiosperm
terminology in SDD format at some stage ( time permitting etc...as ever).
This is probably more straightforward than representing our descriptive data
according to SDD as our underlying data model is quite different insome
aspects ( I think!!!).
cheers
Trevor


Trevor Paterson PhD
t.paterson at napier.ac.uk
School of Computing
Napier University
Merchiston Campus
10 Colinton Road
Edinburgh
Scotland
EH10 5DT
tel:          +44 (0)131 455-2752
www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org
-----Original Message-----
From: Kevin Thiele [mailto:kevin.thiele at BIGPOND.COM]
Sent: 15 March 2004 22:33
To: TDWG-SDD at LISTSERV.NHM.KU.EDU
Subject: Re: SDD Schema in relationship to Prometheus


Apologies: The previous post from me with this title was an unfinished
version sent off prematurely by my email editor. Please ignore and use this
one instead.

-----------------------------

Hi Trevor - thanks very much for your comments and comparative document -
this is really useful, and we need to get much more feedback like this.

The main difference between SDD and Prometheus seems to be that you are
working specifically on the basis of defining a controlled terminology
whereas SDD explicitly decided early on that a controlled terminology was
outside our scope. History will judge which approach is best.
[Paterson, Trevor]
a controlled terminology was not such a large feature of our work
initially - we were more interested in the model for saving 'character'
data - however it became obvious that the only way to allow unambiguous
interpretation of data  - for reuse, comparison etc - was to provide full
definitions. it then seemed desirable that people would share definitions to
allow compatability.......whether this will be achieved by bottom up
adoption is an open question. Taxonomists don seem to like the idea of top
down imposition - tho they may be happier when it is restricted to quite a
small domain of users

We did have early discussions about a controlled terminology (see the list
archives for a history of this).One dificulty for us is that SDD is designed
to be biology-wide (indeed, we have even removed specific references to
biology, such as "taxon", because SDD is equally applicable to descriptions
of non-taxa such as diseases, nutrient deficiency syndromes, soils and
minerals. Perhaps here we have drawn our bow too wide, but we were informed
by the fact that at our Lisbon meeting all but one of the contributors who
were working with identification tools had removed their biology-specific
tags to become more general). Prometheus (as I understand it from your
document) is specifically botanical. This would be an intolerable
restriction for us given our brief.
[Paterson, Trevor]
We are constrained by the expertise of whoever we are collaborating with...
the taxonomists at RBGE are full partners in this project so the 'test
domains' reflect their interests  and expertise ( or we will never get real
test data). We hope that our character model will be applicable to the whole
field of biological taxonomy - - and that specific ontologies/terminologies
could be developed to allow description of other groups ( mammals, insects
etc)

Obviously, a botany-wide controlled terminology is more achievable than a
biology-wide one. Personally, however, I think that you run the danger even
in botany with any controlled terminology of trying to force nature kicking
and screaming into small boxes, and do it an injustice therewith. I don't
know how any botany-wide controlled terminology could cope with the leaves
of Drosera auriculata, for instance, or the morphology of Podostemaceae. (In
fact, I wonder whether the dream of a controlled terminology is more likely
in a cold Northern Hemisphere climate than in the biodiverse South or
tropics?).
[Paterson, Trevor]
Yes - we know that diverse taxa would probably require specific ontologies.
We may be able to develope a system that allows a core central terminology -
with taxon specific extensions.....We want to allow MEANINGFUL  comparison
of data - and often there is no need or sense in comparing data across
widely divergent taxa  ie you would might want to compare the properties of
stalks on angiosperm flowers, but it is probably of no taxonomic interest to
compare these with the stalks of a slime mould fruiting body...............

In general, we have taken the view that a controlled terminology in
particular domains (e.g. legumes) may develop as an emergent property of
SDD, rather than imposed top-down.
[Paterson, Trevor]
Yes  - this is the working model we have come round to...users develope an
ontology and share it with colleagues in a closely related field etc....

On more specific points from your document:

Complexity: SDD was scoped to be a superset of existing systems and
standards e.g. DELTA, Lucid, DeltaAcess, and also to accommodate future
developments that those of us working in the field can envisage but no-one's
really done yet (particularly federation issues - and you may be further
down this track than we are). This is part of the reason for the complexity,

>It is not clear to me whether SDD is proposing this schema as
>a unifying schema to which different description formats would map their
own schema
>or
>whether the SDD schema is being proposed as a schema for developers to
(partially) implement when designing applications
>and repositories for capturing descriptive data.

It is designed as a unifying standard, to allow lossless roundtripping
between applications. At the same time, we are struggling with how much
should be mandatory and how much optional (your second option)

>>>From our own collaborative experiences with botanical taxonomists, data
models and structures hold no interest to them in
>practice, and they find even our simple conceptual model of character
description complex to understand. Probably few working
>taxonomists would wish to interact at any level with the SDD schema and
applications would have to achieve this mapping
>transparently.

On this I'm sure you're right, and we have had many discussions within SDD
about this problem. There are differing views as to the importance of
taxonomists themselves coming to grips with SDD, as the standard itself will
generally be invisible to a taxonomist using an SDD-compliant application.
[Paterson, Trevor]
The problem is that someone has to write and implement the applications (
e.g. me !) - and they have to do this in collaboration with taxonomists - or
it will be perceived as an irrelevant imposition - -therefore it will always
be necessary to get at least some  taxonomists from a variety of fields to
understand the schema...
>>>From my perspective actually it seems much easier for computer scientists to
get to grips with taxonomy rather than vice versa - ( although taxonmists
would be a bit defensive about this....)

Translation and multiple language representations: allowing multiple
languages is seen as a fundamental part of the SDD brief. Life would indeed
be much simpler if everyone spoke the same language, but they don't so we
need to handle that.

>It is not clear whether SDD proposes that a single document can include
multiple language representations, or whether these
>would form separate documents, conforming to the same standard

SDD can handle multiple language representations of every character string
within the one document.
[Paterson, Trevor]
Still not convinced allowing multiple language representations would mace
for good/accurate science

Multiple expertise levels

>I am similarly suspicious of the necessity for including the ability for
recording different expertise levels in one document format.
>Is SDD proposing/allowing multiple representations within the same document
: or just that the same format/standard can be
>used for documents aimed at different expertise level.
>
>There clearly is value in being able to extract/translate simple language
descriptions from complex data resources - as is
>necessary for compiling flora and keys from monographs and original
descriptions. However, is including the ability to describe
>descriptive data in language suitable for primary schoolchildren relevant
to an accurate scientific database of taxonomic data.
>[Again this would appear to be a political requirement??]

This is not a political requirement, but an attempt to broaden the
application of taxonomy beyond taxonomists (surely a requirement if the
taxonomic crisis is to be resolved). It also derives neatly from the XML
underpinning (XML is based on the idea of multiple representations of a
single document)
[Paterson, Trevor]
Yes I see where you are coming from - i think the probelm is that
taxonomists are concerned about accurate representation of their data for
their purposes - making a shareable version of this is not perceived as of
any value to them - they want a scientific tool to do their job - i am not
sure how /if we could encourage a dual markup approach  - or whether
different 'markets' for descriptive data would exist independently -

Defining the descriptive terminology

>Are you suggesting that the SDD Terminology Section will be adequate and
appropriate to store
>and represent any (allowed) defined terminology?

Yes, we hope so. Do you think it will be inadequate?
[Paterson, Trevor]
It probably would be adequate to store Prometheus terminologies  - with
minor tinkering, the semantics of the terminolgies could be saved just with
glossary entries and relationships etc - but would obviously require
interpetation by suitable applications. I cant understand concept trees well
enough to get a feel for whether they could store the semantics of a
terminolgy more explicitly.....
However, that is not to say SDD could cope with other terminolgies  that
might have further unforseen relationships - there might need to be a
facility for recording 'user defined' relationships such as our stategroup
membership and restrictions between these and sets of structures - i think
that these type of relationships are representable in concept trees - but
would need a primer/ turotial to show me how.

>Is the standard going to allow descriptions to reference other defined
terminologies?

It will be possible to outsource the terminology section, so if a group
creates a controlled vocabulary, that could be referenced in multiple SDD
documents. So presumably Prometheus could be the source of a controlled
vocabulary that other users (of they found it adequate) could reference.

>Would SDD only accept Data marked up in an SDD terminology?

Yes - or do I misunderstand this question?
[Paterson, Trevor]
If i understand your previous remark - you could have completely external
termnologies - that SDD 'knows' nothing about the structure and semantics -
so it would have to accept 'non-SDD' terminology

>Would existing terminologies have to be translated/mapped/redescribed in
SDD format?

Any existing terminology can be represented in SDD, so there will be no
remapping necessary.
[Paterson, Trevor]
Again this will only be knowable if people try doing it. Obviously if there
is a standard structure people would be encouraged to use it - but
prexisting description ontologies ( eg PlantOnology and GeneOntology) would
not want to retrofit to the standard....even if there is no ' rdesign or
remapping necessary - there is the matter of re-expressing it in an SDD
format

>Who is going to create terminologies, e.g.gusers on an adhoc basis, or
expert user groups?

As above, these may develop particularly for some groups (e.g. ferns,
legumes), but some users may choose to stay outside such a system (there
will be benefits and costs of using a controlled vocabulary, so people will
have to weigh it up for themselves). SDD itself is agnostic.
[Paterson, Trevor]
Almost our opinion - but we are not agnostic - we believe 'interpretability'
and 'reusability' are Gods, and we should proselytize on their behalfs -
gently of course - by offering only carrots and no sticks...

>Is it an aim to promote re-use and sharing of terminologies?

It would be a desirable outcome, but we hope it evolves bottom-up rather
than being imposed top-down.
[Paterson, Trevor]
Sure

>Is there going to be policing of SDD terminologies, e.g. maintaining
versioning, additions etc?

Versioning will be handled within SDD, but there can be no possibility of
policing a system
[Paterson, Trevor]
This what really worries the taxonomists - imposed standards reducing the
flexibility and expressivity of their descriptions - and hatred of a police
state. I think standardisation WILL be perceived to lead to a loss of
expressivity - but as often one person's expressivity is another'; s
incomprehensivity to an outsider this seems like potentially 'a good thing'
( in small doses obviously)........


>How was the terminology section created - by examining examples of
terminology specifications,
>ontology representations etc?

We have had a mix of off-the-top-of-the-head speculation as to how best to
do things, and proofing of concepts against real-world examples. I would
have liked to see more proofing going on during development, but this has
been hard to maintain, and may be to our cost. We are nbow at a phase where
several groups are trting to implement SDD-compliance for their systems -
this will be the proof of the pudding.

Note also that SDD is currently v0.9 - with the explicit statement on
release that everything may change if we find that the proofing fails.

>Does it form a standard template for storing a terminology?

What do you mean by this. It provides the standard schema for representing
an (undefined) terminology.

>Is it compatible with any existing tools, standards or formats - e.g.
ontology editors?

We have specifically made it very general. There is currently no existing
tool that can handle SDD. Hopefull this will change shortly.
[Paterson, Trevor]
Keep me posted

----------------------------------
So does Prometheus have any data yet, or is it still at the model stage? It
would be very interesting to try representing Prometheus data in SDD.
[Paterson, Trevor] to date  we have
the model ( and a data base implementation to save model compliant data)
a tool for creating simple terminologies -  with definitions, PartOf,
Typeof, Stategroup relations etc
a prototype angiosperm terminology/ontology
a prototype tool for using onologies to specify project description
templates ( proformas)
which also allows recordng of specimen descriptions
the ability to save these descriptions to the database in the model
compliant format
and we are currently getting user testing done on the prototype by the
taxonomists, who are making proformas adn saving specimen descriptions. We
are just about ready to use the tool for a 'real' project if we can get time
and interest to do this...eg for a small taxonomic revision. We are rapidly
approaching the end of our project funding however.. so it may switch to
being a 'back burner' project....



Cheers - k
----- Original Message -----
From: Paterson, Trevor
To: TDWG-SDD at LISTSERV.NHM.KU.EDU
Sent: Monday, March 15, 2004 9:41 PM
Subject: SDD Schema in relationship to Prometheus


Gregor

I have written a rough document considering several aspects of the
SDD-schema - largely interpreted with reference to our Prometheus Database
model for descriptive data. It seems easier to keep this all together,
rather than post it to various sections on twiki, so i am attaching it here

My main problems in interpreting the schema were the lack of documentation
( as always...) especially for the conceptually complex parts like concept
trees. I think clear, visual  summary models for description, characters,
concept trees etc would help a novice to get to grips with the concepts, and
might make some of the complexities more tractable. I do worry that the
overall schema is over complex and 'trying to do too much in one go' - eg
considering multiple language and expertise representations, although I am
sure that there are good political reasons for everything.....

yours
trevor


Trevor Paterson PhD
t.paterson at napier.ac.uk
School of Computing
Napier University
Merchiston Campus
10 Colinton Road
Edinburgh
Scotland
EH10 5DT
tel:          +44 (0)131 455-2752
www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org




More information about the tdwg-content mailing list