-----Original Message-----
From: Kevin Thiele [ mailto:kevin.thiele@BIGPOND.COM
<mailto:kevin.thiele@BIGPOND.COM> ]
Sent: 16 March 2004 22:59
To: TDWG-SDD@LISTSERV.NHM.KU.EDU
Subject: Re: SDD Schema in relationship to
Prometheus_Response to Kevin
Trevor,
I agree that there will be considerable congruence between SDD and
Prometheus, imposed by the needs of the system. The main
difference is in
our decision early on not to constrain a taxonomist's
ability to express any
necessary differences in SDD.
We looked at a proscribed part-region-property-state data
model such as
you've developed for Prometheus early on, and rejected it in
favour of a
more general, simple character-state model (such as DELTA
had; with the
extension that characters may optionally be arranged into character
hierarchies).
For instance, in an interactive key that I published a few
years ago (The
Families of Flowering Plants of Australia, in Lucid), I have
(amongst many
others) the following characters:
Salt tolerance
plants tolerating high salt levels (halophytes)
plants not salt tolerant
General habit
tree
shrub
climber (woody or herbaceous)
herb
grass- or sedge-like plant
Epiphytic or lithophytic habit
plants growing in soil (not epiphytic or lithophytic)
plants growing on other plants or on bare rock surfaces
(epiphytic or
lithophytic)
Habit (aquatic herbs only)
free-floating
rooted in substrate with leaves all or mostly submerged
rooted in substrate with leaves mostly floating on the
water surface
rooted in substrate with leaves mostly emergent above the
water surface
Seasonal longevity
annual, biennial or ephemeral
perennial
Seasonality of leaves (woody plants)
evergreen
deciduous or semi-deciduous
Structures for spreading vegetatively
none (plants not spreading vegetatively)
underground bulbs, corms or tubers etc
rhizomes, stolons or root-suckers
detached aerial stem parts, bulbils or proliferous flowerheads
Chlorophyll in stems or leaves
present (plants green or grey-green)
absent (plants colourless, white or yellowish)
Nutritional strategy
neither carnivorous nor parasitic (normal plants)
partially or totally parasitic on other plants
carnivorous
Trap structures (carnivorous plants only)
submerged or underground bladders
pitcher-traps
sticky glands or glandular hairs on leaves and/or stems
trap like irritable leaf blade segments
I don't see how it would be possible to represent these in
your data model,
but as a taxonomist I need to represent them for my purpose.
There would be
no problem with these in SDD.
The good news I suppose is that since SDD is (hopefully)
more general, there
should be no problem rendering your data in SDD. The more interesting
question is whether your data *model* (the part-structure
stuff) would come
out the other end of an SDD roundtrip.
It should be possible, if you represent you part-structure stuff in a
character hierarchy, and assume after roundtripping that the
hierarchy you
get back can still be interpreted in the same way. That is,
you would need
to check whether the hierarchy you receive conforms with
your model (and I
suppose reject it if it does not).
Cheers - k
----- Original Message -----
From: Paterson, Trevor
To: TDWG-SDD@LISTSERV.NHM.KU.EDU
Sent: Tuesday, March 16, 2004 10:36 PM
Subject: Re: SDD Schema in relationship to
Prometheus_Response to Kevin
Kevin
Thanks for your reply, it is becoming much clearer to me
that actually alot
of our thoughts are convergent ( probably because we are all
thinking about
the same issues.... ). You have clarified a lot of ipoints,
and i have added
a little more clarification below....
It looks like a worthwhile task would be to try and
represent our angiosperm
terminology in SDD format at some stage ( time permitting
etc...as ever).
This is probably more straightforward than representing our
descriptive data
according to SDD as our underlying data model is quite
different insome
aspects ( I think!!!).
cheers
Trevor
Trevor Paterson PhD
t.paterson@napier.ac.uk
School of Computing
Napier University
Merchiston Campus
10 Colinton Road
Edinburgh
Scotland
EH10 5DT
tel: +44 (0)131 455-2752
www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org
-----Original Message-----
From: Kevin Thiele [ mailto:kevin.thiele@BIGPOND.COM
<mailto:kevin.thiele@BIGPOND.COM> ]
Sent: 15 March 2004 22:33
To: TDWG-SDD@LISTSERV.NHM.KU.EDU
Subject: Re: SDD Schema in relationship to Prometheus
Apologies: The previous post from me with this title was an
unfinished
version sent off prematurely by my email editor. Please
ignore and use this
one instead.
-----------------------------
Hi Trevor - thanks very much for your comments and
comparative document -
this is really useful, and we need to get much more feedback
like this.
The main difference between SDD and Prometheus seems to be
that you are
working specifically on the basis of defining a controlled
terminology
whereas SDD explicitly decided early on that a controlled
terminology was
outside our scope. History will judge which approach is best.
[Paterson, Trevor]
a controlled terminology was not such a large feature of our work
initially - we were more interested in the model for saving
'character'
data - however it became obvious that the only way to allow
unambiguous
interpretation of data - for reuse, comparison etc - was to
provide full
definitions. it then seemed desirable that people would
share definitions to
allow compatability.......whether this will be achieved by bottom up
adoption is an open question. Taxonomists don seem to like
the idea of top
down imposition - tho they may be happier when it is
restricted to quite a
small domain of users
We did have early discussions about a controlled terminology
(see the list
archives for a history of this).One dificulty for us is that
SDD is designed
to be biology-wide (indeed, we have even removed specific
references to
biology, such as "taxon", because SDD is equally applicable
to descriptions
of non-taxa such as diseases, nutrient deficiency syndromes,
soils and
minerals. Perhaps here we have drawn our bow too wide, but
we were informed
by the fact that at our Lisbon meeting all but one of the
contributors who
were working with identification tools had removed their
biology-specific
tags to become more general). Prometheus (as I understand it
from your
document) is specifically botanical. This would be an intolerable
restriction for us given our brief.
[Paterson, Trevor]
We are constrained by the expertise of whoever we are
collaborating with...
the taxonomists at RBGE are full partners in this project so
the 'test
domains' reflect their interests and expertise ( or we will
never get real
test data). We hope that our character model will be
applicable to the whole
field of biological taxonomy - - and that specific
ontologies/terminologies
could be developed to allow description of other groups (
mammals, insects
etc)
Obviously, a botany-wide controlled terminology is more
achievable than a
biology-wide one. Personally, however, I think that you run
the danger even
in botany with any controlled terminology of trying to force
nature kicking
and screaming into small boxes, and do it an injustice
therewith. I don't
know how any botany-wide controlled terminology could cope
with the leaves
of Drosera auriculata, for instance, or the morphology of
Podostemaceae. (In
fact, I wonder whether the dream of a controlled terminology
is more likely
in a cold Northern Hemisphere climate than in the biodiverse South or
tropics?).
[Paterson, Trevor]
Yes - we know that diverse taxa would probably require
specific ontologies.
We may be able to develope a system that allows a core
central terminology -
with taxon specific extensions.....We want to allow
MEANINGFUL comparison
of data - and often there is no need or sense in comparing
data across
widely divergent taxa ie you would might want to compare
the properties of
stalks on angiosperm flowers, but it is probably of no
taxonomic interest to
compare these with the stalks of a slime mould fruiting
body...............
In general, we have taken the view that a controlled terminology in
particular domains (e.g. legumes) may develop as an emergent
property of
SDD, rather than imposed top-down.
[Paterson, Trevor]
Yes - this is the working model we have come round
to...users develope an
ontology and share it with colleagues in a closely related
field etc....
On more specific points from your document:
Complexity: SDD was scoped to be a superset of existing systems and
standards e.g. DELTA, Lucid, DeltaAcess, and also to
accommodate future
developments that those of us working in the field can
envisage but no-one's
really done yet (particularly federation issues - and you
may be further
down this track than we are). This is part of the reason for
the complexity,
It is not clear to me whether SDD is proposing this schema as
a unifying schema to which different description formats
would map their
own schema
or
whether the SDD schema is being proposed as a schema for
developers to
(partially) implement when designing applications
and repositories for capturing descriptive data.
It is designed as a unifying standard, to allow lossless
roundtripping
between applications. At the same time, we are struggling
with how much
should be mandatory and how much optional (your second option)
From our own collaborative experiences with botanical
taxonomists, data
models and structures hold no interest to them in
practice, and they find even our simple conceptual model of
character
description complex to understand. Probably few working
taxonomists would wish to interact at any level with the
SDD schema and
applications would have to achieve this mapping
transparently.
On this I'm sure you're right, and we have had many
discussions within SDD
about this problem. There are differing views as to the importance of
taxonomists themselves coming to grips with SDD, as the
standard itself will
generally be invisible to a taxonomist using an
SDD-compliant application.
[Paterson, Trevor]
The problem is that someone has to write and implement the
applications (
e.g. me !) - and they have to do this in collaboration with
taxonomists - or
it will be perceived as an irrelevant imposition -
-therefore it will always
be necessary to get at least some taxonomists from a
variety of fields to
understand the schema...
From my perspective actually it seems much easier for
computer scientists to
get to grips with taxonomy rather than vice versa - (
although taxonmists
would be a bit defensive about this....)
Translation and multiple language representations: allowing multiple
languages is seen as a fundamental part of the SDD brief.
Life would indeed
be much simpler if everyone spoke the same language, but
they don't so we
need to handle that.
It is not clear whether SDD proposes that a single document
can include
multiple language representations, or whether these
would form separate documents, conforming to the same standard
SDD can handle multiple language representations of every
character string
within the one document.
[Paterson, Trevor]
Still not convinced allowing multiple language
representations would mace
for good/accurate science
Multiple expertise levels
I am similarly suspicious of the necessity for including
the ability for
recording different expertise levels in one document format.
Is SDD proposing/allowing multiple representations within
the same document
: or just that the same format/standard can be
used for documents aimed at different expertise level.
There clearly is value in being able to extract/translate
simple language
descriptions from complex data resources - as is
necessary for compiling flora and keys from monographs and original
descriptions. However, is including the ability to describe
descriptive data in language suitable for primary
schoolchildren relevant
to an accurate scientific database of taxonomic data.
[Again this would appear to be a political requirement??]
This is not a political requirement, but an attempt to broaden the
application of taxonomy beyond taxonomists (surely a
requirement if the
taxonomic crisis is to be resolved). It also derives neatly
from the XML
underpinning (XML is based on the idea of multiple
representations of a
single document)
[Paterson, Trevor]
Yes I see where you are coming from - i think the probelm is that
taxonomists are concerned about accurate representation of
their data for
their purposes - making a shareable version of this is not
perceived as of
any value to them - they want a scientific tool to do their
job - i am not
sure how /if we could encourage a dual markup approach - or whether
different 'markets' for descriptive data would exist independently -
Defining the descriptive terminology
Are you suggesting that the SDD Terminology Section will be
adequate and
appropriate to store
and represent any (allowed) defined terminology?
Yes, we hope so. Do you think it will be inadequate?
[Paterson, Trevor]
It probably would be adequate to store Prometheus
terminologies - with
minor tinkering, the semantics of the terminolgies could be
saved just with
glossary entries and relationships etc - but would obviously require
interpetation by suitable applications. I cant understand
concept trees well
enough to get a feel for whether they could store the semantics of a
terminolgy more explicitly.....
However, that is not to say SDD could cope with other
terminolgies that
might have further unforseen relationships - there might need to be a
facility for recording 'user defined' relationships such as
our stategroup
membership and restrictions between these and sets of
structures - i think
that these type of relationships are representable in
concept trees - but
would need a primer/ turotial to show me how.
Is the standard going to allow descriptions to reference
other defined
terminologies?
It will be possible to outsource the terminology section, so
if a group
creates a controlled vocabulary, that could be referenced in
multiple SDD
documents. So presumably Prometheus could be the source of a
controlled
vocabulary that other users (of they found it adequate)
could reference.
Would SDD only accept Data marked up in an SDD terminology?
Yes - or do I misunderstand this question?
[Paterson, Trevor]
If i understand your previous remark - you could have
completely external
termnologies - that SDD 'knows' nothing about the structure
and semantics -
so it would have to accept 'non-SDD' terminology
Would existing terminologies have to be
translated/mapped/redescribed in
SDD format?
Any existing terminology can be represented in SDD, so there
will be no
remapping necessary.
[Paterson, Trevor]
Again this will only be knowable if people try doing it.
Obviously if there
is a standard structure people would be encouraged to use it - but
prexisting description ontologies ( eg PlantOnology and
GeneOntology) would
not want to retrofit to the standard....even if there is no
' rdesign or
remapping necessary - there is the matter of re-expressing
it in an SDD
format
Who is going to create terminologies, e.g.gusers on an
adhoc basis, or
expert user groups?
As above, these may develop particularly for some groups (e.g. ferns,
legumes), but some users may choose to stay outside such a
system (there
will be benefits and costs of using a controlled vocabulary,
so people will
have to weigh it up for themselves). SDD itself is agnostic.
[Paterson, Trevor]
Almost our opinion - but we are not agnostic - we believe
'interpretability'
and 'reusability' are Gods, and we should proselytize on
their behalfs -
gently of course - by offering only carrots and no sticks...
Is it an aim to promote re-use and sharing of terminologies?
It would be a desirable outcome, but we hope it evolves
bottom-up rather
than being imposed top-down.
[Paterson, Trevor]
Sure
Is there going to be policing of SDD terminologies, e.g. maintaining
versioning, additions etc?
Versioning will be handled within SDD, but there can be no
possibility of
policing a system
[Paterson, Trevor]
This what really worries the taxonomists - imposed standards
reducing the
flexibility and expressivity of their descriptions - and
hatred of a police
state. I think standardisation WILL be perceived to lead to a loss of
expressivity - but as often one person's expressivity is another'; s
incomprehensivity to an outsider this seems like potentially
'a good thing'
( in small doses obviously)........
How was the terminology section created - by examining examples of
terminology specifications,
ontology representations etc?
We have had a mix of off-the-top-of-the-head speculation as
to how best to
do things, and proofing of concepts against real-world
examples. I would
have liked to see more proofing going on during development,
but this has
been hard to maintain, and may be to our cost. We are nbow
at a phase where
several groups are trting to implement SDD-compliance for
their systems -
this will be the proof of the pudding.
Note also that SDD is currently v0.9 - with the explicit statement on
release that everything may change if we find that the
proofing fails.
Does it form a standard template for storing a terminology?
What do you mean by this. It provides the standard schema
for representing
an (undefined) terminology.
Is it compatible with any existing tools, standards or
formats - e.g.
ontology editors?
We have specifically made it very general. There is
currently no existing
tool that can handle SDD. Hopefull this will change shortly.
[Paterson, Trevor]
Keep me posted
----------------------------------
So does Prometheus have any data yet, or is it still at the
model stage? It
would be very interesting to try representing Prometheus data in SDD.
[Paterson, Trevor] to date we have
the model ( and a data base implementation to save model
compliant data)
a tool for creating simple terminologies - with definitions, PartOf,
Typeof, Stategroup relations etc
a prototype angiosperm terminology/ontology
a prototype tool for using onologies to specify project description
templates ( proformas)
which also allows recordng of specimen descriptions
the ability to save these descriptions to the database in the model
compliant format
and we are currently getting user testing done on the
prototype by the
taxonomists, who are making proformas adn saving specimen
descriptions. We
are just about ready to use the tool for a 'real' project if
we can get time
and interest to do this...eg for a small taxonomic revision.
We are rapidly
approaching the end of our project funding however.. so it
may switch to
being a 'back burner' project....
Cheers - k
----- Original Message -----
From: Paterson, Trevor
To: TDWG-SDD@LISTSERV.NHM.KU.EDU
Sent: Monday, March 15, 2004 9:41 PM
Subject: SDD Schema in relationship to Prometheus
Gregor
I have written a rough document considering several aspects of the
SDD-schema - largely interpreted with reference to our
Prometheus Database
model for descriptive data. It seems easier to keep this all
together,
rather than post it to various sections on twiki, so i am
attaching it here
My main problems in interpreting the schema were the lack of
documentation
( as always...) especially for the conceptually complex
parts like concept
trees. I think clear, visual summary models for
description, characters,
concept trees etc would help a novice to get to grips with
the concepts, and
might make some of the complexities more tractable. I do
worry that the
overall schema is over complex and 'trying to do too much in
one go' - eg
considering multiple language and expertise representations,
although I am
sure that there are good political reasons for everything.....
yours
trevor
Trevor Paterson PhD
t.paterson@napier.ac.uk
School of Computing
Napier University
Merchiston Campus
10 Colinton Road
Edinburgh
Scotland
EH10 5DT
tel: +44 (0)131 455-2752
www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org