Apologies: The previous post from me with this
title was an unfinished version sent off prematurely by my email editor. Please
ignore and use this one instead.
-----------------------------
Hi Trevor - thanks very much for your comments and
comparative document - this is really useful, and we need to get much more
feedback like this.
The main difference between SDD and Prometheus
seems to be that you are working specifically on the basis of defining a
controlled terminology whereas SDD explicitly decided early on that a controlled
terminology was outside our scope. History will judge which approach is
best.
We did have early discussions about a controlled
terminology (see the list archives for a history of this).One dificulty for us
is that SDD is designed to be biology-wide (indeed, we have even removed
specific references to biology, such as "taxon", because SDD is equally
applicable to descriptions of non-taxa such as diseases, nutrient deficiency
syndromes, soils and minerals. Perhaps here we have drawn our bow too wide, but
we were informed by the fact that at our Lisbon meeting all but one of the
contributors who were working with identification tools had removed their
biology-specific tags to become more general). Prometheus (as I understand it
from your document) is specifically botanical. This would be an intolerable
restriction for us given our brief.
Obviously, a botany-wide controlled terminology is
more achievable than a biology-wide one. Personally, however, I think that you
run the danger even in botany with any controlled terminology of trying to force
nature kicking and screaming into small boxes, and do it an injustice therewith.
I don't know how any botany-wide controlled terminology could cope with the
leaves of Drosera auriculata, for instance, or the morphology of
Podostemaceae. (In fact, I wonder whether the dream of a controlled terminology
is more likely in a cold Northern Hemisphere climate than in the biodiverse
South or tropics?).
In general, we have taken the view that a
controlled terminology in particular domains (e.g. legumes) may develop as an
emergent property of SDD, rather than imposed top-down.
On more specific points from your
document:
Complexity: SDD was scoped to be a
superset of existing systems and standards e.g. DELTA, Lucid, DeltaAcess, and
also to accommodate future developments that those of us working in the field
can envisage but no-one's really done yet (particularly federation issues - and
you may be further down this track than we are). This is part of the reason for
the complexity,
>It is not clear to me whether SDD is proposing this schema as
>a unifying schema to which different description formats would
map their own schema
>or
>whether the SDD schema is being proposed as a schema for
developers to (partially) implement when designing applications
>and repositories for capturing descriptive
data.
It is designed as a unifying standard, to allow lossless
roundtripping between applications. At the same time, we are struggling with how
much should be mandatory and how much optional (your second
option)
>From our own collaborative experiences with botanical
taxonomists, data models and structures hold no interest to them in
>practice, and they find even our simple conceptual model of
character description complex to understand. Probably few working
>taxonomists would wish to interact at any level with the SDD
schema and applications would have to achieve this mapping
>transparently.
On this I'm sure you're right, and we have had many
discussions within SDD about this problem. There are differing views as to the
importance of taxonomists themselves coming to grips with SDD, as the standard
itself will generally be invisible to a taxonomist using an
SDD-compliant application.
Translation and multiple language
representations: allowing multiple languages is seen as a
fundamental part of the SDD brief. Life would indeed be much simpler if everyone
spoke the same language, but they don't so we need to handle that.
>It
is not clear whether SDD proposes that a single document can include multiple
language representations, or whether these
>would
form separate documents, conforming to the same
standard
SDD
can handle multiple language representations of every character string within
the one document.
Multiple expertise
levels
>I am
similarly suspicious of the necessity for including the ability for recording
different expertise levels in one document format.
>Is SDD
proposing/allowing multiple representations within the same document : or just
that the same format/standard can be
>used for
documents aimed at different expertise level.
>
>There
clearly is value in being able to extract/translate simple language descriptions
from complex data resources – as is
>necessary
for compiling flora and keys from monographs and original descriptions. However,
is including the ability to describe
>descriptive
data in language suitable for primary schoolchildren relevant to an accurate
scientific database of taxonomic data.
>[Again this
would appear to be a political requirement??]
This is not a political requirement, but an attempt to broaden
the application of taxonomy beyond taxonomists (surely a requirement if the
taxonomic crisis is to be resolved). It also derives neatly from
the XML underpinning (XML is based on the idea of multiple representations
of a single document)
Defining the descriptive terminology
>Are you suggesting that the SDD
Terminology Section will be adequate and appropriate to store
>and represent any (allowed) defined
terminology?
Yes, we hope so. Do you think it will be
inadequate?
>Is the standard going to allow
descriptions to reference other defined terminologies?
It will be possible to outsource the
terminology section, so if a group creates a controlled vocabulary, that
could be referenced in multiple SDD documents. So presumably Prometheus could be
the source of a controlled vocabulary that other users (of they found it
adequate) could reference.
>Would SDD only accept Data marked up in
an SDD terminology?
Yes - or do I misunderstand this
question?
>Would existing terminologies have to be
translated/mapped/redescribed in SDD format?
Any existing terminology can be represented
in SDD, so there will be no remapping necessary.
>Who is going to create terminologies,
e.g.gusers on an adhoc basis, or expert user groups?
As above, these may develop particularly for
some groups (e.g. ferns, legumes), but some users may choose to stay outside
such a system (there will be benefits and costs of using a controlled
vocabulary, so people will have to weigh it up for themselves). SDD itself is
agnostic.
>Is it an aim to promote re-use and
sharing of terminologies?
It would be a desirable outcome, but we hope
it evolves bottom-up rather than being imposed top-down.
>Is there going to be policing of SDD
terminologies, e.g. maintaining versioning, additions etc?
Versioning will be handled within SDD, but
there can be no possibility of policing a system.
>How was the terminology section created –
by examining examples of terminology specifications,
>ontology representations
etc?
We have had a mix of off-the-top-of-the-head
speculation as to how best to do things, and proofing of concepts against
real-world examples. I would have liked to see more proofing going on during
development, but this has been hard to maintain, and may be to our cost. We are
nbow at a phase where several groups are trting to implement SDD-compliance for
their systems - this will be the proof of the pudding.
Note also that SDD is currently v0.9 - with
the explicit statement on release that everything may change if we find that the
proofing fails.
>Does it form a standard template for
storing a terminology?
What do you mean by this. It provides the
standard schema for representing an (undefined) terminology.
>Is it compatible with any existing tools,
standards or formats – e.g. ontology editors?
We have specifically made it very general.
There is currently no existing tool that can handle SDD. Hopefull this will
change shortly.
----------------------------------
So does Prometheus have any data yet, or is
it still at the model stage? It would be very interesting to try representing
Prometheus data in SDD.
Cheers - k
----- Original Message -----
Sent: Monday, March 15, 2004 9:41
PM
Subject: SDD Schema in relationship to
Prometheus
Gregor
I have written a
rough document considering several aspects of the SDD-schema - largely
interpreted with reference to our Prometheus Database model for descriptive
data. It seems easier to keep this all together, rather than post it to
various sections on twiki, so i am attaching it here
My main problems
in interpreting the schema were the lack of documentation ( as always...)
especially for the conceptually complex parts like concept trees. I think
clear, visual summary models for description, characters, concept trees
etc would help a novice to get to grips with the concepts, and might make some
of the complexities more tractable. I do worry that the overall schema is over
complex and 'trying to do too much in one go' - eg considering multiple
language and expertise representations, although I am sure that there are good
political reasons for everything.....
yours
trevor
Trevor Paterson PhD
t.paterson@napier.ac.uk
School of Computing
Napier University
Merchiston
Campus
10 Colinton
Road
Edinburgh
Scotland
EH10 5DT
tel:
+44 (0)131 455-2752
www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org