Re: Morphological Data Representation

25 Nov 2001

      Steve Shattuck writes:
...
Date:         Mon, 26 Nov 2001 12:31:40 +1100
From: Steve Shattuck <Steve.Shattuck@csiro.au>
To: TDWG-SDD@usobi.org
Subject:      Re: Morphological Data Representation
First, I see Kevin has proposed a process slightly different to the one I
started on Friday.  I would suggest that we focus on one of these rather
than running both at the same time.  I don't think it matter which one we
follow as they both have pros and cons.  My idea was to start simple and
build as we go while Kevin's is to scope the project first then fill in the
details.  Any strong views on the process.
The "data challenges" process was what was agreed in Sydney. Also, it
corresponds closely to processes generally regarded as successful in
software design, namely Use Case analysis, though it focuses on the
use and not the user (in the sense of both human and software).
...
Guillaume commented that he's "always surprised seeing people recommend
using proprietary stuff" in response to my suggestion of using Microsoft's
XML Notepad (which, by the way, is actually free).  The point I was trying
to make was that XML can get complicated very quickly and using an XML view
(of any sort) is better than using a text editor.  Nothing more.
Also, MSIE is a reasonable XML viewer, albeit not an editor.
...
He also commented that my example "was closely related to delta (as xdelta),
whereas everyone seems to agree on building something from scratch".  I
fully agree, but I would suggest that DELTA has it basically right and that
whatever we develop will look suspiciously like DELTA.  For years
taxonomists have talked about things called "characters" with "states" and
have used these to build "descriptions" (= DELTA attributes).  DELTA is
firmly founded in taxonomic practices and since this is driving this process
I would be surprised if they diverge too far.  In other words there's a
reason DELTA has been as broadly accepted as it has and we shouldn't ignore
this.
In Sydney it was agreed that DELTA experience and functionality should
not be ignored, i.e. we should not start from scratch. [I don't happen
to agree with that because of the large community outside TDWG that is
already heavily involved in descriptive data if only informally. But
it is what was decided, and it does leverage the experience of most of
SDD. The trick will be to not arrive at something that only models DELTA]
...
Guillaume's final comment, about the use of XML elements and attributes, is
important but I still think it can wait.  There often isn't a clear
distinction between information that "has real-world meaning" and that which
is "modelling artefacts."  ["One person's data is another person's
metadata."]  The fact that "default XSLT transformation enforces this by
outputting elements contents and ignoring attributes" is too application
specific.  Many people process XML using DOM tools and they shouldn't be
constrained just because XSLT does it another way.
Not only can it wait, but these things are possibly technical enough
that they shouldn't be initially in this mailing list. We are nearly
finished an installation of vBulletin forum software and will offer
to operate a forum off this list for discussion of the technical bits.
...
Leigh's comments are good and worth a detailed look.  His
model/representation/syntax (or whatever you want to call it) of the same
data I used (see http://www.bath.ac.uk/~ccslrd/delta/lep.xml) is exactly the
kind of thing I had in mind.  Does his representation make more sense than
the one I proposed?  What are the strengths/weaknesses of our approaches.
Does one allow us to get to where we want to be?  Again, I don't think the
exact syntax is important at this stage.  For example, both models describe
the meaning of the numbers present in item descriptions for numeric
characters, Leigh as "<value start="0.5" end="0.55" />" and myself as
"<CodedState StateID=C3S1 Value=0.5 /> <CodedState StateID=C3S2 Value=0.55
/>" with the meaning stored with the state rather than the item description.
At the syntax-level these are very different but at the modelling-level they
are the same - the same information is being managed (and both differ from
the current DELTA-standard in this regard).  We will need to work on the
syntax but let's get the model agreed to first, then worry about specific
syntax.
...
However, we must agree on model extend: will it concerns only concept
description (aka: characters) or also case description (aka: items) ? IMHO,
only the first one can be generalized, or we'll have to validate the case
description twice: against a generic model and against its concept.
To the extent these are separable, there would probably be less
dispute about character representation. Probably this means that lots
of case description models have to fit on the same character model.
...
...
For using an example, if i have a description of the characters of
Pociloporidae familiy, and a description of the items of Pociloporidae
family, i'll have to make sure characters are really characters (validating
against generic character model), to make sure items are really items
(validating against generic items model), and make sure Pociloporidae items
are really Pociloporida (validating items against characters). I would
prefer
to have only to validate characters against a generic character model, and
validate items against characters, meaning using a character description as
a
suitable model for items description.
As a not biologist, this sounds right to me.
...
I believe this misses the goals of this project in a number of important
ways and we should avoid going down this path.  It seems to mix the (i)
processing of the data with (ii) the data representation with (iii)
taxonomic work practices.  I'm very uncomfortable going there.

...
Finally, Peter's concerns are important for the next step, expanding the
proposed representation to include information not currently managed.  One
of the strong recommendations from SDD Round 1 was to manage raw data.  This
needs to be housed under the summarized data (in this case, the actual
measurements under the ratios).  We WILL need to do this eventually.
Peter also pointed out possibly our largest challenge. He noted that "having
clear spots in wings is not very
precise if the data is to be comparable beyond the group in question - which
I suppose is part of the goal."  At every TDWG meeting I've been to we
decide that we can't build standards for specific character values and yet
at every TDWG meeting I've been to we try to build standards for specific
character values.  We need to build mechanisms to allow sharing of character
lists across projects IF THOSE PROJECTS WANT TO USE THIS FEATURE.  If
projects don't want to share character lists, for what ever reason, then
they won't not matter how important we think it is to do so.
I agree with this. Whether sharing character lists matters may be a
function of the purpose of the list. For example, paper field guides
to a given group of taxa often have a far greater commonality of
description of characters than they do of characters. And they often
come equipped with a character metadata section that explains how to
use the characters.
...
<soapbox on>
I think this focus on "standard character lists" is very much a "plant
thing."  In animals it would never occur to me to use "clear spots in wings"
for anything other than the local context for which it was established.  No
one would suggest that "clear spots in wings" in bees has anything to do
with "clear spots in wings" in butterflies and trying to use a single
character coding for this would receive limited support at best.  I think
the problem is that it's common to talk about "identifying a plant" but very
rare to talk about "identifying an animal."  There are no "faunas" that are
equivalent to "floras."  This very fundamental difference between plants and
animals and the way people view them has a huge impact on this very
development - it's one of the reason's that the botanical community has
accepted DELTA much more strongly than the zoological community.  While a
"global flora" is a completely reasonable goal, a "global fauna" isn't even
a faint blip on distant radar.  Zoologists work in relative isolation
compared to botanists and have very different work practices and needs.
Meeting the needs of both of these communities will be a significant
challenge, one I'm not sure we can meet in a single set of tools.
<soapbox off>
Thanks, Steve
Steve Shattuck
CSIRO Entomology
steve.shattuck@csiro.au

Robert A. (Bob) Morris

tags

participants (1)