Steve Shattuck writes:
Date: Mon, 26 Nov 2001 12:31:40 +1100 From: Steve Shattuck Steve.Shattuck@csiro.au To: TDWG-SDD@usobi.org Subject: Re: Morphological Data Representation
First, I see Kevin has proposed a process slightly different to the one I started on Friday. I would suggest that we focus on one of these rather than running both at the same time. I don't think it matter which one we follow as they both have pros and cons. My idea was to start simple and build as we go while Kevin's is to scope the project first then fill in the details. Any strong views on the process.
The "data challenges" process was what was agreed in Sydney. Also, it corresponds closely to processes generally regarded as successful in software design, namely Use Case analysis, though it focuses on the use and not the user (in the sense of both human and software).
Guillaume commented that he's "always surprised seeing people recommend using proprietary stuff" in response to my suggestion of using Microsoft's XML Notepad (which, by the way, is actually free). The point I was trying to make was that XML can get complicated very quickly and using an XML view (of any sort) is better than using a text editor. Nothing more.
Also, MSIE is a reasonable XML viewer, albeit not an editor.
He also commented that my example "was closely related to delta (as xdelta), whereas everyone seems to agree on building something from scratch". I fully agree, but I would suggest that DELTA has it basically right and that whatever we develop will look suspiciously like DELTA. For years taxonomists have talked about things called "characters" with "states" and have used these to build "descriptions" (= DELTA attributes). DELTA is firmly founded in taxonomic practices and since this is driving this process I would be surprised if they diverge too far. In other words there's a reason DELTA has been as broadly accepted as it has and we shouldn't ignore this.
In Sydney it was agreed that DELTA experience and functionality should not be ignored, i.e. we should not start from scratch. [I don't happen to agree with that because of the large community outside TDWG that is already heavily involved in descriptive data if only informally. But it is what was decided, and it does leverage the experience of most of SDD. The trick will be to not arrive at something that only models DELTA]
Guillaume's final comment, about the use of XML elements and attributes, is important but I still think it can wait. There often isn't a clear distinction between information that "has real-world meaning" and that which is "modelling artefacts." ["One person's data is another person's metadata."] The fact that "default XSLT transformation enforces this by outputting elements contents and ignoring attributes" is too application specific. Many people process XML using DOM tools and they shouldn't be constrained just because XSLT does it another way.
Not only can it wait, but these things are possibly technical enough that they shouldn't be initially in this mailing list. We are nearly finished an installation of vBulletin forum software and will offer to operate a forum off this list for discussion of the technical bits.
Leigh's comments are good and worth a detailed look. His model/representation/syntax (or whatever you want to call it) of the same data I used (see http://www.bath.ac.uk/~ccslrd/delta/lep.xml) is exactly the kind of thing I had in mind. Does his representation make more sense than the one I proposed? What are the strengths/weaknesses of our approaches. Does one allow us to get to where we want to be? Again, I don't think the exact syntax is important at this stage. For example, both models describe the meaning of the numbers present in item descriptions for numeric characters, Leigh as "<value start="0.5" end="0.55" />" and myself as "<CodedState StateID=C3S1 Value=0.5 /> <CodedState StateID=C3S2 Value=0.55 />" with the meaning stored with the state rather than the item description. At the syntax-level these are very different but at the modelling-level they are the same - the same information is being managed (and both differ from the current DELTA-standard in this regard). We will need to work on the syntax but let's get the model agreed to first, then worry about specific syntax.
However, we must agree on model extend: will it concerns only concept description (aka: characters) or also case description (aka: items) ? IMHO, only the first one can be generalized, or we'll have to validate the case description twice: against a generic model and against its concept.
To the extent these are separable, there would probably be less dispute about character representation. Probably this means that lots of case description models have to fit on the same character model.
For using an example, if i have a description of the characters of Pociloporidae familiy, and a description of the items of Pociloporidae family, i'll have to make sure characters are really characters (validating against generic character model), to make sure items are really items (validating against generic items model), and make sure Pociloporidae items are really Pociloporida (validating items against characters). I would
prefer
to have only to validate characters against a generic character model, and validate items against characters, meaning using a character description as
a
suitable model for items description.
As a not biologist, this sounds right to me.
I believe this misses the goals of this project in a number of important ways and we should avoid going down this path. It seems to mix the (i) processing of the data with (ii) the data representation with (iii) taxonomic work practices. I'm very uncomfortable going there.
Finally, Peter's concerns are important for the next step, expanding the proposed representation to include information not currently managed. One of the strong recommendations from SDD Round 1 was to manage raw data. This needs to be housed under the summarized data (in this case, the actual measurements under the ratios). We WILL need to do this eventually.
Peter also pointed out possibly our largest challenge. He noted that "having clear spots in wings is not very precise if the data is to be comparable beyond the group in question - which I suppose is part of the goal." At every TDWG meeting I've been to we decide that we can't build standards for specific character values and yet at every TDWG meeting I've been to we try to build standards for specific character values. We need to build mechanisms to allow sharing of character lists across projects IF THOSE PROJECTS WANT TO USE THIS FEATURE. If projects don't want to share character lists, for what ever reason, then they won't not matter how important we think it is to do so.
I agree with this. Whether sharing character lists matters may be a function of the purpose of the list. For example, paper field guides to a given group of taxa often have a far greater commonality of description of characters than they do of characters. And they often come equipped with a character metadata section that explains how to use the characters.
<soapbox on> I think this focus on "standard character lists" is very much a "plant thing." In animals it would never occur to me to use "clear spots in wings" for anything other than the local context for which it was established. No one would suggest that "clear spots in wings" in bees has anything to do with "clear spots in wings" in butterflies and trying to use a single character coding for this would receive limited support at best. I think the problem is that it's common to talk about "identifying a plant" but very rare to talk about "identifying an animal." There are no "faunas" that are equivalent to "floras." This very fundamental difference between plants and animals and the way people view them has a huge impact on this very development - it's one of the reason's that the botanical community has accepted DELTA much more strongly than the zoological community. While a "global flora" is a completely reasonable goal, a "global fauna" isn't even a faint blip on distant radar. Zoologists work in relative isolation compared to botanists and have very different work practices and needs. Meeting the needs of both of these communities will be a significant challenge, one I'm not sure we can meet in a single set of tools. <soapbox off>
Thanks, Steve
Steve Shattuck CSIRO Entomology steve.shattuck@csiro.au
participants (1)
-
Robert A. (Bob) Morris