Re: TDWG-SDD XML proposals

2 Nov 2000

      Bob wrote:
...
The
distinction we are presently thinking about is purely syntactic. It is
whether the XML data representation is tree structured (by a
non-linear tree) and without idref's or is linearly structured and with
idrefs. In fact all trees can be flattened into lists if by nothing
else than adding references to the mix.
But this would seem to be to be a pretty important distinction and
something that would have to be decided one way or the other in the
Recommendation that we are going to come up with. Or are we thinking of a
Recommendation that would allow both?  It would be nice to be able to say,
here is the one true path, wouldn't it?
...
It's easy to be convinced of
this: trees can be represented in the conventional linear memory
supplied with most computers, so it is clearly possible (and not that
hard). What we want to get right is not whether this can be
done---since we know it can---but specifically whether XML's reference
mechanisms will capture the requirements of the intended stakeholders
in the enterprise. We hope it does, because there is a BIG advantage
to using XML: there is an abundance of tools, protocols, databses and
services available to those who adopt it.
Bob can you post an example of what a tree stucture represented linearly
with idrefs _might_ look like?
...
For me, what is lurking under the covers here is this: I don't think
that high quality human readability of the naked markup is that
important. I think it is the job of application software to render
markup palatable to humans if that is the problem, or to other
software if that is the problem.
At the end of the day we unfortunately have to deal with humans, and
taxonomic humans which are even worse, so readability of final output is
going to be a *major* issue.  The data will need to be in a form that
appropriate tools can transmute into something as close to natural language
as possible.  Otherwise the punters will walk away...
...
We extracted "highly" and "maybe" from the original.  This raises an
interesting point. It may well be good to recommend a specific set of
qualifiers. That allows more intelligent applications. For example,
(and neglecting that "highly variable" and "frequently variable" are
probably not acceptable as nearly equivalent ) it might be easier to
guide identification software if "frequently" is uniformly used where
appropriate, thus perhaps identifying some initially more (or less)
important feature. However, (a).An author might not be completely
satisfied with that qualifier and (b) an author may be satisfied but
prefer something else in natural language discourse.
I was thinking the exact same thoughts and even started preparing a list of
suggested qualifiers but quickly gave up because the list would end as
infinite as individual human vaguery and indecision and preferences for
particular turns of phrase.  It started to look too much like the trying to
standardize character lists exercise.

Lucid appeared to offer one - the by misinterpretation thing.  Delta
appeared to offered infinite - through the comment thing.   Or are these
conceptually different attributes?  Perhaps it might be possible to come up
with a manageable list of qualifiers?  Qualifiers In Current Use?

To have any useful application you would have to come up with a limited
list of recommended qualifiers, and a mountain of possible synonyms, or
almost synonyms, beneath them.  All too scary for me...
...
Probably this
entails recommending a mechanism for specifying alternatives to the
'official' qualifier. Those mechanisms could guide applications that
were aware of them. For example
<FEATURE_VALUE QUALIFIER="frequently" RENDER_QUALIFIER_AS="highly">
Why would you want to say this sort of thing in the XML dataset rather than
the XML aware application?  If your preferred qualifier is 'highly' rather
than 'frequently', why wouldn't you just code the data that way in the
first place?  Surely the most appropriate place for these sort of
translations is in the application that is designed to render data in a
particular way for a particular purpose (as it the wont of those fascist
editor types and other control freaks)?

I guess you might want to tell an application this if the only choice of
qualifiers you have is this, in this instance, use this particular
one.   Sort of like converse of how style sheets can specify a font, but
give the application a choice of what to use if that font can not be found.

Nope...  still too scary...

jim

Re: TDWG-SDD XML proposals

Jim Croft