TDWG-SDD XML proposals

Jim Croft jrc at ANBG.GOV.AU
Thu Nov 2 06:51:34 CET 2000


Bob wrote:
>The
>distinction we are presently thinking about is purely syntactic. It is
>whether the XML data representation is tree structured (by a
>non-linear tree) and without idref's or is linearly structured and with
>idrefs. In fact all trees can be flattened into lists if by nothing
>else than adding references to the mix.

But this would seem to be to be a pretty important distinction and
something that would have to be decided one way or the other in the
Recommendation that we are going to come up with. Or are we thinking of a
Recommendation that would allow both?  It would be nice to be able to say,
here is the one true path, wouldn't it?

>It's easy to be convinced of
>this: trees can be represented in the conventional linear memory
>supplied with most computers, so it is clearly possible (and not that
>hard). What we want to get right is not whether this can be
>done---since we know it can---but specifically whether XML's reference
>mechanisms will capture the requirements of the intended stakeholders
>in the enterprise. We hope it does, because there is a BIG advantage
>to using XML: there is an abundance of tools, protocols, databses and
>services available to those who adopt it.

Bob can you post an example of what a tree stucture represented linearly
with idrefs _might_ look like?

>For me, what is lurking under the covers here is this: I don't think
>that high quality human readability of the naked markup is that
>important. I think it is the job of application software to render
>markup palatable to humans if that is the problem, or to other
>software if that is the problem.

At the end of the day we unfortunately have to deal with humans, and
taxonomic humans which are even worse, so readability of final output is
going to be a *major* issue.  The data will need to be in a form that
appropriate tools can transmute into something as close to natural language
as possible.  Otherwise the punters will walk away...

>We extracted "highly" and "maybe" from the original.  This raises an
>interesting point. It may well be good to recommend a specific set of
>qualifiers. That allows more intelligent applications. For example,
>(and neglecting that "highly variable" and "frequently variable" are
>probably not acceptable as nearly equivalent ) it might be easier to
>guide identification software if "frequently" is uniformly used where
>appropriate, thus perhaps identifying some initially more (or less)
>important feature. However, (a).An author might not be completely
>satisfied with that qualifier and (b) an author may be satisfied but
>prefer something else in natural language discourse.

I was thinking the exact same thoughts and even started preparing a list of
suggested qualifiers but quickly gave up because the list would end as
infinite as individual human vaguery and indecision and preferences for
particular turns of phrase.  It started to look too much like the trying to
standardize character lists exercise.

Lucid appeared to offer one - the by misinterpretation thing.  Delta
appeared to offered infinite - through the comment thing.   Or are these
conceptually different attributes?  Perhaps it might be possible to come up
with a manageable list of qualifiers?  Qualifiers In Current Use?

To have any useful application you would have to come up with a limited
list of recommended qualifiers, and a mountain of possible synonyms, or
almost synonyms, beneath them.  All too scary for me...

>Probably this
>entails recommending a mechanism for specifying alternatives to the
>'official' qualifier. Those mechanisms could guide applications that
>were aware of them. For example
>
><FEATURE_VALUE QUALIFIER="frequently" RENDER_QUALIFIER_AS="highly">

Why would you want to say this sort of thing in the XML dataset rather than
the XML aware application?  If your preferred qualifier is 'highly' rather
than 'frequently', why wouldn't you just code the data that way in the
first place?  Surely the most appropriate place for these sort of
translations is in the application that is designed to render data in a
particular way for a particular purpose (as it the wont of those fascist
editor types and other control freaks)?

I guess you might want to tell an application this if the only choice of
qualifiers you have is this, in this instance, use this particular
one.   Sort of like converse of how style sheets can specify a font, but
give the application a choice of what to use if that font can not be found.

Nope...  still too scary...

jim




More information about the tdwg-content mailing list