(LEX) Global but non-universal lexicons

Robert A. (Bob) Morris ram at CS.UMB.EDU
Sat Jan 15 08:40:00 CET 2000


Kevin Thiele writes:
 > Date:         Sat, 15 Jan 2000 17:10:42 +1100
 > From: Kevin Thiele <kevin.thiele at PI.CSIRO.AU>
 >
 > Question for XMLHeads - can an XML DTD refer to a remote XML document as a
 > resource?

There's a short answer, a long answer, and a quibble about this first
question. Like most quibbles, the quibble is the longest.

The short answer is: Yes, this is routinely done.

The long answer: There's little point in putting the long answer into
this group.

The quibble: As Leigh and others have suggested, there is substantial
motivation to specify data structuring in XML-Schema instead of the
older DTD formalisms, even if the result is then translated into a DTD
so that more of current tools can use it. This is especially true
here, because the mechanisms for importing definitions are less clunky
in Schema than in DTD (largely because DTD arises from SGML, which
predates the web by 20 years).

If nothing else, the vocabulary of Schema allows more definitive
descriptions of data types and so forms a better reference standard
than a DTD. Although I count myself more an XMLTongue or XMLNose, I
think it fair to say that the XMLHeads are probably thinking: "whenever
the TaxonHead's say DTD, we should privately understand Schema"


 > ...
 >

 > If it can, could we solve the lexicon problem (which is, that some people
 > like the idea of a lexicon and others don't) like this?
 >
 > A lexicographer puts onto the web an XML resource that provides a
 > character/state list for a given universe of interest (seed plants/all
 > life/the flagellated sporozoans of deepsea hydrothermal vents, whatever). If
 > other workers like the character list and think they can live with it, then
 > they include a URL to the lexicon in their document, and the lexicon is used
 > as their character list. But if no available lexicon works for a given user,
 > then they can define their own character list.

This is pretty much one of the things that XML is designed to make
feasible.


> That way, perhaps we'll all be happy.

This is pretty much one of the things that XML is designed to make
feasible, too. :-)

 > If this model works, it could perhaps also allow for extensibility ie mixed
 > lists comprising remote lexica and extra document-defined characters.


 >
 > A problem with such lexica would be data security - what would happen if the
 > lexicon's server crashes permanently - and how to allow for evolving lexica
 > without invalidating previous treatments.

Definition evolution and resource location problems are so pervasive
that they are an important part of the XML specifiers' work (and that
of W3C in general). FWIW it's not that hard to make an application use
a local copy of a resource if it can't find the network copy.

Backward compatibility of course rests not so much on mechanisms
enabling it as with the authors actually accomplishing it. The task is
fraught with the risk of thinkos. (A thinko is to your brain as a typo
is to your fingers. The term was coined by David Fuchs, one of the
architects of FrameMaker and before that an implementor at Stanford of
the portable version of TeX, which had originally been written in the
SAIL language and only ran to a handful of PDP-10's.)


 >
 > Any good?

Smells good to me...

 >
 > Cheers - k

Bob Morris




More information about the tdwg-content mailing list