(LEX) Global but non-universal lexicons

Leigh Dodds ldodds at INGENTA.COM
Sat Jan 15 12:54:46 CET 2000

> Question for XMLHeads - can an XML DTD refer to a remote XML document as a
> resource?

Judging by the example you outline below I believe you want to link
together two XML documents, rather than a DTD and a document. The latter
doesn't bring any benefits in this case...

> A lexicographer puts onto the web an XML resource that provides a
> character/state list for a given universe of interest (seed plants/all
> life/the flagellated sporozoans of deepsea hydrothermal vents,
> whatever). If other workers like the character list and think they can
> with it, then they include a URL to the lexicon in their document, and the
> lexicon is used as their character list. But if no available lexicon works
for a
> given user, then they can define their own character list. That way,
perhaps we'll all
> be happy.

This is exactly the sort of thing I was hoping would happen. The advantage
that character and item information are separated. Therefore a character
can be reused, either between groups (across the web, departments,
or reused by the same researcher when producing a new taxonomy.

I debated this structure in XDELTA, but plumbed for a single file comprising
all information. I think I commented on this somewhere in the DTD.

In my second iteration I'm going to split them into two files - actually
I'm thinking that a modular approach may be useful : define a module for
character/state information and a module for expressing item information.
I've tentatively identified a third module which comprising textual
markup, links to additional resources, etc.

Breaking the problem down into 3 sections should make tackling it a lot
easier. The schema itself can then be modular as well.

There is a standard for document linking - XLink - which allows
two documents to be linked, and even portions of a document to be
selected. So we should be able to use this kind of feature
quite easily:

e.g. <characters import="http://www.foo.org/characters.xml" />
or   <characters import="c://my-project/characters.xml" />

(note these are not XLinks, I don't have the spec to hand for
a clear example)

> If this model works, it could perhaps also allow for
> extensibility ie mixed lists comprising remote lexica and extra
> document-defined characters.


> A problem with such lexica would be data security - what would
> happen if the lexicon's server crashes permanently - and how to allow for
> evolving lexica without invalidating previous treatments.

There are a number of issues:

- how are character lists imported if a computer is not connected to
the net?
- what happens if a server is not available?
- how are multiple imported character lists merged - which takes
precedence (local or remote)
- versioning of character lists
- what if a remote character list is (or becomes) invalid (related
to versioning)
- how are a item list, and its associated character list (and any
additional supporting files) packaged together for either delivery
across the internet, or on traditional media like CDs.

The good news is that these issues are not isolated to this
particular problem and we can benefit from solutions defined

> Any good?

I think so!



More information about the tdwg-content mailing list