Morphological Data Representation

Fri Nov 23 15:07:30 CET 2001

Ainsi parlait Steve Shattuck :
[..]
> I've translated this into the XML file that is attached.  (Even this fairly
> simple example is moderately large and I would recommend using an XML
> viewer such as Microsoft's XML Notepad when working with it - see
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html
>/ xmlpaddownload.asp.)
I'm always surprised seeing people recommend using proprietary stuff, when
they are plenty of free-software equivalents:
http://freshmeat.net/search/?site=Freshmeat&q=xml+editor&section=projects
sourceforge.net also also provide a complete list
vim and emacs also have xml support, and are available on many platforms

> The basic structure of the attached file is:
>
> <Morphology>
>    <Characters>
>       <Character> - ANY NUMBER
>          <Type>
>          <ID>
>          <Short_Descriptor>
>          <Long_Descriptor>
>          <State_Descriptor>
>          <Order>
>          <State> - ANY NUMBER
>             <Value>
>             <ID>
>             <Order>
>    <Items>
>       <Item> - ANY NUMBER
>          <Type>
>          <ID>
>          <Descriptor>
>          <CodedCharacter> - ANY NUMBER
>             <CharacterID>
>             <Description>
>             <CodedState> - ANY NUMBER
>                <StateID>
>                <Value>
This is a mixed approach, whereas Gregor's proposition is to separate
characters description and items. Morevoer, it's closely related to delta (as
xdelta), whereas everyone seems to agree on building something from scratch.

> Note that I've treated everything as elements and haven't used attributes.
> Simplicity is the only reason for this and some elements would be better as
> attributes; these can be converted when the dust settles.
No, there is also a performance argument for attributes: as they make files
less verbose, parsing is quicker. See xerces-j/xalan-j FAQ.

However, from a modeling point-of-view, the rule should be:
use elements for what is has real-world meaning
use attributes for modeling artifacts (id, idrefs, etc...)
Default XSLT transformation enforces this by outputting elements contents and
ignoring attributes.
--
Guillaume Rousse <rousse at ccr.jussieu.fr>
GPG key http://lis.snv.jussieu.fr/~rousse/gpgkey.html