[Tdwg-tag] A very simple question stated again.

Mon Mar 27 10:46:18 CEST 2006

> *"Are the semantics encoded in the XML Schema or in the structure of the 
> XML instance documents that validate against that schema? Is it possible 
> to 'understand' an instance document without reference to the schema?"*

I would say:

A little semantics is embedded in the instance document, with intuitive design 
quite a bit more can be guessed.

Much more semantics is embedded in the schema: Type information, complete 
knowledge of optional properties, cardinality constraints, extension points. 
More semantics is commonly added in the schema annotations.

Certain kind of semantics, like the semantics of repeated use of the same type 
(simple like string or complex, self-created types=classes) in different 
properties of a class are not expressed formally in schema. Nor are they 
formally expressed in UML static class modeling or ER modeling.

Others semantics, like relations between classes (xml-schema:"complex types") 
are expressable through identity constraints, but unfortunately most people 
skip the work of doing it.

> XML is 'self describing' so you would think this must be true. 

Perhaps XML is "self-guessable" :-)

> If the answer in No then we need clear statements about how all 
> instances must always bear links to a permanently retrievable schema - 
> or they become meaningless. 

In xml schema this is done through namespace and namespace location. Note that 
namespace location is a hint, not required to be followed. Consumer may use 
their own version of a schema on their own responsibility.

> We need very tight version control of 
> schemas and a method of linking between the versions so we can track how 
> the meaning has changed. 

Updating a deployed schema in the same namespace implies that you state that at 
least previous document are valid under the new schema. Depending on the 
management of schema users, the reverse may or may not be required as well.  

Guidelines exist as to how forward and backward compatibility can be achieved 
in xml-schema (provide extension points with xs:any and the self or non-self 
namespace options, provide them within container element), enabling schema to 
become extensible. If these guidelines are followed, a new schema version may 
be produced having the same namespace, but a different version attribute. The 
version attribute documents "evolutionary changes".

> We also need clear statements on what happens 
> when you can validate a document with multiple schemas? Does this imply 
> multiple meanings? Schemas must be archived with any data etc.

Schema is primarily about syntax, not semantics (although some semantics are 
implied by syntax). Clearly, additionaly semantics is desirable, and makes this 
discussion worth it. 

However, I can see great benefits in syntax. According to Roger's post RDFS 
cares very little about this ("we could use external OWL-based . Knowing that 
syntax is correct enables you to guarantee that the imported/consumed data 
fullfill an number of validation constraints build into your local data 
structure. Having these constraints enforced then allows you to write code 
relying on assumptions. If you import unconstraints data, you would have to 
write super-error-tolerant code that has exceptions for all possible 
inconsistencies in the data.

Is this a problem with using RDF/S?

> If you respond to this message please state a preference for either 1 or 
> 2. There is no middle road on this one!

I disagree, I believe formal semantical definitions are a question of degree, 
not "yes" or "no". Even OWL claims only to do the "doable" things and not 
aiming to solve all of AI problems.

Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203