*"Are the semantics encoded in the XML Schema or in the structure of the XML instance documents that validate against that schema? Is it possible to 'understand' an instance document without reference to the schema?"*
I would say:
A little semantics is embedded in the instance document, with intuitive design quite a bit more can be guessed.
Much more semantics is embedded in the schema: Type information, complete knowledge of optional properties, cardinality constraints, extension points. More semantics is commonly added in the schema annotations.
Certain kind of semantics, like the semantics of repeated use of the same type (simple like string or complex, self-created types=classes) in different properties of a class are not expressed formally in schema. Nor are they formally expressed in UML static class modeling or ER modeling.
Others semantics, like relations between classes (xml-schema:"complex types") are expressable through identity constraints, but unfortunately most people skip the work of doing it.
XML is 'self describing' so you would think this must be true.
Perhaps XML is "self-guessable" :-)
If the answer in No then we need clear statements about how all instances must always bear links to a permanently retrievable schema - or they become meaningless.
In xml schema this is done through namespace and namespace location. Note that namespace location is a hint, not required to be followed. Consumer may use their own version of a schema on their own responsibility.
We need very tight version control of schemas and a method of linking between the versions so we can track how the meaning has changed.
Updating a deployed schema in the same namespace implies that you state that at least previous document are valid under the new schema. Depending on the management of schema users, the reverse may or may not be required as well.
Guidelines exist as to how forward and backward compatibility can be achieved in xml-schema (provide extension points with xs:any and the self or non-self namespace options, provide them within container element), enabling schema to become extensible. If these guidelines are followed, a new schema version may be produced having the same namespace, but a different version attribute. The version attribute documents "evolutionary changes".
We also need clear statements on what happens when you can validate a document with multiple schemas? Does this imply multiple meanings? Schemas must be archived with any data etc.
Schema is primarily about syntax, not semantics (although some semantics are implied by syntax). Clearly, additionaly semantics is desirable, and makes this discussion worth it.
However, I can see great benefits in syntax. According to Roger's post RDFS cares very little about this ("we could use external OWL-based . Knowing that syntax is correct enables you to guarantee that the imported/consumed data fullfill an number of validation constraints build into your local data structure. Having these constraints enforced then allows you to write code relying on assumptions. If you import unconstraints data, you would have to write super-error-tolerant code that has exceptions for all possible inconsistencies in the data.
Is this a problem with using RDF/S?
If you respond to this message please state a preference for either 1 or 2. There is no middle road on this one!
I disagree, I believe formal semantical definitions are a question of degree, not "yes" or "no". Even OWL claims only to do the "doable" things and not aiming to solve all of AI problems.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203