Re: [Tdwg-tag] A very simple question stated again.

27 Mar 2006

      Hi Stan,

Can I take that as a Yes then? Or is it a No?

Concrete example:

Take this instance document.

<?xml version="1.0" encoding="UTF-8"?>
<ExampleDataSet xmlns="http://example.org/specimens#">
    <Specimen>
        <Collector>
            <Name>John Doe</Name>
        </Collector>
    </Specimen>
</ExampleDataSet>

Is John Doe a person or a research vessel?

If the document validates against this schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified"
  targetNamespace="http://example.org/specimens#" 
xmlns:specimens="http://example.org/specimens#">
  <xs:element name="ExampleDataSet">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="specimens:Specimen"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Specimen">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Collector" type="specimens:personType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:complexType name="personType">
    <xs:sequence>
      <xs:element name="Name" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

then it looks like John Doe is a person.

Without looking at the schema John Doe could as easily have been a team 
of people or an expedition or a sampling machine etc. We can change the 
meaning of the instance document depending on the schema it validates 
against - if we say meaning is in the schema - and there are many 
schemas that this document could validate against.

This is the reason for my question. If we interpret meaning from the 
type hierarchy in XML Schema then we are stuck with single inheritance 
(good in Java but maybe not so hot in data modeling). It also means that 
we hard link structure to meaning. It is very difficult for some one to 
come along and extend this schema saying "I have an element and my 
element represents a collector that isn't a person" because the notion 
of collector is hard coded to the structure for representing a person. 
They can't abstractly say "This machine is a type of collector".

Another way to imply meaning is through namespaces. The element 
http://example.org/specimens#Collector could resolve to a document that 
tells us what we mean by collector. Then we wouldn't have to worry so 
much about the 'structure' of the document but about the individual 
meanings of elements. We could still use XML Schema to encode useful 
structures but the meanings of elements would come from the namespace. 
(And I didn't even mention that this is how RDF does it - oh bother - 
now I have...).

My central question is how we map between existing and future schemas. 
If we can't say where the meaning is encoded in our current schemas then 
we can't even start the process.

All the best,

Roger

Blum, Stan wrote:
...
"Are the semantics encoded in the XML Schema or in the structure of the XML
instance documents that validate against that schema? Is it possible to
'understand' an instance document without reference to the schema?"
Possible answers are:
1.	Yes: you can understand an XML instance document in the
absence of a schema it validates against i.e. just from the structure of the
elements and the namespaces used.
2.	No: you require the XML Schema to understand the document.
Roger, If you postulate that the instance document is valid against the
schema, and the that the element and attribute names are meaningful to the
reader (a human, or software written by a human who understands their
meaning), then the only additional semantics the schema could provide would
be in the annotations/documentation, if any exist in the schema.
I'm not entirely sure what you include in [data] "structure", but if you only
mean concepts such as tuples, trees, sets, lists, bags, etc., then I would
disagree that semantics are encoded substantially in data structure (of the
XML instance doc or any other record).  It is true that without proper
structure, semantics cannot be encoded, but I think semantics are encoded
predominantly in class/element-attribute names and any referenced
documentation (i.e., natural language).  If you replace meaningful names with
surrogate keys (e.g., integers) and thereby obscure any meaning conveyed by
the names, then the instance document would lose a lot of its meaning.
I'm not exactly sure how this relates to the earlier discussion about XML
schema, RDF, and more powerful modeling methodologies like UML. but I hope it
helps.
Cheers,
-Stan
-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------