[Tdwg-tag] xml-schema and RDFS

Fri Mar 24 18:37:36 CET 2006

Roger wrote off-list:

> We might have a complexType for a 'personType' and then create an 
> element within 'bookType' called 'author' that is of type 'personType'. 
> Is the notion of "an author" a property of a bookType or the extension 
> of the notion of a person?

I think neither. If in any given OO-model you define a class or struct 
"Coordinates" with lat and long, and you use it for two properties (say 
TranssectStart, TransectEnd) of another class, you are not extending 
"Coordinates".

Both xml-elements and xml-attributes are OO-properties of the current OO-class 
(called "Type" in xml-schema). Attributes are always simple types (OO: value-
types), whereas elements can be simple or complex (or mixed, but let us forget 
that - all TDWG standards avoid mixed type). Confusing is that XML schema often 
allows you to be implicit about this, by defining "anonymous types". So you can 
make complex nested structures without giving the types names. Or even define 
global elements. But in principle, you are creating a composition of class-
definitions - just that type-naming can be omitted.

> In an instance document the thing encoded 
> within the book>author element gets the meaning of person from the type 
> hierarchy but the meaning of author from its position within a bookType 
> and the name of the element. Should we really create a complexType 
> called authorType that extends personType and the author element should 
> be of type authorType. (i.e. are the semantics in the complexType 
> hierarchy) or should we just have a author element of type person (i.e. 
> some of the the semantics are in the document structure and some in the 
> type hierarchy)?

I think not. Otherwise in current OO programming you would have to extend the 
class "string" for each property where you want to use the type string.

> Simply put: Are the semantics encoded in the XML Schema or in the 
> structure of the XML instance documents that validate against that 
> schema? Is it possible to 'understand' an instance document without 
> reference to the schema?

You are correct that the semantics of a property are not defined in an 
ontological way, but they are implied by the property that is using a type. 
(They are implied in the *schema*, not the "instance").

If you add several string properties to a class in any Java, C++, Python, 
Visual Basic, etc. this is what you do. That is why I at the moment feel that 
xml-schema is intuitive to users of OO languages like the ones named. RDFS 
seems strange - at least to me. 

I realize that more OO-languages exist, I just perceive the ones "mainstream" 
and know little about the others.

> How do we map 'concepts' in two schemas to each other if one encodes 
> meaning in the document structure and the other in the type hierarchy?  
> Even doing it manually how do we express that authorType in one schema 
> is equivelent to book>author(of type person) in another. This gets very 
> complex as the Tapir team have found. Is there are concept for every 
> possible element in every possible instance document for a given schema? 
> What about iterative nested structures?

I have no answer other than simple xml transformation based on a one time (per 
schema combination) analysis. Although I do not believe we will truly achieve 
backward compatibility (it could be nominal, but not useful!) very soon, 
regardless of technology, these questions are certainly important for schema 
evolution.

Do we have to go to the length of atomizing our models into subject-predicate-
object tupels, or do we have other options? I find xml-schema a much simpler 
technology, despite all its shortcoming, some of which we have encountered 
ourself. Can we answer Roger's question about mapping concepts from one schema 
to another based on xml-schema standards like TDWG has produced so far?

Gregor

BTW: I just wonder how you do inheritance of a class with properties in RDFS. 
It seems to me that since in RDF/RDFS you do not know the number of properties 
a class may have, you can always only extend an abstract ("property-less") 
concept, but never communicate an agreement that two pieces of software desire 
to use an agreed set of properties. Is that correct?----------------------------
------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203