Hi Gregor,
I've placed some comments regarding your exchange with Roger in-line:
Gregor Hagedorn wrote:
Hi Roger
TAG list url is here with the archive:
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
thanks, I registered.
I need help to understand rdf. Whereas xml schema has a conceptual mapping to database or oo-programming design, rdf seem to have none, I lack anything I can relate it too. I still have not seen any software to help me understand what you produced.
XML Schema was not designed to have a conceptual mapping to databases or object-oriented frameworks. There are a set of tools and a series of conventions for loading XML schema instances into objects and for mapping schemas into relational table structures, but most of these systems only work if you use a subset of XML Schema language features. For example, XML Schema features like substitution groups and xsd:any cause many of these tools to have problems.
RDF is no more complex than xml schema. The RDFS way of doing things is far more object orientated than schema. It forces you to have classes and properties whereas arbitrary XML document structures can be ambiguous as to whether they are defining objects or properties of objects so - I don't see your reasoning.
I never encountered atomizing every statement into subject-predicate-object in OO design...
That's very true and whether you think of RDF as a graph of 3-tuples or whether you envision it as a set of "objects" that are instances of classes depends on the type of problem you're trying to solve. Triples are the lowest level but thinking in terms of the abstraction of objects and classes can be helpful for some tasks.
As an aside, some XML databases reduce XML Schema instances to a low-level structure called a flattened tree that can be analogous to triples. It is possible to decompose any XML instance into an ordered list of XPath = value pairs where the XPaths are concrete and used to refer to any element or attribute in the document. This is one of two approaches for building an XML database from scratch.
"to whether they are defining objects or properties of objects so": xml schema is about classes, not objects (instances). Can you give an example what you find confusing in xml schema, I don't see it.
XML Schema is a grammar for accepting or rejecting documents. It does not define classes of objects and the relationships among those classes. With care and common agreement from the stakeholders of an XML Schema one can create a schema such that it describes classes of objects, but this is by agreement not by design. The only class of object described by the ABCD schema is an ABCD document, not a specimen or a publication or a name or what have you.
Of course you do have the strange animal of mixed content in xml schema, but ignoring this (none of the TDWG standard used it) you have classes and each class has a type. The type can be simple or complex, just like in OO languages.
I agree that you can define simple and complex types in an XML Schema, however these are syntactic types. An XML Schema type (simple or complex) is simply a rule for accepting or rejecting an XML subtree. It does not define what a thing is and how it relates to other things, it merely describes the form a thing must have in order to be acceptable to a validating XML parser.
Because XML Schema was designed to be a grammar for the validation of XML trees and not a semantic typing system, using it to build a global collection of interrelated data objects introduces a variety of issues:
First XML Schema is very limited in the relationships you can define between types. One of the most-used relationships in OOA/OOD is inheritance and XML Schema does not provide proper inheritance. In XML Schema, substitution groups can be used as an inheritance-like language function, but they only work within a single schema. To do more than that, one must start importing other schemas which can cause some surprising problems.
Second, there is no global identity property in XML. One can use id's, but they are local to a single instance document. The use of GUIDs will enable us to build a large collection of interrelated data objects of different types. To accomplish this in XML we would have to agree on how to represent GUIDs in all of the TDWG schema. Again, this is something we can accomplish, but it will be accomplished by agreement instead as opposed to being enforced by the technology stack.
Third, XML Schema introduces the problem of schema interoperability. If I have a TCS XML Schema that allows pointers to instances of a publication XML Schema and I want instances of my TCS schema to be able to represent publications either as GUIDs or as actual data, then I must design my TCS schema to import my publication schema. This is fine for taxon concepts and publications, but what about taxon concepts and specimens? The Specimen XML Schema would have to import the TCS schema (because a specimen can be identified as an instance of a particular taxon concept) and TCS would have to import the specimen schema. This is circular import and it is not allowed. Furthermore, there is no sophisticated XML instance pre-processor system (as in C compilers) that supports conditional imports. In order to do this with XML we would have to change our requirements such that we only ever allow references to data objects defined under a foreign schema by GUID and never allow copies of those foreign data objects to be embedded in our XML instance. In plain English this means our TCS instance can't embed a publication data object, it can only refer to it by GUID. Once again, this builds greater dependency upon the GUID framework which exists by agreement only due to the second problem listed above.
This is only three out of a great many issues with using XML Schema to build a large collection of interrelated data objects. RDF (along with RDF-Schema and/or OWL) solve many of these problems. To be fair RDF also has its drawbacks, not limited to complexity of client-APIs and inefficiency of triple stores. I'd be happy to discuss problems on both sides of this ontological divide at more length if anyone else is interested.
I did already tried the primer but it did not help me, it seemed to talk of use cases rather in Artificial intelligence that are hard for me to follow.
The RDF primer is a good place to start reading:
http://www.w3.org/TR/rdf-primer/
It is less than 100 printed pages so can probably be read in an evening and understood in several evenings!
There is a tutorial here:
http://www.w3schools.com/rdf/default.asp
and loads of books and things
The key to understanding it I found was that it is about describing resources not validating documents. When using XML Schema we are trying to create a set of rules to validate a document that describes the resource. We are effectively designing forms. With RDF we are describing the attributes of the resource that we want to use to describe it. Thus the two things are not mutually exclusive - which I hoped to demonstrate with my code.
That may be a good pointer to the problems I have. Because I do not think we are describing resources. In my mind we are sharing scientific data. I want the data, not the resources.
Resources only act as identifiers for things, for data objects of a particular type. What is important is the description of those things (the data). In the RDF universe I've been imagining, resources are GUIDs for things like names, specimens, observations, publications, people, institutions, sequences, etc.
One thing we haven't talked about is the fundamental unit of data exchange in an RDF universe. It's not a document (as in the XML universe) nor is it a statement (a triple), instead it is a set of triples that form a concise description of a resource. See http://swdev.nokia.com/uriqa/CBD.html (a W3C proposal).
-Steve