[Tdwg-tag] roger at tdwg.org
Steven Perry
smperry at ku.edu
Tue Mar 28 17:27:48 CEST 2006
Hi Gregor,
I've placed some comments regarding your exchange with Roger in-line:
Gregor Hagedorn wrote:
>Hi Roger
>
>
>
>>TAG list url is here with the archive:
>>
>>http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>>
>>
>
>thanks, I registered.
>
>
>
>>>I need help to understand rdf. Whereas xml schema has a conceptual mapping to
>>>database or oo-programming design, rdf seem to have none, I lack anything I can
>>>relate it too. I still have not seen any software to help me understand what
>>>you produced.
>>>
>>>
>>>
XML Schema was not designed to have a conceptual mapping to databases or
object-oriented frameworks. There are a set of tools and a series of
conventions for loading XML schema instances into objects and for
mapping schemas into relational table structures, but most of these
systems only work if you use a subset of XML Schema language features.
For example, XML Schema features like substitution groups and xsd:any
cause many of these tools to have problems.
>>RDF is no more complex than xml schema. The RDFS way of doing things is
>>far more object orientated than schema. It forces you to have classes
>>and properties whereas arbitrary XML document structures can be
>>ambiguous as to whether they are defining objects or properties of
>>objects so - I don't see your reasoning.
>>
>>
>
>I never encountered atomizing every statement into subject-predicate-object in
>OO design...
>
>
>
That's very true and whether you think of RDF as a graph of 3-tuples or
whether you envision it as a set of "objects" that are instances of
classes depends on the type of problem you're trying to solve. Triples
are the lowest level but thinking in terms of the abstraction of objects
and classes can be helpful for some tasks.
As an aside, some XML databases reduce XML Schema instances to a
low-level structure called a flattened tree that can be analogous to
triples. It is possible to decompose any XML instance into an ordered
list of XPath = value pairs where the XPaths are concrete and used to
refer to any element or attribute in the document. This is one of two
approaches for building an XML database from scratch.
>"to whether they are defining objects or properties of objects so": xml schema
>is about classes, not objects (instances). Can you give an example what you
>find confusing in xml schema, I don't see it.
>
>
>
XML Schema is a grammar for accepting or rejecting documents. It does
not define classes of objects and the relationships among those
classes. With care and common agreement from the stakeholders of an XML
Schema one can create a schema such that it describes classes of
objects, but this is by agreement not by design. The only class of
object described by the ABCD schema is an ABCD document, not a specimen
or a publication or a name or what have you.
>Of course you do have the strange animal of mixed content in xml schema, but
>ignoring this (none of the TDWG standard used it) you have classes and each
>class has a type. The type can be simple or complex, just like in OO languages.
>
>
>
I agree that you can define simple and complex types in an XML Schema,
however these are syntactic types. An XML Schema type (simple or
complex) is simply a rule for accepting or rejecting an XML subtree. It
does not define what a thing is and how it relates to other things, it
merely describes the form a thing must have in order to be acceptable to
a validating XML parser.
Because XML Schema was designed to be a grammar for the validation of
XML trees and not a semantic typing system, using it to build a global
collection of interrelated data objects introduces a variety of issues:
First XML Schema is very limited in the relationships you can define
between types. One of the most-used relationships in OOA/OOD is
inheritance and XML Schema does not provide proper inheritance. In XML
Schema, substitution groups can be used as an inheritance-like language
function, but they only work within a single schema. To do more than
that, one must start importing other schemas which can cause some
surprising problems.
Second, there is no global identity property in XML. One can use id's,
but they are local to a single instance document. The use of GUIDs will
enable us to build a large collection of interrelated data objects of
different types. To accomplish this in XML we would have to agree on
how to represent GUIDs in all of the TDWG schema. Again, this is
something we can accomplish, but it will be accomplished by agreement
instead as opposed to being enforced by the technology stack.
Third, XML Schema introduces the problem of schema interoperability. If
I have a TCS XML Schema that allows pointers to instances of a
publication XML Schema and I want instances of my TCS schema to be able
to represent publications either as GUIDs or as actual data, then I must
design my TCS schema to import my publication schema. This is fine for
taxon concepts and publications, but what about taxon concepts and
specimens? The Specimen XML Schema would have to import the TCS schema
(because a specimen can be identified as an instance of a particular
taxon concept) and TCS would have to import the specimen schema. This
is circular import and it is not allowed. Furthermore, there is no
sophisticated XML instance pre-processor system (as in C compilers) that
supports conditional imports. In order to do this with XML we would
have to change our requirements such that we only ever allow references
to data objects defined under a foreign schema by GUID and never allow
copies of those foreign data objects to be embedded in our XML
instance. In plain English this means our TCS instance can't embed a
publication data object, it can only refer to it by GUID. Once again,
this builds greater dependency upon the GUID framework which exists by
agreement only due to the second problem listed above.
This is only three out of a great many issues with using XML Schema to
build a large collection of interrelated data objects. RDF (along with
RDF-Schema and/or OWL) solve many of these problems. To be fair RDF
also has its drawbacks, not limited to complexity of client-APIs and
inefficiency of triple stores. I'd be happy to discuss problems on both
sides of this ontological divide at more length if anyone else is
interested.
>I did already tried the primer but it did not help me, it seemed to talk
>of use cases rather in Artificial intelligence that are hard for me to follow.
>
>
>
>>The RDF primer is a good place to start reading:
>>
>>http://www.w3.org/TR/rdf-primer/
>>
>>It is less than 100 printed pages so can probably be read in an evening
>>and understood in several evenings!
>>
>>There is a tutorial here:
>>
>>http://www.w3schools.com/rdf/default.asp
>>
>>and loads of books and things
>>
>>The key to understanding it I found was that it is about describing
>>resources not validating documents. When using XML Schema we are trying
>>to create a set of rules to validate a document that describes the
>>resource. We are effectively designing forms. With RDF we are describing
>>the attributes of the resource that we want to use to describe it. Thus
>>the two things are not mutually exclusive - which I hoped to demonstrate
>>with my code.
>>
>>
>
>That may be a good pointer to the problems I have. Because I do not think we
>are describing resources. In my mind we are sharing scientific data. I want the
>data, not the resources.
>
>
Resources only act as identifiers for things, for data objects of a
particular type. What is important is the description of those things
(the data). In the RDF universe I've been imagining, resources are
GUIDs for things like names, specimens, observations, publications,
people, institutions, sequences, etc.
One thing we haven't talked about is the fundamental unit of data
exchange in an RDF universe. It's not a document (as in the XML
universe) nor is it a statement (a triple), instead it is a set of
triples that form a concise description of a resource. See
http://swdev.nokia.com/uriqa/CBD.html (a W3C proposal).
-Steve
More information about the tdwg-tag
mailing list