Dear Steve,
many thanks for your comments. You definitely pointed out many places where my language was inaccurate
XML Schema is a grammar for accepting or rejecting documents. It does not define classes of objects and the relationships among those classes. With care and common agreement from the stakeholders of an XML Schema one can create a schema such that it describes classes of objects, but this is by agreement not by design. The only class of object described by the ABCD schema is an ABCD document, not a specimen or a publication or a name or what have you.
I agree, but then you can (and TCS and UBIF/SDD do) use schema in a way, that by design avoids global elements, substitution groups etc. Instead, we use types, which always are intended map to a class.
I agree that you can define simple and complex types in an XML Schema, however these are syntactic types. An XML Schema type (simple or complex) is simply a rule for accepting or rejecting an XML subtree. It does not define what a thing is and how it relates to other things, it merely describes the form a thing must have in order to be acceptable to a validating XML parser.
Yes, but if you write these rules by means of class inheritance, extension and polymorphism, and you add a note that this is not meant to be random, you surely are enabled to interpret this as design. You can just as well claim that an Java/whatever OO architecture is not about defining what a thing is and how it relates to other things. Strictly you are correct, but I believe that no strict separation is in place here.
First XML Schema is very limited in the relationships you can define between types. One of the most-used relationships in OOA/OOD is inheritance and XML Schema does not provide proper inheritance.
I believe it does. w3c-schema has type derivation, extension, and even type polymorphism (all somewhat limited by parsing determinism optimisations in schema). You can have extension both on simple and complex types. For polymorphism, it even has the special xsi:type attribute (strictly a separate schema, but documented in the w3c schema documentation.
substitution groups can be used as an inheritance-like language function, but they only work within a single schema. To do more than that, one must start importing other schemas which can cause some surprising problems.
I believe what you say may be true for substitution groups, but not for extension.
Second, there is no global identity property in XML. One can use id's, but they are local to a single instance document. The use of GUIDs will enable us to build a large collection of interrelated data objects of different types. To accomplish this in XML we would have to agree on how to represent GUIDs in all of the TDWG schema. Again, this is something we can accomplish, but it will be accomplished by agreement instead as opposed to being enforced by the technology stack.
I think this is erroneous. Whenever you define a data element as the simple type xs:uri you inform any parser that you mean this to be a guid. Whether parsers use that information is another question, it certainly is not validated in current validators.
Also there is not restriction that id attributes must be local. In fact they can be freely typed, including to xs:uri.
=== By the way, conversely in SDD we have identified a major problem in forcing people to use URIs for every internal reference. The problem is learning curve (school children trying to develop their own LUCID key to backyard plants should NOT be bothered with defining their GUID-scheme first - and then as a biologist I may say that biologists often would like to be treated the same...) and legal (e.g. my current base address is my employers one, but as soon as I leave or retire, I am legally forced to no longer use bba.de in any circumstances).
I am not sure how to overcome this, perhaps someone should indeed register a urn:local schema. ===
Third, XML Schema introduces the problem of schema interoperability. If I have a TCS XML Schema that allows pointers to instances of a publication XML Schema and I want instances of my TCS schema to be able to represent publications either as GUIDs or as actual data, then I must design my TCS schema to import my publication schema. This is fine for taxon concepts and publications, but what about taxon concepts and specimens? The Specimen XML Schema would have to import the TCS schema (because a specimen can be identified as an instance of a particular taxon concept) and TCS would have to import the specimen schema. This is circular import and it is not allowed.
I fully agree with this being a serious problem.
In principle it is possible to overcome this with the use of type polymorphism. UBIF would define an abstract base type (and yes, if we need more base types we would need to extent to UBIF schema, creating a new version of it).
However, in testing in 2002/2003 it turned out that major xml tools did not handle multiple namespace schemata correctly, so we never got down the road very far. So I cannot say how realistic the solution is with current software.
I agree this is an open problem with w3c schema.
This is only three out of a great many issues with using XML Schema to build a large collection of interrelated data objects. RDF (along with RDF-Schema and/or OWL) solve many of these problems. To be fair RDF also has its drawbacks, not limited to complexity of client-APIs and inefficiency of triple stores. I'd be happy to discuss problems on both sides of this ontological divide at more length if anyone else is interested.
I am and I believe we should be. Please do so, to help us get a clearer picture. Current usage ("mainstream") seems to point to xml-schema, but I think ontological approaches are exiting. I just feel we loose quite a bit as well, simply because RDF may be so general, that it does not allow to write software for more constrained (and therefore easier to analyse) cases. Although RDBMS can be used as triple store, that is not what they are designed for, so my current impression is we do loose the time proven utility of ER models implemented in RDBMS. I may still be wrong, I just start to learn about RDF/S.
Resources only act as identifiers for things, for data objects of a particular type. What is important is the description of those things (the data). In the RDF universe I've been imagining, resources are GUIDs for things like names, specimens, observations, publications, people, institutions, sequences, etc.
What are the resources, what are metadata and data when expressing knowledge about a taxon or specimen in saying:
Ipomoea violacea in the USA: "Flowers frequently dark to light blue, sometimes bordering on violet (G. Hagedorn, 29.3.2006)" and then "Flowers dark or light blue to purplish (Much. Better, 30.3.2006)"
We have object parts, characters, states, frequency modifiers, IPR metadata, versions, etc.
SDD expresses this through xml-schema. I find it very hard to think how to express this in RDF tuples. Maybe attempting this may help to understand what we loose in RDF.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203