[Tdwg-tag] w3c xml-schema discussion
Gregor Hagedorn
G.Hagedorn at BBA.DE
Wed Mar 29 23:27:22 CEST 2006
Dear Steve,
many thanks for your comments. You definitely pointed out many places where my
language was inaccurate
> XML Schema is a grammar for accepting or rejecting documents. It does
> not define classes of objects and the relationships among those
> classes. With care and common agreement from the stakeholders of an XML
> Schema one can create a schema such that it describes classes of
> objects, but this is by agreement not by design. The only class of
> object described by the ABCD schema is an ABCD document, not a specimen
> or a publication or a name or what have you.
I agree, but then you can (and TCS and UBIF/SDD do) use schema in a way, that
by design avoids global elements, substitution groups etc. Instead, we use
types, which always are intended map to a class.
> I agree that you can define simple and complex types in an XML Schema,
> however these are syntactic types. An XML Schema type (simple or
> complex) is simply a rule for accepting or rejecting an XML subtree. It
> does not define what a thing is and how it relates to other things, it
> merely describes the form a thing must have in order to be acceptable to
> a validating XML parser.
Yes, but if you write these rules by means of class inheritance, extension and
polymorphism, and you add a note that this is not meant to be random, you
surely are enabled to interpret this as design. You can just as well claim that
an Java/whatever OO architecture is not about defining what a thing is and how
it relates to other things. Strictly you are correct, but I believe that no
strict separation is in place here.
> First XML Schema is very limited in the relationships you can define
> between types. One of the most-used relationships in OOA/OOD is
> inheritance and XML Schema does not provide proper inheritance.
I believe it does. w3c-schema has type derivation, extension, and even type
polymorphism (all somewhat limited by parsing determinism optimisations in
schema). You can have extension both on simple and complex types. For
polymorphism, it even has the special xsi:type attribute (strictly a separate
schema, but documented in the w3c schema documentation.
> substitution groups can be used as an inheritance-like language
> function, but they only work within a single schema. To do more than
> that, one must start importing other schemas which can cause some
> surprising problems.
I believe what you say may be true for substitution groups, but not for
extension.
> Second, there is no global identity property in XML. One can use id's,
> but they are local to a single instance document. The use of GUIDs will
> enable us to build a large collection of interrelated data objects of
> different types. To accomplish this in XML we would have to agree on
> how to represent GUIDs in all of the TDWG schema. Again, this is
> something we can accomplish, but it will be accomplished by agreement
> instead as opposed to being enforced by the technology stack.
I think this is erroneous. Whenever you define a data element as the simple
type xs:uri you inform any parser that you mean this to be a guid. Whether
parsers use that information is another question, it certainly is not validated
in current validators.
Also there is not restriction that id attributes must be local. In fact they
can be freely typed, including to xs:uri.
===
By the way, conversely in SDD we have identified a major problem in forcing
people to use URIs for every internal reference. The problem is learning curve
(school children trying to develop their own LUCID key to backyard plants
should NOT be bothered with defining their GUID-scheme first - and then as a
biologist I may say that biologists often would like to be treated the same...)
and legal (e.g. my current base address is my employers one, but as soon as I
leave or retire, I am legally forced to no longer use bba.de in any
circumstances).
I am not sure how to overcome this, perhaps someone should indeed register a
urn:local schema.
===
> Third, XML Schema introduces the problem of schema interoperability. If
> I have a TCS XML Schema that allows pointers to instances of a
> publication XML Schema and I want instances of my TCS schema to be able
> to represent publications either as GUIDs or as actual data, then I must
> design my TCS schema to import my publication schema. This is fine for
> taxon concepts and publications, but what about taxon concepts and
> specimens? The Specimen XML Schema would have to import the TCS schema
> (because a specimen can be identified as an instance of a particular
> taxon concept) and TCS would have to import the specimen schema. This
> is circular import and it is not allowed.
I fully agree with this being a serious problem.
In principle it is possible to overcome this with the use of type polymorphism.
UBIF would define an abstract base type (and yes, if we need more base types we
would need to extent to UBIF schema, creating a new version of it).
However, in testing in 2002/2003 it turned out that major xml tools did not
handle multiple namespace schemata correctly, so we never got down the road
very far. So I cannot say how realistic the solution is with current software.
I agree this is an open problem with w3c schema.
> This is only three out of a great many issues with using XML Schema to
> build a large collection of interrelated data objects. RDF (along with
> RDF-Schema and/or OWL) solve many of these problems. To be fair RDF
> also has its drawbacks, not limited to complexity of client-APIs and
> inefficiency of triple stores. I'd be happy to discuss problems on both
> sides of this ontological divide at more length if anyone else is
> interested.
I am and I believe we should be. Please do so, to help us get a clearer
picture. Current usage ("mainstream") seems to point to xml-schema, but I think
ontological approaches are exiting. I just feel we loose quite a bit as well,
simply because RDF may be so general, that it does not allow to write software
for more constrained (and therefore easier to analyse) cases. Although RDBMS
can be used as triple store, that is not what they are designed for, so my
current impression is we do loose the time proven utility of ER models
implemented in RDBMS. I may still be wrong, I just start to learn about RDF/S.
> Resources only act as identifiers for things, for data objects of a
> particular type. What is important is the description of those things
> (the data). In the RDF universe I've been imagining, resources are
> GUIDs for things like names, specimens, observations, publications,
> people, institutions, sequences, etc.
What are the resources, what are metadata and data when expressing knowledge
about a taxon or specimen in saying:
Ipomoea violacea in the USA: "Flowers frequently dark to light blue, sometimes
bordering on violet (G. Hagedorn, 29.3.2006)" and then "Flowers dark or light
blue to purplish (Much. Better, 30.3.2006)"
We have object parts, characters, states, frequency modifiers, IPR metadata,
versions, etc.
SDD expresses this through xml-schema. I find it very hard to think how to
express this in RDF tuples. Maybe attempting this may help to understand what
we loose in RDF.
Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19 Tel: +49-30-8304-2220
14195 Berlin, Germany Fax: +49-30-8304-2203
More information about the tdwg-tag
mailing list