All:
I sent this yesterday without being properly subscribed to the mailing lists. Apologies to anyone for whom this is a repeat.
Flip
From: Phillip C. Dibner pcd@ecosystem.com Date: February 12, 2006 2:21:49 PM PST To: Stan Blum sblum@calacademy.org Cc: tdwg-guid@lists.tdwg.org tdwg-guid@lists.tdwg.org tdwg-guid@lists.tdwg.org, tdwg-tag@lists.tdwg.org tdwg-tag@lists.tdwg.org tdwg-tag@lists.tdwg.org, Donald Hobern dhobern@gbif.org, John Wieczorek tuco@berkeley.edu, renato@cria.org.br renato@cria.org.br renato@cria.org.br, Reed Beaman reed.beaman@yale.edu, Simon Cox Simon.Cox@csiro.au Subject: Re: RDF versus GML; old but relevant (?)
Stan,
Thanks for the message. You've mentioned some items that are central to several discussions in which I've been engaged during the past few months. I've taken the liberty of copying Simon Cox on this response since, as you know, he's quite knowledgeable about these issues, and also very open to the use of other languages despite his participation in the authorship of GML. Simon, I'm hoping you (as well as anyone else who wants to contribute) will correct any misstatements that I make here.
I take your message to contain two fundamental points. Responding to each of them:
First point:
I (we) desperately need to solve the puzzle of how to compose (re-use) conceptual specifications. Can we create a flexible system of base classes that can be snapped together to make useful data exchange applications? I think this is one of, if not THE most import tasks for the incipient TDWG Architecture group.
In short, absolutely. I have no doubt that we can create such a flexible collection of classes. The key is in your choice of the word "conceptual." If we get the information model right, we can express it in any of several languages. Any given language (even UML, I'm given to understand) has its limitations and inherent biases, but if we develop appropriate intuitive and formal understandings of the relationships - mappings - between the various expressions, we'll have achieved the goal that I think you correctly identify as the most important task for the Architecture group. The question is how to factor them, and perhaps how, if at all, to identify them as belonging to the same family of specifications by defining an abstract DarwinCore base class.
Second point, relating to Chris Goad's comments (mapbureau.com) on the non-composibility of GML (which I won't reproduce inline here):
Chris is a great thinker and very knowledgeable about these technologies, but I think his comments leave out some important points. There are arguably two different notions of what GML is.
In one sense, GML is a formal specification written in XML Schema, well on its way to becoming an ISO-blessed International Standard. As such, it does impose the formal requirement that features derive from gml:AbstractFeatureType. I view this partly as a by-product of the way that XML Schema works, and partly as the reasonable decision by the designers of GML to provide a generic, abstract notion of what a Feature is, and by the way provide some minimal properties that every Feature has.
Whether this is a problem or not depends on what you want to do with it. GML definitions can be used for many purposes. The place where you have no choice but to use them is where you want to make information available through an OGC-compliant Web Feature Server, and accessible to Web Feature Service client applications. That is very simply because these software applications have been written to comply with the published specification; the motivation for using it is simply that there are a great and increasing number of such implementations. Ultimately, this makes the capability to collate and analyze biocollections or other data significantly more accessible and less costly. Note that the data need only be expressed as (XML Schema / )GML from the point where they exit the server to the point where they enter the client application. It is necessary to have a well-defined and broadly accepted GML schema for this, but the data need not reside in the source datastore in this format. Nonetheless, serving the source data as (XML Schema / )GML is much easier if there is a natural mapping of the source data to (XML Schema / )GML.
This leads to the other notion of what GML is: an information model. GML can be expressed in other languages than XML Schema. As many of you know, an early version of GML was written in RDF, but the authors decided to use XML Schema instead because at the time, tools to support RDF (and perhaps the RDF language itself?) were too immature to support extensive information modeling, application development, and deployment. UML can also be used to express the GML model.
Simon Cox's web page at https://www.seegrid.csiro.au/twiki/bin/view/Xmml/UmlGml should be required reading for anyone who wants to explore these issues more deeply. That page provides substance to the above claim. (Note that the page might challenge you to login, but this is only for editing privileges. Cancel the challenges and you will ultimately reach the page. I'll also put in a plug here for https://www.seegrid.csiro.au/twiki/bin/view/Xmml/ ObservationsAndMeasurements, which I expect to explore more completely with respect to DarwinCore in the late March / April timeframe.)
The distinction between these two notions of GML is blurred a bit by the fact that GML expressions of geographic and other features ( / objects) are sometimes used for purposes other than service through a WFS server or access by inherently GML-compatible geoanalysis applications. This may be a matter of convenience for data that are to be used for multiple applications, or because of the implementors' familiarity with GML, or it might be due to limitations of other languages. For example, I've been given to understand (although I'm not familiar with the issues myself) that numeric data of any sort are very cumbersome to use with RDF. If true, that would be a severe limitation for many kinds of geoinformatic analysis.
Regardless of when and where we choose to implement biocollections descriptions and related data as XML Schema / GML documents, I do feel it's essential to maintain the equivalent of the Class-property / Object-property pattern. This is inherent to GML, it is part and parcel of what RDF is, and it is substantially the point of the ISO 19103 / ISO 19109 metamodel for object definitions. If we do this, then regardless of the longevity of XML Schema / GML as a broadly-supported standard, our data will be translatable to RDF, applications built, e.g., upon UML/XMI representations, or whatever other technologies may come to the fore in future decades.
What do I think we should ultimately do? I would like to see our classes defined as GML - UML, and I would like our outputs to include XSLD documents that perform the translation of the corresponding XMI to formal (XML Schema /) GML. My own provisional implementations will be in (XML Schema /) GML to start, but I consider that an interim step. In answer to Stan's final question, I think it would also be a useful interim step for TDWG, for non-spatial as well as spatial biodiversity information. I would also like to see provisional translations of UML DarwinCore to RDF, and for these to be done in coordination with, or at least with coordinated awareness of, other groups exploring the whole question of how properly to express the GML information model in RDF.
I see that Roger Hyam has also responded to Stan's mail, and that his comments are not incompatible with mine.
I hope this note provides a useful addition to the discussion.
Best regards, Flip
On Feb 11, 2006, at 6:57 PM, Blum, Stan wrote:
I (we) desperately need to solve the puzzle of how to compose (re-use) conceptual specifications. Can we create a flexible system of base classes that can be snapped together to make useful data exchange applications? I think this is one of, if not THE most import tasks for the incipient TDWG Architecture group. So I'm trying to educate myself about RDF versus XML (and their respective schema tools). I came across this comment on RDF versus GML, http://www.mapbureau.com/gml/, which contained this:
<excerpt> [...] GML is not directly composable with other XML languages. Entities that are described by other languages cannot legally play the role of geographic features in GML. This because all types of geographic features are required to derive from the GML abstract class gml:AbstractFeatureType. Even if it were not for this formal requirement, the lack of conventions about how to represent even simple semantic notions in XML languages would prevent effective integration of GML with XML languages developed independently.
The non-composability of GML requires that it absorb as application schemas the multitude of other domains to which geographical information is relevant. Failing this, non-standard mechanisms of some kind must be used to relate GML content with external data.
Indeed, GML positions itself as a universal, rather than geography-specific, semantic standard by including its own general formalisms for collections, assertion of properties (in a style very much like RDF), time and processes, and reference between content in separate files (via Xlink). GML can be viewed as an alternative not just to geography in RDF, but to RDF itself.
</excerpt>
This seems like a problem for us because some aspects of our biodiversity information are decidedly not spatial. Is this a problem with XML Schema generally or just the way it was used to create GML? Several TDWGers are getting enthusiastic about RDF, despite the cautions of McCool (referenced by Bob Morris earlier on the TDWG-GUID list). Should we go ahead and cast DarwinCore as a GML application while we gear up for a coordinated switch to RDF?
-Stan
Phillip C. Dibner Ecosystem Associates (650) 948-3537 (650) 948-7895 Fax