Re: [Tdwg-tag] RDF versus GML; old but relevant (?)

13 Feb 2006

      All:

I sent this yesterday without being properly subscribed to the mailing  
lists.  Apologies to anyone for whom this is a repeat.

Flip

From: Phillip C. Dibner <pcd@ecosystem.com>
Date: February 12, 2006 2:21:49 PM PST
To: Stan Blum <sblum@calacademy.org>
Cc: <tdwg-guid@lists.tdwg.org> <tdwg-guid@lists.tdwg.org>  
<tdwg-guid@lists.tdwg.org>,  <tdwg-tag@lists.tdwg.org>  
<tdwg-tag@lists.tdwg.org> <tdwg-tag@lists.tdwg.org>, Donald Hobern  
<dhobern@gbif.org>, John Wieczorek <tuco@berkeley.edu>,  
<renato@cria.org.br> <renato@cria.org.br> <renato@cria.org.br>, Reed  
Beaman <reed.beaman@yale.edu>, Simon Cox <Simon.Cox@csiro.au>
Subject: Re: RDF versus GML; old but relevant (?)

Stan,

Thanks for the message.  You've mentioned some items that are central  
to several discussions in which I've been engaged during the past few  
months.  I've taken the liberty of copying Simon Cox on this response  
since, as you know, he's quite knowledgeable about these issues, and  
also very open to the use of other languages despite his participation  
in the authorship of GML.  Simon, I'm hoping you (as well as anyone  
else who wants to contribute) will correct any misstatements that I  
make here.

I take your message to contain two fundamental points.  Responding to  
each of them:

First point:
...
I (we) desperately need to solve the puzzle of how to compose (re-use)  
conceptual specifications.  Can we create a flexible system of base  
classes that can be snapped together to make useful data exchange  
applications?  I think this is one of, if not THE most import tasks  
for the incipient TDWG Architecture group.
In short, absolutely.  I have no doubt that we can create such a  
flexible collection of classes.  The key is in your choice of the word  
"conceptual."  If we get the information model right, we can express it  
in any of several languages.  Any given language (even UML, I'm given  
to understand) has its limitations and inherent biases, but if we  
develop appropriate intuitive and formal understandings of the  
relationships - mappings - between the various expressions, we'll have  
achieved the goal that I think you correctly identify as the most  
important task for the Architecture group.  The question is how to  
factor them, and perhaps how, if at all, to identify them as belonging  
to the same family of specifications by defining an abstract DarwinCore  
base class.

Second point, relating to Chris Goad's comments (mapbureau.com) on the  
non-composibility of GML (which I won't reproduce inline here):

Chris is a great thinker and very knowledgeable about these  
technologies, but I think his comments leave out some important points.  
  There are arguably two different notions of what GML is.

In one sense, GML is a formal specification written in XML Schema, well  
on its way to becoming an ISO-blessed International Standard.  As such,  
it does impose the formal requirement that features derive from  
gml:AbstractFeatureType.  I view this partly as a by-product of the way  
that XML Schema works, and partly as the reasonable decision by the  
designers of GML to provide a generic, abstract notion of what a  
Feature is, and by the way provide some minimal properties that every  
Feature has.

Whether this is a problem or not depends on what you want to do with  
it.  GML definitions can be used for many purposes.  The place where  
you have no choice but to use them is where you want to make  
information available through an OGC-compliant Web Feature Server, and  
accessible to Web Feature Service client applications.  That is very  
simply because these software applications have been written to comply  
with the published specification; the motivation for using it is simply  
that there are a great and increasing number of such implementations.   
Ultimately, this makes the capability to collate and analyze  
biocollections or other data significantly more accessible and less  
costly.  Note that the data need only be expressed as (XML Schema /  
)GML from the point where they exit the server to the point where they  
enter the client application.  It is necessary to have a well-defined  
and broadly accepted GML schema for this, but the data need not reside  
in the source datastore in this format.  Nonetheless, serving the  
source data as (XML Schema / )GML is much easier if there is a natural  
mapping of the source data to (XML Schema / )GML.

This leads to the other notion of what GML is: an information model.   
GML can be expressed in other languages than XML Schema.  As many of  
you know, an early version of GML was written in RDF, but the authors  
decided to use XML Schema instead because at the time, tools to support  
RDF (and perhaps the RDF language itself?) were too immature to support  
extensive information modeling, application development, and  
deployment.  UML can also be used to express the GML model.

Simon Cox's web page at  
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/UmlGml should be  
required reading for anyone who wants to explore these issues more  
deeply.  That page provides substance to the above claim.  (Note that  
the page might challenge you to login, but this is only for editing  
privileges.  Cancel the challenges and you will ultimately reach the  
page.  I'll also put in a plug here for   
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/ 
ObservationsAndMeasurements, which I expect to explore more completely  
with respect to DarwinCore in the late March / April timeframe.)

The distinction between these two notions of GML is blurred a bit by  
the fact that GML expressions of geographic and other features ( /  
objects) are sometimes used for purposes other than service through a  
WFS server or access by inherently GML-compatible geoanalysis  
applications.  This may be a matter of convenience for data that are to  
be used for multiple applications, or because of the implementors'  
familiarity with GML, or it might be due to limitations of other  
languages.  For example, I've been given to understand (although I'm  
not familiar with the issues myself) that numeric data of any sort are  
very cumbersome to use with RDF.  If true, that would be a severe  
limitation for many kinds of geoinformatic analysis.

Regardless of when and where we choose to implement biocollections  
descriptions and related data as XML Schema / GML documents, I do feel  
it's essential to maintain the equivalent of the Class-property /  
Object-property pattern.  This is inherent to GML, it is part and  
parcel of what RDF is, and it is substantially the point of the ISO  
19103 / ISO 19109 metamodel for object definitions.   If we do this,  
then regardless of the longevity of XML Schema / GML as a  
broadly-supported standard, our data will be translatable to RDF,  
applications built, e.g., upon UML/XMI representations, or whatever  
other technologies may come to the fore in future decades.

What do I think we should ultimately do?  I would like to see our  
classes defined as GML - UML, and I would like our outputs to include  
XSLD documents that perform the translation of the corresponding XMI to  
formal (XML Schema /) GML.  My own provisional implementations will be  
in (XML Schema /) GML to start, but I consider that an interim step.   
In answer to Stan's final question, I think it would also be a useful  
interim step for TDWG, for non-spatial as well as spatial biodiversity  
information.  I would also like to see provisional translations of UML  
DarwinCore to RDF, and for these to be done in coordination with, or at  
least with coordinated awareness of, other groups exploring the whole  
question of how properly to express the GML information model in RDF.

I see that Roger Hyam has also responded to Stan's mail, and that his  
comments are not incompatible with mine.

I hope this note provides a useful addition to the discussion.

Best regards,
Flip

On Feb 11, 2006, at 6:57 PM, Blum, Stan wrote:
...
I (we) desperately need to solve the puzzle of how to compose (re-use)  
conceptual specifications.  Can we create a flexible system of base  
classes that can be snapped together to make useful data exchange  
applications?  I think this is one of, if not THE most import tasks  
for the incipient TDWG Architecture group.

So I'm trying to educate myself about RDF versus XML (and their  
respective schema tools).  I came across this comment on RDF versus  
GML, http://www.mapbureau.com/gml/, which contained this:
<excerpt>
[...] GML is not directly composable with other XML languages.  
Entities that are described by other languages cannot legally play the  
role of geographic features in GML. This because all types of  
geographic features are required to derive from the GML abstract class  
gml:AbstractFeatureType. Even if it were not for this formal  
requirement, the lack of conventions about how to represent even  
simple semantic notions in XML languages would prevent effective  
integration of GML with XML languages developed independently.
The non-composability of GML requires that it absorb as application  
schemas the multitude of other domains to which geographical  
information is relevant. Failing this, non-standard mechanisms of some  
kind must be used to relate GML content with external data.
Indeed, GML positions itself as a universal, rather than  
geography-specific, semantic standard by including its own general  
formalisms for collections, assertion of properties (in a style very  
much like RDF), time and processes, and reference between content in  
separate files (via Xlink). GML can be viewed as an alternative not  
just to geography in RDF, but to RDF itself.
</excerpt>
This seems like a problem for us because some aspects of our  
biodiversity information are decidedly not spatial.  Is this a problem  
with XML Schema generally or just the way it was used to create GML?   
Several TDWGers are getting enthusiastic about RDF, despite the  
cautions of McCool (referenced by Bob Morris earlier on the TDWG-GUID  
list).  Should we go ahead and cast DarwinCore as a GML application  
while we gear up for a coordinated switch to RDF?
-Stan
Phillip C. Dibner
Ecosystem Associates
(650) 948-3537
(650) 948-7895 Fax

Phillip C. Dibner

tags

participants (1)