[Tdwg-tag] RDF versus GML; old but relevant (?)

Mon Feb 13 19:35:07 CET 2006

All:

I sent this yesterday without being properly subscribed to the mailing  
lists.  Apologies to anyone for whom this is a repeat.

Flip

From: Phillip C. Dibner <pcd at ecosystem.com>
Date: February 12, 2006 2:21:49 PM PST
To: Stan Blum <sblum at calacademy.org>
Cc: <tdwg-guid at lists.tdwg.org> <tdwg-guid at lists.tdwg.org>  
<tdwg-guid at lists.tdwg.org>,  <tdwg-tag at lists.tdwg.org>  
<tdwg-tag at lists.tdwg.org> <tdwg-tag at lists.tdwg.org>, Donald Hobern  
<dhobern at gbif.org>, John Wieczorek <tuco at berkeley.edu>,  
<renato at cria.org.br> <renato at cria.org.br> <renato at cria.org.br>, Reed  
Beaman <reed.beaman at yale.edu>, Simon Cox <Simon.Cox at csiro.au>
Subject: Re: RDF versus GML; old but relevant (?)

Stan,

Thanks for the message.  You've mentioned some items that are central  
to several discussions in which I've been engaged during the past few  
months.  I've taken the liberty of copying Simon Cox on this response  
since, as you know, he's quite knowledgeable about these issues, and  
also very open to the use of other languages despite his participation  
in the authorship of GML.  Simon, I'm hoping you (as well as anyone  
else who wants to contribute) will correct any misstatements that I  
make here.

I take your message to contain two fundamental points.  Responding to  
each of them:

First point:

> I (we) desperately need to solve the puzzle of how to compose (re-use)  
> conceptual specifications.  Can we create a flexible system of base  
> classes that can be snapped together to make useful data exchange  
> applications?  I think this is one of, if not THE most import tasks  
> for the incipient TDWG Architecture group.

In short, absolutely.  I have no doubt that we can create such a  
flexible collection of classes.  The key is in your choice of the word  
"conceptual."  If we get the information model right, we can express it  
in any of several languages.  Any given language (even UML, I'm given  
to understand) has its limitations and inherent biases, but if we  
develop appropriate intuitive and formal understandings of the  
relationships - mappings - between the various expressions, we'll have  
achieved the goal that I think you correctly identify as the most  
important task for the Architecture group.  The question is how to  
factor them, and perhaps how, if at all, to identify them as belonging  
to the same family of specifications by defining an abstract DarwinCore  
base class.

Second point, relating to Chris Goad's comments (mapbureau.com) on the  
non-composibility of GML (which I won't reproduce inline here):

Chris is a great thinker and very knowledgeable about these  
technologies, but I think his comments leave out some important points.  
  There are arguably two different notions of what GML is.

In one sense, GML is a formal specification written in XML Schema, well  
on its way to becoming an ISO-blessed International Standard.  As such,  
it does impose the formal requirement that features derive from  
gml:AbstractFeatureType.  I view this partly as a by-product of the way  
that XML Schema works, and partly as the reasonable decision by the  
designers of GML to provide a generic, abstract notion of what a  
Feature is, and by the way provide some minimal properties that every  
Feature has.

Whether this is a problem or not depends on what you want to do with  
it.  GML definitions can be used for many purposes.  The place where  
you have no choice but to use them is where you want to make  
information available through an OGC-compliant Web Feature Server, and  
accessible to Web Feature Service client applications.  That is very  
simply because these software applications have been written to comply  
with the published specification; the motivation for using it is simply  
that there are a great and increasing number of such implementations.   
Ultimately, this makes the capability to collate and analyze  
biocollections or other data significantly more accessible and less  
costly.  Note that the data need only be expressed as (XML Schema /  
)GML from the point where they exit the server to the point where they  
enter the client application.  It is necessary to have a well-defined  
and broadly accepted GML schema for this, but the data need not reside  
in the source datastore in this format.  Nonetheless, serving the  
source data as (XML Schema / )GML is much easier if there is a natural  
mapping of the source data to (XML Schema / )GML.

This leads to the other notion of what GML is: an information model.   
GML can be expressed in other languages than XML Schema.  As many of  
you know, an early version of GML was written in RDF, but the authors  
decided to use XML Schema instead because at the time, tools to support  
RDF (and perhaps the RDF language itself?) were too immature to support  
extensive information modeling, application development, and  
deployment.  UML can also be used to express the GML model.

Simon Cox's web page at  
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/UmlGml should be  
required reading for anyone who wants to explore these issues more  
deeply.  That page provides substance to the above claim.  (Note that  
the page might challenge you to login, but this is only for editing  
privileges.  Cancel the challenges and you will ultimately reach the  
page.  I'll also put in a plug here for   
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/ 
ObservationsAndMeasurements, which I expect to explore more completely  
with respect to DarwinCore in the late March / April timeframe.)

The distinction between these two notions of GML is blurred a bit by  
the fact that GML expressions of geographic and other features ( /  
objects) are sometimes used for purposes other than service through a  
WFS server or access by inherently GML-compatible geoanalysis  
applications.  This may be a matter of convenience for data that are to  
be used for multiple applications, or because of the implementors'  
familiarity with GML, or it might be due to limitations of other  
languages.  For example, I've been given to understand (although I'm  
not familiar with the issues myself) that numeric data of any sort are  
very cumbersome to use with RDF.  If true, that would be a severe  
limitation for many kinds of geoinformatic analysis.

Regardless of when and where we choose to implement biocollections  
descriptions and related data as XML Schema / GML documents, I do feel  
it's essential to maintain the equivalent of the Class-property /  
Object-property pattern.  This is inherent to GML, it is part and  
parcel of what RDF is, and it is substantially the point of the ISO  
19103 / ISO 19109 metamodel for object definitions.   If we do this,  
then regardless of the longevity of XML Schema / GML as a  
broadly-supported standard, our data will be translatable to RDF,  
applications built, e.g., upon UML/XMI representations, or whatever  
other technologies may come to the fore in future decades.

What do I think we should ultimately do?  I would like to see our  
classes defined as GML - UML, and I would like our outputs to include  
XSLD documents that perform the translation of the corresponding XMI to  
formal (XML Schema /) GML.  My own provisional implementations will be  
in (XML Schema /) GML to start, but I consider that an interim step.   
In answer to Stan's final question, I think it would also be a useful  
interim step for TDWG, for non-spatial as well as spatial biodiversity  
information.  I would also like to see provisional translations of UML  
DarwinCore to RDF, and for these to be done in coordination with, or at  
least with coordinated awareness of, other groups exploring the whole  
question of how properly to express the GML information model in RDF.

I see that Roger Hyam has also responded to Stan's mail, and that his  
comments are not incompatible with mine.

I hope this note provides a useful addition to the discussion.

Best regards,
Flip

On Feb 11, 2006, at 6:57 PM, Blum, Stan wrote:

> I (we) desperately need to solve the puzzle of how to compose (re-use)  
> conceptual specifications.  Can we create a flexible system of base  
> classes that can be snapped together to make useful data exchange  
> applications?  I think this is one of, if not THE most import tasks  
> for the incipient TDWG Architecture group.
>  
> So I'm trying to educate myself about RDF versus XML (and their  
> respective schema tools).  I came across this comment on RDF versus  
> GML, http://www.mapbureau.com/gml/, which contained this:
>
> <excerpt>
> [...] GML is not directly composable with other XML languages.  
> Entities that are described by other languages cannot legally play the  
> role of geographic features in GML. This because all types of  
> geographic features are required to derive from the GML abstract class  
> gml:AbstractFeatureType. Even if it were not for this formal  
> requirement, the lack of conventions about how to represent even  
> simple semantic notions in XML languages would prevent effective  
> integration of GML with XML languages developed independently.
>
> The non-composability of GML requires that it absorb as application  
> schemas the multitude of other domains to which geographical  
> information is relevant. Failing this, non-standard mechanisms of some  
> kind must be used to relate GML content with external data.
>
> Indeed, GML positions itself as a universal, rather than  
> geography-specific, semantic standard by including its own general  
> formalisms for collections, assertion of properties (in a style very  
> much like RDF), time and processes, and reference between content in  
> separate files (via Xlink). GML can be viewed as an alternative not  
> just to geography in RDF, but to RDF itself.
> </excerpt>
>
> This seems like a problem for us because some aspects of our  
> biodiversity information are decidedly not spatial.  Is this a problem  
> with XML Schema generally or just the way it was used to create GML?   
> Several TDWGers are getting enthusiastic about RDF, despite the  
> cautions of McCool (referenced by Bob Morris earlier on the TDWG-GUID  
> list).  Should we go ahead and cast DarwinCore as a GML application  
> while we gear up for a coordinated switch to RDF?
>
> -Stan

Phillip C. Dibner
Ecosystem Associates
(650) 948-3537
(650) 948-7895 Fax

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 10595 bytes
Desc: not available
Url : http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060213/3835d767/attachment.bin