[Tdwg-tag] RDF versus GML; old but relevant (?)
Phillip C. Dibner
pcd at ecosystem.com
Mon Feb 13 19:35:07 CET 2006
All:
I sent this yesterday without being properly subscribed to the mailing
lists. Apologies to anyone for whom this is a repeat.
Flip
From: Phillip C. Dibner <pcd at ecosystem.com>
Date: February 12, 2006 2:21:49 PM PST
To: Stan Blum <sblum at calacademy.org>
Cc: <tdwg-guid at lists.tdwg.org> <tdwg-guid at lists.tdwg.org>
<tdwg-guid at lists.tdwg.org>, <tdwg-tag at lists.tdwg.org>
<tdwg-tag at lists.tdwg.org> <tdwg-tag at lists.tdwg.org>, Donald Hobern
<dhobern at gbif.org>, John Wieczorek <tuco at berkeley.edu>,
<renato at cria.org.br> <renato at cria.org.br> <renato at cria.org.br>, Reed
Beaman <reed.beaman at yale.edu>, Simon Cox <Simon.Cox at csiro.au>
Subject: Re: RDF versus GML; old but relevant (?)
Stan,
Thanks for the message. You've mentioned some items that are central
to several discussions in which I've been engaged during the past few
months. I've taken the liberty of copying Simon Cox on this response
since, as you know, he's quite knowledgeable about these issues, and
also very open to the use of other languages despite his participation
in the authorship of GML. Simon, I'm hoping you (as well as anyone
else who wants to contribute) will correct any misstatements that I
make here.
I take your message to contain two fundamental points. Responding to
each of them:
First point:
> I (we) desperately need to solve the puzzle of how to compose (re-use)
> conceptual specifications. Can we create a flexible system of base
> classes that can be snapped together to make useful data exchange
> applications? I think this is one of, if not THE most import tasks
> for the incipient TDWG Architecture group.
In short, absolutely. I have no doubt that we can create such a
flexible collection of classes. The key is in your choice of the word
"conceptual." If we get the information model right, we can express it
in any of several languages. Any given language (even UML, I'm given
to understand) has its limitations and inherent biases, but if we
develop appropriate intuitive and formal understandings of the
relationships - mappings - between the various expressions, we'll have
achieved the goal that I think you correctly identify as the most
important task for the Architecture group. The question is how to
factor them, and perhaps how, if at all, to identify them as belonging
to the same family of specifications by defining an abstract DarwinCore
base class.
Second point, relating to Chris Goad's comments (mapbureau.com) on the
non-composibility of GML (which I won't reproduce inline here):
Chris is a great thinker and very knowledgeable about these
technologies, but I think his comments leave out some important points.
There are arguably two different notions of what GML is.
In one sense, GML is a formal specification written in XML Schema, well
on its way to becoming an ISO-blessed International Standard. As such,
it does impose the formal requirement that features derive from
gml:AbstractFeatureType. I view this partly as a by-product of the way
that XML Schema works, and partly as the reasonable decision by the
designers of GML to provide a generic, abstract notion of what a
Feature is, and by the way provide some minimal properties that every
Feature has.
Whether this is a problem or not depends on what you want to do with
it. GML definitions can be used for many purposes. The place where
you have no choice but to use them is where you want to make
information available through an OGC-compliant Web Feature Server, and
accessible to Web Feature Service client applications. That is very
simply because these software applications have been written to comply
with the published specification; the motivation for using it is simply
that there are a great and increasing number of such implementations.
Ultimately, this makes the capability to collate and analyze
biocollections or other data significantly more accessible and less
costly. Note that the data need only be expressed as (XML Schema /
)GML from the point where they exit the server to the point where they
enter the client application. It is necessary to have a well-defined
and broadly accepted GML schema for this, but the data need not reside
in the source datastore in this format. Nonetheless, serving the
source data as (XML Schema / )GML is much easier if there is a natural
mapping of the source data to (XML Schema / )GML.
This leads to the other notion of what GML is: an information model.
GML can be expressed in other languages than XML Schema. As many of
you know, an early version of GML was written in RDF, but the authors
decided to use XML Schema instead because at the time, tools to support
RDF (and perhaps the RDF language itself?) were too immature to support
extensive information modeling, application development, and
deployment. UML can also be used to express the GML model.
Simon Cox's web page at
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/UmlGml should be
required reading for anyone who wants to explore these issues more
deeply. That page provides substance to the above claim. (Note that
the page might challenge you to login, but this is only for editing
privileges. Cancel the challenges and you will ultimately reach the
page. I'll also put in a plug here for
https://www.seegrid.csiro.au/twiki/bin/view/Xmml/
ObservationsAndMeasurements, which I expect to explore more completely
with respect to DarwinCore in the late March / April timeframe.)
The distinction between these two notions of GML is blurred a bit by
the fact that GML expressions of geographic and other features ( /
objects) are sometimes used for purposes other than service through a
WFS server or access by inherently GML-compatible geoanalysis
applications. This may be a matter of convenience for data that are to
be used for multiple applications, or because of the implementors'
familiarity with GML, or it might be due to limitations of other
languages. For example, I've been given to understand (although I'm
not familiar with the issues myself) that numeric data of any sort are
very cumbersome to use with RDF. If true, that would be a severe
limitation for many kinds of geoinformatic analysis.
Regardless of when and where we choose to implement biocollections
descriptions and related data as XML Schema / GML documents, I do feel
it's essential to maintain the equivalent of the Class-property /
Object-property pattern. This is inherent to GML, it is part and
parcel of what RDF is, and it is substantially the point of the ISO
19103 / ISO 19109 metamodel for object definitions. If we do this,
then regardless of the longevity of XML Schema / GML as a
broadly-supported standard, our data will be translatable to RDF,
applications built, e.g., upon UML/XMI representations, or whatever
other technologies may come to the fore in future decades.
What do I think we should ultimately do? I would like to see our
classes defined as GML - UML, and I would like our outputs to include
XSLD documents that perform the translation of the corresponding XMI to
formal (XML Schema /) GML. My own provisional implementations will be
in (XML Schema /) GML to start, but I consider that an interim step.
In answer to Stan's final question, I think it would also be a useful
interim step for TDWG, for non-spatial as well as spatial biodiversity
information. I would also like to see provisional translations of UML
DarwinCore to RDF, and for these to be done in coordination with, or at
least with coordinated awareness of, other groups exploring the whole
question of how properly to express the GML information model in RDF.
I see that Roger Hyam has also responded to Stan's mail, and that his
comments are not incompatible with mine.
I hope this note provides a useful addition to the discussion.
Best regards,
Flip
On Feb 11, 2006, at 6:57 PM, Blum, Stan wrote:
> I (we) desperately need to solve the puzzle of how to compose (re-use)
> conceptual specifications. Can we create a flexible system of base
> classes that can be snapped together to make useful data exchange
> applications? I think this is one of, if not THE most import tasks
> for the incipient TDWG Architecture group.
>
> So I'm trying to educate myself about RDF versus XML (and their
> respective schema tools). I came across this comment on RDF versus
> GML, http://www.mapbureau.com/gml/, which contained this:
>
> <excerpt>
> [...] GML is not directly composable with other XML languages.
> Entities that are described by other languages cannot legally play the
> role of geographic features in GML. This because all types of
> geographic features are required to derive from the GML abstract class
> gml:AbstractFeatureType. Even if it were not for this formal
> requirement, the lack of conventions about how to represent even
> simple semantic notions in XML languages would prevent effective
> integration of GML with XML languages developed independently.
>
> The non-composability of GML requires that it absorb as application
> schemas the multitude of other domains to which geographical
> information is relevant. Failing this, non-standard mechanisms of some
> kind must be used to relate GML content with external data.
>
> Indeed, GML positions itself as a universal, rather than
> geography-specific, semantic standard by including its own general
> formalisms for collections, assertion of properties (in a style very
> much like RDF), time and processes, and reference between content in
> separate files (via Xlink). GML can be viewed as an alternative not
> just to geography in RDF, but to RDF itself.
> </excerpt>
>
> This seems like a problem for us because some aspects of our
> biodiversity information are decidedly not spatial. Is this a problem
> with XML Schema generally or just the way it was used to create GML?
> Several TDWGers are getting enthusiastic about RDF, despite the
> cautions of McCool (referenced by Bob Morris earlier on the TDWG-GUID
> list). Should we go ahead and cast DarwinCore as a GML application
> while we gear up for a coordinated switch to RDF?
>
> -Stan
Phillip C. Dibner
Ecosystem Associates
(650) 948-3537
(650) 948-7895 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 10595 bytes
Desc: not available
Url : http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060213/3835d767/attachment.bin
More information about the tdwg-tag
mailing list