Re: [Tdwg-tag] RDF instead of xml schema
What I somehow still failed to say clearly at the end of my previous post was that "RDF technologies are an excellent way to do this... BUT we may well have other mechanisms we could use". My real interests are not in getting RDF adopted but in finding a really good solution to all our modelling problems and answering the three needs I gave in my previous post. I have two options that I see at this stage as plausible ways to do this.
In both cases, I would of course recommend that we model our objects using a more neutral language such as UML and then generate the encoding models we actually use. After that here are the two options I see (not in any order of priority).
- - - - -
1. The "new-style" RDF-based ontology approach
I see the direct use of plain RDF as the encoding as roughly equivalent to using an assembler as our programming language. We would be able to do absolutely everything we would like to do but there is no real need for the pain and ugliness of such a low-level representation. I understand the natural response of shock when people are presented with a mass of RDF triples.
I would much rather adopt a higher-level standard (probably OWL Lite) to allow us to represent the same information in a more familiar and friendly way and make our underlying classes clearer.
I believe that the main problems from use of RDF will arise from attempts to manage inferences based on the underlying triples rather than in the basic modelling of objects and exchanging and consuming data encoded in RDF or RDF-based languages. For me, the main reasons for considering RDF are all in this second (easier) set of functions, and therefore I'm not really worried by the technology. If the tools mature and we can subsequently use inferential approaches (and exploit the power of SPARQL, etc.) then we will be in an excellent position to benefit - otherwise nothing is lost.
Despite the doubts that met this approach last week, I still tend to think that we should consider the option of using an RDF (OWL Lite) approach but at the same time to support the use of XML schema models which conform to valid subsets of our RDF models. This doesn't seem hard (although I have been too busy this week to look closely at what Roger has done in this area) and would immediately mean that the less IT-oriented members of our community would be able to continue working with the tools they have already got used to.
- - - - -
2. The modified "new-style" XML schema approach
In my opinion, the power present in the DiGIR family of protocols and particularly TAPIR, when combined with conceptual schemas such as Darwin Core, is enormous. The developers of DiGIR and Darwin Core produced a model which has most of the strengths I am looking for in my previous post, but uses the XML schema concept of substitution groups. It supports extension in a way that is very like RDF. Its biggest weakness is that it does not provide any ontological underpinnings of any kind. Neither DiGIR nor Darwin Core makes any commitments regarding the class of object described in the records returned. DiGIR provides a generic query language. Darwin Core provides a set of useful descriptors. I could use Darwin Core to encode data on a collection of stamps illustrating plants and animals (ScientificName, Country, CollectionDay, etc.). We need the ability to identify what sort of objects are being described. I gave a long and confusing presentation at the TDWG meeting in Christchurch which was my first attempt to explain this - I'm not sure anyone had a clue what I was trying to say). See:
http://www.tdwg.org/2004meet/EV/TDWG_2004_Papers_Hobern_4.zip
We also had real problems applying DiGIR to ABCD because substitution groups will not work with complex documents like ABCD.
However we could simply carry out some fairly simple modifications to our existing XML schemas and solve these problems. We would need to do the following:
* Determine a basic ontology of biodiversity data objects (Specimen, Locality, Character, etc.) and some of their fundamental relationships (collectedAt, hasCharacter, etc.) * Restructure our current schemas so they each schema is a collection of descriptive properties for one of these classes (perhaps a substitution group for properties for each class - like GML?) and a container element representing an instance of the class (and holding a collection of descriptive properties for the class). Note that some property elements would be RDF-like references to other objects (e.g. <collectedAt ref="locality1"> or inline versions of such objects (e.g. <collectedAt></location></collectedAt>. * Enhance TAPIR so that each resource identifies itself as returning objects from one of the standard classes (rather than untyped records)
I suspect that the structure of TCS and SDD would make this really easy in those cases. ABCD would need to be split into classes such as Unit, GatheringEvent, Locality, Collection and Collector rather than having everything presented as nested properties of a Unit, but most of the work would survive quite cleanly.
This approach would allow us to extend the properties for any class just as easily as we can Darwin Core. We could keep using TAPIR as our search protocol almost unchanged.
- - - - -
I am sure that there are other options but each of these seems a fairly easy and powerful development from where TDWG has already gone. The second is much closer to what has been done in the past (although it still represents a more object-oriented approach instead of the current document-oriented one). However I am not sure what we gain at that stage from not simply using OWL Lite for the models.
Anyway, I hope this clarifies where I am coming from.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
Donald Hobern wrote:
Gregor,
I can understand your angst, but I would like to suggest that XML schema actually only really provides good support for some aspects of OO
modelling.
Extending classes is a real problem.
A data model encoded in RDF can still make use of an ontology language to provide greater rigour in the way that objects are defined.
As was indicated in some of the earlier messages here, it is even possible to put together a data model which looks fundamentally just the same as
one
defined using XML schema but which is using RDF technologies under the covers and which consequently is easier to extend than XML schema.
For me however the biggest factors of importance in a revision of our data models would be:
- A cleaner separation between different object classes (not all
versioned
in a single schema).
- A good model to support easy extension (using a multiple inheritance
approach) so that different (potentially overlapping) communities can add extra information in the ways that best suit them.
- An underlying ontology that is sufficient for us at least to identify
the
object class of each record.
RDF technologies are an excellent way to do this. GML has managed to produce many of the same features, but has probably done so largely by replicating the essentials of RDF modelling.
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
-----Original Message----- From: Tdwg-tag-bounces@lists.tdwg.org [mailto:Tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Gregor Hagedorn Sent: 24 March 2006 18:37 To: Tdwg-tag@lists.tdwg.org Subject: [Tdwg-tag] RDF instead of xml schema
Hi all,
RDF to me appears on a level of abstraction making it very hard for me to follow the documentation and discussion. Most of the examples are embedded in an artificial intelligence / reasoning use cases that I have no experience
with.
I am a biologist and I feel comfortable with UML, ER-modeling, xml-schema- modeling, and - surprise - relational databases. I believe many others are as well - how many datastores are actually build upon RDBMS technology?
To me xml-schema maps nicely to both UML-like OO-modeling and Relational DBMS. I can guess about the advantages of opening this all up and seeing the
world
as a huge set of unstructured statement tupels. But it also scares me.
Angst is a bad advisor. But then if only a minority of the current few people involved can follow on the RDF abstraction level. A few questions I have:
- Would we be first in line to try rdf for such complex models as
biodiversity informatics?
- Do Genbank/EMBL with their hundreds of employees and programmers use
rdf?
Internally/externally? The molecular bioinformatics is probably 1000 times
larger than our biodiversity informatics.
- Why are GML, SVG etc. based on xml schema and not RDFS? Is this just
historical?
- Are there any tools around that let me import RDF into a relational
database (simple tools for xml-schema-based import/export are almost standard part
of
databases now, or you can use comfortable graphical tools like Altova MapForce).
-- I am just trying to test some tools to help me to visualize RDFS productions (like Roger has send around) on a level comparable with the UML-like xml-schema editors (Spy, Stylus, Oracle, etc.) I will try Altova SemanticWorks and Protege over the next week. The screenshot seem to be about AI and semantic web
much
more than about information models (those creatures where you try to simplify the world to make it manageable...).
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Hi Donald,
Just a quick comment below concerning TAPIR...
...
- Enhance TAPIR so that each resource identifies itself as returning objects
from one of the standard classes (rather than untyped records) ...
If I understood correctly, the protocol already covers this situation. In the capabilities response, nothing prevents a "mappedConcept" to relate to a class. And when defining an output model, one could also map a complex node to that kind of concept.
Regards, -- Renato
Donald wrote:
In both cases, I would of course recommend that we model our objects using a more neutral language such as UML and then generate the encoding models we actually use. After that here are the two options I see (not in any order of priority).
UML would be absolutely excellent. However do we have tools that support this?
For SDD we failed to communicate in UML, partly because people wanted example documents that looked like the html they got so used to, and partly because at some stage the complexity could not be handled by manually synchronizing two separate forms of expression. However, tools for UML to w3c schema seem to be rather experimental and not well available. Or perhaps we just did not find them?
(Actually, both TCS and UBIF/SDD use w3c schema in a way that corresponds to UML static class diagrams (and are built on class definitions). SDD is built with the assumption that someone wants to create a 1:1 mapping of JAVA classes, including type derivation, extension, and type-polymorphism. However, it is clear that w3c-schema is a hodgepodge compromise, and I guess that makes a general UML/w3c-schema mapping difficult.
Is there a good UML editor that exports RDFS/OWL? One that we can use? One that we can use in discussions between information scientists and biologists?
If we can forget about RDF/S and simply use a UML tools for all relevant discussions (AND for defining the constraints we consider necessary), and the product of this discussion can then automatically be turned into RDF, I would *love RDF*. (Some comments about using text-editors to edit RDF, however, point me into the opposite direction).
Perhaps also schema-driven applications can be built, similar to what CASTOR is for Java and w3c-xml-schema?
Then it really does not matter that RDF/S in my view has a much steeper learning curve and higher level of abstraction than xml itself - because it is software that handles it.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Somewhat tangential to Gregor's question, but maybe of interest:
There are tools available to convert ISO 19109-compliant UML models to GML, which would address WFS applications that use our models. One such can be downloaded from:
http://www.interactive-instruments.de/ugas/
... and for information:
http://www.interactive-instruments.de/ugas/ShapeChange.pdf
I don't know of such tools for RDF, but I suspect they exist now or will soon.
Flip
On Mar 29, 2006, at 12:33 PM, Gregor Hagedorn wrote:
Donald wrote:
In both cases, I would of course recommend that we model our objects using a more neutral language such as UML and then generate the encoding models we actually use. After that here are the two options I see (not in any order of priority).
UML would be absolutely excellent. However do we have tools that support this?
For SDD we failed to communicate in UML, partly because people wanted example documents that looked like the html they got so used to, and partly because at some stage the complexity could not be handled by manually synchronizing two separate forms of expression. However, tools for UML to w3c schema seem to be rather experimental and not well available. Or perhaps we just did not find them?
(Actually, both TCS and UBIF/SDD use w3c schema in a way that corresponds to UML static class diagrams (and are built on class definitions). SDD is built with the assumption that someone wants to create a 1:1 mapping of JAVA classes, including type derivation, extension, and type-polymorphism. However, it is clear that w3c-schema is a hodgepodge compromise, and I guess that makes a general UML/w3c-schema mapping difficult.
Is there a good UML editor that exports RDFS/OWL? One that we can use? One that we can use in discussions between information scientists and biologists?
If we can forget about RDF/S and simply use a UML tools for all relevant discussions (AND for defining the constraints we consider necessary), and the product of this discussion can then automatically be turned into RDF, I would *love RDF*. (Some comments about using text-editors to edit RDF, however, point me into the opposite direction).
Perhaps also schema-driven applications can be built, similar to what CASTOR is for Java and w3c-xml-schema?
Then it really does not matter that RDF/S in my view has a much steeper learning curve and higher level of abstraction than xml itself - because it is software that handles it.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
http://w3.antd.nist.gov/pubs/quirolgico-wise2004.pdf Slightly special, but explores the issues of mapping UML to SW technologies. Predates UML2 which is probably a better fit
http://www.sandsoft.com/edoc2004/ChangODMDesignMDSW.pdf Insights about ontology metamodeling from the authors of the OMG proposal http://www.omg.org/docs/ad/05-09-08.pdf I mentioned in an earlier post. One point quite consistent with observations Roger, Donald, and Steve have made: "In addition, ontology development must be accomplished within the context of a business need, and should be grounded in requirements relevant to a particular software development activity, or set of activities, to be of practical use in most business settings. That is, the ontologies that an enterprise develops must form an integral part of that enterprise’s information and application infrastructure."
http://www.sfu.ca/~dgasevic/Tutorials/ISWC2005/ "MDA Standards for Ontology Development" including handouts from the tutorial
Phillip C. Dibner wrote:
Somewhat tangential to Gregor's question, but maybe of interest:
There are tools available to convert ISO 19109-compliant UML models to GML, which would address WFS applications that use our models. One such can be downloaded from:
http://www.interactive-instruments.de/ugas/
... and for information:
http://www.interactive-instruments.de/ugas/ShapeChange.pdf
I don't know of such tools for RDF, but I suspect they exist now or will soon.
Flip
On Mar 29, 2006, at 12:33 PM, Gregor Hagedorn wrote:
Donald wrote:
In both cases, I would of course recommend that we model our objects using a more neutral language such as UML and then generate the encoding models we actually use. After that here are the two options I see (not in any order of priority).
UML would be absolutely excellent. However do we have tools that support this?
For SDD we failed to communicate in UML, partly because people wanted example documents that looked like the html they got so used to, and partly because at some stage the complexity could not be handled by manually synchronizing two separate forms of expression. However, tools for UML to w3c schema seem to be rather experimental and not well available. Or perhaps we just did not find them?
(Actually, both TCS and UBIF/SDD use w3c schema in a way that corresponds to UML static class diagrams (and are built on class definitions). SDD is built with the assumption that someone wants to create a 1:1 mapping of JAVA classes, including type derivation, extension, and type-polymorphism. However, it is clear that w3c-schema is a hodgepodge compromise, and I guess that makes a general UML/w3c-schema mapping difficult.
Is there a good UML editor that exports RDFS/OWL? One that we can use? One that we can use in discussions between information scientists and biologists?
If we can forget about RDF/S and simply use a UML tools for all relevant discussions (AND for defining the constraints we consider necessary), and the product of this discussion can then automatically be turned into RDF, I would *love RDF*. (Some comments about using text-editors to edit RDF, however, point me into the opposite direction).
Perhaps also schema-driven applications can be built, similar to what CASTOR is for Java and w3c-xml-schema?
Then it really does not matter that RDF/S in my view has a much steeper learning curve and higher level of abstraction than xml itself - because it is software that handles it.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
participants (5)
-
Bob Morris
-
Donald Hobern
-
Gregor Hagedorn
-
Phillip C. Dibner
-
Renato De Giovanni