[Tdwg-tag] Automatic derivation of XML schemas from models

Donald Hobern dhobern at gbif.org
Thu Mar 2 10:13:14 CET 2006



I'm going to stick my neck on the block here.  


GBIF clearly has an urgent need for a vocabulary for classes of data
accessed and served by the Data Portal and for a vocabulary of the concepts
or properties which we handle in connection with each of these classes.  The
GBIF Data Portal Strategy document includes the idea of a Schema Repository,
which was originally conceived as a way to manage the various XML schema
definitions of interest to us, but everything is moving so fast that it now
seems much more plausible than it did before for us to use a basic ontology
of biodiversity data classes and the properties which we need to manage in
relation to them (including all of the different syntactic representations
of semantically related properties).


I would like to see this work take place under the control of TDWG, but in
the mean time I need to make some decisions which can guide the development
of GBIF's portal and services over the next few months, so I have taken a
pass at representing the simplest core ontology I can for the data portal.
There are certainly several obvious areas for discussion, but much of it
simply reflects what I believe is implicit in the current TDWG data


I have also put together a sketchy proposal for how GBIF might use such a
core ontology as the basis for the function which was originally suggested
for the Schema Repository.  I would value comments on any or all of these
things - allowing for the fact that the current description is little more
than a stream-of-consciousness first pass.


You can see the materials at the Data Portal wiki at:


Follow the links for the CoreOntology, PropertyStore, SchemaRepositoryFacade
and SchemaRepositoryUseCases to get some idea of what I am thinking.




Donald Hobern (dhobern at gbif.org)
Programme Officer for Data Access and Database Interoperability 
Global Biodiversity Information Facility Secretariat 
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480


From: Tdwg-tag-bounces at lists.tdwg.org
[mailto:Tdwg-tag-bounces at lists.tdwg.org] On Behalf Of Roger Hyam
Sent: 01 March 2006 13:35
To: Javier de la Torre
Cc: Tdwg-tag at lists.tdwg.org
Subject: Re: [Tdwg-tag] Automatic derivation of XML schemas from models


Hi Javier,

Sorry for the delay in responding to this one.

Javier de la Torre wrote:

I think I also agree with what you are saying, but I suppose is also too
general and that in the details can arise differences. Specially if we want
TAG only to define a shared vocabulary or classes, how complicate should
these classes be?

This is part of my plan. It is easier to agree on the general things and so
I want to fix these things in stone and slowly move towards the detailed and
more controversial subjects. We may find that some of the controversial
subjects can be avoided all together or, when we have a framework of
agreement in place, it may be possible to have a polymorphic approach to
some well defined areas.

This is counter intuitive for most of us (including me) as we tend to be
analysts who look for the problems and exceptions in every case - and are
therefore unlikely to ever reach agreement.

Your suggestions about auto generation of XML from UML are great. I have
added them to a wiki page here:


and added that to a list of things to be discussed here:


All the best,



On 22/02/2006, at 16:50, Roger Hyam wrote:

Hi All,

It is generally agreed that we need an representation independent object
model or ontology of some kind. I would like to put together a list of the
things that need to be agreed or investigated in order to do this.

Firstly the things I believe we can all agree on (stop me if I am wrong).

1.	It should be representation independent (i.e. we should be able to
move it between 'languages' UML, OWL, BNF etc).
2.	It should be dynamic (i.e. capable of evolving through time).
3.	It should be polymorphic. This is a result of it being dynamic.
There will, at a minimum, be multiple version of any one part of the model
when new version are introduced.
4.	It should NOT attempt to be omniscient i.e. it will not cover
everything in our domain, only the parts that need to be communicated.
5.	It will be managed in a distributed fashion. Different teams will
take responsibility for different parts of it.

My first Question is:

Does the centralization of the ontology need to go beyond a small shared
vocabulary of terms or base classes? 

I envisage this ontology containing things like Collection, Specimen,
TaxonConcept, TaxonName but not defining the detailed structure of these
objects. It would contain a maximum of a few 10's of objects and properties.
TDWG subgroups would be responsible for building ontologies that extend
these base objects but that generally didn't refer to each other - only to
the core. If this is true then I think the definition of the top level
object falls within the remit of the TAG ( in consultation with others). 

If this is not a valid way forward what are the alternatives?

Are their questions we should ask before this one?

Once again I'd be grateful for your thoughts.


 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
 roger at tdwg.org
 +44 1578 722782


Tdwg-tag mailing list

Tdwg-tag at lists.tdwg.org



 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
 roger at tdwg.org
 +44 1578 722782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060302/7e6e266e/attachment.html 

More information about the tdwg-tag mailing list