Roger,

I’m going to stick my neck on the block here.

GBIF clearly has an urgent need for a vocabulary for classes of data accessed and served by the Data Portal and for a vocabulary of the concepts or properties which we handle in connection with each of these classes. The GBIF Data Portal Strategy document includes the idea of a Schema Repository, which was originally conceived as a way to manage the various XML schema definitions of interest to us, but everything is moving so fast that it now seems much more plausible than it did before for us to use a basic ontology of biodiversity data classes and the properties which we need to manage in relation to them (including all of the different syntactic representations of semantically related properties).

I would like to see this work take place under the control of TDWG, but in the mean time I need to make some decisions which can guide the development of GBIF’s portal and services over the next few months, so I have taken a pass at representing the simplest core ontology I can for the data portal. There are certainly several obvious areas for discussion, but much of it simply reflects what I believe is implicit in the current TDWG data standards.

I have also put together a sketchy proposal for how GBIF might use such a core ontology as the basis for the function which was originally suggested for the Schema Repository. I would value comments on any or all of these things – allowing for the fact that the current description is little more than a stream-of-consciousness first pass.

You can see the materials at the Data Portal wiki at: http://wiki.gbif.org/dadiwiki/wikka.php?wakka=SchemaRepository

Follow the links for the CoreOntology, PropertyStore, SchemaRepositoryFacade and SchemaRepositoryUseCases to get some idea of what I am thinking.

Thanks,

Donald

---------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------

From: Tdwg-tag-bounces@lists.tdwg.org [mailto:Tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roger Hyam
Sent: 01 March 2006 13:35
To: Javier de la Torre
Cc: Tdwg-tag@lists.tdwg.org
Subject: Re: [Tdwg-tag] Automatic derivation of XML schemas from models

Hi Javier,

Sorry for the delay in responding to this one.

Javier de la Torre wrote:

I think I also agree with what you are saying, but I suppose is also too general and that in the details can arise differences. Specially if we want TAG only to define a shared vocabulary or classes, how complicate should these classes be?

This is part of my plan. It is easier to agree on the general things and so I want to fix these things in stone and slowly move towards the detailed and more controversial subjects. We may find that some of the controversial subjects can be avoided all together or, when we have a framework of agreement in place, it may be possible to have a polymorphic approach to some well defined areas.

This is counter intuitive for most of us (including me) as we tend to be analysts who look for the problems and exceptions in every case - and are therefore unlikely to ever reach agreement.

Your suggestions about auto generation of XML from UML are great. I have added them to a wiki page here:

http://www.tdwg.hyam.net/twiki/bin/view/TAG/AutoGenerationOfXMLSchema

and added that to a list of things to be discussed here:

http://www.tdwg.hyam.net/twiki/bin/view/TAG/TagDiscussionRoadMap

All the best,

Roger

On 22/02/2006, at 16:50, Roger Hyam wrote:

Hi All,

It is generally agreed that we need an representation independent object model or ontology of some kind. I would like to put together a list of the things that need to be agreed or investigated in order to do this.

Firstly the things I believe we can all agree on (stop me if I am wrong).

It should be representation independent (i.e. we should be able to move it between 'languages' UML, OWL, BNF etc).
It should be dynamic (i.e. capable of evolving through time).
It should be polymorphic. This is a result of it being dynamic. There will, at a minimum, be multiple version of any one part of the model when new version are introduced.
It should NOT attempt to be omniscient i.e. it will not cover everything in our domain, only the parts that need to be communicated.
It will be managed in a distributed fashion. Different teams will take responsibility for different parts of it.

My first Question is:

Does the centralization of the ontology need to go beyond a small shared vocabulary of terms or base classes?

I envisage this ontology containing things like Collection, Specimen, TaxonConcept, TaxonName but not defining the detailed structure of these objects. It would contain a maximum of a few 10's of objects and properties. TDWG subgroups would be responsible for building ontologies that extend these base objects but that generally didn't refer to each other - only to the core. If this is true then I think the definition of the top level object falls within the remit of the TAG ( in consultation with others).

If this is not a valid way forward what are the alternatives?

Are their questions we should ask before this one?

Once again I'd be grateful for your thoughts.

Roger

--

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 http://www.tdwg.org

 roger@tdwg.org

 +44 1578 722782

-------------------------------------

_______________________________________________

Tdwg-tag mailing list

Tdwg-tag@lists.tdwg.org

http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org

--

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 http://www.tdwg.org

 roger@tdwg.org

 +44 1578 722782

-------------------------------------