Hi Javier,
Comments below.
Nice document. Just some comments:
-Tonto is a little bit bad name in spanish.
Yes I know it means Fool. I thought it was appropriate for something that people might think of as a knowledge management tool - keep things in perspective. Ricardo thought it was OK so I presume it is OK in Brazilian Portuguese but not sure about Spanish in all the different countries. If anyone knows it to be very derogatory or a 'bad' word in any particular country let me know and I'll change it. Fool is quite mild in English. Fool also means "Cold dessert consisting of fruit puree and whipped cream." in English.
-Have you considered something about controlled vocabularies in the semantic hub? Or maybe what I prefer to call managed controlled vocabularies. For example in ABCD there is a concept called "KindOfRecord", it is not a controlled vocabulary, is just a text. It would be too difficult to provide a fix list of terms for it so would be great if the list can somehow be created in a community base. Let say that I want to map my database to this field, I could get a list of proposed terms already being used, if none of them satisfy me then i can create my own one. It is a little bit like tagging in a control way. I love the del.icio.us example, they propose you tags and most of the time I use them, and by doing this then the data is much more accessible because the tags have not exploded. The opposite os what is happening now in ABCD everybody use a different term for the same thing and the unified data becomes useless.
There will be instances of classes in Tonto. I should have mentioned that. Chatting to Rob Gales about it it seems a good way of doing controlled vocabularies. They will be extensible because Tonto will always be capable of change. Also in some languages - OWL etc you could always define your own instances outside of Tonto - but it would depend on how we do the coding for the XML Schema based renderings of the semantics.
Populating drop down menus with information out of Tonto when some one is mapping a data source in the ultimate goal - the dream!
-In the implementation section you say something like "Data Providers must map their data to these views" referring to views from the semantic hub. This is actually we are trying to avoid. TAPIR at the beginning was created with the vision of data providers mapping once their databases and being accessible through different views that are explicitly declared on the request. We changed the name now and we are calling it outputModels. In the other hand you know that WASABI and PyWrapper are now becoming muti-protocol. That means that we want providers to map their databases once and make the data available in different protocols.
The plan is that data mapping only has to be done to one of the views that Tonto has onto its internal semantics then the other views/representations could be used as outputModels (or custom output models could be created by clients etc). The goal is definitely to only map once - a single set of semantics - but then represent in multiple ways. Tonto could provide a view of the semantics that a graphical tool could then pick up to help some one build a mapping file - the dream...
So, again, within TAPIR itself and within protocols we need something like the semantic hub you are proposing. And we are doing it right now but very primitive. I am working on the implementation of the BioMOBY protocol inside PyWrapper. I had created a mapping file between TDWG schemas and MOBY data types registry so that I can resolve questions like: -Ok, if I have these TDWG concepts mapped, which MOBY services could I create. -How can I create this MOBY types using TDWG concepts?
This is definitely where everyone is heading. The idea with the hub is to start by clearly defining the semantic constructs we are going to use (classes, properties, instances, literals, ranges) so that we can be sure that we can represent the semantics in different 'representations'. It is no good using some UML or OWL construct that doesn't have a good representation in XML Schema or GML for example.
This plan will not answer all the questions on the first day. The existing schemas will need to be mapped into Tonto so they are represented in a uniform way and this will take time - but months not years I hope. It is not always clear in the exiting schemas what the classes and properties are, for example, so this is not an automatic process but will need thinking through - and there are issues around cardinality and multiple inheritance that will need to be discussed.
As I said this is now being implemented in a simple flat file that will be available in Internet for all data providers, but I am not accessing it as a service and I have to do all the handling on the client. The semantic Hub you are proposing is exactly what we need and want to do this more properly.
I am glad to hear that. I hope that we can generate the file you need to do the mapping automatically in future.
So... summarizing, from our side I can imagine now that we need:
-the semantic hub must expose the concepts in a way that we can use them in our configuration tools in the data providers to allow mapping a database there.
Just specify the way and we will make it do it.
-the semantic hub must expose the different views, or outputModels as we call them in TAPIR, so that providers software can produce them.
Just specify them and we write a script to produce them from Tonto.
There is a full list of other requirements I would love to see there that can be found in Dave Thau's work on a schema repository, do you remember? http://ww3.bgbm.org/schemarepowiki
Specially there where things like: -Give an XSLT that transforms ABCD 1.2 into 2.06
Ah ha! this is where I have to say no!
What I am proposing is that we have a central semantic model that can be presented in a multitude of ways. This is *very *different from a service that can automatically transform one existing schema to another. That is far more ambitious and may not be possible in what remains of human history. We should build the central semantic model out of the existing schemas but that is a manual process where decisions will have to be made about what was meant by the different constructs in the existing schemas. Think about the different ways inclusion, adjacency, cardinality and type extension/restriction are used within XML Schema documents and what they 'mean' in terms of GML feature types or RDFS classes and properties? In general there just isn't a mapping.
When they built GML they stared with the model of feature types and properties and then decided how they would represent this using XML and control it with XML Schema. We need to take a similar approach. Decide on what our modeling technique is then decide how we will represent the model in different technologies including GML and semantic web stuff.
A crazy analogy would be to say it is like getting a machine to write a story in different languages based on the same plot. This might be achievable because the plot can be encoded in some machine readable way and the machine can just use rules to bolt together stored sentences. Adding another language is just a matter of new output rules and new stored sentences. It wouldn't produce great literature but it would work.
This is very different from asking a machine to read a book in one language, understand the plot and print out the story in a completely different language. For a start their may be things expressed in the first language for which there is no direct equivalent. Just ask a babel fish!
I think this is the main way the Semantic Hub proposal differs from the Schema Repository approach. I am trying to make something happen by restricting the scope - if you remember the talk I gave on managing scope and resources..
I hope this is OK.
-Give me labels for this concept
This should be easy. Internationalization won't be in the first implementation but should be in subsequent versions.
-Give me examples of values for this concept
This should be easy but isn't in the highest priority.
...
I hope to be half clear as your document is :)
Sounds good. Thank you again for your input.
All the best,
Roger
Javier.
On 8/7/06, Roger Hyam roger@tdwg.org wrote:
Hi Everyone,
The TDWG Infrastructure Project team have a milestone/deliverable to produce a document to act as a road map for the TDWG Architecture.
I have prepared a document that I believe pulls together the ideas and consequences from the TAG-1 meeting and the two GUID meetings plus various discussions held at the OGC meeting in Edinburgh. This document introduces the notion of a 'Semantic Hub' which is a proposed mechanism to expose TDWG semantics through multiple technologies: XML Schema; GML; OWL; etc.
I would be most grateful if you could find time to have a look at this six page document:
http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview
and give any feedback on the wiki page, via this list or directly to me.
Many thanks,
Roger
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag