Re: [tdwg-tag] TAG Road Map

9 Aug 2006

      Hi Javier,

Comments below.
...
Nice document. Just some comments:
-Tonto is a little bit bad name in spanish.
Yes I know it means Fool. I thought it was appropriate for something 
that people might think of as a knowledge management tool - keep things 
in perspective. Ricardo thought it was OK so I presume it is OK in 
Brazilian Portuguese but not sure about Spanish in all the different 
countries. If anyone knows it to be very derogatory or a 'bad' word in 
any particular country let me know and I'll change it. Fool is quite 
mild in English. Fool also means "Cold dessert consisting of fruit puree 
and whipped cream." in English.
...
-Have you considered something about controlled vocabularies in the
semantic hub? Or maybe what I prefer to call managed controlled
vocabularies. For example in ABCD there is a concept called
"KindOfRecord", it is not a controlled vocabulary, is just a text. It
would be too difficult to provide a fix list of terms for it so would
be great if the list can somehow be created in a community base. Let
say that I want to map my database to this field, I could get a list
of proposed terms already being used, if none of them satisfy me then
i can create my own one. It is a little bit like tagging in a control
way. I love the del.icio.us example, they propose you tags and most of
the time I use them, and by doing this then the data is much more
accessible because the tags have not exploded. The opposite os what is
happening now in ABCD everybody use a different term for the same
thing and the unified data becomes useless.
There will be instances of classes in Tonto. I should have mentioned 
that. Chatting to Rob Gales about it it seems a good way of doing 
controlled vocabularies. They will be extensible because Tonto will 
always be capable of change. Also in some languages - OWL etc you could 
always define your own instances outside of Tonto - but it would depend 
on how we do the coding for the XML Schema based renderings of the 
semantics.

Populating drop down menus with information out of Tonto when some one 
is mapping a data source in the ultimate goal - the dream!
...
-In the implementation section you say something like "Data Providers
must map their data to these views" referring to views from the
semantic hub. This is actually we are trying to avoid. TAPIR at the
beginning was created with the vision of data providers mapping once
their databases and being accessible through different views that are
explicitly declared on the request. We changed the name now and we are
calling it outputModels.
In the other hand you know that WASABI and PyWrapper are now becoming
muti-protocol. That means that we want providers to map their
databases once and make the data available in different protocols.
The plan is that data mapping only has to be done to one of the views 
that Tonto has onto its internal semantics then the other 
views/representations could be used as outputModels (or custom output 
models could be created by clients etc). The goal is definitely to only 
map once - a single set of semantics - but then represent in multiple 
ways. Tonto could provide a view of the semantics that a graphical tool 
could then pick up to help some one build a mapping file - the dream...
...
So, again, within TAPIR itself and within protocols we need something
like the semantic hub you are proposing. And we are doing it right now
but very primitive. I am working on the implementation of the BioMOBY
protocol inside PyWrapper. I had created a mapping file between TDWG
schemas and MOBY data types registry so that I can resolve questions
like:
-Ok, if I have these TDWG concepts mapped, which MOBY services could I 
create.
-How can I create this MOBY types using TDWG concepts?
This is definitely where everyone is heading. The idea with the hub is 
to start by clearly defining the semantic constructs we are going to use 
(classes, properties, instances, literals, ranges) so that we can be 
sure that we can represent the semantics in different 'representations'. 
It is no good using some UML or OWL construct that doesn't have a good 
representation in XML Schema or GML for example.

This plan will not answer all the questions on the first day. The 
existing schemas will need to be mapped into Tonto so they are 
represented in a uniform way and this will take time - but months not 
years I hope. It is not always clear in the exiting schemas what the 
classes and properties are, for example, so this is not an automatic 
process but will need thinking through - and there are issues around 
cardinality and multiple inheritance that will need to be discussed.
...
As I said this is now being implemented in a simple flat file that
will be available in Internet for all data providers, but I am not
accessing it as a service and I have to do all the handling on the
client. The semantic Hub you are proposing is exactly what we need and
want to do this more properly.
I am glad to hear that. I hope that we can generate the file you need to 
do the mapping automatically in future.
...
So... summarizing, from our side I can imagine now that we need:
-the semantic hub must expose the concepts in a way that we can use
them in our configuration tools in the data providers to allow mapping
a database there.
Just specify the way and we will make it do it.
-the semantic hub must expose the different views, or outputModels as
we call them in TAPIR, so that providers software can produce them.
Just specify them and we write a script to produce them from Tonto.
...
There is a full list of other requirements I would love to see there
that can be found in Dave Thau's work on a schema repository, do you
remember?
http://ww3.bgbm.org/schemarepowiki
Specially there where things like:
-Give an XSLT that transforms ABCD 1.2 into 2.06
Ah ha! this is where I have to say no!
What I am proposing is that we have a central semantic model that can be 
presented in a multitude of ways. This is *very *different from a 
service that can automatically transform one existing schema to another. 
That is far more ambitious and may not be possible in what remains of 
human history. We should build the central semantic model out of the 
existing schemas but that is a manual process where decisions will have 
to be made about what was meant by the different constructs in the 
existing schemas. Think about the different ways inclusion, adjacency, 
cardinality and type extension/restriction are used within XML Schema 
documents and what they 'mean' in terms of GML feature types or RDFS 
classes and properties? In general there just isn't a mapping.

When they built GML they stared with the model of feature types and 
properties and then decided how they would represent this using XML and 
control it with XML Schema. We need to take a similar approach. Decide 
on what our modeling technique is then decide how we will represent the 
model in different technologies including GML and semantic web stuff.

A crazy analogy would be to say it is like getting a machine to write a 
story in different languages based on the same plot. This might be 
achievable because the plot can be encoded in some machine readable way 
and the machine can just use rules to bolt together stored sentences. 
Adding another language is just a matter of new output rules and new 
stored sentences. It wouldn't produce great literature but it would work.

This is very different from asking a machine to read a book in one 
language, understand the plot and print out the story in a completely 
different language. For a start their may be things expressed in the 
first language for which there is no direct equivalent. Just ask a babel 
fish!

I think this is the main way the Semantic Hub proposal differs from the 
Schema Repository approach. I am trying to make something happen by 
restricting the scope - if you remember the talk I gave on managing 
scope and resources..

I hope this is OK.
...
-Give me labels for this concept
This should be easy. Internationalization won't be in the first 
implementation but should be in subsequent versions.
-Give me examples of values for this concept
This should be easy but isn't in the highest priority.
...
I hope to be half clear as your document is :)
Sounds good. Thank you again for your input.

All the best,

Roger
...
Javier.
On 8/7/06, Roger Hyam <roger@tdwg.org> wrote:
...
Hi Everyone,
The TDWG Infrastructure Project team have a milestone/deliverable to
produce a document to act as a road map for the TDWG Architecture.
I have prepared a document that I believe pulls together the ideas and
consequences from the TAG-1 meeting and the two GUID meetings plus
various discussions held at the OGC meeting in Edinburgh. This document
introduces the notion of a 'Semantic Hub' which is a proposed mechanism
to expose TDWG semantics through multiple technologies: XML Schema;
GML;  OWL; etc.
I would be most grateful if you could find time to have a look at this
six page document:
http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview
and give any feedback on the wiki page, via this list or directly to me.
Many thanks,
Roger
--
-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------
_______________________________________________
tdwg-tag mailing list
tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------