Hi Javier,
Comments below.
Nice document. Just some comments:
-Tonto is a little bit bad name in spanish.
Yes I know it means Fool. I thought it was appropriate for something
that people might think of as a knowledge management tool - keep things
in perspective. Ricardo thought it was OK so I presume it is OK in
Brazilian Portuguese but not sure about Spanish in all the different
countries. If anyone knows it to be very derogatory or a 'bad' word in
any particular country let me know and I'll change it. Fool is quite
mild in English. Fool also means "Cold dessert consisting of fruit
puree and whipped cream." in English.
-Have you considered something about controlled
vocabularies in the
semantic hub? Or maybe what I prefer to call managed controlled
vocabularies. For example in ABCD there is a concept called
"KindOfRecord", it is not a controlled vocabulary, is just a text. It
would be too difficult to provide a fix list of terms for it so would
be great if the list can somehow be created in a community base. Let
say that I want to map my database to this field, I could get a list
of proposed terms already being used, if none of them satisfy me then
i can create my own one. It is a little bit like tagging in a control
way. I love the del.icio.us example, they propose you tags and most of
the time I use them, and by doing this then the data is much more
accessible because the tags have not exploded. The opposite os what is
happening now in ABCD everybody use a different term for the same
thing and the unified data becomes useless.
There will be instances of classes in Tonto. I should have mentioned
that. Chatting to Rob Gales about it it seems a good way of doing
controlled vocabularies. They will be extensible because Tonto will
always be capable of change. Also in some languages - OWL etc you could
always define your own instances outside of Tonto - but it would depend
on how we do the coding for the XML Schema based renderings of the
semantics.
Populating drop down menus with information out of Tonto when some one
is mapping a data source in the ultimate goal - the dream!
-In the implementation section you say something like
"Data Providers
must map their data to these views" referring to views from the
semantic hub. This is actually we are trying to avoid. TAPIR at the
beginning was created with the vision of data providers mapping once
their databases and being accessible through different views that are
explicitly declared on the request. We changed the name now and we are
calling it outputModels.
In the other hand you know that WASABI and PyWrapper are now becoming
muti-protocol. That means that we want providers to map their
databases once and make the data available in different protocols.
The plan is that data mapping only has to be done to one of the views
that Tonto has onto its internal semantics then the other
views/representations could be used as outputModels (or custom output
models could be created by clients etc). The goal is definitely to only
map once - a single set of semantics - but then represent in multiple
ways. Tonto could provide a view of the semantics that a graphical tool
could then pick up to help some one build a mapping file - the dream...
So, again, within TAPIR itself and within protocols we
need something
like the semantic hub you are proposing. And we are doing it right now
but very primitive. I am working on the implementation of the BioMOBY
protocol inside PyWrapper. I had created a mapping file between TDWG
schemas and MOBY data types registry so that I can resolve questions
like:
-Ok, if I have these TDWG concepts mapped, which MOBY services could I
create.
-How can I create this MOBY types using TDWG concepts?
This is definitely where everyone is heading. The idea with the hub is
to start by clearly defining the semantic constructs we are going to
use (classes, properties, instances, literals, ranges) so that we can
be sure that we can represent the semantics in different
'representations'. It is no good using some UML or OWL construct that
doesn't have a good representation in XML Schema or GML for example.
This plan will not answer all the questions on the first day. The
existing schemas will need to be mapped into Tonto so they are
represented in a uniform way and this will take time - but months not
years I hope. It is not always clear in the exiting schemas what the
classes and properties are, for example, so this is not an automatic
process but will need thinking through - and there are issues around
cardinality and multiple inheritance that will need to be discussed.
As I said this is now being implemented in a simple flat
file that
will be available in Internet for all data providers, but I am not
accessing it as a service and I have to do all the handling on the
client. The semantic Hub you are proposing is exactly what we need and
want to do this more properly.
I am glad to hear that. I hope that we can generate the file you need
to do the mapping automatically in future.
So... summarizing, from our side I can imagine now that we
need:
-the semantic hub must expose the concepts in a way that we can use
them in our configuration tools in the data providers to allow mapping
a database there.
Just specify the way and we will make it do it.
-the semantic hub must expose the different views, or
outputModels as
we call them in TAPIR, so that providers software can produce them.
Just specify them and we write a script to produce them from Tonto.
There is a full list of other requirements I would love to
see there
that can be found in Dave Thau's work on a schema repository, do you
remember?
http://ww3.bgbm.org/schemarepowiki
Specially there where things like:
-Give an XSLT that transforms ABCD 1.2 into 2.06
Ah ha! this is where I have to say no!
What I am proposing is that we have a central semantic model that can
be presented in a multitude of ways. This is very different
from a service that can automatically transform one existing schema to
another. That is far more ambitious and may not be possible in what
remains of human history. We should build the central semantic model
out of the existing schemas but that is a manual process where
decisions will have to be made about what was meant by the different
constructs in the existing schemas. Think about the different ways
inclusion, adjacency, cardinality and type extension/restriction are
used within XML Schema documents and what they 'mean' in terms of GML
feature types or RDFS classes and properties? In general there just
isn't a mapping.
When they built GML they stared with the model of feature types and
properties and then decided how they would represent this using XML and
control it with XML Schema. We need to take a similar approach. Decide
on what our modeling technique is then decide how we will represent the
model in different technologies including GML and semantic web stuff.
A crazy analogy would be to say it is like getting a machine to write a
story in different languages based on the same plot. This might be
achievable because the plot can be encoded in some machine readable way
and the machine can just use rules to bolt together stored sentences.
Adding another language is just a matter of new output rules and new
stored sentences. It wouldn't produce great literature but it would
work.
This is very different from asking a machine to read a book in one
language, understand the plot and print out the story in a completely
different language. For a start their may be things expressed in the
first language for which there is no direct equivalent. Just ask a
babel fish!
I think this is the main way the Semantic Hub proposal differs from the
Schema Repository approach. I am trying to make something happen by
restricting the scope - if you remember the talk I gave on managing
scope and resources..
I hope this is OK.
-Give me labels for this concept
This should be easy. Internationalization won't be in the first
implementation but should be in subsequent versions.
-Give me examples of values for this concept
This should be easy but isn't in the highest priority.
...
I hope to be half clear as your document is :)
Sounds good. Thank you again for your input.
All the best,
Roger
Javier.
On 8/7/06, Roger Hyam <roger@tdwg.org> wrote:
Hi Everyone,
The TDWG Infrastructure Project team have a milestone/deliverable to
produce a document to act as a road map for the TDWG Architecture.
I have prepared a document that I believe pulls together the ideas and
consequences from the TAG-1 meeting and the two GUID meetings plus
various discussions held at the OGC meeting in Edinburgh. This document
introduces the notion of a 'Semantic Hub' which is a proposed mechanism
to expose TDWG semantics through multiple technologies: XML Schema;
GML; OWL; etc.
I would be most grateful if you could find time to have a look at this
six page document:
http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview
and give any feedback on the wiki page, via this list or directly to
me.
Many thanks,
Roger
--
-------------------------------------
Roger Hyam
Technical Architect
Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
roger@tdwg.org
+44 1578 722782
-------------------------------------
_______________________________________________
tdwg-tag mailing list
tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
--
-------------------------------------
Roger Hyam
Technical Architect
Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
roger@tdwg.org
+44 1578 722782
-------------------------------------