Re: [tdwg-tag] TAG Road Map

9 Aug 2006

      ...
First, a silly point.  The term "controlled vocabulary" is used by 
some as a synonym for "ontology".  Can we find or coin a new term for 
concepts with a range of enumerated values that isn't overloaded?
I agree with Rob that any "concept" with enumerated values ought to 
have those values represented as instances with assigned GUIDs.  Some 
values will be blessed by the fact that they are stored in the core 
model but people will be free to invent their own without breaking 
existing software or mapping rules (at the expense of losing 
interoperability with software that only understands the approved 
values).
Sounds good. A terminology to talk about terminology would be good. I
Hi Steve,

Good to have your input.

This thread is getting a little difficult to read so I 'll just pick out 
a few bits then spin out some new threads.

Steve Perry wrote:
think enumerated values is quiet a good one but am open to suggestions. 
Any property with a range of instance of a class is really an 
enumeration though.
...
First, since some of us are also involved in working to create data 
models that one day may be added to the semantic hub, is there a 
defined list of the common subset of modeling constructs (between UML, 
XSD, OWL, and RDFS) and suggestions about how to implement them?  Are 
there discussions about what constructs will be dropped and the 
trade-offs of different implementations?
I am hoping to kick the discussion off. Donald made the point on an 
earlier version of the road map document that I should expand on the 
basic lists of constructs (I use the word construct as I can't think of 
another term for the things we use to construct our ontology) but I 
decided to leave it 'enigmatic' so it would kick off the debate.
For example, it could be argued that N-ary associations could be 
implemented in RDFS and OWL (and perhaps in XML-Schemas that can 
describe directed labeled graphs through the use of GUIDs), but from 
the research community, the recommended implementation for N-ary 
associations in RDF-based systems is reification.  As implementors of 
systems that work with RDF-based data, we feel that reification is not 
the way to go and that it may be better to drop support for N-ary 
associations rather than put in place a "flawed work-around" like 
reification.  Just off the top of my head there are also issues with 
modeling arbitrary cardinality, cardinality on one or both sides of an 
association, primitive type mapping and data type promotion, modeling 
aggregation and/or composition (sequences and bags mean anonymous 
nodes which don't play nice in a system that uses GUIDs to name 
resources), and how to implement many of these modeling constructs in 
XSD which was designed to describe trees (not graphs) and has no 
built-in notion of global identifiers.
What I would like to do is start from the position of 0 constructs and 
then add them in one at a time (stopping when we have the bear minimum 
we need to communicate) rather than look for an intersection of known 
target languages. This is because less means less 
implementation/documentation etc it also means when another language 
appears that we have to support we are less likely to have used 
something that it can't do. Rather contentiously I think it will also 
make it easier for us to agree on models - easier for people to 
understand and appreciate the alternative ways of modeling something 
before settling on one.

I am just thinking about how to kick off the debate on constructs and 
have started a wiki page here:

http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntologyConstructs

but running out of time today so will have to work on it tomorrow.

The big ones I'd like to start discussion with are cardinality and 
multiple inheritance. I have good arguments for not having either (just 
not sure how to state the inheritance ones). I'll maybe kick off with 
and email on cardinality tomorrow. I tried it out on Jessie, Rob and 
Robert last week and they seemed OK about it - but may have just been 
bored !

The big question is "What is business/application logic and what is 
semantics?"
...
I don't mean to get bogged down in detail or to make trouble, but 
creation of the concrete data models (in XSD, RDFS, OWL, etc.) from 
the abstract semantic core will depend on sorting all these issues 
out.  Is there a place where these discussions are happening and is 
there some way that implementors can feed back into the decisions made 
on these issues by the technical infrastructure group?
Once they have been formalized, I think it may be important that these 
modeling recommendations be made available to the community in a 
document.  One idea behind the semantic core is that it can grow over 
time.  As the community models new areas of biodiversity informatics 
there has to be a way for the new data models to be incorporated into 
the semantic core (after being blessed by some TDWG body).  This will 
be easier if the people creating new data models understand which 
modeling constructs they have available to them.
What I would like to think is that we will be able to model from within 
Tonto for the basic agreed semantics. Application logic modeling can 
happen outside but for the basic agreed semantics it has to happen 
collaboratively and extend what is already there - or it isn't agreed.
...
Is there a more complete description of Tonto functionality?  I'm 
still not clear about what kind of interface it will provide to the 
network of services (if any).  Is it correct to think of Tonto as a 
schema repository, a collaborative schema editor, or a tool that will 
be used by technical infrastructure group to generate concrete 
implementations of the semantic core in various typing systems (XSD, 
RDFS, OWL, etc)?  It may be lack of sleep, but I took the wording in 
section 5.2 to suggest all three at different points.
The Semantic Hub as a notion will be a place where semantic stuff is 
stored. Tonto will be an application to do 'ontology governance'. 
Effectively a simple web application that allows the building of a 
single, agreed ontology according to a set of rules (the constructs that 
are permitted) and then exposes (writes files out some place) the 
ontology in different technologies. The semantics are internal to Tonto 
(controlled by its business rules) and exposed in flavours that are 
required.
All this is only possible by managing the scope of what is done. If we 
extend the scope of what Tonto is supposed to do too far then it will 
never work. The lessons of the Gene Ontology echo in my mind. They had 
far too few constructs and they ended up with something that people 
complain about. People complain about it because they are using it! 
There are plenty of OWL ontologies out there that no one complains about 
because no one is using. I'd like to hit the happy medium.

Currently I am hacking together a first iteration of Tonto as I believe 
it is easier (and quicker) to demonstrate the thing as a proof of 
concept than try and formally describe it, justify it and then engineer 
it properly. It is probably quicker to discover what is not possible 
this way. I hope to have a version up for comment in a few weeks and 
certainly for demo by October.
...
...
This should be easy but isn't in the highest priority.
This could be difficult when the value of a concept is not a single 
string value, but one of several different types of objects.
You are right it could be difficult. In which case it should only be 
done if we can't achieve our business goals without it - same applies to 
choosing constructs.

Hope this helps.

I'll kick off a thread on why we shouldn't have cardinality tomorrow 
morning my time - then you can pick it to pieces ;)

All the best,

Roger

-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------