Hi Steve,
Good to have your input.
This thread is getting a little difficult to read so I 'll just pick out a few bits then spin out some new threads.
Steve Perry wrote:
First, a silly point. The term "controlled vocabulary" is used by some as a synonym for "ontology". Can we find or coin a new term for concepts with a range of enumerated values that isn't overloaded?
I agree with Rob that any "concept" with enumerated values ought to have those values represented as instances with assigned GUIDs. Some values will be blessed by the fact that they are stored in the core model but people will be free to invent their own without breaking existing software or mapping rules (at the expense of losing interoperability with software that only understands the approved values).
Sounds good. A terminology to talk about terminology would be good. I think enumerated values is quiet a good one but am open to suggestions. Any property with a range of instance of a class is really an enumeration though.
First, since some of us are also involved in working to create data models that one day may be added to the semantic hub, is there a defined list of the common subset of modeling constructs (between UML, XSD, OWL, and RDFS) and suggestions about how to implement them? Are there discussions about what constructs will be dropped and the trade-offs of different implementations?
I am hoping to kick the discussion off. Donald made the point on an earlier version of the road map document that I should expand on the basic lists of constructs (I use the word construct as I can't think of another term for the things we use to construct our ontology) but I decided to leave it 'enigmatic' so it would kick off the debate.
For example, it could be argued that N-ary associations could be implemented in RDFS and OWL (and perhaps in XML-Schemas that can describe directed labeled graphs through the use of GUIDs), but from the research community, the recommended implementation for N-ary associations in RDF-based systems is reification. As implementors of systems that work with RDF-based data, we feel that reification is not the way to go and that it may be better to drop support for N-ary associations rather than put in place a "flawed work-around" like reification. Just off the top of my head there are also issues with modeling arbitrary cardinality, cardinality on one or both sides of an association, primitive type mapping and data type promotion, modeling aggregation and/or composition (sequences and bags mean anonymous nodes which don't play nice in a system that uses GUIDs to name resources), and how to implement many of these modeling constructs in XSD which was designed to describe trees (not graphs) and has no built-in notion of global identifiers.
What I would like to do is start from the position of 0 constructs and then add them in one at a time (stopping when we have the bear minimum we need to communicate) rather than look for an intersection of known target languages. This is because less means less implementation/documentation etc it also means when another language appears that we have to support we are less likely to have used something that it can't do. Rather contentiously I think it will also make it easier for us to agree on models - easier for people to understand and appreciate the alternative ways of modeling something before settling on one.
I am just thinking about how to kick off the debate on constructs and have started a wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntologyConstructs
but running out of time today so will have to work on it tomorrow.
The big ones I'd like to start discussion with are cardinality and multiple inheritance. I have good arguments for not having either (just not sure how to state the inheritance ones). I'll maybe kick off with and email on cardinality tomorrow. I tried it out on Jessie, Rob and Robert last week and they seemed OK about it - but may have just been bored !
The big question is "What is business/application logic and what is semantics?"
I don't mean to get bogged down in detail or to make trouble, but creation of the concrete data models (in XSD, RDFS, OWL, etc.) from the abstract semantic core will depend on sorting all these issues out. Is there a place where these discussions are happening and is there some way that implementors can feed back into the decisions made on these issues by the technical infrastructure group? Once they have been formalized, I think it may be important that these modeling recommendations be made available to the community in a document. One idea behind the semantic core is that it can grow over time. As the community models new areas of biodiversity informatics there has to be a way for the new data models to be incorporated into the semantic core (after being blessed by some TDWG body). This will be easier if the people creating new data models understand which modeling constructs they have available to them.
What I would like to think is that we will be able to model from within Tonto for the basic agreed semantics. Application logic modeling can happen outside but for the basic agreed semantics it has to happen collaboratively and extend what is already there - or it isn't agreed.
Is there a more complete description of Tonto functionality? I'm still not clear about what kind of interface it will provide to the network of services (if any). Is it correct to think of Tonto as a schema repository, a collaborative schema editor, or a tool that will be used by technical infrastructure group to generate concrete implementations of the semantic core in various typing systems (XSD, RDFS, OWL, etc)? It may be lack of sleep, but I took the wording in section 5.2 to suggest all three at different points.
The Semantic Hub as a notion will be a place where semantic stuff is stored. Tonto will be an application to do 'ontology governance'. Effectively a simple web application that allows the building of a single, agreed ontology according to a set of rules (the constructs that are permitted) and then exposes (writes files out some place) the ontology in different technologies. The semantics are internal to Tonto (controlled by its business rules) and exposed in flavours that are required.
All this is only possible by managing the scope of what is done. If we extend the scope of what Tonto is supposed to do too far then it will never work. The lessons of the Gene Ontology echo in my mind. They had far too few constructs and they ended up with something that people complain about. People complain about it because they are using it! There are plenty of OWL ontologies out there that no one complains about because no one is using. I'd like to hit the happy medium.
Currently I am hacking together a first iteration of Tonto as I believe it is easier (and quicker) to demonstrate the thing as a proof of concept than try and formally describe it, justify it and then engineer it properly. It is probably quicker to discover what is not possible this way. I hope to have a version up for comment in a few weeks and certainly for demo by October.
This should be easy but isn't in the highest priority.
This could be difficult when the value of a concept is not a single string value, but one of several different types of objects.
You are right it could be difficult. In which case it should only be done if we can't achieve our business goals without it - same applies to choosing constructs.
Hope this helps.
I'll kick off a thread on why we shouldn't have cardinality tomorrow morning my time - then you can pick it to pieces ;)
All the best,
Roger