[tdwg-tag] TAG Road Map
roger at tdwg.org
Wed Aug 9 19:23:29 CEST 2006
Good to have your input.
This thread is getting a little difficult to read so I 'll just pick out
a few bits then spin out some new threads.
Steve Perry wrote:
> First, a silly point. The term "controlled vocabulary" is used by
> some as a synonym for "ontology". Can we find or coin a new term for
> concepts with a range of enumerated values that isn't overloaded?
> I agree with Rob that any "concept" with enumerated values ought to
> have those values represented as instances with assigned GUIDs. Some
> values will be blessed by the fact that they are stored in the core
> model but people will be free to invent their own without breaking
> existing software or mapping rules (at the expense of losing
> interoperability with software that only understands the approved
Sounds good. A terminology to talk about terminology would be good. I
think enumerated values is quiet a good one but am open to suggestions.
Any property with a range of instance of a class is really an
> First, since some of us are also involved in working to create data
> models that one day may be added to the semantic hub, is there a
> defined list of the common subset of modeling constructs (between UML,
> XSD, OWL, and RDFS) and suggestions about how to implement them? Are
> there discussions about what constructs will be dropped and the
> trade-offs of different implementations?
I am hoping to kick the discussion off. Donald made the point on an
earlier version of the road map document that I should expand on the
basic lists of constructs (I use the word construct as I can't think of
another term for the things we use to construct our ontology) but I
decided to leave it 'enigmatic' so it would kick off the debate.
> For example, it could be argued that N-ary associations could be
> implemented in RDFS and OWL (and perhaps in XML-Schemas that can
> describe directed labeled graphs through the use of GUIDs), but from
> the research community, the recommended implementation for N-ary
> associations in RDF-based systems is reification. As implementors of
> systems that work with RDF-based data, we feel that reification is not
> the way to go and that it may be better to drop support for N-ary
> associations rather than put in place a "flawed work-around" like
> reification. Just off the top of my head there are also issues with
> modeling arbitrary cardinality, cardinality on one or both sides of an
> association, primitive type mapping and data type promotion, modeling
> aggregation and/or composition (sequences and bags mean anonymous
> nodes which don't play nice in a system that uses GUIDs to name
> resources), and how to implement many of these modeling constructs in
> XSD which was designed to describe trees (not graphs) and has no
> built-in notion of global identifiers.
What I would like to do is start from the position of 0 constructs and
then add them in one at a time (stopping when we have the bear minimum
we need to communicate) rather than look for an intersection of known
target languages. This is because less means less
implementation/documentation etc it also means when another language
appears that we have to support we are less likely to have used
something that it can't do. Rather contentiously I think it will also
make it easier for us to agree on models - easier for people to
understand and appreciate the alternative ways of modeling something
before settling on one.
I am just thinking about how to kick off the debate on constructs and
have started a wiki page here:
but running out of time today so will have to work on it tomorrow.
The big ones I'd like to start discussion with are cardinality and
multiple inheritance. I have good arguments for not having either (just
not sure how to state the inheritance ones). I'll maybe kick off with
and email on cardinality tomorrow. I tried it out on Jessie, Rob and
Robert last week and they seemed OK about it - but may have just been
The big question is "What is business/application logic and what is
> I don't mean to get bogged down in detail or to make trouble, but
> creation of the concrete data models (in XSD, RDFS, OWL, etc.) from
> the abstract semantic core will depend on sorting all these issues
> out. Is there a place where these discussions are happening and is
> there some way that implementors can feed back into the decisions made
> on these issues by the technical infrastructure group?
> Once they have been formalized, I think it may be important that these
> modeling recommendations be made available to the community in a
> document. One idea behind the semantic core is that it can grow over
> time. As the community models new areas of biodiversity informatics
> there has to be a way for the new data models to be incorporated into
> the semantic core (after being blessed by some TDWG body). This will
> be easier if the people creating new data models understand which
> modeling constructs they have available to them.
What I would like to think is that we will be able to model from within
Tonto for the basic agreed semantics. Application logic modeling can
happen outside but for the basic agreed semantics it has to happen
collaboratively and extend what is already there - or it isn't agreed.
> Is there a more complete description of Tonto functionality? I'm
> still not clear about what kind of interface it will provide to the
> network of services (if any). Is it correct to think of Tonto as a
> schema repository, a collaborative schema editor, or a tool that will
> be used by technical infrastructure group to generate concrete
> implementations of the semantic core in various typing systems (XSD,
> RDFS, OWL, etc)? It may be lack of sleep, but I took the wording in
> section 5.2 to suggest all three at different points.
The Semantic Hub as a notion will be a place where semantic stuff is
stored. Tonto will be an application to do 'ontology governance'.
Effectively a simple web application that allows the building of a
single, agreed ontology according to a set of rules (the constructs that
are permitted) and then exposes (writes files out some place) the
ontology in different technologies. The semantics are internal to Tonto
(controlled by its business rules) and exposed in flavours that are
All this is only possible by managing the scope of what is done. If we
extend the scope of what Tonto is supposed to do too far then it will
never work. The lessons of the Gene Ontology echo in my mind. They had
far too few constructs and they ended up with something that people
complain about. People complain about it because they are using it!
There are plenty of OWL ontologies out there that no one complains about
because no one is using. I'd like to hit the happy medium.
Currently I am hacking together a first iteration of Tonto as I believe
it is easier (and quicker) to demonstrate the thing as a proof of
concept than try and formally describe it, justify it and then engineer
it properly. It is probably quicker to discover what is not possible
this way. I hope to have a version up for comment in a few weeks and
certainly for demo by October.
>> This should be easy but isn't in the highest priority.
> This could be difficult when the value of a concept is not a single
> string value, but one of several different types of objects.
You are right it could be difficult. In which case it should only be
done if we can't achieve our business goals without it - same applies to
Hope this helps.
I'll kick off a thread on why we shouldn't have cardinality tomorrow
morning my time - then you can pick it to pieces ;)
All the best,
Taxonomic Databases Working Group
roger at tdwg.org
+44 1578 722782
More information about the tdwg-tag