[tdwg-tapir] Mapping to CNS file

8 Mar 2007

      Hi Folks,

I have been doing some thinking about mapping to the ontology.

We have a way of defining TAPIR concepts as paths through the  
ontology. I have written a script to generate a CNS file from a  
particular view onto the ontology i.e. starting at a root class and  
following all the links in the RDF to extinction but not allowing  
recursion on any one path.  It needs a little work to include the  
common and external properties but seems to run OK.

I attach an example CNS if it will make it through the mail server.

How can we use such as CNS file to configure the configurators of the  
wrappers and is this desirable?

I believe the current configurators assume that the full name of the  
concepts is resolvable to the documentation for the concept. The  
concept paths generated form the ontology are not resolvable in this  
way. I could write a script that generated a documentation page when  
it was passed one of these paths. Would this be a reasonable thing to  
do?

There are lots of concepts in this file (138) and it will get a lot  
bigger when I add in the general properties. We really need a way of  
organizing the mapping process into a hierarchy browsing process.  
Could we have a pre-mapping phase where someone browses the ontology  
and creates a list of concepts that they want to map. They could then  
use this much shorter list in the configurator. Would this be more  
productive? I have started mocking something up along these lines but  
will abandon it if there is another way forward.

I was thinking of writing a script to automate the production of  
output model structures in much the same way as creating the CNS but  
I ran into  an interesting problem with preventing recursion.

When building paths for the CNS any one path can't include the same  
class twice - which is easy to do when you are only thinking about  
paths but when you are building a series of XML Schemas it becomes  
more complex.

A --includes--> B --includes--> C --includes--> A

Can be detected and C can contain a link instead of the include.

A --includes--> B --includes--> C --linksTo--> A

So we avoid recursion when building the concepts in the CNS.

But  if we have to consider multiple paths and possible loops then we  
end in situations where we can't make choices automatically. If B can  
include C and C can include B for example.

A --includes--> B --includes--> C --linksTo-->B
A --includes--> C --includes--> B --linksTo -->C

There are two definitions of B here, one with a link to C and one  
which includes C. We can't do this in XML Schema because we have to  
have two definitions of the same thing in the same namespace. When  
you reference the element in the schema for A it couldn't know which  
one you meant. Please correct me if I am wrong - I'd like to be.

This example is likely to arise more often than you might think as  
there are lots of circular links in the ontology. just think of  
synonymy and basionyms and hierarchies etc.

So a decision still has to be made - by a human I guess - about which  
is the optimum combination of links and includes for a useful output  
structure.

It makes me think a convention on namespace substitution is the  
easiest way forward for building output models - but that might be  
just because we are getting near the end of the day here....

What do you think?

Roger

[tdwg-tapir] Mapping to CNS file

Roger Hyam